Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genome-scale insights into the underlying genetics of background effects
(USC Thesis Other)
Genome-scale insights into the underlying genetics of background effects
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Genome-scale Insights into the Underlying Genetics of Background Effects
by
Joseph Hale
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
May 2024
Copyright 2024 Joseph Hale
Acknowledgements
I would like to thank my advisor, Ian Ehrenreich, for both the outstanding guidance and
direction, and for making every effort to help my projects and my PhD succeed. Additionally, I
would like to thank Sasha Levy, who provided invaluable insight and advice on every project I
worked on. Thank you to my fellow lab members and collaborators for their input, discussions,
and support, as well as my committee members: Matt Dean, Oscar Aparicio, and Jazlyn
Mooney. Thank you to Takeshi Matsui, Martin Mullis, Rachel Schell, and Ilan Goldstein, for
being the best coworkers I could have asked for. Thank you to my cohort: Caleb Ghione,
Meghan Petrie, Joseph Hale, Yiwei He, Dan Ma, Joshua Park, and Nicole Stuhr, for always
celebrating graduations and birthdays.
Finally, thank you to my family for all of the assistance and support you have given me over
these past years.
ii
TABLE OF CONTENTS
Acknowledgements........................................................................................................................ii
List of Tables................................................................................................................................ vii
List of Figures..............................................................................................................................viii
List of Supplementary Figures...................................................................................................... ix
Abstract.........................................................................................................................................xi
Chapter 1: Introduction...............................................................................................................1
1.1 Obstacles in studying background effects............................................................. 1
1.2 Genetic architecture of background effects............................................................2
1.3 Relationship between epistasis, additivity, and dominance..................................2
1.4 Combinatorial DNA Barcoding................................................................................. 3
1.5 Goals of this dissertation..........................................................................................4
1.6 Summary of the chapters..........................................................................................5
Chapter 2: Genetic basis of a spontaneous mutation’s expressivity.....................................6
Summary of author contributions..................................................................................6
2.1 Abstract...................................................................................................................... 6
2.2 Introduction................................................................................................................7
2.3 Materials and Methods.............................................................................................. 9
2.3.1 Generation of Segregants.............................................................................9
2.3.2 Genotyping..................................................................................................14
2.3.3 Phenotyping................................................................................................ 16
2.3.4 Comparison of phenotypic variance between mutant and wild-type
segregants........................................................................................................... 17
2.3.5 Linkage Mapping.........................................................................................17
2.3.6 Classification of inviable segregants...........................................................20
2.3.7 Resolving loci and identifying candidate genes.......................................... 22
2.3.8 Reciprocal hemizygosity experiments.........................................................24
2.3.9 Construction of nucleotide replacement strains.......................................... 25
2.3.10 Mitochondrial genome instability experiments.......................................... 27
2.3.11 Modeling growth and examining the model in additional segregant
populations...........................................................................................................27
2.3.12 Relationship between detrimental alleles, growth, and inviability............. 28
2.4 Results and Discussion.......................................................................................... 28
2.4.1 A spontaneous mutation increases phenotypic variance in the BYx3S cross.
28
2.4.2 A large effect locus shows epistasis with mrp20-A105E.............................29
iii
2.4.3 Epistasis between MRP20 and MKT1 differs in cross parents and
segregants........................................................................................................... 32
2.4.4 Many additional loci affect the expressivity of mrp20-A105E......................34
2.4.5 The Chromosome XIV locus contains multiple causal variants.................. 35
2.4.6 Aneuploidy also contributes to the expressivity of mrp20-A105E............... 36
2.4.7 Multiple cellular mechanisms underlie poor growth in the presence of
mrp20-A105E.......................................................................................................37
2.4.8 Genetic underpinnings of mrp20-A105E’s expressivity in segregants
and parents.......................................................................................................... 38
2.5 Conclusion............................................................................................................... 39
2.6 Supplementary Material.......................................................................................... 43
Chapter 3: The interplay of additivity, dominance, and epistasis on fitness in a
diploid yeast cross.................................................................................................................... 53
Summary of author contributions................................................................................53
3.1 Abstract.................................................................................................................... 53
3.2 Introduction..............................................................................................................54
3.3 Results......................................................................................................................57
3.3.1 Phenotyping of a large diploid cross by barcode sequencing.....................57
3.3.2 Genetic mapping within interrelated families...............................................59
3.3.3 Loci frequently show dominance effects..................................................... 63
3.3.4 Epistatic hubs govern both additivity and non-additivity..............................64
3.3.5 Relationships between epistasis and dominance in diploids...................... 67
3.3.6 Characteristics of the Chromosome III dominance modifier....................... 69
3.4 Discussion................................................................................................................70
3.5 Methods.................................................................................................................... 72
3.5.1 Generation of haploid segregants...............................................................72
3.5.2 Barcoding of haploid segregants.................................................................73
3.5.3 Whole-genome sequencing of parental haploid segregants.......................75
3.5.4 Determining the barcode sequences.......................................................... 76
3.5.5 Generation of diploid segregants................................................................ 77
3.5.6 Translocating barcodes onto the same chromosome................................. 78
3.5.7 Growth Assays............................................................................................78
3.5.8 Library Preparation......................................................................................79
3.5.9 Barcode sequencing................................................................................... 81
3.5.10 Counting double barcode reads................................................................83
3.5.11 Fitness Estimation..................................................................................... 84
3.5.12 Heritability Estimates.................................................................................84
3.5.13 Quantile Normalization of Fitness............................................................. 85
3.5.14 Genome-wide scan for one-locus effects..................................................85
3.5.15 Family-level scans for one-locus effects................................................... 86
3.5.16 Identification of loci enriched for detections across families..................... 88
iv
3.5.17 Removing additive effects and correcting for family structure...................88
3.5.18 Genome-wide scans for loci with dominance............................................88
3.5.19 Degrees of dominance..............................................................................89
3.5.20 Comprehensive scan for pairwise interactions..........................................90
3.5.21 Scan for three-locus effects...................................................................... 91
3.5.22 Resolving hub loci.....................................................................................92
3.5.23 Estimating the fraction of epistatic effects that involve dominance...........92
3.5.24 Examining how combinations of modifiers explain effect size variance
of hubs................................................................................................................. 93
3.6 Supplementary Material.......................................................................................... 95
Chapter 4: Genome-scale analysis of interactions between genetic perturbations
and natural variation................................................................................................................116
Summary of author contributions.............................................................................. 116
4.1 Abstract...................................................................................................................116
4.2 Introduction............................................................................................................ 117
4.3 Results.................................................................................................................... 119
4.3.1 Construction of a double barcoded library of segregants with
inducible CRISPRi perturbations........................................................................119
4.3.2 Phenotyping of the segregant-gRNA pool.................................................121
4.3.3 Identification and characterization of background effects......................... 126
4.3.4 Hub loci control the phenotypic effects of many gRNAs........................... 131
4.3.5 Properties of hub loci that interact with many genetic perturbations……..133
4.3.6 Insights from the genetic interaction network of the yeast cell..................134
4.4 Discussion..............................................................................................................136
4.5 Methods.................................................................................................................. 138
4.5.1 Generation of barcoded haploid segregants.............................................138
4.5.2 CRISPRi plasmid construction..................................................................139
4.5.3 Generation of the CRISPRi library............................................................ 140
4.5.4 Linking barcodes to gRNAs in the CRISPRi library...................................141
4.5.5 Transformation of haploid strains with the CRISPRi library...................... 142
4.5.6 Fitness Assay............................................................................................142
4.5.7 Library preparation for barcode sequencing............................................. 143
4.5.8 Barcode sequencing................................................................................. 144
4.5.9 Removing PCR chimeras..........................................................................146
4.5.10 Fitness estimation................................................................................... 147
4.5.11 Identification and quantification of gRNA effects.....................................149
4.5.12 Linkage mapping.....................................................................................151
4.5.13 Heritability estimation..............................................................................151
4.5.14 Analysis of hub loci................................................................................. 152
iv
4.6 Supplementary Material........................................................................................ 153
Chapter 5: Conclusion............................................................................................................ 173
5.1 Impact of my work................................................................................................. 173
5.2 Future Directions................................................................................................... 177
5.2.1 Chapters 2 and 4.......................................................................................177
5.2.2 Chapters 3 and 4.......................................................................................177
References............................................................................................................................... 179
vi
LIST OF TABLES
Table 2.1 Candidate genes at detected loci have diverse cellular functions............................... 24
Table S2.1 Programs and settings used for sequence analyses.................................................50
Table S2.2 Crosses and segregant populations examined in this study..................................... 50
Table S2.3 Loci other than MKT1 that influence growth in mrp20-A105E segregants................ 51
Table S2.4 Candidate genes at loci in Table S3.......................................................................... 51
Table S2.5 Presence of Chromosome II duplication differs among BY x 3S crosses..................52
Table S3.1 List of chemical additives and their concentrations used in the competition
experiments............................................................................................................................... 113
Table S3.2 Number of replicate diploid strains in each fitness assay........................................ 113
Table S3.3 Broad and narrow-sense heritability estimates for each fitness assay.................... 114
Table S3.4 The number of unique genotypes and the number of barcode replicates per
genotype detected before filtering for quality.............................................................................114
Table S3.5 Estimated rate of PCR chimeras..............................................................................115
Table S4.1 Total raw reads used for fitness estimation..............................................................169
Table S4.2 Stability of fitness estimates across different subsets of time points.......................170
Table S4.3 Table of gRNAs that are likely to be affected by SNPs near their binding
location...................................................................................................................................... 171
Table S4.4 Table of gRNAs with a mapped QTL near their binding location and without
an overlapping polymorphism....................................................................................................172
Table S4.5 Positional information about hubs............................................................................173
vii
LIST OF FIGURES
Figure 2.1 The mrp20-A105E mutation occurred spontaneously, increasing phenotypic
variance in the BYx3S cross........................................................................................................11
Figure 2.2 MKT1 genetically interacts with mrp20-A105E...........................................................12
Figure 2.3 Additional loci govern response to the mutation.........................................................14
Figure 2.4 Mitochondrial genome instability partially underlies the expressivity of
mrp20-A105E…...........................................................................................................................21
Figure 2.5 Detected loci quantitatively and qualitatively explain mutant phenotypes..................22
Figure 2.6 Epistasis between MRP20 and Chr XIV appears to mostly explain response
to mrp20-A105E...........................................................................................................................31
Figure 2.7 Genetic engineering of parents revealed unexpected responses to
mrp20-A105E...............................................................................................................................33
Figure 2.8 Analysis of MKT1 and SAL1 genotypes reveals an aneuploidy also contributes
to mrp20-A105E’s variable expressivity.......................................................................................35
Figure 3.1 Generating a large panel of diploid segregants with known genotypes that can
be phenotyped as a pool............................................................................................................. 57
Figure 3.2 Identification of loci that affect fitness.........................................................................62
Figure 3.3 Interactions often affect both the additive and dominance effects of involved
loci............................................................................................................................................... 66
Figure 3.4 Multiple modifier loci cause hubs to exhibit a range of effect sizes across
different genetic backgrounds......................................................................................................68
Figure 3.5 Parent-of-origin of the mating locus influences dominance at hubs...........................69
Figure 4.1 Generating and phenotyping a panel of double-barcoded segregants carrying
CRISPRi perturbations.............................................................................................................. 122
Figure 4.2 Calculating deviation values for all segregant-gRNA combinations......................... 128
Figure 4.3 Using deviation values as traits in linkage mapping.................................................130
Figure 4.4 Features of and comparisons between hubs............................................................133
Figure 4.5 Interaction networks for each hub............................................................................ 136
viii
LIST OF SUPPLEMENTARY FIGURES
Figure S2.1 Generation of BYx3S segregants.............................................................................43
Figure S2.2 Identification of mrp20-A105E..................................................................................44
Figure S2.3 Representative MRP20 and mrp20-A105E segregants on ethanol......................... 46
Figure S2.4 Linkage mapping in the F3 panel more finely resolves the Chromosome XIV
locus............................................................................................................................................ 47
Figure S2.5 Growth effects of loci detected in BY x 3S mrp20-A105E crosses.......................... 48
Figure S2.6 Loci affecting expressivity of mrp20-A105E show minimal effects in MRP20
segregants...................................................................................................................................49
Figure S3.1 Workflow of how haploid and diploid segregants were barcoded............................ 96
Figure S3.2 Genome-wide allele frequencies of haploid segregants and the inferred
genome-wide allele frequencies of diploid segregants................................................................97
Figure S3.3 Correlation of fitness estimates between strain replicates in each experiment....... 98
Figure S3.4 Accuracy of fitness estimates (measurement noise) is significantly impacted
by the fitness and initial frequency of a diploid strain................................................................ 100
Figure S3.5 Correlation of fitness across environments............................................................102
Figure S3.6 Quantile normalization of fitnesses........................................................................ 103
Figure S3.7 Most loci are detected as significant without family correction...............................103
Figure S3.8 Loci with individual effects on fitness in each environment....................................104
Figure S3.9 Distinct loci show substantial variation in detection across families.......................106
Figure S3.10 Accounting for differences in mean parental fitness corrects for family-level
fitness effects in the population................................................................................................. 108
Figure S3.11 Pairwise genetic interactions detected in each environment................................109
Figure S3.12 Three-way genetic interactions detected in each environment............................ 111
Figure S3.13 Genetic variation at the mating locus results in distinct heterozygote classes.....112
Figure S4.1 Landing Pad design............................................................................................... 153
Figure S4.2 Allele frequencies across all genotypes.................................................................154
Figure S4.3 Plasmid design and double-barcoding scheme..................................................... 155
Figure S4.4 CRISPRi plasmid map........................................................................................... 156
Figure S4.5 Multiple replicate gRNAs target the same gene.....................................................157
Figure S4.6 Transformation efficiency across 58 random segregants.......................................158
Figure S4.7 Distribution of data per gRNA................................................................................ 159
Figure S4.8 Distribution of fitness values from each assay.......................................................160
Figure S4.9 Distribution of barcodes and normalized reads per genotype in each flask...........161
Figure S4.10 Distribution of barcodes and normalized reads per gRNA in each flask..............162
Figure S4.11 XY plot between the effect of SNPs that directly overlap gRNA binding
sites and the distance between the gRNA’s PAM site and that SNP.........................................163
Figure S4.12 Effects of gRNAs that target the same gene........................................................164
ix
Figure S4.13 Location of all gRNAs across the genome...........................................................165
Figure S4.14 X-Y plot between gRNA effect and all deviation values for that gRNA................ 166
Figure S4.15 Linkage mapping results on deviation values for the replicate experimental
flask ATC2................................................................................................................................. 167
Figure S4.16 Potential hubs below the significance threshold used during analysis................ 168
x
Abstract
A major goal of modern genetics is to fully understand how genetic variation influences
phenotypic variation. A great deal of our knowledge on the genetic basis of phenotype comes
from large-scale studies of many individuals. However, this does not always tell the entire story.
The effects of genetic variants can be highly contextual, with some individuals exhibiting
dramatically different responses to the same genetic variant. This phenomenon, known as a
genetic background effect, means that simply understanding the average effect of a variant
across an entire population may not translate to predictive power for individuals. Recent
advances in DNA barcoding, sequencing, and CRISPR interference have made it possible to
study these effects on an unprecedented scale, potentially enabling general, genome-scale
insights into how genetic background effects play a role in determining phenotype.
The aim of this dissertation is to utilize these new methods to provide a broad view into the
background effects that occur in Saccharomyces cerevisiae. In this model system, I investigate
how background effects alter the impact of spontaneous mutations, standing genetic variants
segregating in a cross, and new genetic perturbations introduced through CRISPR interference.
In chapter two, I investigate a specific mutation with a very high degree of variable expressivity,
revealing that over 15 separate loci collectively influence the mutation’s phenotypic impact. In
chapter three, I use a similar panel of diploid yeast strains to investigate the relationship
between additivity, dominance, and epistasis in the context of standing genetic variation,
mapping the loci that underlie these effects across seven environments. In chapter four, I
measure the effects of perturbing ~1,700 yeast genes across a panel of ~170 haploid yeast
genotypes, identifying over 450 genetic perturbations with background-specific effects and
mapping their underlying loci. These studies provide several genome-scale observations into
the mechanisms of background effects as a whole and their underlying genetic bases.
xi
Chapter 1: Introduction
1.1 Obstacles in studying background effects
Background effects are a form of epistasis that occurs when the same genetic variant has
different phenotypic effects in distinct individuals. Assuming a constant environment, this is due
to genetic interactions between the variant and other genomic loci that modify its effect (Mullis
et al. 2018; Chandler, Chari, and Dworkin 2013). Several previous studies have identified
examples of background effects, and have found that they can vary greatly in their magnitude of
effect, frequency in a population, appearance across environments, and complexity of their
underlying genetic basis (Mullis et al. 2018; Taylor and Ehrenreich 2015; Taylor et al. 2016; Hou
et al. 2019; J. T. Lee et al. 2019). While many broad categories and classifications of
background effects have been identified, such as incomplete penetrance and variable
expressivity, many questions about the prevalence and mechanisms underlying background
effects as a whole remain. This is in part due to the technical limitations involved in studying
background effects at a genomic scale.
This level of understanding is important due to how background effects can limit predictions of
phenotype from genotype. While the prevalence and impact of epistasis is an active subject of
debate, it is generally accepted that the majority of genetic effects behave additively (Mackay
2014), and predictive models operating under this assumption have performed well (Crow
2010). However, additive models do not typically provide accurate predictions for the
phenotypes of specific individuals. This is important because it is common for some specific
individuals to exhibit dramatically different responses to the same genetic variant (Mullis et al.
2018; I. Goldstein and Ehrenreich 2021a). A broader, genome-scale understanding of
background effects could potentially improve predictions of phenotype for individuals, which is
relevant for applications such as personalized medicine.
1
1.2 Genetic architecture of background effects
Elucidating the genetic elements underlying background effects is an important step in
understanding and predicting their effects on phenotype. While several examples of background
effects have been studied previously, the underlying genetics are often less well-understood
(Chandler, Chari, and Dworkin 2013; Nadeau 2001), especially on a genomic scale. Additionally,
the connections between the features of the genetic architecture and the properties of the
resulting background effect are currently not well understood. For example, some specific loci
have been shown to alter the effects of significantly more genetic perturbations than expected
by chance (Johnson et al. 2019). However, this has only been investigated in a small number of
studies. Additional studies are needed to understand the common features of these underlying
genetic architectures, in order to eventually reveal the biological mechanisms connecting these
alleles to the genetic perturbations they alter.
1.3 Relationship between epistasis, additivity, and dominance
In the same way that genetic perturbations can have variable effects across individuals, genetic
variants segregating in a population or a cross may also have differing effects on phenotype
depending on the genetic background in which they appear (Mackay, Stone, and Ayroles 2009).
While this phenomenon has been extensively studied in the context of haploid, inbred strains,
there are complex forms of epistasis that can only be observed in a diploid population
(Cheverud and Routman 1995; Cheverud and Routman 1996; Campbell, McGrath, and Paaby
2018). For instance, two alleles could exhibit a relationship of complete dominance in one
genetic background and incomplete dominance in another. Additionally, this is separate from
any effects the same genetic background may have on additivity. The relationship epistasis has
with additivity and dominance has not been explored extensively, as generating and
phenotyping sufficiently large populations of diploid strains has presented a technical challenge
2
(Hallin 2016; Märtens et al. 2016). Further studies that overcome this technical limitation could
provide valuable insight into the fundamental mechanisms of epistasis and background effects.
1.4 Combinatorial DNA barcoding
Background effects are apparent only when studying how a population of diverse individuals
responds to genetic perturbation. In order to study multiple background effects at once, it is
desirable to have a population where the same genetic backgrounds are present with every
possible perturbation. Creating such populations is a major technical obstacle, as the total
diversity rapidly scales up as more backgrounds and perturbations are included. Many methods
exist for introducing genetic perturbations, such as CRISPRi (Qi 2013; Smith 2016) or
transposon-based techniques (Johnson et al. 2019), but these methods are not always feasible
to individually apply to the millions of strains required for these studies. Additionally, even after
such a population has been created, efficiently phenotyping millions of strains presents a
second technical challenge. An experimental approach that can apply pool-based methods
during both the generation of the study population and the subsequent phenotyping would
address this limitation.
Combinatorial DNA barcoding (Levy 2015; Schlecht et al. 2017; Mullis 2022) can address this
issue, by allowing information about each strain to be represented by short nucleotide
sequences present in the genome. These DNA barcodes can indicate information such as the
genetic background of a strain, the genetic perturbations it carries, or the parental strains from
which it originated. By integrating multiple barcodes in close proximity at the same genomic
locus, it becomes possible to construct large strain panels very efficiently. For example,
hundreds of barcoded genotypes could be combined with thousands of barcoded perturbations
to create a theoretical population of millions of combinations, each of which can be identified via
its unique set of two barcodes. This also allows for efficient phenotyping of each strain’s fitness,
by pooling millions of strains together in a single fitness assay (Levy 2015). The frequency of
3
each barcode combination can be tracked over time and used to estimate fitness for that strain,
with a high degree of accuracy (F. Li, Salit, and Levy 2018). This approach thus addresses
several technical obstacles to studying background effects on a large scale.
1.5 Goals of this dissertation
The factors described above both limit our current understanding of background effects and
impact our ability to study them on a larger scale. My goal as a PhD student has been to
implement techniques that both address gaps in our current knowledge of background effects
and enable future research to more efficiently investigate this topic. In addition to designing and
implementing an efficient experimental framework that can identify and quantify as many
background effects as possible, my main goals included addressing the following questions: 1)
What patterns or trends of background effects are apparent when taking a genome-scale view?
2) What types or categories of loci influence background effects? 3) To what extent does genetic
background impact the response to genetic perturbation?
The budding yeast Saccharomyces cerevisiae is an ideal model system for investigating these
questions. The high natural recombination rate and short doubling time of yeast allows many
genetic backgrounds to be rapidly generated via tetrad dissection. Methods for efficiently
introducing DNA barcodes into the yeast genome have been previously developed, in addition
to techniques that allow for subsequent modification of integrated barcodes (Mullis et al. 2022a).
Along with a combinatorial DNA barcoding scheme, this allows sample populations of millions of
strains to be generated in a short time. Finally, the small, well-annotated genome and high
natural recombination rate enable higher-power linkage mapping studies, which is essential for
investigating the genetic basis of background effects.
4
1.6 Summary of the chapters
In Chapter 2, I investigate the genetic basis of a spontaneous mutation that exhibits a high
degree of variable expressivity. I use a variety of genetic mapping and engineering approaches
to dissect the genetic basis of this trait, revealing 18 loci that influence this variable expressivity.
I note that multiple different genetic backgrounds can exhibit similar responses to the
spontaneous mutation, especially if they carry a similar number of detrimental alleles.
In Chapter 3, I generate a large panel of barcoded haploid segregants, and mate these
segregants together to create a population of genetically diverse barcoded diploid strains. I
measure the fitness of each diploid strain, and map loci that impact these fitness values. Further
mapping scans identify genetic interactions, including those that alter dominance at the fitness
loci. I find that loci with strong fitness effects tend to exhibit more of these genetic interactions,
and identify that the most common locus influencing dominance effects is the mating locus,
which has little effect on fitness alone.
In Chapter 4, I introduce thousands of genetic perturbations into a similar panel of genetically
diverse S. cerevisiae haploid segregants via CRISPR interference (Qi et al. 2013; Smith et al.
2016; Smith et al. 2017), and determine the fitness of each resulting lineage. I then quantify the
fitness effect of each perturbation on a genotype-by-genotype basis, identifying hundreds of
background effects. I also map the genetic basis of these background effects, and identify
several loci that are associated with significantly more background effects than would be
expected by chance. I also find that baseline segregant fitness has a significant relationship with
response to genetic perturbation, with higher-fitness strains exhibiting more extreme responses
to detrimental perturbations.
In Chapter 5, I discuss the impact of these findings and provide possible future directions for
further large-scale studies of background effects.
5
Chapter 2: Genetic basis of a spontaneous mutation’s expressivity
This work appears essentially as published in 2022 in Genetics 220 (3), with minor edits for
readability. It was done in collaboration with Rachel Schell, Martin N Mullis, Takeshi Matsui,
Ryan Foree, and Ian M Ehrenreich.
Summary of author contributions
My primary contribution to this work involved the identification of the mrp20-A105E mutation that
occurred in a BY/3S HOS3/hos3Δ::KanMX diploid. Some of the segregant progeny from this
diploid exhibited greatly impaired growth on ethanol, while others grew at near-wild type levels.
It was not initially known that a de novo mutation was affecting growth. I performed tetrad
dissections of the HOS3/hos3Δ::KanMX diploid and phenotyped the resulting segregant
progeny on ethanol. I then sequenced these segregants on an Illumina HiSeq lane and
analyzed the data. With the help of Rachel Schell, we were able to identify the region of the
genome responsible for the impaired growth, which was found to contain the de novo
mrp20-A105E mutation.
2.1 Abstract
Genetic background often influences the phenotypic consequences of mutations, resulting in
variable expressivity. How standing genetic variants collectively cause this phenomenon is not
fully understood. Here, we comprehensively identify loci in a budding yeast cross that impact the
growth of individuals carrying a spontaneous missense mutation in the nuclear-encoded
mitochondrial ribosomal gene MRP20. Initial results suggested that a single large effect locus
influences the mutation’s expressivity, with 1 allele causing inviability in mutants. However,
further experiments revealed this simplicity was an illusion. In fact, many additional loci shape
the mutation’s expressivity, collectively leading to a wide spectrum of mutational responses.
6
These results exemplify how complex combinations of alleles can produce a diversity of
qualitative and quantitative responses to the same mutation.
2.2 Introduction
Mutations frequently exhibit different phenotypic effects across individuals in the same
population or species (“background effects”) (Nadeau 2001; Chandler et al. 2013; Riordan and
Nadeau 2017). Individuals with the same mutation may differ in whether they show an effect
(“incomplete penetrance”) or in their degrees of response (“variable expressivity”) (Griffiths et al.
2020). For example, people with the same disease-causing mutation often vary in whether they
express the associated disorder and in their manifestation of symptoms (Cooper et al. 2013).
Background effects can arise due to a myriad of reasons, including genetic interactions between
a mutation and segregating loci (“epistasis”), dominance in individuals heterozygous for a
mutation, and variability in environmental conditions experienced by individuals (Nadeau 2001),
as well as differences in microbiome composition among individuals (Wagner et al. 2021) and
stochastic noise (Raj et al. 2010). Multiple of these factors may jointly cause a mutation to show
a background effect (Lee et al. 2016, 2019).
In this study, we focus on the role of epistasis in background effects, an important topic for
evolution and inheritance (Gibson and Dworkin 2004; Jarosz et al. 2010; Paaby and Rockman
2014; Siegal and Leu 2014). Genetic interactions with loci can impact the evolutionary
trajectories of mutations, influencing selection for and against beneficial and deleterious
mutations, respectively (Weinreich et al. 2005; Lang et al. 2011; Kryazhimskiy et al. 2014;
Johnson et al. 2019). Such epistasis can also have bearing on genotype–phenotype
relationships, as a mutation may alter the phenotypic effects of interacting loci (Dworkin et al.
2003; Jarosz et al. 2010; Geiler-Samerotte et al. 2016; Schell et al. 2016; Mullis et al. 2018). In
extreme cases, a mutation may mask genetically interacting loci or convert them from cryptic to
7
phenotypically visible (Gibson and Hogness 1996; Rutherford and Lindquist 1998; Queitsch et
al. 2002; Jarosz and Lindquist 2010).
Epistasis between mutations and loci is also relevant to modern genetic applications. In
medicine, loci that genetically interact with mutations are referred to as “modifiers” (Nadeau
2001). Knowing the modifiers of a disease mutation may enable personalized prediction of
disease severity and treatment (Riordan and Nadeau 2017). This knowledge may also be
necessary to ensure potentially curative genome editing therapies, such as those for sickle cell
disease and beta thalassemia (Frangoul et al. 2021), have their intended consequences. In
agriculture, genome editing will likely become a common crop improvement strategy (Nasti and
Voytas 2021), but its success depends on edits having their predicted phenotypic effects.
Studies in diverse species have found a large fraction of mutations show background effects
due to epistasis with loci (Dowell et al. 2010; Chari and Dworkin 2013; Paaby et al. 2015; Vu et
al. 2015; Mullis et al. 2018; Galardini et al. 2019; Johnson et al. 2019). Yet, questions about the
number of loci that typically show epistasis with a mutation and the architecture of genetic
interactions between mutations and loci remain (Goldstein and Ehrenreich 2021). Addressing
these questions is difficult because natural populations harbor substantial genetic diversity,
which presents a technical challenge for genetic mapping and enables complex forms of
epistasis between mutations and multiple loci (Dowell et al. 2010; Chari and Dworkin 2013;
Chandler et al. 2014, 2017; Taylor and Ehrenreich 2014, 2015b; Paaby et al. 2015; Vu et al.
2015; Lee et al. 2016; Taylor et al. 2016; Mullis et al. 2018; Galardini et al. 2019; Hou et al.
2019; Lee et al. 2019; Parts et al. 2021).
Lab crosses can overcome these challenges (Dowell et al. 2010; Chandler et al. 2014; Taylor
and Ehrenreich 2014, 2015b; Mullis et al. 2018; Hou et al. 2019; Parts et al. 2021). Crosses can
8
be grown in controlled environments and have balanced allele and multilocus genotype
frequencies providing statistical power to detect epistatic loci (Ehrenreich 2017). In inbred or
haploid crosses, dominance will be absent, eliminating another potential concern. To date,
crosses have shown that background effects often involve multiple loci that may genetically
interact not only with a focal mutation, but also each other (Dowell et al. 2010; Chandler et al.
2014; Taylor and Ehrenreich 2014, 2015b; Lee et al. 2016, 2019; Taylor et al. 2016; Mullis et al.
2018; Hou et al. 2019; Parts et al. 2021). However, these studies mainly focused on mutations
that cause binary phenotypes and show incomplete penetrance. Similar work is needed on
mutations with quantitative phenotypic effects and variable expressivity.
In this study, we employ a series of crosses in the budding yeast Saccharomyces cerevisiae to
determine how loci cause a spontaneous mutation to show variable expressivity. We focus on a
missense mutation in MRP20, a nuclear-encoded subunit of the mitochondrial ribosome that is
essential for cellular respiration (Fearon and Mason 1992). This mutation occurred by chance in
a previously generated cross between the reference strain BY4716 (“BY”) and the clinical
isolate 322134S (“3S”) (Mullis et al. 2018). Here, we identify the MRP20 mutation and
demonstrate its variable expressivity among haploid BYx3S cross progeny (“segregants”). We
then determine how loci in the BYx3S cross individually and collectively cause this variable
expressivity. These data show that expressivity can be shaped by many loci that show a
spectrum of effect sizes and act in a predominantly additive manner.
2.3 Materials and Methods
2.3.1 Generation of Segregants
The mrp20-A105E mutation occurred spontaneously in a BY/3S HOS3/hos3Δ::KanMX diploid
produced in Mullis et al. (2018) (Fig. 2.1). As described in Fig. 2.1, this mutant strain was
generated through targeted disruption of 1 copy of the HOS3 gene in a BY/3S diploid by
9
transformation of a KanMX cassette with homology tails (Wach et al. 1994). The BY/3S diploid
used in this transformation had been produced by mating a BY MATa ho
can1Δ::STE2pr-SpHIS5 his3Δ haploid and a 3S MATα ho::HphMX his3Δ::NatMX haploid,
resulting in a BY/3S MATa/MATα CAN1/can1Δ::STE2pr-SpHIS5 his3Δ/his3Δ ho/ho::HphMX
diploid. This diploid genotype made it possible to obtain MATa haploid F2 segregants by
sporulating the initial diploid strain or its HOS3/hos3Δ::KanMX derivative and selecting for
random spores on His- plates containing canavanine (Tong and Boone 2006). Haploid
segregants from the HOS3/hos3Δ::KanMX diploid also went through a second selection for the
hos3Δ mutation on G418. The mrp20-A105E mutation was heterozygous in all BY/3S
HOS3/hos3Δ::KanMX cells, implying it was likely present in the original cell in which 1 HOS3
copy had been deleted. Following discovery of mrp20-A105E, we sporulated the BY/3S
HOS3/hos3Δ::KanMX again and performed tetrad dissection, avoiding any marker selections.
This made it possible to obtain all 4 products from each meiotic event, which was necessary to
distinguish the phenotypic effects of the mrp20-A105E and hos3Δ mutations and to test whether
these 2 mutations genetically interact.
10
Figure 2.1 The mrp20-A105E mutation occurred spontaneously, increasing phenotypic
variance in the BYx3S cross. a) A spontaneous mutation in a BY/3S diploid gave rise to a
BYx3S segregant population in which mrp20-A105E segregated. b) The mrp20-A105E
segregants exhibited increased phenotypic variance and a bimodal distribution of phenotypes.
Throughout the study, blue and orange are used to denote BY and 3S genetic material,
respectively. All growth data presented in the paper are measurements of colonies on agar
plates containing rich medium with ethanol as the carbon source.
To fine map the Chromosomes XIV locus harboring MKT1, we produced haploid F3 segregants
by mating 2 mrp20-A105E F2 segregants from Mullis et al. (2018), one with the BY allele of the
locus and the other with the 3S allele (Fig. 2.2a). Because these segregants were both MATa,
we mating-type switched the segregant with the 3S allele of the Chromosome XIV locus. To
enable mating-type switching, we first deleted URA3 from the segregant using a HphMX
cassette with homology tails (Goldstein and McCusker 1999). We then mating-type switched the
strain by transforming it with a counterselectable URA3 plasmid containing an inducible HO
11
endonuclease (Herskowitz and Jensen 1991), inducing HO, and plating single cells. After
mating the 2 haploids, the resulting diploid was sporulated and random haploid F3 segregants
were obtained by plating on His- media.
12
Figure 2.2 MKT1 genetically interacts with mrp20-A105E. a) To improve mapping resolution
of the Chromosome XIV locus, we crossed 2 mrp20-A105E F2 segregants with different alleles
of the locus and gathered a panel of F3 segregants. b) Linkage mapping in the F3 segregants
identified the Chromosome XIV locus at high resolution, with a peak at position 467,219. Tick
marks denote every 100,000 bases along the chromosome. c) Recombination breakpoints in
the F3 segregants delimited the Chromosome XIV locus to a single SNP in MKT1 at position
467,219. Vertical dashed line highlights the delimited causal polymorphism, while small vertical
lines along the x-axis indicate different SNPs in the window that is shown. d) Engineering the BY
allele into mrp20-A105E XIV3S segregants changed growth (left), while substitutions at the
nearest upstream and downstream variants did not (right).
In addition, to better understand the genetic basis of mrp20-A105E’s expressivity, we generated
new haploid F2 segregant panels, 1 in which all individuals were mrp20-A105E MKT1BY and 1
in which all individuals were mrp20-A105E MKT1
3S
(Fig. 2.3a). To facilitate this, we engineered
mrp20-A105E, as well as the 3S and BY causal variants at MKT1 position 467,219 into BY and
3S, respectively, using a 2-step CRISPR/Cas9 strategy described later. BY mrp20-A105E was
mated to 3S mrp20-A105E MKT1
BY
twice independently, the resulting diploids were sporulated,
and BYx3S mrp20-A105E MKT1
BY haploid segregants were obtained by tetrad dissection. The
same process was followed with BY mrp20-A105E MKT1
3S and 3S mrp20-A105E strains to
obtain BYx3S mrp20-A105E MKT1
3S haploid segregants.
13
Figure 2.3 Additional loci govern response to the mutation. a) We generated BY × 3S
crosses in which all segregants carried mrp20-A105E. Two crosses were performed: 1 in which
all segregants carried MKT1
BY and 1 in which all segregants carried MKT1
3S
. Tetrads were
dissected and spores were phenotyped for growth on ethanol. b) Each of the new crosses
showed increased growth that extended from inviable to wild type, differing from the more
qualitative bimodal phenotypes seen among the original mrp20-A105E MKT1 segregant
populations. c) Linkage mapping identified a total of 16 loci that influenced growth. After 4
iterations of a forward regression, no additional loci were identified.
2.3.2 Genotyping
Genotyping of previously generated F2 segregants is described in Mullis et al. (2018) and was
performed in the same manner as segregants produced in the current study. All haploid F2 and
F3 segregants generated in this study were genotyped by low coverage whole genome
14
sequencing, using previously reported methods that we summarize here (Bloom et al. 2013;
Taylor and Ehrenreich 2014; Lee et al. 2016, 2019; Mullis et al. 2018). Strains were inoculated
into liquid overnight cultures and grown to stationary phase at 30°C on a shaker. DNA was
extracted from cell pellets using Qiagen 96-well DNeasy kits. Sequencing libraries were
prepared using the Illumina Nextera Kit and custom barcoded adapters. Segregants from each
respective cross were pooled in equimolar fractions into multiplexed libraries, run on a gel, size
selected, and purified with the Qiagen Gel Extraction kit. 150 × 150-bp paired-end sequencing
was done on an Illumina HiSeq4000 with Novogene.
All sequence processing and analyses were conducted in bash and R, and specific programs
and functions used are listed in Supplementary Table S2.1. Sequencing reads were mapped
against the S288C reference genome (version
S288C_reference_sequence_R64-2-1_20150113.fsa from the Saccharomyces Genome
Database https://www.yeastgenome.org (accessed 28 January 2022)) using BWA version
0.7.7-r44 (Li and Durbin 2009). Note, BY is a close derivative of S288C, the main reference
strain of yeast. Samtools v1.9 was then used to create a pileup file for each segregant (Li et al.
2009). For both BWA and Samtools, default settings were employed. Base calls and coverages
were gathered for 44,429 SNPs that segregate in the cross and are spaced approximately every
300 bp (Taylor et al. 2016). Low coverage individuals (<0.7× average per site coverage) were
removed from analyses. Ninety percentage of segregants had an average per site coverage of
2.5 or more across all SNPs and the median per site coverage across all segregants in this
study was 12. Diploid and contaminated individuals were identified by abnormal patterns of
heterozygosity or sequencing coverage and excluded from all analyses.
For each segregant, a raw genotype vector was determined based on the percentage of 3S
calls at each marker. We then used a Hidden Markov Model (HMM) implemented in the “HMM”
15
package v 1.0 in R to correct each raw genotype vector. The HMM was used both to impute
genotype at markers lacking direct sequencing information and to correct for sequencing errors
at low coverage sites. Both types of corrections were rare. The following probability matrices
(Rabiner 1989) were used, transitionProbabilitiy = matrix{c[0.(\d+),.0001,.0001,.9999],2} and
emissionProbability = matrix{c[0.(\d+).25,0.75,0.75,0.25],2}. These values were taken from past
work (Taylor and Ehrenreich 2014; Lee et al. 2016, 2019; Mullis et al. 2018); their
appropriateness here was confirmed by visual comparison of HMM outputs to raw genotype
vectors.
Aneuploidies were identified based on segregants showing elevated sequencing coverage
across entire chromosomes. Specifically, we employed the normalmixEM() function from the
mixtools library v 1.2.0 (Benaglia et al. 2009) in R to each segregant, identifying chromosomes
that had unusually high coverage relative to the rest of the nuclear genome. A Chromosome II
duplication in a subset of BYx3S mrp20-A105E MKT1
BY and BYx3S mrp20-A105E MKT1
3S
segregants was the only detected aneuploidy. The coverage of Chromosome II in aneuploid
individuals was roughly double the rest of the genome, suggesting the presence of 1 extra copy
of that chromosome.
2.3.3 Phenotyping
Segregants were inoculated into broth containing 1% yeast extract, 2% peptone, and 2%
dextrose (“YPD”). Cultures were grown to stationary phase at 30°C over 2 days and were then
pinned onto YP plates containing ethanol (“YPE”) and 2% agar. YPE is 1% yeast extract, 2%
peptone, and 2% ethanol. Each pinned colony results from clonal growth of a genetically distinct
strain. Plates were grown at 30°C for 2 days. Growth assays were conducted in a minimum of 3
replicates across 3 plates, with randomized positioning of strains. On each plate, a BY parental
strain was included as a control. Plates were imaged with the BioRAD Gel Doc XR+ Molecular
16
Imager at a standard size of 11.4 × 8.52 cm2
(width × length) and imaged with epi-illumination
using an exposure time of 0.5 s. Images were saved as 600 dpi tiffs. ImageJ
(http://rsbweb.nih.gov/ij/ (accessed 28 January 2022)) was used to quantify pixel intensity of
each colony through the Plate Analysis JRU v1 plugin
(https://research.stowers.org/imagejplugins/zipped_plugins.html (accessed 28 January 2022)),
as discussed in a previous study (Matsui and Ehrenreich 2016). In brief, each colony was
normalized to the BY control grown on the same plate to account for batch effects. Then, the 3
normalized biological replicates for a given strain were averaged to produce a single growth
value for each genetically distinct segregant. We used replicates to calculate the broad sense
heritability of growth on ethanol. Specifically, all replicates of the 749 segregants engineered to
carry mrp20-A105E and specific MKT1 alleles were utilized as data points in the fixed effects
linear model growth ∼ strain + error. From this model, we took the Sum of Squares of the strain
term and divided it by the Sum of Squares Total. This produced a broad sense heritability
estimate of 0.93.
2.3.4 Comparison of phenotypic variance between mutant and wild-type segregants
To compare phenotypic variance between wild-type and mutant segregants, we used the
leveneTest() function in the car library (v 3.0.6) in R (Fox and Weisberg 2018). We employed the
default settings for this function, which use the median as the center for each group rather than
the mean, providing a more robust and conservative test (Fox and Weisberg 2018).
2.3.5 Linkage Mapping
Initial discovery of the spontaneous mrp20-A105E mutation was done by performing linkage
mapping with 385 haploid F2 segregants (164 wild type and 221 hos3Δ) generated in Mullis et
al. (2018). The hos3Δ segregants had been engineered to lack the gene encoding the histone
deacetylase Hos3. As discussed in the “Generation of segregants” section of the Materials and
17
Methods, this gene deletion had been introduced by transformation of PCR tailed KanMX
cassette into a BY/3S diploid. To identify mrp20-A105E, we employed the fixed effects linear
model growth ∼ hos3Δ + locus + hos3Δxlocus + error, from which the hos3Δxlocus interaction
term was used to identify loci that differentially explained growth in hos3Δ segregants.
Examination of the hos3Δxlocus interaction term led to discovery of the spontaneous
mrp20-A105E mutation, which was present in the BY copy of MRP20 in only the hos3Δ
segregants.
Following the identification of mrp20-A105E, we attempted to identify genetically interacting loci
using the fixed effects linear model growth ∼ MRP20 + locus + MRP20xlocus + error with only
hos3Δ segregants. From this scan, we examined the MRP20xlocus interaction term. Only a
single locus was identified in this scan at a −log10(P) exceeding 4 and it passed even the most
conservative thresholds, including a Bonferroni threshold (Bland and Altman 1995) accounting
for all SNPs in the cross. Three hundred and sixty-one mrp20-A105E F3 segregants were then
used to better resolve the Chromosome XIV locus. In the experiment, we employed the fixed
effects linear model growth ∼ locus + error and examined the locus term, focusing on the
Chromosome XIV locus, whose significance also exceeded a Bonferroni threshold.
To map loci affecting growth among mrp20-A105E segregants, we generated new populations
of mrp20-A105E MKT1
BY
(353) and mrp20-A105E MKT1
3S
(396) haploid segregants. The
combined 749 mrp20-A105E segregants were used in linkage mapping with a forward
regression approach. We obtained residuals (“residuals1”) from the fixed effects linear model
growth ∼ MKT1 + error and then implemented a first genome-wide scan using the model
residuals1 ∼ locus + error. We examined the locus term and called loci significant if they
exceeded the 95th quantile of maximal −log10(P-values) from 1,000 permutations (Churchill and
Doerge 1994). Permutations were implemented by randomly shuffling the residuals1 vector and
18
rerunning the genome-wide scan. In the first scan, we only called the single most significant
locus on each chromosome.
Following the first scan, we accounted for all detected loci using the fixed effects linear model
residuals1 ∼ locus 1 + locus 2 + … locus n + error and obtained the residuals (“residuals2”).
These new residuals were used in another genome-wide scan with the fixed effects linear model
residuals2 ∼ locus + error. Permutation thresholds were calculated anew for this second stage
scan in the same manner as the first scan and again a maximum of 1 significant locus per
chromosome was called. This process was repeated for 3 additional iterations, at which point no
additional loci were detectable using a permutation-based significance threshold. Chromosome
II was excluded from linkage mapping due to the presence of its aneuploidy in a subset of
individuals. The Chromosome II duplication was tested for significance using the model growth ∼
MKT1 + Chromosome II + error, from which the Chromosome II term was examined.
We then conducted genome-wide scans in mutant segregants for pairwise interactions with
detected loci, similar to Brem et al. (2005) and Storey et al. (2005). We used the fixed effects
linear model growth ∼ detected_locus + new_locus + detected_locusxnew_locus + error. In this
model, detected_Locus corresponded to the peak marker at one of the loci detected in the
forward scan and new_Locus represented a different marker. From each test, the P-value of the
detected_locusxnew_locus term was extracted. Significance thresholds were determined in
each scan using the same permutation strategy described above, with different permutations
conducted for each of the previously detected loci. No genetic interactions exceeded the
scan-specific permutation thresholds. We also explicitly tested for epistasis between MKT1 and
the newly detected locus on Chromosome XIV that we refer to as “SAL1” using the fixed effects
linear model growth ∼ MKT1 + SAL1 + MKT1xSAL1 + error. We examined the P-value for the
MKT1xSAL1 interaction term, which was greater than 0.05.
19
All linkage mapping was performed in R. Fixed effects linear models were implemented using
the lm() function in base R. Although we had genotype calls for 44,429 SNPs, we only used a
subset of these markers in genetic mapping. The difference between the total and tested
number of markers reflects the fact that certain linked markers had exactly the same genotype
calls across all segregants. Such markers containing the same genetic information will produce
identical test results. To avoid unnecessarily inflating the number of tests, we randomly selected
1 marker in each set and used it in linkage mapping. Because each genetic mapping population
contained different numbers of segregants and different recombination breakpoints, the number
of markers with unique information varied among them, from a minimum of 7,025 in the F2
population used in identifying mrp20-A105E to 18,246 in the F2 populations genetically
engineered to carry mrp20-A105E and particular MKT1 alleles.
In all scans, we required that the most significant site (“peak marker”) be a minimum of
150,000 kb away from any other locus. We also required peaks to be more than 20 kb from the
edge of a chromosome. We report confidence intervals using established protocols based on
drops in the logarithm of the odds (“LOD”), here approximated by −log10(P) values, surrounding
a peak marker at a locus; here, we use conventional 2 LOD drops (Lander and Botstein 1989).
2.3.6 Classification of inviable segregants
Initial discovery of the genetic interaction between mrp20-A105E and MKT1 suggested that
expressivity of mrp20-A105E was largely determined by variation at MKT1. All mrp20-A105E
MKT1
BY segregants exhibited very poor growth, while all mrp20-A105E MKT1
3S segregants
showed higher levels of growth. We termed this initial mrp20-A105E MKT1
BY segregant
population as “inviable.” Figures 2.4 and 2.5 include a gray dashed line to denote the highest
growth value observed among the original inviable segregants.
20
Figure 2.4 Mitochondrial genome instability partially underlies the expressivity of
mrp20-A105E. We measured petite formation frequency, which estimates the proportion of cells
within a clonal population capable of respiratory growth. Higher petite frequency is a proxy for
greater mitochondrial genome instability. a) We examined MRP20 and mrp20-A105E versions of
the BY and 3S parent strains. For each, average values and 95% bootstrapped confidence
intervals are shown. BY showed elevated mitochondrial genome instability in the presence of
mrp20-A105E, while 3S showed no change. b) We examined 16 BYx3S MRP20 segregants.
These segregants were randomly selected and spanned the range of growth values for MRP20
segregants. c) 45 BYx3S mrp20-A105E segregants. Poorer growing segregants tended to
exhibit higher mitochondrial genome instability, though some exhibited wild-type levels of
mitochondrial genome instability. The gray dashed line indicates the threshold used to call
inviability.
21
Figure 2.5 Detected loci quantitatively and qualitatively explain mutant phenotypes. a) We
fit a linear model accounting for the effects of all detected loci and the aneuploidy on the growth
of mrp20-A105E segregants. This model not only explained the growth of the new BYx3S
mrp20-A105E crosses generated in this study, but also accurately predicted the phenotypes of
the mutant parents and previously generated segregants. b) We examined growth relative to the
sum of detrimental alleles carried by a segregant. This relationship shows how collections of loci
produce a quantitative spectrum of phenotypes, including instances of qualitative phenotypic
responses. This relationship explains the full range of responses, from inviable to wild-type
growth, across MKT1-SAL1 genotypes. The gray dashed line indicates the threshold used to
call inviability.
2.3.7 Resolving loci and identifying candidate genes
To help resolve loci, we utilized all 44,429 SNPs to analyze recombination breakpoints within
loci containing mrp20-A105E, MKT1, and SAL1/PMS1. In each case, we split the appropriate
segregants into 2 groups depending on whether they carried the BY or the 3S allele at the peak
marker. Segregants’ haplotypes across the adjacent genomic window were then examined. The
22
likely causal regions were determined by identifying the SNPs fixed for BY among all BY
individuals and fixed for 3S among all 3S individuals. All recombination breakpoints were
confirmed by visual inspection of raw Illumina sequencing using the view function of BWA
version 0.7.7-r44 (Li and Durbin 2009). Previously generated F2 segregants (Mullis et al. 2018),
newly generated F3 segregants, and newly generated genetically engineered segregants were
used to localize MRP20, MKT1, and SAL1/PMS1, respectively. Delimiting these loci in this
manner was only possible because of their large phenotypic effects. For other loci with smaller
effects, we identified all markers showing peak significance and listed any genes containing
these markers as “candidates.” These loci and the candidate genes in them are provided in
Table 2.1 and Supplementary Tables S2.3 and S2.4.
Candidate
gene(s)
Summary of associated function(s)
AFR1 Pheromone-induced projection (shmoo) formation; Septin architecture
during mating
HOS2 Histone deacetylase, subunit of Set3 and Rpd3L complexes
YGL193C Haploid specific gene
IME4 Methyltransferase, conditionally essential for meiosis
SDH1 Flavoprotein involved in TCA cycle in mitochondria
BPT1 Vacuolar transmembrane protein
RNH203 Ribonuclease H2 subunit, ribonucleotide excision repair
BOP2 Unknown
ECM7 Putative integral membrane protein with a role in calcium uptake
23
YMR090W Unknown
NPL6 Component of the RSC chromatin remodeling complex
ECM5 Subunit of Snt2C complex involved in gene regulation in response to
oxidative stress
AEP2 Mitochondrial protein; likely involved in translation
PBR1 Putative oxidoreductase; required for cell viability
SAL1 ADP/ATP transporter in mitochondria
PMS1 ATP-binding protein required for mismatch repair; required for both
mitosis and meiosis
BRX1 Nucleolar protein involved in rRNA processing
YOR008C-A Unknown, potential transmembrane domain
TIR4 Cell wall mannoprotein; required for anaerobic growth
ISN1 Ionosine metabolism
ISW2 ATP-dependent DNA translocase involved in chromatin remodeling
Table 2.1 Candidate genes at detected loci have diverse cellular functions. Candidate
genes were identified by looking at all genes within an interval that contained markers showing
maximal significance. For this analysis, we considered all SNPs within an interval, not just the
subset used in linkage mapping. Candidate genes in these intervals are listed with brief
annotations for the candidates are provided here from the Saccharomyces Genome Database
(Cherry et al. 2012). More information about these loci is provided in Supplementary Tables
S2.3 and S2.4.
2.3.8 Reciprocal hemizygosity experiments
Four hos3Δ F2MATa segregants were used in all reciprocal hemizygosity (RH) experiments
focused on the Chromosome IV locus (“IV”) (Steinmetz et al. 2002): 2 were hos3Δ IV
BY XIV
BY
and 2 were hos3Δ IV
3S XIV
BY
. The 4 segregants were first mating-type switched to enable
mating of these segregants to produce homozygous IV
BY
/IV
BY
, homozygous IV
3S
/IV
3S
, or
24
heterozygous IV
BY
/IV
3S diploids. Each pairwise mating was performed and confirmed by plating
on mating-type tester plates. These diploid strains were then phenotyped on YPE plates, which
verified that IV
BY has an effect in diploids and acts in a recessive manner. Using the haploid
MATa and MATα versions of these 4 segregants, we individually engineered premature stop
codons into DIT1, MRP20, and PDR15 using CRISPR-mediated targeted gene disruption and
lithium acetate transformations (Gietz and Woods 2002). Plasmid-based CRISPR-Cas9 was
employed to target the beginning of each coding region and 20-bp repair templates which
contained a premature stop codon followed by 1-bp deletions were incorporated. Each sgRNA
and repair template was designed so that only the first 15 (of 537), 26 (of 264), and 33 (of
1,530) amino acids would be translated for DIT1, MRP20, and PDR15, respectively. Engineered
strains were confirmed by PCR and Sanger sequencing. After confirmation, wild-type and
knockout strains for each gene were then mated in particular combinations to produce
reciprocal hemizygotes that were otherwise isogenic. A minimum of 2 distinct hemizygotes were
generated for each allele of each gene.
2.3.9 Construction of nucleotide replacement strains
Single nucleotide replacement strains were generated for MRP20 and MKT1 using a
CRISPR/Cas9-mediated approach. For a given replacement, the appropriate strain was first
transformed with a modified version of pML104 that constitutively expresses Cas9 using LiAc
transformation (Gietz and Woods 2002; Laughery et al. 2015). We then inserted the KanMX
gene using cotransformation of a double-stranded DNA containing KanMX with 30-bp upstream
and 30-bp downstream homology tails and gRNAs targeting the region containing the site of
interest (Kannan et al. 2016). DNA oligos and PCR were used to construct custom sgDNA
templates which included crRNA and tracrRNA in a single molecule. Next, we employed T7
RNA Polymerase to express sgDNA templates in vitro. DNase treatment and phenol extraction
were used to obtain purified sgRNAs. Transformants were selected on media containing G418,
25
and KanMX integration was confirmed by PCR. Next, KanMX was replaced with the nucleotide
of interest. To do this, integrants were cotransformed with 4 gRNAs targeting KanMX, a 60-bp
single-stranded DNA repair oligo, and a marker plasmid expressing either HphMX or NatMX
using electroporation (Thompson et al. 1998). Marker plasmids were constructed by Gibson
assembly with HphMX or NatMX and pRS316 (Sikorski and Hieter 1989; Gibson et al. 2009).
Repair constructs were 60-bp ssDNA oligos ordered from Integrated DNA technologies that
included upstream homology, the desired nucleotide at the site of interest, and downstream
homology. Transformants were selected on media containing either hygromycin or
nourseothricin, depending on what marker plasmid was used. Replacement strains were then
confirmed by Sanger sequencing.
Following this strategy, the mrp20-A105E nucleotide was engineered into 2 hos3Δ IV3S XIV
BY
segregants, and 2 hos3Δ IV
BY XIV
BY segregants were restored to MRP20. Similarly, at MKT1 the
causal, nearest upstream and downstream SNPs were engineered into 2 hos3Δ IV
BY XIV
3S
segregants. Similarly, we generate BY mrp20-A105E, BY MKT1
3S
, 3S mrp20-A105E, and 3S
MKT1
BY strains in this manner. Each single nucleotide parental replacement strain was then
backcrossed to its own progenitor. Each subsequent diploid was sporulated and tetrad
dissected, and we confirmed haploid genotypes by Sanger sequencing. The same approach
was used to generate 3S mrp20-A105E MKT1
BY haploids by crossing 3S mrp20-A105E and 3S
MKT1
BY strains. However, this strategy could not be followed to generate BY mrp20-A105E
MKT1
3S haploids, because, crossing BY mrp20-A105E and BY MKT1
3S strains failed to produce
any tetrads with 4 viable spores. Instead, we took BY MKT1
3S strains and converted MRP20 to
mrp20-A105E.
26
2.3.10 Mitochondrial genome instability experiments
Budding yeast are facultative anaerobes and can survive without functional mitochondria
(Contamine and Picard 2000). Cells with defective mtDNA or no mtDNA are referred to as
“petites” because they form small colonies relative to cells with mtDNA which form normal-sized
“grande” colonies (Ephrussi et al. 1949; Dujon 1981). Mitochondrial genome stability can be
quantitatively measured as the frequency of spontaneous petites in a population of cells
(Sherman 2002; Dimitrov et al. 2009). We performed petite frequency assays as described in
Dimitrov et al. (2009). In brief, freezer stocks were streaked onto solid YPD media and grown for
2 days at 30°C. Single colonies were then resuspended in PBS, plated across dilutions onto
YPDG plates (1% yeast extract, 2% peptone, 0.1% glucose, and 3% glycerol) and grown for 5
days at 30°C. Plates were then imaged with the BioRAD Gel Doc XR+ Molecular Imager at a
standard size of 12.4 × 8.9 cm2 (width × length) and imaged with epi-illumination using an
exposure time of 0.5 s. Images were saved as 600 dpi tiffs. ImageJ (http://rsbweb.nih.gov/ij/
(accessed 28 January 2022)) was used to examine growth and quantitate colony size as
discussed in Dimitrov et al. (2009). Colonies were then classified as petite and grande using a
threshold defined as the maximum colony diameter of observed petites among BY and 3S
wild-type strains. Petite frequency is the ratio of small colonies to total colonies.
2.3.11 Modeling growth and examining the model in additional segregant populations
We modeled growth for mrp20-A105E segregants from the BYx3S crosses fixed for
mrp20-A105E and engineered at MKT1. We incorporated MKT1, the 16 detected loci, and the
Chromosome II aneuploidy into a fixed effects linear model growth ∼ MKT1 + locus1 + locus2 +
… locus16 + Chromosome II + error. This model was used to generate predicted growth values.
We then compared our observed growth values to these predictions. We also obtained
predictions for the original mutant population and cloned parent strains from this linear model.
These predictions were compared to observed growth measurements, shown in Fig. 2.5a. In
27
addition, a ten-fold cross-validation approach was employed in which we fit the model to distinct
9/10th subsets of data and then used it to obtain phenotypic predictions for the remaining
1/10ths of data. We then compared the predicted values to the observed growth values and
obtained Pearson correlations. We also fit a genomic best linear unbiased prediction (“gBLUP”)
model to these data in which genetic relatedness was used to predict phenotype (Henderson
1975). Specifically, we used the A.mat() function from the “sommer” library (v.4.0.9) in R to
generate the genetic relatedness matrix and the mmer() function to fit this matrix to our growth
data and obtain phenotypic predictions (Henderson 1975; Endelman and Jannink 2012;
Covarrubias-Pazaran 2016).
2.3.12 Relationship between detrimental alleles, growth, and inviability
At each detected locus influencing response to mrp20-A105E, we determined the allele
associated with worse growth (“detrimental allele”). Next, we counted the number of detrimental
alleles carried by each mrp20-A105E segregant and examined how phenotypic response to
mrp20-A105E related to it. The MKT1 and SAL1 loci were not included when counting
detrimental alleles, so that this relationship could be examined across different MKT1-SAL1
genotype classes.
2.4 Results and Discussion
2.4.1 A spontaneous mutation increases phenotypic variance in the BYx3S cross
In a BY/3S diploid, a spontaneous mutation occurred in a core domain of Mrp20 that is
conserved from bacteria to humans (Fig. 2.1a, Supplementary Fig. S2.2, and Supplementary
Table S2.1) (Fearon and Mason 1992; Koc et al. 2001). This mutation resulted in an alanine to
glutamic acid substitution at amino acid 105 (mrp20-A105E). We sporulated this diploid,
unaware of the mutation, and obtained haploid BYx3S F2 segregants, some of which
possessed mrp20-A105E. We then phenotyped the segregants for colony growth on agar plates
28
containing glucose, a fermentable carbon source, or ethanol, a respirable carbon source. In
yeast, colony growth assays are commonly used to measure the phenotypic effects of mutations
(e.g. Dowell et al. 2010; Costanzo et al. 2016), as well as to characterize phenotypic differences
among genetically diverse strains (e.g. Bloom et al. 2013, 2015; Matsui and Ehrenreich 2016).
Here, cells from each segregant were pinned onto agar plates and colony size was measured at
a common endpoint, with final size a proxy for the number of mitotic divisions that occurred.
The mrp20-A105E mutation was discovered because some haploid BYx3S segregants showed
near-zero growth (“inviability”) on ethanol but not glucose. On respirable carbon sources
specifically, loss-of-function mutations in MRP20 should cause poor growth by disrupting
translation in the mitochondrial compartment and impairing cellular respiration. Unexpectedly,
many mrp20-A105E segregants exhibited high levels of growth on ethanol, with some
possessing phenotypes similar to wild-type segregants. The BYx3S mrp20-A105E segregants
exhibited substantially elevated phenotypic variance relative to wild-type segregants (Fig. 2.1b;
Levene’s test using median, P = 5.9 × 10−22). Among the mrp20-A105E segregants, there were
2 modes, centered on 10% and 57% growth relative to the haploid BY parent (bimodal fit log
likelihood = 30; Supplementary Fig. S2.3). This increase in phenotypic variance, as well as
certain segregants showing a lack of impairment by mrp20-A105E, suggested the mutation
displays variable expressivity on ethanol, our focal environment hereafter.
2.4.2 A large effect locus shows epistasis with mrp20-A105E
Multiple lines of evidence suggest that epistasis with segregating loci in the BYx3S cross
causes mrp20-A105E’s variable expressivity. The same polymorphisms segregated among
MRP20 and mrp20-A105E segregants, which were derived from a common diploid progenitor.
All segregants were haploid and phenotyped under the same conditions, ruling out dominance
and environmental effects as explanations, respectively. Also, we measured the growth of yeast
29
colonies, which contain hundreds of thousands to millions of cells, making stochastic phenotypic
differences among cells an unlikely explanation as well.
To identify loci that genetically interact with mrp20-A105E, we performed linkage mapping using
all MRP20 and mrp20-A105E segregants, seeking loci that show pairwsie epistasis with the
mutation. In a genome-wide scan, we detected a single locus on Chromosome XIV (interaction
term in a full-factorial 2-way ANOVA, P = 4.3 × 10−16; Fig. 2.6a). This locus (“XIV”) exceeded
even a conservative threshold based on a Bonferroni correction (Bland and Altman 1995)
accounting for all 44,429 genotyped SNPs in the cross. Individuals with the XIV
BY allele grew
worse than individuals with XIV
3S allele in both MRP20 and mrp20-A105E segregants, but to a
greater extent among the mutants (Fig. 2.6b). XIV explained 79% of the phenotypic variance
within mrp20-A105E segregants (ANOVA, P = 3.2 × 10
−31
) and XIV
BY was present in all inviable
segregants (Fig. 2.6b).
30
Figure 2.6 Epistasis between MRP20 and Chr XIV appears to mostly explain response to
mrp20-A105E. a) Linkage mapping in the BYx3S segregants shown in Fig. 2.1 identified a
locus on Chromosome XIV that exhibits a 2-way genetic interaction with MRP20. b) The
Chromosome XIV locus had effects in both MRP20 and mrp20-A105E segregants but had a
greater effect among mrp20-A105E segregants.
Yeast is a highly recombinogenic organism, with 1 centimorgan in genetic map distance roughly
equal to 2.5 kb in physical map distance (Cherry et al. 1997; Mancera et al. 2008). Thus, we
performed another cross, with the goal of using yeast’s high recombination rate to help resolve
the causal polymorphism. We mated 2 mrp20-A105E segregants that differed at XIV
(mrp20-A105E XIVBY × mrp20-A105E XIV3S; Fig. 2.2a and Supplementary Table S2.2). From
this advanced intercross, 361 F3mrp20-A105E progeny were sequenced and phenotyped.
31
Linkage mapping with these data reidentified XIV at a P-value of 2.50 × 10−43 (ANOVA; Fig.
2.2b and Supplementary Fig. S2.4). This locus also far exceeded a Bonferroni threshold.
Analysis of the peak marker and recombination breakpoints surrounding it delimited XIV to a
single SNP in the coding region of MKT1 (Fig. 2.2c). This SNP causes a substitution at amino
acid 30, with BY and 3S encoding glycine and serine, respectively, and was validated by
nucleotide replacement in mrp20-A105E segregants (Fig. 2.2d). Notably, this specific SNP was
previously shown to play a role in mitochondrial genome stability (Dimitrov et al. 2009).
2.4.3 Epistasis between MRP20 and MKT1 differs in cross parents and segregants
Yeast is also highly amenable to genetic engineering (Schindler 2020), making it possible to
validate results from linkage mapping using allele replacement. To validate the epistasis
between mrp20-A105E and MKT1, we introduced all 4 possible combinations of the causal
nucleotides at these genes into BY and 3S haploids using CRISPR/Cas9-mediated genetic
engineering (Fig. 2.7). The mrp20-A105E mutation affected growth in both parent strains
(t-tests, P in BY strain = 4.3 × 10−24 and P in 3S strain = 4.0 × 10−4). However, the magnitude of
the phenotypic effect differed between the 2: mrp20-A105E caused inviability in BY but had a
more modest effect in 3S. In addition, response to mrp20-A105E was modified by MKT1 in a
parent background-dependent manner. Specifically, response to the mutation was more severe
in the presence of MKT1BY than MKT13S in 3S (ANOVA, P = 0.01) but not BY [ANOVA,
P = 0.99], presumably because BY mrp20-A105E strains were generally inviable.
32
Figure 2.7 Genetic engineering of parents revealed unexpected responses to
mrp20-A105E. We engineered all combinations of MRP20 and MKT1 into the BY and 3S cross
parents. Expected phenotypes are shown as shaded boxes denoting 95% confidence interval
based on the originally obtained segregant phenotypes.
Although linkage mapping initially suggested that MKT1 explains most of mrp20-A105E’s
variable expressivity, our genetic engineering results clearly showed this was not the case.
Specifically, BY mrp20-A105E MKT1
3S
, 3S mrp20-A105E MKT1
BY
, and 3S mrp20-A105E
MKT1
3S all exhibited different phenotypes than expected based on mrp20-A105E segregants
(Fig. 2.7). These results imply additional loci must also genetically interact with mrp20-A105E
and contribute to its variable expressivity.
33
2.4.4 Many additional loci affect the expressivity of mrp20-A105E
To enable the identification of other loci influencing the expressivity of mrp20-A105E, we
generated 2 new BYx3S haploid F2 crosses using genetically engineered BY and 3S parents
(Fig. 2.3a and Supplementary Table S2.2). Both crosses were designed so that all segregants
in the same cross would carry mrp20-A105E and the same MKT1 allele, either MKT1BY or
MKT13S. By engineering the crosses in this way, we increased the chance of detecting
additional loci contributing to the variable expressivity of mrp20-A105E. From these engineered
crosses, 749 total segregants were obtained through tetrad dissection, ensuring these
individuals would have balanced allele frequencies at all loci and random multilocus genotypes.
These segregants were then sequenced and phenotyped for growth on ethanol.
In contrast to the bimodal phenotypic distribution observed in the original mrp20-A105E
segregants, these new crosses exhibited continuous ranges of phenotypes, as well as higher
phenotypic variance (Fig. 2.3b). This finding suggests that genetically modifying the cross
parents altered the phenotypic effects of loci that genetically interact with mrp20-A105E or
MKT1, uncoupled linked loci that also genetically interact with mrp20-A105E from MKT1, or
both. In both the MKT1BY and MKT13S crosses, mrp20-A105E segregants ranged from
inviable to nearly wild type. The distributions of phenotypes in the 2 crosses differed in a manner
consistent with their MKT1 alleles, with the mean of the MKT1BY segregants lower than the
MKT13S segregants (t-test, P = 4.8 × 10−34). These data show that regardless of the MKT1
allele present, additional loci can cause mrp20-A105E to show phenotypic effects ranging from
lethal to benign.
We next mapped these other loci influencing the expressivity of mrp20-A105E. Excluding MKT1,
which explained 18% of the phenotypic variance in the new crosses, linkage mapping identified
16 new loci (Fig. 2.3c, Supplementary Fig. S2.5, and Supplementary Table S2.3). We did not
detect genetic interactions among the loci, suggesting any epistasis between them is weak. Of
34
the new loci, the BY allele was inferior at 10 and superior at 6. These loci individually explained
between 0.79% and 14% of the phenotypic variance in the new crosses. Thirteen of these loci
resided on a subset of chromosomes but were distantly linked: 4 on Chromosomes XII, 3 on
XIII, 2 on XIV, and 4 on XV. The 3 remaining loci were detected on Chromosomes IV, VII, and
XI. We delimited these other loci to small genomic intervals spanning 1 (12 loci), 2 (3 loci), or 3
(1 locus) candidate genes based on markers showing peak significance (Supplementary Table
S2.4). The candidate genes in these intervals functioned in many compartments of the cell and
implicated multiple pathways rather than a single molecular process, suggesting the
mechanistic basis of mrp20-A105E’s variable expressivity is complex (Table 2.1).
2.4.5 The Chromosome XIV locus contains multiple causal variants
Among the newly detected loci, the largest effect (14% phenotypic variance explained) was on
Chromosome XIV near the first locus we had detected at MKT1. The position of maximal
significance at this new site was 2 genes away from the end of MKT1, with a confidence interval
that did not encompass the causal variant in MKT1 (Supplementary Table S2.4). Thus, the
originally identified large effect XIV locus in fact represents multiple distinct closely linked
nucleotides that both genetically interact with MRP20 and occur in different genes (Fig. 2.8a).
35
Figure 2.8 Analysis of MKT1 and SAL1 genotypes reveals an aneuploidy also contributes
to mrp20-A105E’s variable expressivity. a) Inviable segregants were present among all
mrp20-A105E MKT1 SAL1 genotype classes. b) Aneuploid individuals with duplicated
Chromosome II showed reduced growth. Aneuploid individuals were not evenly detected across
the different MKT1-SAL1 genotype classes.
The new locus on Chromosome XIV was delimited to 2 genes, one of which was SAL1,
encoding a mitochondrial ADP/ATP transporter that physically interacts with Mrp20 (Singh et al.
2020). A SNP in SAL1 that segregates in this cross was previously linked to increased
mitochondrial genome instability in BY (Dimitrov et al. 2009), suggesting it is likely also causal in
our study. For this reason, we refer to this additional Chromosome XIV locus as “SAL1.” We
found no evidence for epistasis between MKT1 and SAL1 (interaction term in a full-factorial
2-way ANOVA, P = 0.77). Although the MKT1-SAL1 locus had a large effect, it explained a
minority of the phenotypic variance among mrp20-A105E segregants in a model including all
detected loci (32% for MKT1-SAL1 vs 36% for all other loci collectively). Thus, by enabling
MKT1 and SAL1 to segregate independently through genetic engineering and examining a large
number of mrp20-A105E segregants with different MKT1-SAL1 genotypes, we observed a
greater diversity of phenotypes among individuals with the mutation than was originally seen
and detected many additional loci contributing to the mutation’s expressivity.
2.4.6 Aneuploidy also contributes to the expressivity of mrp20-A105E
Despite the fact that the identified loci explain most of mrp20-A105E’s expressivity, some
individuals exhibited growth that was appreciably lower than expected, suggesting yet another
unidentified genetic factor was present (Fig. 2.8b). This finding led to the detection of a
Chromosome II duplication that reduced growth (ANOVA, 1.2 × 10−48). The aneuploidy was
common among mrp20-A105E segregants, with a higher prevalence when MKT1
3S was also
present (Fisher’s exact test, P = 1.5 × 10−43; Supplementary Table S2.5). The Chromosome II
aneuploidy was not seen among wild-type segregants. These data suggest that mrp20-A105E
36
increases the rate of aneuploidization and that genetic variation in MKT1 influences the degree
to which mrp20-A105E segregants duplicate Chromosome II. The aneuploidy’s contribution to
phenotypic variation was relatively minor, explaining 5% of phenotypic variance among
mrp20-A105E segregants in a model also including all identified loci.
2.4.7 Multiple cellular mechanisms underlie poor growth in the presence of mrp20-A105E
Evidence suggests mitochondrial genome instability contributes to the variable expressivity of
mrp20-A105E. First, mitochondrial genome instability is known to cause poor growth on
respirable carbon sources, such as ethanol (Shadel 1999; Lipinski et al. 2010). Second, the
exact variants that segregate in our cross at MKT1 and SAL1 were previously linked to
mitochondrial genome instability (Dimitrov et al. 2009). Third, both Mrp20 and Sal1 function in
the mitochondria (Koc et al. 2001; Kucejova et al. 2008). Fourth, 2 other candidate genes in the
newly detected loci encode proteins that function in the mitochondria (Table 2.1).
To examine whether mitochondrial genome instability contributes to the variable expressivity of
mrp20-A105E, we quantified petite formation, a measure of spontaneous mitochondrial genome
loss (Fig. 2.4) (Ephrussi and Slonimski 1955; Sherman 2002; Dimitrov et al. 2009). This
phenotype can be examined by plating individual cells on media containing a respirable carbon
source and inspecting the resulting colonies; cells with defective mitochondria will produce
petite colonies, while cells with functional mitochondria will produce normal-sized colonies
(Ephrussi et al. 1949; Dujon 1981; Dimitrov et al. 2009). The proportion of petite and normal
colonies produced by a strain provides an estimate of its mitochondrial genome instability. Petite
formation and colony growth are distinct but related phenotypes: a high rate of petite formation
is one, but not the only, reason a strain might grow poorly on ethanol.
37
In the petite assays, we examined MRP20 and mrp20-A105E BY and 3S parent strains, as well
as 16 MRP20 segregants and 42 mrp20-A105E segregants. Despite causing reduced growth in
both parents, mrp20-A105E only led to elevated mitochondrial genome instability in BY (t-test,
P = 0.013 in BY and P = 0.39 in 3S; Fig. 2.4a). Also, although mrp20-A105E segregants
exhibited increased mitochondrial genome instability relative to MRP20 segregants (Wilcoxon
rank-sum test, P = 0.023), especially at lower levels of growth, a subset of inviable segregants
did not show elevated petite formation (Fig. 2.4, b and c). These results show that mitochondrial
genome instability is 1 cellular process contributing to mrp20-A105E’s variable expressivity, but
also imply that other mechanisms are involved as well. This finding agrees with our mapping
results, which suggested that a number of distinct molecular pathways influence the phenotypic
effect of mrp20-A105E (Table 2.1).
2.4.8 Genetic underpinnings of mrp20-A105E’s expressivity in segregants and parents
We determined the extent to which the identified loci explained phenotypic variability among
mutants. We fit a fixed effects linear model of growth as a function of all identified loci and the
aneuploidy. This model accounted for most (78%) of the broad sense heritability among
mrp20-A105E segregants. Furthermore, phenotypic predictions for segregants based on this
model were strongly correlated with their observed phenotypes (Pearson’s r = 0.85,
P = 4.4 × 10−209; Fig. 2.5a). A ten-fold cross-validation analysis found this result was robust,
with the ten-fold producing correlations between observed and predicted phenotypes that
ranged from 0.78 to 0.91 (mean Pearson’s r = 0.84). In addition, genomic best linear unbiased
predictions (gBLUPs) had a Pearson’s r of 0.82 with observed phenotypes, which was roughly
equal to the correlation obtained using predictions from the fixed effects linear model. These
results suggest we have explained nearly all of the genetic basis of mrp20-A105E’s variable
expressivity.
38
These results show that the variable expressivity of mrp20-A105E is driven by many loci.
Confirming this point, the fixed effects linear model including all detected loci was also effective
at predicting the phenotypes of genotypes that were not present in the new engineered crosses,
but had been generated throughout the course of this work. For example, the model accurately
predicted the phenotypes of the original mrp20-A105E segregant population (Pearson’s r = 0.90,
P = 1.6 × 10−39), as well as the phenotypes of cross parents engineered to carry mrp20-A105E
(Fig. 2.5a). Moreover, the model explained both qualitative and quantitative variation within and
between the 2 XIV classes that were originally seen among mrp20-A105E segregants.
Finally, we examined how diverse combinations of loci collectively produced similar phenotypic
responses to mrp20-A105E. We examined the relationship between growth and the total
number of detrimental alleles carried by mrp20-A105E segregants, keeping track of each
individual’s genotype at MKT1 and SAL1, the largest effect loci (Fig. 2.5b). The number of
detrimental alleles carried by a segregant showed a strong negative relationship with growth,
which was not observed in wild-type segregants (Supplementary Fig. S2.6). Furthermore,
regardless of genotype at MKT1 and SAL1, the phenotypic effect of mrp20-A105E ranged from
lethal to benign in a manner dependent on the number of detrimental alleles present at other
loci. These findings demonstrate that many segregating loci beyond the large effect MKT1-SAL1
locus influence the expressivity of mrp20-A105E and enable different genotypes in the cross to
exhibit a broad range of responses to the mutation, including inviability.
2.5 Conclusion
We have provided a detailed genetic characterization of the expressivity of a spontaneous
mutation, mrp20-A105E. Response to this mutation in a budding yeast cross is influenced by at
least 18 genetic factors in total, with the largest effect due to 2 closely linked variants. However,
at least 15 additional loci segregate and jointly exert greater effects than the largest 2, MKT1
39
and SAL1. Different combinations of alleles across these loci produce a continuous spectrum of
phenotypic responses to the mutation. Due to tight linkage between MKT1 and SAL1 in the
original cross parents, the full extent of this continuum was not originally observed, leading to an
initial understanding of the expressivity of the mrp20-A105E mutation that was simplistic. It was
only once we disrupted linkage disequilibrium between MKT1 and SAL1 through genetically
engineering the parent strains and producing new segregants that the full extent of
mrp20-A105E’s variable expressivity was visible.
The identified loci largely explain mrp20-A105E’s variable expressivity and thus make it possible
to answer practical and theoretical questions about background effects. For example, our work
helps to connect the manifestation of discrete, qualitative mutational responses to their
quantitative genetic underpinnings. Whether responses to a mutation appear qualitative may
depend on the combinations of responsive alleles in examined individuals. In our case, tight
linkage between the largest effect loci, MKT1 and SAL1, made it appear at first that response to
mrp20-A105E might be more discrete in nature, as 2 phenotypic modes were observed among
mutants such that segregants with the MKT1BY allele were often inviable (Fig. 2.1b). Yet, later
work showed these results were misleading and that segregants in all 4 genotype classes
involving MKT1 and SAL1 in fact exhibit continuous responses to mrp20-A105E. Our data
suggest this is because of the large number of additional loci that contribute to variable
expressivity in our study, which despite having small effects individually are able to exert a large
influence over segregants’ responses to mrp20-A105E collectively.
In addition, our findings imply that different individuals may show similar responses to the same
mutation due to distinct molecular and cellular mechanisms. When we examined mitochondrial
genome instability, we found that most, but not all, mrp20-A105E segregants that were inviable
on ethanol exhibited substantially elevated instability. However, a minority of the inviable
40
mrp20-A105E segregants had highly stable mitochondrial genomes, implying they grew poorly
on respirable carbon sources because of a different cellular mechanism. The cause of inviability
in these individuals is less clear, as identified candidate genes functioned in many different
processes (Table 2.1). This finding likely reflects the fact that growth is a composite phenotype
shaped by many cellular processes. Supporting this point, gene–gene deletion studies in yeast
have shown genes in diverse cellular processes can genetically interact to affect growth
(Costanzo et al. 2016), illustrating the distinction between genetic interactions and direct
functional relationships between gene products (Boone et al. 2007). With this said, we note that
in other work on discrete traits (Taylor and Ehrenreich 2014, 2015b; Lee et al. 2016, 2019;
Taylor et al. 2016), we found a different outcome: identified genes underlying background effects
functioned in common pathways and impacted the regulation of a single key gene. These
findings illustrate how the genetic basis of a background effect may depend on the mechanisms
giving rise to the affected phenotype.
Another major area of interest regarding background effects is their impact on evolution. Within
a population, background effects may influence whether beneficial alleles enable adaptation or
deleterious alleles are purged (Lang et al. 2011; Kryazhimskiy et al. 2014; Johnson et al. 2019).
Regarding the latter, in the case of mrp20-A105E, a highly deleterious allele in respiratory
conditions, we found that some segregants with the mutation exhibited wild-type levels of
growth. This finding demonstrates how variable expressivity that is caused by epistatic loci may
enable some individuals to avoid phenotypic or fitness impairments in the presence of
deleterious mutations (Siegal and Leu 2014), potentially muting purifying selection on these
alleles. Depending on the prevalence of background effects across different mutations, epistasis
between mutations and segregating loci could have a substantial impact on the persistence of
deleterious mutations within populations over time.
41
Our results also inform efforts to understand variable expressivity in other systems, including
humans. For example, there is interest in determining why people who carry highly penetrant
alleles known to cause disease do not develop pathological conditions (Chen et al. 2016;
Narasimhan et al. 2016; Riordan and Nadeau 2017). Such resilience could provide insights into
potential therapies, but our work indicates it is likely to involve numerous loci. This speaks to the
complicated and unexpected epistasis that can arise between mutations and segregating loci in
genetically diverse populations (Carlborg and Haley 2004; Shao et al. 2008; Dowell et al. 2010;
Chari and Dworkin 2013; Mackay 2014; Chandler et al. 2014, 2017; Taylor and Ehrenreich
2014, 2015a, 2015b; Paaby et al. 2015; Vu et al. 2015; Lee et al. 2016, 2019; Taylor et al. 2016;
Forsberg et al. 2017; Campbell et al. 2018; Mullis et al. 2018; Hou et al. 2019). At the same
time, our study also provides reason for optimism. Despite the highly complex genetic basis of
mrp20-A105E’s variable expressivity, a gBLUP model lacking any explicitly defined loci
explained the mutation’s expressivity as well as a model explicitly including all identified loci.
This implies that directly mapping the epistatic loci causing variable expressivity may not be
necessary to achieve a predictive understanding of genotype–phenotype relationships among
individuals carrying particular mutations. These insights illustrate how characterizing
background effects in genetically diverse populations is immediately relevant to inheritance,
disease, and evolution.
42
2.6 Supplementary Material
Figure S2.1 Generation of BYx3S segregants. Two distinct BY/3S diploid progenitors were
used to obtain wild type and hos3∆ segregants respectively. Each diploid was generated and
sporulated. The synthetic genetic array marker system was used to generate MATa segregants.
Wild type segregants were collected directly from MATa selection plates. MATa selection plates
were replica-plated onto G418 plates from which hos3∆::KanMX segregants were selected.
43
Figure S2.2 Identification of mrp20-A105E. (A) Wild type and hemizygous BY/3S diploids
were generated and sporulated to produce HOS3 and hos3∆ F2 BYx3S segregants. BYx3S
hos3∆ segregants exhibited a large increase in phenotypic variability relative to wild type
segregants. (B) Linkage mapping using the HOS3 and hos3∆ segregants identified a single
locus on Chromosome IV. The peak marker was from 1,277,231 to 1,277,959 and the
44
confidence interval extended from position 1,272,164 to position 1,278,407, encompassing
(from left to right) part of URH1 and all of DIT2, DIT1, RPB7, and MRP20. (C) The BY allele of
the Chromosome IV locus had a large effect in hos3∆ segregants, but no effect in HOS3
segregants. (D) Recombination breakpoints in hos3∆ segregants delimited the Chromosome IV
locus to five SNPs (small vertical black lines along the x-axis) in the RPB7-MRP20 region of the
chromosome. Dashed vertical lines show the window delimited by the recombination
breakpoints. One of these variants was a spontaneous mutation in MRP20. Blue and orange
respectively refer to the BY and 3S alleles of the locus. (E) Reciprocal hemizygosity analysis in
a hos3∆ BY/3S diploid was conducted at closely linked non-essential genes and found that
MRP20 is the causal gene underlying the Chromosome IV locus. In these experiments the IVBY
allele includes the mrp20-A105E mutation and results in a substantial decrease in growth. Black
triangles denote the absence of one allele and colored triangles indicate the alleles that are
present. (F) The causality of mrp20-A105E was validated by engineering in segregants with
MRP20 (left) and mrp20-A105E (right). (G) Tetrad dissection of the original BY/3S HOS3/hos3∆
MRP20/mrp20-A105E diploid showed that increased variation was due to mrp20-A105E, not
hos3∆. Throughout the paper, blue and orange are used to denote BY and 3S genetic material,
respectively. All growth data presented in the paper are measurements of colonies on agar
plates containing rich medium with ethanol as the carbon source.
45
Figure S2.3 Representative MRP20 and mrp20-A105E segregants on ethanol. Each colony
is a genetically distinct BYx3S segregant grown on ethanol. A wide range of growth phenotypes
was observed among mrp20-A105E segregants, some of which were inviable in this condition.
46
Figure S2.4 Linkage mapping in the F3 panel more finely resolves the Chromosome XIV
locus. The model growth ~ locus + error was used. The genome-wide significance plot of the
locus term is shown in (A) and the relationship between genotype at the Chromosome XIV locus
are shown in (B). The peak and 99% confidence interval solely included the position 467,219.
47
Figure S2.5 Growth effects of loci detected in BY x 3S mrp20-A105E crosses. The
relationship between genotype is shown at each of the 16 loci detected among BYx3S
mrp20-A1015E segregants shown in Figure 2.3C.Effects are shown from greatest to least effect
size, left to right, top to bottom.
48
Figure S2.6 Loci affecting expressivity of mrp20-A105E show minimal effects in MRP20
segregants. Growth relative to the sum of detrimental alleles is shown for MRP20 segregants.
While predictions for MRP20 segregants correlated with observed growth (Pearson’s r = 0.70, p
= 9.6 x 10-25), the cumulative effects of loci differed between mrp20-A105E and MRP20
segregants (ANOVA, observedGrowth ~ predictedGrowth*MRP20; interaction term p = 2.8 x
10-23). This is likely, in part, due to the fact that wild type segregants exhibited a narrower range
of phenotypes which did not include inviable segregants.
49
Table S2.1 Programs and settings used for sequence analyses. This table includes the
programs used to map and process sequencing reads in bash, and additional programs and
functions beyond base R that were used in subsequent analyses included in this paper.
Table S2.2 Crosses and segregant populations examined in this study. All BY x 3S crosses
and segregant populations examined in this study are listed. Note, at times different methods of
obtaining segregants, either random spores or tetrad dissection were employed.
50
Table S2.3 Loci other than MKT1 that influence growth in mrp20-A105E segregants. These
loci were detected by mapping growth ~ locus in mrp20-A105E MKT
BY F2 and in mrp20-A105E
MKT
3S F2
individuals shown in Figures 2.3 - 2.6 and Figure S2.4. Confidence intervals are
reported as 2 LOD drops around the peak position at a locus.
Table S2.4 Candidate genes at loci in Table S3. These loci additively affect the expressivity of
mrp20-A105E mutation. Recombination delimits each peak to between one candidate (12 loci),
two candidate genes (3 loci), or three genes (1 locus). Location of the delimited SNPs is
included.
51
Table S2.5 Presence of Chromosome II duplication differs among BY x 3S crosses. We
observed a Chromosome II duplication in crosses fixed for mrp20-A105E. Among these two
crosses, the cross fixed for MKT1
3S had much higher prevalence of the aneuploidy relative to
the cross fixed for MKT1
BY
.
52
Chapter 3: The interplay of additivity, dominance, and epistasis on fitness in a diploid
yeast cross
This work appears essentially as published in 2022 in Nature Communications 13 (1). It was
done in collaboration with Takeshi Matsui, Martin N. Mullis, Kevin R. Roy, Rachel Schell, Sasha
F. Levy, and Ian M. Ehrenreich.
Summary of author contributions
My primary contribution to this work involved the generation of the barcoded haploid segregant
panel used in this study. This was a highly collaborative effort involving multiple labs and
personnel, including Takeshi Matsui, Martin Mullis, and Rachel Schell. Together, we performed
sporulations, tetrad dissections, and phenotyping to identify roughly 1000 viable four-spore
tetrads for use in transformations. With Martin Mullis, I helped transform these viable four-spore
tetrads with a library of barcoded plasmids, such that a unique genotype barcode was integrated
into the genome of each individual haploid segregant. This barcoded haploid panel was used to
generate the diploids studied in this publication. I remained involved in this project via weekly
meetings.
3.1 Abstract
In diploid species, genetic loci can show additive, dominance, and epistatic effects. To
characterize the contributions of these different types of genetic effects to heritable traits, we
use a double barcoding system to generate and phenotype a panel of ~200,000 diploid yeast
strains that can be partitioned into hundreds of interrelated families. This experiment enables
the detection of thousands of epistatic loci, many whose effects vary across families. Here, we
show traits are largely specified by a small number of hub loci with major additive and
dominance effects, and pervasive epistasis. Genetic background commonly influences both the
additive and dominance effects of loci, with multiple modifiers typically involved. The most
53
prominent dominance modifier in our data is the mating locus, which has no effect on its own.
Our findings show that the interplay between additivity, dominance, and epistasis underlies a
complex genotype-to-phenotype map in diploids.
3.2 Introduction
Most complex traits, including many phenotypes of agricultural, clinical, and evolutionary
significance, are specified by multiple loci (Mackay, Stone, and Ayroles 2009). How alleles at
these loci collectively produce the heritable trait variation in genetically diverse populations
remains unresolved (Bloom et al. 2013; Boyle, Li, and Pritchard 2017). While additive loci play a
major role in most traits, non-additive genetic effects are also likely important (Shao 2008;
Mackay 2014; Huang and Mackay 2016; Taylor and Ehrenreich 2015; Forsberg et al. 2017;
Rowe et al. 2008). However, loci with non-additive genetic effects are often difficult to detect,
limiting knowledge of their properties (Wei, Hemani, and Haley 2014; Ehrenreich 2017).
The two purely genetic sources of non-additivity are dominance among alleles of individual loci
and epistasis between alleles at different loci (or genetic interactions) (Falconer and Mackay
1996; Lynch and Walsh 1998). Most empirical studies of non-additive genetic effects have
focused on haploid or inbred individuals (Bloom 2015; Taylor and Ehrenreich 2014; Huang
2012), which provide higher statistical power to detect loci due to their minimal levels of
heterozygosity. However, by design, these populations cannot furnish insight into dominance or
its relationship with epistasis. This is a problem because many eukaryotic species that matter to
humans, including our species itself, exist predominantly as diploids that outbreed and have
high levels of heterozygosity (“The 1000 Genomes Project Consortium. A Map of Human
Genome Variation from Population-Scale Sequencing” 2010; Magwene 2011; Churchill et al.
2012). Dominance may be an important contributor to traits in these species.
54
When epistasis occurs in diploids, a locus may influence only the additive effects, only the
dominance effects, or both the additive and dominance effects of its interactor(s) (Cheverud and
Routman 1995; 1996; Campbell, McGrath, and Paaby 2018). Such interplay has implications for
efforts to genetically dissect phenotypes, predict heritable traits from genotypes, and understand
the evolutionary trajectories of beneficial and deleterious alleles. Yet, exploration of the
relationship between additivity, dominance, and epistasis has mainly been limited to theory
because of technical challenges in identifying non-additive loci.
The budding yeast Saccharomyces cerevisiae is a potentially powerful system for studying
non-additive genetics in diploids. Haploid yeast segregants with known genotypes can be mated
to produce diploid strains that also have known genotypes (Hallin 2016). This strategy facilitates
the generation of diploid mapping populations that are roughly the square of the number of
haploid progenitors. However, phenotyping large diploid populations of more than ~10,000
individuals has been technically difficult (Hallin 2016; Märtens et al. 2016), limiting the use of
this strategy.
Here, we develop a chromosomally-encoded barcoding system that enables phenotyping of
large yeast diploid mapping populations. We fuse two genomic barcodes, one from each haploid
parent, in vivo to create a unique double barcode for each diploid strain (Fig. 3.1a). This system
enables linkage mapping in a population of ~200,000 diploid strains and examination of the
relationship between additivity, dominance, and epistasis at detected loci.
55
56
Figure 3.1 Generating a large panel of diploid segregants with known genotypes that can
be phenotyped as a pool. a. Overview of the experimental design. Parental haploids, BY and
3S, were mated and sporulated. The resulting MATα and MATa segregants were barcoded at a
common genomic location and sequenced. Segregants were mated as pairs to generate a
panel of ~200,000 double-barcoded diploid strains with known genotypes. All diploid strains
originating from a single haploid parent are referred to as a ‘family’. b. MATα and MATa
barcodes were brought to the same genomic location by inducing recombination between
homologous chromosomes via Cre-loxP. c. Diploid strains were pooled and grown in
competition for 12–15 generations. Barcode sequencing over the course of the competition was
used to estimate the fitness of each strain. d. Density plot of the raw fitness of double barcodes
representing the same diploid strain in the same pooled growth condition (Glucose 1). e.
Density plot of the mean raw fitness of the same diploid strain measured in two replicate growth
cultures (Glucose 1 and Glucose 2). f. The mean broad-sense and narrow-sense heritability
estimates for the 8 environments. The standard errors for both heritability estimates are shown
as error bars for each point. g. Violin plots of the fitnesses of diploid strains in 8 environments
(n > 187,000 in each environment). Raw fitness estimates of BY/BY, BY/3S, 3S/BY, and 3S/3S
diploid strains are shown as colored lines. Overlaid boxplots, here and in subsequent figures,
indicate the median (white dots), interquartile range (IQR; black boxes), and lower and upper
adjacent values (black lines extending from the black boxes), defined as first quartile − 1.5 IQR
and third quartile + 1.5 IQR, respectively.
3.3 Results
3.3.1 Phenotyping of a large diploid cross by barcode sequencing
We started with two S. cerevisiae isolates, the commonly used lab strain BY4716 (BY) and a
haploid derivative of the clinical isolate 322134S (3S) (Fig. 3.1a). These strains differ at ~45,000
SNPs (~0.4% of genome) (Taylor et al. 2016; Mullis et al. 2018). To ensure segregation of the
mating locus, both BY MATa x 3S MATα and 3S MATa x BY MATα crosses were performed
using isogenic strains that had been mating type switched. From these crosses, 600 MATα and
400 MATa segregants from distinct four-spore tetrads were marked at the neutral YBR209W
locus by integrating a random barcode (Levy 2015; Liu 2019). At least two uniquely barcoded
strains were recovered per haploid segregant and the genome of each segregant was
sequenced to define the genotype represented by each barcode (Supplementary Figs. S3.1
and S3.2A).
MATa and MATα strains (2 barcodes per segregant) were mated as pairs and grown on media
that induced site-directed recombination between the MATa barcode and MATα barcode on
57
homologous chromosomes (Fig. 3.1b and Supplementary Fig. S3.1E) (Liu 2019; Schlecht et
al. 2017). This process resulted in a double barcode on one chromosome that uniquely
identifies both parents of a diploid strain and therefore its presumptive genotype
(Supplementary Fig. S3.2B). Using similar methods, we also constructed BY/BY, BY/3S,
3S/BY, and 3S/3S parental diploid strains.
After the matings, diploid strains were pooled and competed in seven conditions: cobalt
chloride, copper sulfate, glucose, hydrogen peroxide, sodium chloride, rapamycin, and zeocin
with the glucose condition performed twice (Supplementary Table S3.1). Cells were grown for
~15 generations in serial batch culture, with 1:8 dilution every ~3 generations and a bottleneck
population size greater than 2 × 109 cells (Fig. 3.1c). Double barcodes were enumerated over
4–5 timepoints by sequencing amplicons from the double barcode locus, and the resulting
frequency trajectories were used to estimate the relative fitness of each strain (L. Zhao et al.
2018; F. Li, Salit, and Levy 2018).
We recovered on average 197,267 diploid strains per environment with a minimum of two
biological replicates that were marked with different barcodes within a growth pool
(Supplementary Table S3.2). These biological replicates showed a relatively high correlation
(average Spearman’s rho across environments = 0.67, 0.524 < rho < 0.8, Fig. 3.1d and
Supplementary Fig. S3.3A). Noise between biological replicates was primarily caused by one
or more replicates being present at a low abundance within the pool, resulting in a less accurate
fitness estimate (Supplementary Fig. S3.4) (F. Li, Salit, and Levy 2018). To minimize the
effects of measurement noise, the average fitness measure of all biological replicates was used
as the phenotype for each strain. The average fitness measures of strains assayed in replicate
glucose growth cultures were also highly correlated (Spearman’s correlation = 0.863, Fig. 3.1e).
This suggests that our pooled fitness assay is accurate and reproducible.
58
Substantial phenotypic diversity was observed in every environment. The majority of this
variation was due to genetic factors: broad-sense heritabilities were on average 61% (52–76%
across environments), with 40% (19–53%) being additive and 21% (19–26%) being non-additive
(Fig. 3.1f, Supplementary Fig. S3.5, and Supplementary Table S3.3). These heritability
estimates are similar to other yeast studies in which fitness was measured using colony growth
assays on agar plates (Mullis et al. 2018; Bloom et al. 2013; Forsberg et al. 2017). Every
environment contained many diploids with more extreme fitness than either the BY/BY or 3S/3S
parent (i.e., transgressive segregation). BY/3S and 3S/BY segregants were more fit than the
BY/BY or 3S/3S diploids in all environments but one (i.e., heterozygote advantage, Fig. 3.1g).
3.3.2 Genetic mapping within interrelated families
Using quantile normalized fitness estimates from barcode sequencing, we mapped loci that
contribute to growth (Supplementary Fig. S3.6). Due to our experimental design, diploid strains
generated from the same haploid parent (families) are more genetically related than diploid
strains generated from different parents (Fig. 3.1a). Such family structure causes false positives
in genetic mapping (Pritchard et al. 2000; K. Zhao 2005). Here, we found that most sites
throughout the genome exceeded nominal significance thresholds when fixed effects linear
models were applied in a given environment (Supplementary Fig. S3.7). To enable mapping
despite the family structure, we used mixed effects linear models, which are commonly
employed in genetic mapping studies involving populations in which individuals show
nonrandom relatedness (Kang 2008; 2010; Zhou and Stephens 2012). Specifically, we used
Factored Spectrally Transformed Linear Mixed Models (FaST-LMM) (Lippert 2011; Widmer
2015) to identify an average of 18 loci per environment (10–26).
59
Contrary to expectations that larger sample sizes should yield better statistical power and
therefore more detections, the numbers of loci identified here were comparable to studies that
were at least 60-fold smaller (Bloom et al. 2013; Bloom 2015; Hallin 2016). To identify more loci,
we also used an alternative strategy that did not require explicitly controlling for family structure.
Fixed effects linear models were conducted individually within each of 392 families of diploids
that descended from distinct MATa parents and consisted of ~600 individuals each. The
family-level scans yielded an average of 6.5 detections per family across environments (Fig.
3.2a, b and Supplementary Fig. S3.8), which were largely reproducible between replicate
glucose cultures (Fig. 3.2c).
60
61
Figure 3.2 Identification of loci that affect fitness. a, b. Loci mapped in CoCl2 (a) and
CuSO4 (b). Panels from top to bottom are (1) loci detected using the mixed effects linear model
FaST-LMM (red bars), (2) loci with dominance effects detected using a fixed effects linear model
on the non-additive portion of each diploid’s phenotype (green bars), (3) loci enriched for
detections in family-level scans (orange bars), (4) loci detected using family-tests (black or blue
points), where each row is a different MATa family, and (5) the total number of detections across
families for each 20 kb interval (gray bars). c. Violin plot showing the % of loci that were
detected in both glucose replicates for each family (n = 392 MATa families). d. Scatterplot
showing distinct loci detected using family-level tests with permutation-based thresholds in
CoCl2 and their maximal −log10(p) values in each family in which they were identified. Red,
green, and blue labels denote distinct loci in family-level scans that were also identified by
FaST-LMM, dominance scans, or enrichment tests, respectively. Distinct loci showed substantial
variability in statistical significance across families and mapping methods. e. Barplot comparing
the number of enriched family-level loci (red) and FaST-LMM loci (blue) detected across
environments. Loci that were detected using both mapping methods are in dark colors, while
loci that were specific to either FaST-LMM or family-level scans are in light colors. f. Examples
of loci with only additive effects (or low dominance), incomplete dominance, complete
dominance, overdominance, and underdominance. All genotype classes had n > 41,000. Black
lines are the mean fitness of diploids subsetted by the genotype state at the focal locus. Gray
lines are the standard errors. Green lines are the expected mean fitness of heterozygotes
assuming no dominance. Genotype state at each locus is denoted by colored boxes: BY/BY
(blue), 3S/3S (orange), is BY/3S (half blue, half orange). Dominance and additive effects (blue
and red bars, respectively) for each subset of the data are shown next to the relevant genotype
classes. The degree of dominance at a locus is included in parentheses. g. Violin plot showing
the degree of dominance for all loci detected in the dominance scan (n = 142). Loci with positive
values are dominant towards the allele conferring higher fitness (green), while loci with negative
values are dominant towards the deleterious allele (red). All loci with degree of dominance
>100% or < −100% exhibit overdominance and underdominance, respectively.
Often the same loci were detected in multiple families within an environment, as expected if
many detections within families are true positives. Detections across the families were
consolidated, resulting in approximately 58 distinct loci per environment (49–65), >2.5-fold more
loci than detected by FaST-LMM (Fig. 3.2a and b). Across environments, these distinct loci
were detected in as few as one family and as many as 388 families (Fig. 3.2d). To distinguish
loci most likely to represent true positives, we identified distinct loci showing more detections
across families than expected by chance using an approach first employed to map loci with
widespread effects on transcription (Brem et al. 2002). We found on average 22 of these
enriched loci per environment (15–30; Fig. 3.2e).
62
Many of the loci identified by FaST-LMM overlapped with the conservative set of enriched
family-level loci, but there were also a number of differences between the methods (Fig. 3.2d).
29% of loci detected by FaST-LMM were detected in fewer families than the enrichment
threshold in a given environment (10% to 38%; Fig. 3.2e; Supplementary Fig. S3.9). Similarly,
43% of loci enriched across families were not identified by FaST-LMM (35% to 57%; Fig. 3.2e).
These findings likely reflect numerous complexities in our diploid mapping population, including
varying degrees of relatedness among strains within and between families and a potentially
large number of segregating loci that vary in their effect sizes, dominance, epistasis, and linkage
to each other. Neither FaST-LMM nor family-level scans can fully address all these factors and
thus they may produce somewhat different results, suggesting the two methods should be
viewed as complementary.
3.3.3 Loci frequently show dominance effects
In diploids, non-additivity can arise due to dominance among alleles at the same locus, epistasis
between alleles at different loci, or a mixture of the two. To identify such non-additive loci from
the aggregate data, we extracted the non-additive portion of each diploid strain’s phenotype
(Supplementary Fig. S3.10) (Hallin 2016). Using these values accounts for family structure and
enables mapping of non-additive loci in the full segregant panel with fixed effects linear models.
Regarding dominance effects, we identified an average of 18 loci showing dominance per
environment (12–30). 49% of these loci were also identified by FaST-LMM, while 72% of these
loci were detected in the conservative set of loci from the family-level scans. Among loci with
dominance effects, the average degree of dominance was ~51% (i.e., heterozygotes’ fitnesses
were roughly halfway between the average of the two homozygotes and one of the
homozygotes), with 82% of the loci showing incomplete dominance (Fig. 3.2f, g). Only ~7% of
the loci exhibited complete dominance, while overdominance (~8%) and underdominance (~3%)
were seen among the remaining loci with dominance effects. ~77% of the loci showed
63
dominance towards the allele conferring higher fitness (Fig. 3.2g), which may explain why
segregants were more fit than the BY/BY or 3S/3S diploid strains (Fig. 3.1g).
3.3.4 Epistatic hubs govern both additivity and non-additivity
We also used the non-additive portion of phenotype to perform comprehensive genome-wide
scans for genetic interactions. We identified an average of 440 two-locus interactions per
environment (377–538; Fig. 3.3a and Supplementary Fig. S3.11). Our large sample size had a
pronounced impact on detection: ~40-fold more interactions per environment were detected
than previous studies that phenotyped smaller mapping populations using conventional
approaches (Bloom 2015; Hallin 2016). Our large sample size also enabled comprehensive
scans for three-locus interactions with a reduced set of markers, identifying an average of 6152
per environment (4845–7301; Fig. 3.3a and Supplementary Fig. S3.12). Loci involved in
three-locus interactions were identified across all chromosomes and distributed widely
throughout the genome.
64
65
Figure 3.3 Interactions often affect both the additive and dominance effects of involved
loci. a. Interaction plots of all two-locus (left) and three locus (right) effects for two
representative environments. Significant interactions between loci are shown as connecting
lines. Green bars are the absolute effect size of a locus, calculated as the absolute difference
between the mean fitness of diploids that are 3S/3S and BY/BY at the focal locus. Orange bars
are the number of interactions detected for each locus. b. Scatter plot of the absolute effect size
of a locus and the number of two-locus (left) and three-locus (right) interactions in which it is
involved. Local regressions are shown as blue lines. c. Scatter plot of the number of two-locus
and three-locus interactions per locus. d. Examples of genetic interactions with, from left to right,
low (0.04), moderate (0.48), and high (0.97) fractions of epistasis involving dominance. All
genotype classes had n > 8,700. Black lines are the mean fitness of diploids subsetted by the
genotype state at the two involved loci. Gray lines are the standard errors. Green lines are the
expected mean fitness of heterozygotes assuming no dominance. Genotype state at each locus
is denoted by colored boxes: BY/BY (blue), 3S/3S (orange), is BY/3S (half blue, half orange).
The first locus is the locus whose effect is being modified, and the second locus is the modifier
locus. Dominance and additive effects (blue and red bars, respectively) for each subset of the
data are shown next to the relevant genotype classes. e. Density plot of the fraction of epistasis
involving dominance for all interactions (red; n = 3,522 genetic interactions), hub-hub (yellow;
n = 87), non-hub-hub (blue; n = 2197), hub-non-hub (green; n = 2197), and non-hub-non-hub
interactions (purple; n = 1,238).
We next analyzed the relationship between individual loci and their genetic interactions. We
found a strong positive relationship between the effect of a locus and its involvement in two- and
three-locus interactions (Fig. 3.3b). This suggests that loci with larger effects tend to genetically
interact with many loci or that their interactions are easier to detect. We also observed a clear
linear relationship between the number of two- and three-locus interactions of a given locus
(Fig. 3.3c). Notably, certain loci exhibited many more interactions than others, acting as ‘hubs’
(here defined as loci with >20 two-locus interactions in at least one environment) (Forsberg et
al. 2017). On average, ~4.5 hubs were detected per environment, and the same hub was often
detected in multiple environments. A majority (>54%) of all two- and three-locus interactions
involved at least one hub. Fine-mapping localized the Chromosome VI, VIII, X, and XII hubs to
genes involved in amino acid sensing (PTR3), copper resistance (CUP1), vacuolar protein
sorting (VPS70), and a gene of unknown function (YLR257W), respectively.
66
3.3.5 Relationships between epistasis and dominance in diploids
In haploids, epistasis can only influence the additive effect of a locus because there are no
heterozygotes. In diploids, however, epistasis can modify a locus’ additive effects, dominance
effects, or both additive and dominance effects (Cheverud and Routman 1995; 1996; Campbell,
McGrath, and Paaby 2018). To better characterize how loci are modified, each two-locus
interaction was partitioned into additive and dominance components. We found that changes in
dominance account for ~44% of the average epistatic effect (Fig. 3.3d, e), implying that
interactions often affect both additivity and dominance. However, this fraction varied depending
on whether the modifying locus was a hub. When the modifier was a hub, dominance accounted
for little of the epistatic effect (11.9% on average), implying that hubs mostly modify the additive
component of the interacting loci. By comparison, when the modifier was not a hub, epistasis
was mostly composed of dominance (64% of interactions had a larger dominance component).
These data suggest that epistasis commonly involves modification of dominance in diploids and
that hubs act in a distinct manner from loci that are not hubs.
We next examined how the additive and dominance effects of hubs were modified by genetic
interactions. In most cases, hubs genetically interacted with a small number of major effect
modifiers and many minor effect modifiers (Fig. 3.4a). The major effect modifiers typically
influenced only the additive or only the dominance effect of a hub, suggesting that distinct sets
of loci govern additive and dominance effect sizes (Fig. 3.4a). Whereas the most frequent major
effect modifiers of the additive effects of hubs were other hubs (Fig. 3.4b), the single most
frequent major effect modifier of the dominance effects of hubs was a locus on Chromosome III.
Collectively, multiple modifier loci could cause a hub locus to show a broad range of effect sizes
across different genetic backgrounds (Fig. 3.4c).
67
Figure 3.4 Multiple modifier loci cause hubs to exhibit a range of effect sizes across
different genetic backgrounds. a. Specific examples of hubs on Chromosome VI, X, and XII
(each row) and their additive (left) and dominance (right) modifiers. The height of the bar
corresponds to the magnitude of the modifying effect. The dotted red line shows the threshold in
which loci were considered as major effect modifiers. b. Barplot showing the total number of
times loci were detected as a major effect modifier of additive (left) or dominance (right) effects
of hubs across environments. Colored dots indicate hub loci. An asterisk indicates a non-hub
locus on ChrIII. c. Additive (top; n > 443 for each genotype class) or dominance (bottom; n > 175
for each genotype class) effect size of Chromosome VI hub in NaCl across different allelic
combinations of its four largest effect modifiers. Red points are the mean effect size of a
genotype class based on the genotype state of the four modifiers. Black point is the overall
mean effect size of the locus. Black lines are bootstrapped 95% confidence intervals.
68
3.3.6 Characteristics of the Chromosome III dominance modifier
Although not a hub, the Chromosome III locus nevertheless had a prominent impact on
phenotype by modifying the dominance effects of multiple variable effect loci. Interactions with
the Chromosome III locus had greater impacts on dominance than additivity at all focal loci. For
example, in hydrogen peroxide, dominance at the Chromosome X variable effect locus
depended on the Chromosome III locus, ranging from complete to nearly absent in a
genotype-dependent manner (Fig. 3.5a). We delimited the Chromosome III locus to a 3 kb
region containing the mating locus and a few other genes (BUD5, TAF2, and YCR041W).
69
Figure 3.5 Parent-of-origin of the mating locus influences dominance at hubs. a. Violin
plots of the fitness distribution of diploid strains split by the genotype at the Chromosome X
locus (top), further split by the genotype at the mating locus on Chromosome III (middle), and
the parent-of-origin at the mating locus (bottom). All genotype classes had n > 9000. Genotype
state at each locus is denoted by colored boxes: BY/BY (blue), 3S/3S (orange), is BY/3S (half
blue, half orange). Lines are the observed mean fitness in the homozygous genotype classes
(gray), the observed mean fitness in heterozygous genotype classes (red), and the expected
heterozygous fitness if there was no dominance (blue). b. Violin plots of the fitness distribution
of diploid strains split by the genotype state of a hub locus and the parent-of-origin of the mating
locus: Chromosome VI hub (top), Chromosome XII hub (middle), Chromosome XIV hub
(bottom). All genotype classes had n > 9500.
Yeast mating types possess different nonhomologous gene cassettes at the mating locus, which
encode distinct transcription factors that are master regulators of the MATa, MATα, and diploid
transcriptional programs (Haber 2012). This region of the genome is unique because four
genotype classes segregate (BY MATa, 3S MATa, 3S MATα, and BY MATα), and as a result, the
two heterozygotes are not identical (Supplementary Fig. S3.13). To test if the mating locus is
the dominance modifier, we partitioned heterozygotes at the Chromosome III locus based on
their parents-of-origin for the MATa and MATα cassettes and found a difference: dominance at
the Chromosome X locus was only visible in the 3S MATa/BY MATα genotype class (Fig. 3.5a).
Other hub loci modified by Chromosome III showed the same relationship between dominance
and the parent-of-origin of the mating loci (Fig. 3.5b). These results suggest that BY and 3S
harbor functional differences in one or both mating cassettes.
3.4 Discussion
We used a double barcoding system to generate and phenotype an extremely large panel of
diploid yeast strains that can be partitioned into hundreds of interrelated families. This
experimental design enabled the detection of thousands of loci, including at least an order of
magnitude more genetic interactions than discovered in previous yeast crosses. Analysis of
these epistatic loci identified a modest number of hubs that have large effects, show pervasive
70
epistasis, and control most phenotypic variation across environments, as well as many other loci
that genetically interact with these hubs.
Genetic background commonly modified the magnitude of, or completely masked, the effects of
the hubs, indicating that even loci with the largest effects are highly sensitive to genetic
background. Such non-additive genetic background effects are likely to hamper efforts to predict
phenotype from genotype by limiting the extrapolation of effect estimates from one genetic
context to others. However, our finding that large effect loci were most impacted by other major
effect loci does provide some optimism that characterizing a limited set of interactions may
account for a substantial portion of these genetic background effects.
Because our experiments were performed in outbred diploids rather than haploids or inbred
diploids, we could detect dominance effects and whether dominance is modified by epistasis.
We showed that dominance effects are common and that the magnitude of dominance can
strongly depend on the alleles of interacting loci. The potential existence of dominance modifiers
has been discussed in theory, but to date, only a single dominance modifier has been found in a
plant self-incompatibility locus (Tarutani 2010; Billiard and Castric 2011). Our results show that
dominance modifiers are prevalent and raise the intriguing possibility that sites with atypical
allele dynamics within natural populations, the yeast mating locus here and a self-incompatibility
locus in plants, are more likely to harbor dominance modifiers with major effects.
Generally, we found that heritable traits in yeast are more genetically complex than formerly
appreciated. Relative to the cross that we examined, natural populations may harbor
substantially higher genetic diversity, meaning traits could be even more complex and difficult to
dissect. Our work supports the premise that, to the extent possible, focusing on groups of more
closely related individuals, such as the families studied here, can enhance statistical power and
71
precision relative to populations with greater diversity (Hallin 2016; Märtens et al. 2016; Young
et al. 2019). The genetic insights gained from these more closely related groups can then be
leveraged to inform the genetic architecture of traits in more diverse populations in which many
critical genetic effects may otherwise be obscured.
3.5 Methods
3.5.1 Generation of haploid segregants
All haploid segregants and diploid segregants described in this paper were generated from a
cross using two isolates of Saccharomyces cerevisiae, the commonly used lab strain BY4716
(BY) and a haploid derivative of the clinical isolate 322134S (3S). To generate counterselectable
markers in each strain, we first introduced clean deletions of FCY1 and URA3. Each gene was
deleted using a two step approach: first, genes were replaced with a KanMX cassette (Wach et
al. 1994) via lithium acetate transformation (Gietz and Schiestl 2007). Next, the cassette was
targeted using CRISPR/Cas9 and a gRNA specific to the pTEF promoter region in each
cassette (DiCarlo 2013). A repair template homologous to the upstream and downstream region
of the target gene was co-transformed with the CRISPR system to generate a clean deletion.
The MATa BY and 3S strains were then mating-type-switched by transforming a plasmid
containing galactose-inducible HO and URA3 using the lithium acetate protocol (Herskowitz and
Jensen 1991). Strains with the plasmid were selected on SCM-Ura plates and inoculated into
SCM-Ura + 2% galactose media overnight. Individual colonies were then obtained by plating out
102 cells onto YPD plates. Colonies were tested for their mating type using the yeast mating
halo assay (Julius et al. 1983). Successfully mating-type-switched BY and 3S MATα clones
were cured of the HO plasmid by growing the cells on YPD + 5-FOA plates. The BY MATa and
MATα cross parents were preserved at −80 °C with the unique identifiers IEY1176 and IEY1177,
respectively. The 3S MATa and MATα cross parents were preserved at −80 °C with the unique
identifiers IEY1178 and IEY1179, respectively.
72
To prevent aggregation of cells in pools grown in liquid, FLO8, a transcriptional activator of
many flocculins (Kobayashi et al. 1996), and FLO11, the flocculin responsible for many
aggregation phenotypes in S. cerevisiae (Lo and Dranginis 1998), were first knocked out in both
mating types of the BY fcy1∆ ura3∆ and 3 S fcy1∆ ura3∆ strains. Each parental strain was then
engineered to have a genomic ‘landing pad’ (Levy 2015; Liu 2019; Schlecht et al. 2017)
containing two partially crippled LoxP sites (G. Lee and Saito 1998), Lox71 and Lox2272/71,
and a galactose-inducible Cre recombinase (Sauer 1987) at the YBR209W locus via
CRISPR/Cas9-mediated homologous recombination (Supplementary Fig. S3.1A). Earlier
studies have shown that deletion of YBR209W or incorporation of our barcoding system has no
effect on fitness (Levy 2015).
Opposite mating types of BY fcy1Δ flo8Δ flo11Δ ura3Δ ho YBR209W::pGal1-Cre - Lox71 -
Lox2272/71 and 3S fcy1Δ flo8Δ flo11Δ ura3Δ ho YBR209W::pGal1-Cre - Lox71 - Lox2272/71
were next mated, creating two BY/3S heterozygous diploids. Diploids were sporulated and >500
tetrads were dissected from each diploid. Performing these two crosses would, in theory, enable
us to achieve ~50% allele frequency at all sites, including the mating type locus
(Supplementary Fig. S3.2). Either one MATa segregant or one MATα segregant was then
randomly selected from each tetrad (a total of >500 each) to maximize the number of unique
recombination breakpoints. To avoid segregants carrying aneuploidies, only tetrads that
produced 4 spores were utilized.
3.5.2 Barcoding of haploid segregants
Segregants were uniquely barcoded using two different methods (Supplementary Fig.
S3.1A–D). MATα segregants were barcoded by integrating a randomly barcoded plasmid via
Cre-mediated homologous recombination at lox2272/71 (Supplementary Fig. S3.1C).
Barcoded plasmids were made by modifying the pBAR6 plasmid (Liu 2019). First, pBAR6 was
73
digested with KpnI and EcoRI (Supplementary Fig. S3.1B). Linearized pBAR6 was then
assembled by Gibson assembly with a PCR product containing a Lox2272/66 site, a random
20-mer barcode sequence, and a partial TruSeq read 2 adapter sequence. The resulting
product was transformed into chemically competent NEB 10-beta cells using standard heat
shock protocol, and transformants were selected on LB + 50 μg/ml Carbenicillin plates. To
ensure high barcode complexity, ~100,000 transformants were scraped and pooled prior to
plasmid extraction. Purified barcoded plasmids (300 ng) were then transformed into the yeast
cells using the lithium acetate protocol (Gietz and Schiestl 2007). After transformation, the
barcoded plasmids were recombined into the yeast genome by inducing the galactose-inducible
Cre recombinase by growing the yeast cells in YP + 2% galactose media. Homologous
recombination between the two partially crippled Lox2272 variants, Lox2272/66 and
Lox2272/71, resulted in the formation of a fully crippled Lox2272/66/71 and a fully functional
Lox2272. Transformants with successful integration were selected on YPD + 200 μg/ml G418
agar plates.
MATa segregants were barcoded by CRISPR/Cpf1-driven (Verwaal et al. 2018) homologous
recombination at the genomic landing pad (Supplementary Fig. S3.1D). The genomic region
containing the two partially crippled Lox sites, Lox71 and Lox2272/71, was replaced with a DNA
fragment containing (in the following order) a 60 bp sequence homologous to the region
upstream of the Lox sites, an HphMX cassette (A. L. Goldstein and McCusker 1999), the 5′ end
of a split URA3 marker, a 5′ artificial intron splice site, a partial TruSeq read 1 adapter sequence,
a random 20-mer barcode sequence, a partially crippled Lox66 site, and a 60 bp sequence
homologous to the region downstream of the Lox sites. To integrate the randomly barcoded
DNA fragment in the yeast genome, 200 ng of the DNA fragment, 200 ng of a PCR amplicon
containing pTEF-CPF1 flanked by 2 nuclear localization sequences, and 200 ng of a PCR
amplicon containing a polyA-tailed pSNR52−20-mer guide sequence were co-transformed into
74
the yeast cells using lithium acetate (Gietz and Schiestl 2007). Transformants with successful
integration were then selected on YPD + 300 μg/ml Hygromycin B agar plates.
For each segregant transformation, we chose ~5 colonies that presumably represented
independent integration events and barcodes. In addition to the segregants, both mating types
of the parental strains were barcoded in the same manner as the segregants and ~50 colonies
were picked for each parental strain.
3.5.3 Whole-genome sequencing of parental haploid segregants
Clones containing different barcodes for 1003 MATα and 500 MATa segregants were pooled
and these pools were whole-genome sequenced at low-coverage (~10×) to determine where
crossover events occured. A sequencing library was prepared using the Illumina Nextera Kit
with custom multiplexing barcodes (Bloom et al. 2013; Taylor and Ehrenreich 2014; Mullis et al.
2018). Libraries from different segregants were pooled in equimolar fractions and these
multiplex pools were size selected using the Qiagen Gel Extraction Kit. Multiplexed samples
were then sequenced on an Illumina HiSeq 2500 using 150 bp × 150 bp paired-end reads. For
each strain, reads were mapped against the S288c genome using BWA with default settings (H.
Li and Durbin 2009). Alignments were converted to a bam format and sorted using SAMTOOLS
(default settings) (H. Li 2009). Read duplicates were then removed and bam files were
converted to pileups using SAMTOOLS (default settings). Base calls and coverage values were
obtained from the pileup files for 43,865 high-confidence SNPs that segregate in the BYx3S
cross. Any segregants that showed signs of cross-contamination, where the allele frequency at
the SNP markers significantly deviated from the expected 0% or 100% 3S allele, were excluded
from further analysis. To avoid segregants with aneuploidy, all segregants whose average
coverage of each individual chromosome or segments of the chromosome significantly deviated
from the overall average coverage were also removed. Additionally, all segregants with a mean
coverage of less than 2 were removed. In total, 76 MATα and 11 MATa segregants were
75
removed after filtering for quality. For the remaining 927 MATa and 489 MATα segregants, a
vector containing the fraction of 3S calls at each SNP was generated and used to make initial
genotype calls with sites above and below 50% classified as 3S and BY, respectively. This
vector of initial genotype calls was then corrected with a Hidden Markov Model (HMM),
implemented using the HMM package version 1.057 in R (Team 2013). We used the following
transition and emission probability matrices: transProbs = matrix(c(0.9999, 0.0001, 0.0001,
0.9999), emissionProbs = matrix(c(.0.25, 0.75, 0.75, 0.25). All SNP markers within the first and
last 30 kb of each chromosome were omitted because we observed higher sequencing error
rate and/or lower read mapping quality for these specific regions. Adjacent SNPs in the
HMM-corrected genotype calls that lack recombination in the segregants were collapsed into a
single SNP, reducing the number of SNP markers in subsequent analysis from 43,865 to 7742.
3.5.4 Determining the barcode sequences
For all segregants, the associated barcodes were determined by pooling all clones and
sequencing the genomic region containing the barcode using Novogene Illumina HiSeq 2500
150 bp × 150 bp paired-end reads at ~2000× coverage per barcode. For the MATa segregants,
the genomic region containing the barcode was PCR amplified using primer pairs where one
primer was located on the partial TruSeq read 1 adapter sequence and the other primer was
located downstream of the barcode. Similarly, barcodes for MATα segregants were PCR
amplified using primer pairs where one primer was located on the partial TruSeq read 2 adapter
sequence and the other primer was located upstream of the barcode. PCR products were
purified using a Qiagen MinElute PCR purification kit, amplified using Illumina P1 and P2
primers, and then size selected via gel extraction prior to sequencing. The 20-mer barcode
sequences for each segregant were then extracted from the sequencing reads and barcodes
within a Hamming distance of 2 were clustered with Bartender (L. Zhao et al. 2018). Only
barcode clusters comprising >5% of the total reads for each sample were considered true
76
barcodes. One possible concern with using random barcodes is that sequencing errors from an
abundant barcode could erroneously contribute to read counts of a barcode with a similar
sequence. To prevent this, we determined the number of mismatches away from a nearest
neighbor for each barcode. All barcodes were at least 4 mismatches away from each other.
Thus, a sequencing read was unlikely to be assigned to the wrong barcode cluster unless it
contained 3 or more errors. Overall, we recovered 403 MATa segregants and 679 MATα
segregants with good genotype calls and at least 3 barcodes that are unique from all others.
3.5.5 Generation of diploid segregants
400 MATa and 600 MATα segregants were mated to create a panel of ~240,000 diploid
segregants, each labeled with 4 unique pairs of barcodes (~960,000 total double barcodes). To
minimize skews in the initial frequency distribution of genotypes, each MATa segregant was
individually mated to each MATα segregant to generate the panel of diploid segregants.
Specifically, two barcoded versions of each segregant were first grown to saturation and mixed
in equal proportion. Each MATa segregant mixture was then systematically mated with a MATα
segregant mixture using a LabCyte Echo 550 acoustic liquid handling robot. For each pairwise
mating, the Echo transferred 100 nl of MATa and MATα segregants onto overlapping positions
on a new YPD plate, resulting in the formation of diploid segregants labeled with ~4 double
barcodes. After growing the cells overnight at 30°C, successfully mated diploids were isolated
from unmated haploids by pinning the colonies onto YPD + 200 μg/ml G418 + 300 μg/ml
Hygromycin B agar plates using Singer ROTOR pinning robot and grown for 2 days. Haploid
strains carried either KanMX or HphMX, therefore only mated diploids that carry both selection
markers should grow. In addition to the diploid segregants, different barcoded versions of the
homozygous and heterozygous parental diploid strains were made as controls, using the same
approaches described for segregants (total of 81 unique pairs of barcodes per parental diploid
strain).
77
3.5.6 Translocating barcodes onto the same chromosome
After mating, the two barcodes are located on different chromosomes (Supplementary Fig.
S3.1E). To bring the barcodes onto the same chromosome, site-directed chromosomal
translocation was induced via Cre-LoxP-mediated site-specific recombination. To do this, mated
diploid colonies were pinned onto YP + 2% galactose plates and grown for 2 days. Presence of
galactose induces the expression of the galactose-inducible Cre recombinase, causing
Cre-mediated homologous recombination at the LoxP site and translocation of the
chromosomes, bringing the two barcodes to close proximity. In addition to the barcodes,
chromosomal translocation brings the two halves of the split URA3 marker onto the same
chromosome, resulting in the reconstitution of a functional URA3 marker. Diploids that have
undergone successful recombination were selected by pinning the diploid colonies onto SCM
-Ura agar plates and growing for 2 days. To minimize bias due to differences in growth rate
between diploids, the colonies were then pinned onto fresh SCM -Ura plates and pooled
immediately with SC -Ura media. The pooled sample of ~960,000 double barcoded segregants
was then spun down, re-suspended in SC -Ura media at a concentration of 2 × 109 cells/ml, and
frozen in 25% glycerol for later use. In parallel, the BY/BY, 3S/3S, BY/3S, and 3S/BY parental
diploids were generated in the same manner and stored at the same concentration as the
segregants.
3.5.7 Growth Assays
The panel of diploid segregants was evolved for 15 generations by serial batch culture under
carbon limitation in 100 ml of SC -Ura media with 4% ammonium sulfate and 2% dextrose. First,
2 ml of the segregant frozen stock (2 × 109 cells/ml) was inoculated in 198 ml of SC -Ura media
and then grown in a 500 ml flask at 30 °C with 300 rpm shaking for 48 h. At saturation, the cell
concentration was 1.5 × 108 cells/ml for a total of 3 × 1010 cells, or ~3 × 104 cells per double
78
barcode. In parallel, 10 μl of the parental frozen stocks were inoculated in 990 μl of SCM-Ura
media and grown at 30 °C with 300 rpm for 48 h. The parental diploids were then spiked into the
segregant culture at a concentration of 10−5. This mixed culture served as the seed culture
(time point 0 or T0) for subsequent growth time points.
Next, one-eighth or 12.5 ml of the T0 culture (~3.75 × 103 cells per double barcode at
bottleneck) was transferred into eight 500 ml flasks, each containing 87.5 ml of SC -Ura media
supplemented with different drugs or chemicals (Supplementary Table S3.1). Two flasks
contained no supplement (Glucose 1 and Glucose 2) and served as growth replicates for the
experiment. The remaining T0 culture was spun down at 3000 rpm and frozen down for
subsequent DNA library preparation. Each culture was grown in serial batch conditions for 15
generations, bottlenecking every ~3 generations. For each transfer, cultures were grown at 30°C
with 300 rpm shaking for 48 h until saturation. One-eighth (12.5 ml) of the culture was then
transferred into new 500 ml flasks, each containing 87.5 ml of the appropriate media. The
remaining 87.5 ml of culture was frozen down for subsequent DNA library preparation. At each
transfer, the number of cells was counted using a hemocytometer. Contamination checks for
bacteria or other non-yeast microbes were performed regularly by observing the cultures under
a microscope.
3.5.8 Library Preparation
Frozen cultures were thawed and DNA was extracted for library preparation using the
MasterPure Yeast DNA Purification kit. Any remaining RNA was removed by adding RNaseA
(10 mg/ml) and incubating the sample at 37 °C for one hour. DNA was then cleaned by adding
one volume of phenol:chloroform:isoamyl alcohol (25:24:1). The sample was gently mixed using
a tube rotator at 30 rpm for 10 min and then centrifuged at 16,000 × g for 10 min. The upper
aqueous layer was transferred to a new tube and cleaned of phenol by adding one volume of
79
chloroform:isoamyl alcohol (24:1). The sample was again mixed at 30 rpm for 10 min,
centrifuged at 16,000 × g for 10 min, and the upper aqueous layer was transferred to a new
tube. Finally, to remove residual chloroform, the DNA sample was ethanol precipitated using
one-tenth volume of 3 M sodium acetate (pH 5.5) and 2.5 volume of 100% ethanol. The
resulting DNA pellet was washed with ice-cold 70% ethanol twice, and dissolved in 10 mM
Tris-HCl pH 8.0. After the DNA was extracted and cleaned, a two-step PCR was used to amplify
the double barcodes for sequencing. Because only a small fraction of the total genome contains
relevant information (~250 bp amplicon out of a 12 Mb genome size), we amplified 30 µg of
template per time point sample, which corresponds to ~2 × 109 genomes or ~2000 copies per
double barcode.
First, a 4-cycle PCR with OneTaq polymerase (New England Biolabs) was performed in 60 wells
of 96-well PCR plates, with ~500 ng of DNA template, 4.5 mM MgCl2, 3% DMSO, and 125 µL
total volume per well. Primers for this reaction were:
AATGATACGGCGACCACCGAGATCTACACNXXXXXNNACACTCTTTCCCTACACGACGCTCT
T and
CAAGCAGAAGACGGCATACGAGATNXXXXXNNGTGACTGGAGTTCAGACGTGTGCTCTTCC
GATCT.
The Ns in these sequences correspond to any random nucleotide and are used as unique
molecular identifiers (UMIs) in downstream analysis to identify PCR duplicates. The Xs
correspond to one of several multiplexing tags, which were used to distinguish different samples
when loaded on the same sequencing flow cell. Multiplexing tags were designed to have a
Levenshtein distance of 3 from each other, such that reads with 1 or less sequencing errors in
the multiplexing tag can be assigned to the correct sample.
80
After amplification, 16 wells of PCR products were pooled (4 pools total), run through a
NucleoSpin PCR cleanup column, and eluted in 120 μL of elution buffer. The pooled PCR
products were then cleaned for the second time to remove any residual primers using a
NucleoSpin column and eluted in 120 μL of elution buffer. A second 22-cycle PCR was
performed with PrimeStar HS DNA polymerase (Takara) in 16 reaction tubes, with 30 μl of
cleaned product from the first PCR as template and 125 μL total volume per tube. Illumina
primers P1 and P2 were used for this reaction.
PCR products from all reaction tubes were concentrated into 100 μl using ethanol precipitation.
The appropriate PCR band (264 bp) was isolated by agarose gel electrophoresis and quantified
by Bioanalyzer (Agilent) and Qubit fluorometer (Life Technologies).
3.5.9 Barcode sequencing
Sequencing (150 bp paired-end) was performed on an Illumina HiSeq 4000 or NovaSeq 6000.
Each flow cell contained at least 2 multiplexed time points and 25% genomic DNA or PhiX.
Genomic DNA/PhiX was included to increase the read complexity for proper calibration of the
instrument since most of the bases in each barcode read are fixed. Sequencing reads were
analyzed using custom written code in Python and R, which are available on GitHub. Reads
were sorted by their multiplexing tags (the Xs in the primers above) and removed if they failed to
pass either of two quality filters: 1) The average Illumina quality score for the double barcode
insert region must be greater than 30, and 2) the double barcode region must contain the fixed
sequence ATAACTTCGTATAATGTATGCTAT with less than 2 mismatches. After filtering, MATa
and MATα barcodes were extracted from the sequencing reads and fused into one double
barcode.
81
We next used Bartender to cluster similar sequences into consensus double barcodes, with the
maximum allowable sequence distance (-d) set to 230. One source of bias in these counts that
we wanted to avoid is PCR duplicates or other non-linearities between the amount of template
for a double barcode and the number of sequences observed for that double barcode. We
removed these errors by using UMIs (the Ns in the primers above). Specifically, 2 random
3-mers were attached to each template DNA molecule in the first few rounds of PCR (see
section ‘Library preparation’ for more detail). Because the total sequence space of 2 random
3-mers (4^6 = 4,096 possibilities) is much larger than the target coverage for each double
barcode (~100×), it is unlikely that any two template DNA molecules from the same time point
that contain the same double barcode will be attached to identical pairs of 3-mers. Thus,
sequence reads for a double barcode that contained the same pair of 3-mers were counted as
PCR duplicates and removed from our final counts.
After the double barcode clusters were counted and filtered, only double barcode clusters
carrying the previously determined MATa and MATα segregant and parental barcodes were
extracted. All double barcodes with an average read count of <5 across the 4–5 time points
were removed, as accurately estimating fitness for samples with very low read counts is difficult
(Supplementary Fig. S3.4) (F. Li, Salit, and Levy 2018). In total, ~227,999 unique genotypes
with an average of ~2.82 barcode replicates per genotype were detected across conditions
(Supplementary Table S3.4). These numbers are lower than the theoretical number of double
barcodes, 240,000 diploids × 4 biological replicates = 960,000. This discrepancy may be due to
segregants failing to mate during the generation of diploid segregants or being lost during
rearraying procedures.
82
3.5.10 Counting double barcode reads
Previous studies have shown that PCR amplification of DNA sequences containing two variable
regions (for this study the MATa and MATα barcodes) separated by a fixed region can lead to
formation of undesired chimeric molecules due to template switching (Liu 2019; Schlecht et al.
2017). This can result in erroneous double barcode counts, leading to errors in fitness
estimation.
To identify PCR chimeras in each sample, we pooled data from all 4–5 time points and counted
the number of reads that contain each combination of a parental barcode and an haploid
segregant barcode. Because the parental strains were never mated to the haploid segregants,
these parental-segregant double barcodes must be due to a PCR chimera. We next counted the
number of times each single barcode is present in the entire data set regardless of the barcode
it is paired with. We then fit a linear model between the count of each PCR chimera and the
product of the total counts of each single barcode within the pool: # of copies of PCR chimera ~
# of copies of barcode1 x # of copies of barcode2. Because template switching occurs
randomly, we expect to see a linear relationship between the number of PCR chimeras and the
abundance of the involved barcodes (average R2 ≈ 0.623 across environments)
(Supplementary Table S3.5). Using this linear model, the expected number of PCR chimeras
for each double barcode was calculated. The expected number of PCR chimeras were
subtracted from the actual read counts to calculate the corrected counts. We estimate that the
PCR chimera rate was 0.11% across environments (Supplementary Table S3.5). Corrected
counts less than 0 were set to 0. All double barcodes with an average corrected read count of
<5 across the 4–5 time points were removed.
83
3.5.11 Fitness Estimation
The corrected read counts from the 4–5 time points were used to estimate fitness for each
double barcode using a maximum likelihood algorithm PyFitSeq using default settings (F. Li,
Salit, and Levy 2018). Fitness estimates with low maximum likelihood scores were removed, as
these fitness estimates are most likely technical artifacts. Outliers with low maximum likelihood
scores were defined as data scores that are more than 1.5 interquartile range below the first
quartile. Additionally, for strains with three or more replicates, outlier replicates with significantly
different fitness estimates were removed. Outlier replicates were determined by examining all
pairwise differences in fitness estimates between barcode replicates for the same strain. Based
on this distribution, outlier replicates were defined as replicates whose fitness is more than 0.5
fitness units different from the other replicates. For strains with only two replicates, if the fitness
estimate difference between the two replicates was greater than 0.5, both replicates were
removed. Finally, all strains with only 1 replicate were omitted from further analysis as we have
no way to assess the quality of the fitness estimate. Overall, fitness estimates for ~548,046
(~190,604 genotypes with average of ~2.88 replicates per genotype) double barcodes per
environment were used for the remaining downstream analysis (Supplementary Table S3.2).
3.5.12 Heritability Estimates
Broad-sense heritability was estimated using the reproducibility of phenotype across replicates.
The linear model, fitness ~ genotype was used, where genotype is a categorical value
corresponding to the 2–4 biological replicates for each diploid segregant. Broad-sense
heritability was calculated by taking the sum of squares of genotype and dividing it by the total
sum of squares. Because large memory overhead is required to calculate broad-sense
heritability for the entire dataset, we calculated 1,000 broad-sense heritability estimates for a
smaller subset of randomly selected 2500 genotypes. The reported broad-sense heritability and
standard errors are the mean value and the standard error across the 1000 tests.
84
Variances explained by additive, dominance, and epistasis effects were estimated using the
‘sommer’ package in R (Covarrubias-Pazaran 2016). The additive, dominance, and epistasis
relationship matrices were calculated using the A.mat(), D.mat(), and E.mat() functions,
respectively. Variances explained by additive, dominance, and epistasis were then estimated by
dividing the variance of the respective components by the total variance. Similar to the
broad-sense heritability estimates, we calculated 1,000 variance estimates for a smaller subset
of randomly selected 2500 genotypes. The reported variance estimates and standard errors are
the mean value and the standard error across the 1000 tests.
3.5.13 Quantile Normalization of Fitness
For each genotype in each condition, the average fitness estimate across all barcode replicates
was calculated. Because the distribution of the average fitness estimates was slightly left
skewed, average fitness estimates were quantile normalized such that the data is normally
distributed using the ‘bestNormalize’ R package (Supplementary Fig. S3.6) (Peterson and
Cavanaugh 2020). This quantile normalized fitness estimate was used as the fitness phenotype
for all downstream analysis.
3.5.14 Genome-wide scan for one-locus effects
Our experimental design resulted in genotypes that are not equally related to each other. For
example, two diploid segregants that share a parental haploid segregant are more related to
each other than two diploids that do not share a parental haploid segregant. We found that
differences in relatedness had a large effect on the fitness of the diploid segregants. Thus,
failing to account for family structure could inflate type I error and result in false positives. To
account for these potential errors, we detected one-locus effect loci using FaST-LMM (Lippert
85
2011; Widmer 2015), which runs a mixed effects linear model (MLM) with a spectrally
decomposed genetic similarity matrix (GSM).
We used FaST-LMM’s single_snp function to perform a genome-wide linear mixed effect
analysis. The genotype table was reformatted as a binary biallelic genotype table (BED) prior to
running FaST-LMM using PLINK version 1.0761. To avoid proximal contamination, all SNP
markers that are located on the same chromosome as the SNP marker that is being tested were
omitted from the calculation of the GSM by setting the leave_out_one_chrom function as TRUE.
To determine appropriate significance thresholds, 1000 permutations were conducted with the
correspondence between genotypes and phenotypes randomly shuffled each time. Among the
minimum p values obtained in the permutations, the fifth quantile was identified and used as the
threshold for determining significant loci.
To detect loci that are closely linked to a nearby locus, we ran multiple rounds of stepwise
forward regression in FaST-LMM. For the second round of forward regression, we included the
most significant SNP marker from each chromosome that was above the significance threshold
from the initial genome-wide scan as a covariate in the mixed effect linear model. In the third
round, all SNP markers detected in the first two rounds were added as covariates in the mixed
effect model. This process was repeated until no significant SNP markers were detected. The
confidence interval of a locus was defined as the region surrounding a peak marker in which the
significance of detection was within three −log10(p-value) of the maximal significance observed
at the peak position (hereafter referred to as “3-LOD drop”).
3.5.15 Family-level scans for one-locus effects
We also performed scans for one-locus effects within families generated from the same MATa
haploid parental strain. Results from families generated from the same MATα haploid parental
86
stains were not used as the difference in sample size (~600 diploids in MATa families vs ~400
diploids in MATα families) had a huge impact on the statistical power to detect loci. Because all
genetic differences between diploids within a family are random, they do not need to be
corrected for family structure prior to mapping. Genome-wide mapping was conducted in each
family using a fixed effects linear model in R using the lm() function: lm(fitness ~ locus). The
fitness term corresponded to the vector of quantile normalized fitness values of individuals
within a family and the locus term corresponded to the vector of these individuals’ genotypes at
a given marker. Because diploids within a family are derived from the same MATa haploid, they
possess only two possible genotype states: BY/BY and BY/3S or BY/3S and 3S/3S, depending
on the allele present in the MATa progenitor. Although the locus term was encoded as a
categorical variable here, the coding of the locus term should not matter. The categorical coding
should be equivalent to a count of an allele because either defines two categories. The p-value
of the locus term was obtained using the summary.aov() in R. Among markers exceeding a
significance threshold, the most significant marker from each chromosome was identified and
utilized in subsequent rounds of forward regression. Specifically, identified loci were included in
the fixed effects linear model as a covariate, e.g. fitness ~ known_locus1 + known_locus2 + …
known_locusN + locus. This process was repeated until no additional locus terms were
discovered within an environment. The significance threshold for these forward scans was
established by pooling the p values from 1000 permutations of 10 randomly selected families. In
each permutation, the correspondence between genotype and phenotype was shuffled within a
family. From the resulting distribution of minimum p-values obtained in each permutation, the
fifth quantile was identified and used as the significance threshold. After calling peaks, loci
detected in multiple families were consolidated using 3-LOD drops within each family.
87
3.5.16 Identification of loci enriched for detections across families
We identified distinct loci that were enriched for detections across families (Brem et al. 2002).
We divided the genome into nonoverlapping 20 kb bins and determined the maximum number of
detections expected by chance at a Bonferroni-corrected p = 0.05/(the number of 20 kb
nonoverlapping bins). The threshold was then obtained from a Poisson distribution with λ = (# of
detections across all families within the environment)/(the number of 20 kb nonoverlapping bins
tiling the genome) using the qpois() function in R, with lower.tail set to FALSE.
3.5.17 Removing additive effects and correcting for family structure
For detection of dominance effects and interactions, family structure was corrected using a
previously described strategy (Hallin 2016). Specifically, we first estimated the fitnesses of each
parental haploid segregant by calculating the mean fitness of all diploids that originated from
each parental strain (additive genetic background contribution to fitness). Next, the expected
midparent value for each diploid was obtained by taking the average of the 2 parental fitnesses.
If phenotypic variation is due to only additive effects, then we expect the fitness of a diploid to be
the same as its midparent value. Deviations from the midparent value were then determined by
fitting a linear model between fitness and the midparent value, fitness ~ midparent value, and
taking the residuals (hereafter referred to as ‘residuals’). Not only does this correct out the
additive portion of each diploid’s phenotype, it also effectively accounts for family structure
(Supplementary Fig. S3.10) (Hallin 2016). These residuals, or the non-additive portions of
each diploid’s phenotype, were used as phenotype for the subsequent scans for loci involved in
non-additivity, such as dominance and genetic interactions.
3.5.18 Genome-wide scans for loci with dominance
To detect loci with one-locus non-additive effects (e.g. dominance), we ran multiple rounds of
forward regression. In the first round, a genome-wide scan was conducted using the following
88
fixed effects linear model: residuals ~ locus, where locus is the genotype of the diploids at a
given SNP marker encoded as a categorical variable. Here, the significance of the locus term
was tested. The significance threshold was determined by conducting 1000 permutations with
the correspondence between genotypes and residuals shuffled each time. Among the minimum
p values obtained in the permutations, the fifth quantile was identified and used as the threshold
for determining significant loci. The most significant SNP marker from each chromosome that
was above the significance threshold was identified as a significant locus. For subsequent
rounds of forward regression, identified loci were included in the fixed effects linear model as a
covariate, e.g. residuals ~ known_locus1 + known_locus2 + … known_locusN + locus. The most
significant SNP marker from each chromosome that was above the significance threshold was
again identified as a significant locus. This process was repeated until no additional locus terms
were discovered for an environment.
3.5.19 Degrees of dominance
For each locus detected in the scan for dominance effects, the magnitude of dominance was
calculated in the following manner. First, the data was subsetted based on the genotype state at
the focal locus (e.g., BY/BY, BY/3S, 3S/3S). Then, the mean fitness value for each genotype
state was calculated. Additive effect sizes of the 3S allele were calculated by subtracting the
mean fitness of diploids that are BY/BY at the focal locus from the mean fitness of diploids that
are 3 S/3 S at the focal locus and dividing by two. Dominance effect sizes for each focal locus
were calculated by subtracting the mean fitness of diploids that are BY/3S at the focal locus
from the average of the mean fitness of diploids that are BY/BY and 3S/3S at the focal locus
(i.e., the additive expectation for heterozygotes). Dominance effect sizes were then normalized
based on the absolute additive effect sizes of the 3S allele relative to the BY allele at the same
loci. Normalized dominance values ranging from −0.9 to 0 were classified as incomplete
dominance for the deleterious allele, while values ranging from 0 to 0.9 were classified as
89
incomplete dominance for the allele conferring higher fitness. Values that were between −0.9 to
−1.1 or 0.9 to 1.1 were classified as complete dominance for the deleterious and fit allele,
respectively. Values < −1.1 or >1.1 were classified as underdominant or overdominant,
respectively.
3.5.20 Comprehensive scan for pairwise interactions
We conducted a genome-wide scan for pairwise interactions using the following linear fixed
effect model: residuals ~ locus1 + locus2 + locus1:locus2. Here, the significance of the
locus1:locus2 interaction term was tested. Simpler terms were included in each model to ensure
that variances due to one-locus non-additive effects (e.g., dominance) were not erroneously
attributed to more complex terms. All locus terms were treated as categorical values, including
the locus1:locus2 interaction term, such that all 9 genotype classes (BY/BY-BY/BY,
BY/BY-BY/3S, BY/BY-3S/3S, BY/3S-BY/BY, BY/3S-BY/3S, BY/3S-3S/3S, 3 S/3S-BY/BY,
3S/3S-BY/3 S, 3S/3S-3S/3S) are treated as independent categories rather than continuous
numerical values. The significance threshold was determined by conducting 1,000 permutations
with the correspondence between genotype and residuals shuffled each time. Among the
minimum p-values obtained in each permutation, the fifth quantile was identified and used as
the significance threshold for our comprehensive scan of all two-loci interactions. To reduce
computational time, 10,000 random pairs of loci were randomly selected for each permutation
rather than all possible pairs of loci. All significant pairs of loci where both loci were within
3-LOD drop of each other were consolidated. For each set of overlapping loci, the SNP marker
with the most significant p-value was used for downstream analysis while less significant
markers were recorded but treated as the same genetic effect and not used downstream.
During our comprehensive scan for pairwise interactions, we detected several hubs that were
involved in large numbers of interactions. These interactions often spanned the entire length of
the genome, making it difficult to differentiate sites interacting with these hubs. In such cases, a
90
forward regression approach was conducted. Specifically, we first ran the same fixed effects
linear model as the comprehensive scan for pairwise interactions, but with locus1 fixed for a hub
locus, e.g., residuals ~ hub + locus2 + hub:locus2. The significance of the hub:locus2 interaction
term was tested. Significance thresholds were determined using the same permutation strategy
as the comprehensive scan. The most significant interacting locus from each chromosome that
was above the significance threshold was identified as a significant interactor of a hub locus.
For subsequent rounds of forward regression, all identified interactions were included in the
fixed effects linear model as covariates, including the simpler terms, e.g. residuals ~
hub + locus2 + known_locus1 + known_locus2 + …
known_locusN + hub:known_locus1 + hub:known_locus2 + … hub:known_locusN + hub:locus2.
The most significant interacting locus from each chromosome that was above the significance
threshold was again identified as a significant interactor with a hub locus. This process was
repeated until no additional hub:locus2 terms were discovered for each environment. To avoid
double counting interactions, all pairwise interactions identified in the comprehensive scan that
involved a locus within 50 kb of a hub were removed.
3.5.21 Scan for three-locus effects
A comprehensive scan for three-locus effects using all SNP markers is computationally
expensive. Instead, we scanned for three-locus effects using a smaller subset of 579 SNP
markers, where each SNP marker was at least 5 cM apart. The following fixed effects linear
model was used: residuals ~
locus1 + locus2 + locus3 + locus1:locus2 + locus1:locus3 + locus2:locus3 + locus1:locus2:locus3.
Similar to the scan for pairwise interactions, all locus and interaction terms were treated as
categorical values. Here, the significance of the locus1:locus2:locus3 interaction term was
tested. Simpler terms were included in each model to ensure that variances due to one-locus
non-additive effects (e.g., dominance) and pairwise interactions were not erroneously attributed
91
to more complex terms. As before, the significance threshold was determined by conducting
1000 permutations with the correspondence between genotype and residuals shuffled each
time. Among the minimum p-values obtained in each permutation, the fifth quantile was
identified and used as the significance threshold for our comprehensive scan of all two-loci
interactions. Similar to the pairwise interaction scan, for each permutation, 10,000 trios of loci
were randomly selected for each permutation rather than all possible trios of loci.
3.5.22 Resolving hub loci
Hub loci were resolved using data from family-level scans in each of the MATa families. The
peak position of each hub was defined as the most significant marker for that locus across all
families. Family-level detections with confidence intervals overlapping the peak position of a hub
were identified. The first and last coordinates of all confidence intervals overlapping a peak
position were compared, and the coordinates closest to each side of a peak position were
recorded as the minimum bounds for a given hub.
3.5.23 Estimating the fraction of epistatic effects that involve dominance
To understand how a locus modifies the effect of an interacting locus in a pairwise interaction,
each pairwise interaction was partitioned into four epistasis types: additive-additive,
dominance-additive, additive-dominance, and dominance-dominance (Yu 1997). The following
linear fixed effect model was used: residuals
~ locus1 + locus2 + additive_locus1:additive_locus2(a1:a2) + dominance_locus1:additive_locus2
(d1:a2) + additive_locus1:dominance_locus2(a1:d2) + dominance_locus1:dominance_locus2(d1:
d2). Additive_locus terms were treated as numerical values, while locus and dominance_locus
terms were treated as categorical values. The percent variance explained (PVE) by each of the
four epistatic types was then estimated by taking the sum of squares of each interaction term
and dividing it by the total sum of squares. Using these PVE values for each interaction term,
92
the fraction of epistatic effect involving dominance was then calculated. To estimate the fraction
in which locus1’s modifying effect on locus2 acts on dominance, the following equation was
used: (d1:a2 + d1:d2)/(a1:a2 + d1:a2 + a1:d2 + d1:d2). Conversely, to estimate the fraction in
which locus2’s modifying effect on locus1 acts on dominance, the following equation was used:
(a1:d2 + d1:d2)/(a1:a2 + d1:a2 + a1:d2 + d1:d2).
3.5.24 Examining how combinations of modifiers explain effect size variance of hubs
Using the PVE values for the four epistatic types (see section ‘Estimating the fraction of epistatic
effect that involves dominance’), we examined how loci that interact with a hub contribute to
additive and dominance effect size variance across different genetic backgrounds. For each
two-locus interaction involving a hub, the magnitude in which the hub modifier affects the
additive component of the hub (‘add_mod’) was determined by adding the PVE of a1:a2 and
a1:d2 interaction terms, where locus1 is the hub. The magnitude in which the hub modifier
affects the dominance component of the hub (‘dom_mod’) was determined by adding the PVE of
d1:a2 and d1:d2 interaction terms. Significantly large outliers in add_mod or dom_mod, defined
as values more than 1.5 interquartile range above the third quartile, were identified as major
effect modifiers.
To examine how modifiers collectively change the effect size of hubs, four modifiers with the
largest modifying effects on additivity and dominance were chosen for each hub. Data was then
subsetted into 81 genotype classes based on the genotype state at the four modifiers. For each
genotype class, the data were further subsetted based on the genotype state of the focal hub
locus. Additive effect sizes for each genotype class were calculated by subtracting the mean
fitness of diploids that are BY/BY at the focal hub from the mean fitness of diploids that are
3S/3S at the focal hub. Dominance effect sizes for each genotype class were calculated by
93
subtracting the mean fitness of diploids that are heterozygous at the focal hub from the average
of the mean fitness of diploids that are BY/BY and 3S/3S at the focal hub.
94
3.6 Supplementary Material
95
Figure S3.1 Workflow of how haploid and diploid segregants were barcoded. A. Each
parental strain was engineered to have a genomic “landing pad” containing two partially crippled
LoxP sites, LoxP/71 and Lox2272/71, and a galactose-inducible Cre recombinase at the
YBR209W locus. B. pBAR6 plasmid used to make the library of barcoding plasmids for MATα
segregants. C. Barcoding plasmids were made by combining linearized pBAR6 with a PCR
product containing a partially crippled Lox2272/66 site, a random 20-mer barcode sequence,
and a partial TrueSeq read 2 adapter sequence using Gibson assembly. The barcoding
plasmids were then individually transformed into each MATα segregant and integrated into the
genome at the genomic “landing pad” using cre-recombinase mediated homologous
recombination at the Lox2272 site. D. MATa segregants were barcoded by integrating a PCR
product containing a split URA3 marker, a partial TruSeq read1 adapter sequence, a random
20-mer barcode, and a partially crippled LoxP/66 at the genomic “landing pad” using
CRISPR/Cas9 mediated homologous recombination. E. After the MATa and MATα segregants
were mated, the two barcodes were brought onto the same chromosome using site-directed
chromosomal translocation via Cre-LoxP homologous recombination.
96
Figure S3.2 Genome-wide allele frequencies of haploid segregants and the inferred
genome-wide allele frequencies of diploid segregants. A. Genome-wide allele frequencies
of haploid BYx3S segregants. With the exception of a region on Chromosome IV, allele
frequencies were balanced in the BYx3S segregants used to generate the diploid population. B.
Inferred genome-wide allele frequencies of diploid BYx3S strains. Genotypes for diploid strains
were generated in silico using sequencing data from segregants containing the barcodes
present in each diploid.
97
98
Figure S3.3 Correlation of fitness estimates between strain replicates in each experiment.
Between two and four barcoded replicates of all strains were included in each pooled fitness
assay. Fitness estimates between strain replicates were well correlated (0.524 ≤ Spearman’s ⍴
≤ 0.8) within each experiment.
99
100
Figure S3.4 Accuracy of fitness estimates (measurement noise) is significantly impacted
by the fitness and initial frequency of a diploid strain. A. A negative relationship was
observed between the mean fitness of replicates and the standard error of the mean. Diploid
strains with the lowest fitnesses had the highest standard errors in fitness estimates between
replicates while strains with the highest fitness had the lowest measurement noise. B.
Examination of how the correlation of fitness between replicate strains changes if we only
include replicate strains that are abundant in the pool. Correlation of fitness between replicates
improved significantly as the required abundance threshold (double barcode count at the first
time point) was increased.
101
Figure S3.5 Correlation of fitness across environments. Cells in the heatmap contain
Spearman’s correlation coefficients of the mean fitnesses of strains across all environments.
Two environments, CoCl2 and CuSO4, were less correlated with other environments and each
other.
102
Figure S3.6 Quantile normalization of fitnesses. Because the distributions of fitness were
slightly left skewed, mean fitnesses were quantile normalized. This quantile normalized fitness
estimate was used as the fitness phenotype for all downstream analyses.
Figure S3.7 Most loci are detected as significant without family correction.
Histogram showing the p-values of all 7,742 SNP markers when tested for significance using a
fixed effects linear model without correcting for family structure. The p-values shown in this plot
are not corrected for multiple testing. The red line shows the nominal significance threshold at
p-value = 0.05.
103
Figure S3.8 Loci with individual effects on fitness in each environment. Panels from top to
bottom are 1) loci detected by the mixed effects linear model FaST-LMM (red bars), 2)
dominance loci detected by the fixed effect linear model using the non-additive portions of each
diploid’s phenotype (green bars), 3) loci enriched for detection in family-level scans (orange
bars) 4) loci detected using family-level scans (black or blue points), where each row is a
104
different MATa family, and 5) the total number of detections across families for each 20 kb
interval (grey bars).
105
106
Figure S3.9 Distinct loci show substantial variation in detection across families.
Genome-wide mapping was conducted in each family using a fixed effects linear model fitness ~
locus, where the fitness term corresponds to the vector of quantile normalized fitness values of
individuals within a family and the locus term corresponds to the vector of these individuals’
genotypes at a given marker. To determine appropriate significance thresholds, 1,000
permutations were conducted with the correspondence between genotypes and phenotypes
randomly shuffled each time. Only p-values from family scans that were significant after multiple
testing corrections are shown, with a log10 scaling employed. Red, green, and blue labels
denote distinct loci in family-level scans that were identified by FaST-LMM, dominance scans, or
enrichment tests, respectively.
107
Figure S3.10 Accounting for differences in mean parental fitness corrects for family-level
fitness effects in the population. A. The mean fitness of each of the 392 MATa and 591 MATɑ
families in glucose, ordered from lowest to highest mean fitness. B. The mean residuals, or the
non-additive portions of each diploid’s phenotype, for each family after accounting for mean
parental fitness.
108
Figure S3.11 Pairwise genetic interactions detected in each environment. In each plot,
interior lines connect regions of the genome involved in pairwise interactions. Outer rings
109
contain barplots with height corresponding to the number of pairwise interactions at each locus
(orange) and its absolute effect size (green) in the entire population.
110
Figure S3.12 Three-way genetic interactions detected in each environment.
In each plot, interior lines connect regions of the genome involved in three-way interactions.
Outer rings contain barplots with height corresponding to the number of three-way interactions
at each locus (orange) and its absolute effect size (green) in the entire population.
111
Figure S3.13 Genetic variation at the mating locus results in distinct heterozygote
classes. To ensure segregation of the mating locus, both BY MATa x 3S MATα and 3S MATa x
BY MATα crosses were performed using isogenic strains that had been mating type switched.
Pairwise matings between these haploids results in two possible heterozygous genotypes at the
mating locus in the resulting diploids: BY MATa / 3S MATɑ and 3S MATa / BY MATɑ.
112
Table S3.1 List of chemical additives and their concentrations used in the competition
experiments. Chemicals are added to SCM-Ura + 2% glucose (the Glucose1 and Glucose2
growth condition).
Table S3.2 Number of replicate diploid strains in each fitness assay. Column 1
(‘Environment’) lists the eight pooled fitness assays (including glucose replicates) conducted in
this study. Columns 2 - 5 (‘1_rep’ - ‘4_rep’) list how many diploid strains are present in one, two,
three, or four barcoded strain replicates within a given fitness assay. Column 6 (‘total’) lists the
total number of diploid strains present in each fitness assay. Column 7 (‘2+_rep’) is how many of
the total strains are present at least twice in an experiment.
113
Table S3.3 Broad and narrow-sense heritability estimates for each fitness assay. Column
1 (‘env’) lists the eight pooled fitness assays (including glucose replicates) conducted in this
study. Columns 2 (‘broad’) and 3 (‘broad_se’) list the broad sense heritability estimates and
corresponding standard error values. Columns 4 (‘narrow’) and 5 (‘narrow_se’) list the narrow
sense heritability estimates and corresponding standard error values. Columns 6 (‘dom’) and 7
(‘dom_se’) list estimates of phenotypic variance explained by dominance and the corresponding
standard error values. Columns 8 (‘epi’) and 9 (‘epi_se’) list estimates of phenotypic variance
explained by genetic interactions and the corresponding standard error values.
Table S3.4 The number of unique genotypes and the number of barcode replicates per
genotype detected before filtering for quality. Column 1 (‘Environment’) lists the eight pooled
fitness assays (including glucose replicates) conducted in this study. Column 2 (‘Number of
genotypes’) list the total number of diploid strains detected before filtering for quality. Column 3
(‘Number of rep per genotype’) lists the average number of barcoded strain replicates each
diploid was represented before filtering for quality.
114
Table S3.5 Estimated rate of PCR chimeras. Column 1 (‘Environment’) lists the eight pooled
fitness assays (including glucose replicates) conducted in this study. Column 2 (‘Coefficient of
determination’) lists the R2 value when observing the linear relationship between the number of
PCR chimeras and the abundance of the involved barcodes: # of copies of PCR chimera ~ # of
copies of barcode1 * # of copies of barcode2. Column 3 (‘PCR chimera rate’) lists the overall
PCR chimera rate estimated using this linear model.
115
Chapter 4: Genome-scale analysis of interactions between genetic perturbations and
natural variation
This work is currently in revisions at Nature Communications. It was done in collaboration with
Takeshi Matsui, Ilan Goldstein, Martin N. Mullis, Kevin R. Roy, Chris Ne Ville, Darach Miller,
Charley Wang, Trevor Reynolds, Lars M. Steinmetz, Sasha F. Levy, and Ian M. Ehrenreich.
Summary of author contributions
As the first author on this publication, I was involved in most major steps of the experiments and
data analysis. I became involved in this project when the barcoded segregant haploid panel was
being generated, when Martin Mullis was leading the project. Together, we performed the initial
transformations of the CRISPRi plasmid library into the segregant panel. Ilan Goldstein and I
optimized and finished the CRISPRi transformations, then designed and performed the fitness
assay experiment. With help from Charley Wang, the three of us performed the DNA extraction
and library preparation necessary to sequence the fitness assays on Illumina NovaSeq lanes.
Ian Ehrenreich, Sasha Levy, and I performed the data processing and analysis, and I wrote the
custom Python and R scripts needed. The three of us also wrote the final manuscript, while I
generated the figures.
4.1 Abstract
Interactions between genetic perturbations and segregating loci can cause perturbations to
show different phenotypic effects across genetically distinct individuals. To study these
interactions on a genome scale in many individuals, we used combinatorial DNA barcode
sequencing to measure the fitness effects of 7,700 CRISPRi perturbations targeting 1,712
distinct genes in 169 yeast cross progeny (or segregants). We identified 460 genes whose
perturbation has different effects across segregants. Several factors caused perturbations to
show variable effects, including baseline segregant fitness, the mean effect of a perturbation
116
across segregants, and interacting loci. We mapped 234 interacting loci and found four hub loci
that interact with many different perturbations. Perturbations that interact with a given hub
exhibit similar epistatic relationships with the hub and show enrichment for cellular processes
that may mediate these interactions. These results suggest that an individual’s response to
perturbations is shaped by a network of perturbation-locus interactions that cannot be measured
by approaches that examine perturbations or natural variation alone.
4.2 Introduction
Genetic perturbations can interact with genetic variants (or loci) segregating in a population
(Mullis et al. 2018). These interactions cause perturbations to show different phenotypic effects
in genetically distinct individuals (also known as genetic background effects) (Chandler, Chari,
and Dworkin 2013). For example, disease mutations often exhibit incomplete penetrance and
variable expressivity in humans (Nadeau 2001). Similarly, background effects could cause
genome editing to have variable efficacy, as applied to human disease, productivity of crops or
livestock, and engineering of cells and microorganisms for industrial applications. In an
evolutionary context, the fitness effect of de novo mutations can depend on interactions with
specific alleles present in an individual. These interactions can impact the dynamics of
beneficial mutations spreading through a population (Weinreich et al. 2006; Kryazhimskiy et al.
2014) or deleterious mutations persisting within a population (Johnson et al. 2019).
Interactions between genetic perturbations and loci also impact the phenotypic effects of loci. A
perturbation may alter the degree to which an interacting locus influences a trait (magnitude
epistasis) or which allele of that locus confers a higher trait value (sign epistasis) (Phillips 2008;
de Visser, Cooper, and Elena 2011). In some cases, a perturbation can uncover cryptic loci that
do not typically show phenotypic effects (Rutherford and Lindquist 1998; Queitsch, Sangster,
and Lindquist 2002), or mask loci that usually exhibit phenotypic effects (Jarosz and Lindquist
117
2010). The net effect of a perturbation can depend on epistasis with multiple loci that interact not
only with the perturbation but also each other (higher-order epistasis) (Matthew B. Taylor and
Ehrenreich 2015). That is, the response of a given individual to a perturbation will depend on its
genotype at all interacting loci (Goldstein and Ehrenreich 2021).
While numerous examples of background effects have been reported (Dworkin et al. 2003;
Chari and Dworkin 2013; Chandler et al. 2014; 2017; Matthew B. Taylor and Ehrenreich 2014;
M. B. Taylor and Ehrenreich 2015; Matthew B. Taylor et al. 2016; Lee et al. 2016; 2019; Dowell
et al. 2010; Hou et al. 2019; Paaby et al. 2015; Galardini et al. 2019; Caudal et al. 2022; Schell
et al. 2022; Ang et al. 2023), we lack a detailed understanding of interactions between genetic
perturbations and genetic backgrounds. For example, we do not yet know what characteristics
of perturbations and individuals’ genotypes increase the likelihood of interactions, what types of
epistasis are most likely to underlie background effects, and the properties of interaction
networks between perturbations and loci. To answer these questions systematically, we require
two things: perturbations at a genomic scale and extremely high-throughput measurement of the
phenotypic effects of these perturbations across many individuals. Studies of this scale have
been technically challenging and cost prohibitive.
Here, we developed a scalable platform for assaying interactions between genetic backgrounds
and genetic perturbations using a genomic double barcoding system (Levy et al. 2015;
Schlecht, Liu, Blundell, St.Onge, et al. 2017) and inducible CRISPR interference (CRISPRi) (Qi
et al. 2013; Smith et al. 2016; Smith et al. 2017). We utilized 169 haploid progeny (segregants)
from a cross between the BY4742 (BY) lab and 322134S (3S) clinical strains of Saccharomyces
cerevisiae, each of which contained a barcode that marked a segregant (Matsui et al. 2022;
Mullis et al. 2022a). We then integrated a genome-scale library of 8,046 gRNA RNAs (gRNAs),
each marked with a barcode, at an adjacent site. Pooled competitions, double barcode
118
sequencing, and lineage trajectory analyses enabled precise measurement of the fitnesses of
~875,000 segregant-gRNA combinations. This large dataset allowed us to identify 460 genes
whose perturbation has different effects across segregants and to characterize factors that
cause these background effects.
4.3 Results
4.3.1 Construction of a double barcoded library of segregants with inducible CRISPRi
perturbations
We previously generated, genotyped, and barcoded 822 haploid, MATɑ BYx3S ura3∆
segregants carrying a genomic landing pad at the neutral YBR209W locus (Matsui et al. 2022;
Mullis et al. 2022a) (Supplemental Figure S4.1 and S4.2). The landing pad contains a loxP site
and a partial URA3 marker for site-directed integration of constructs and recovery of integrants
by selection for URA3. The barcodes enable measurement of each segregant’s frequency in a
pool by amplicon sequencing. When measured at multiple time points, these frequencies can
be used to infer the relative fitness of each segregant in a pool (Smith et al. 2017; Matsui et al.
2022). To prevent cell clumping and flocculation during pooled experiments, the main flocculin
(FLO11) (Lo and Dranginis 1998) and the primary transcriptional activator of cell-cell and
cell-surface adhesion phenotypes (FLO8) (Kobayashi et al. 1996) in S. cerevisiae were deleted
from BY and 3S prior to mating. We randomly chose 169 barcoded BYx3S segregants from
different tetrads for use in the current study (Supplemental Data 1 and 2). Most of these
segregants were represented by a single barcode in the panel. To enable tests for
reproducibility, 30 segregants were represented by 2-3 distinct barcodes.
To introduce genetic perturbations into the segregants at high throughput, we also designed a
barcoded CRISPRi plasmid library that could be integrated into the genomic landing pad
adjacent to the segregant barcode (Supplemental Figures S4.3 and S4.4). An inducible
119
CRISPRi system (Smith et al. 2016) was chosen because it provides several key advantages.
CRISPRi perturbations are only induced during phenotyping experiments, which helps prevent
accumulation of de novo suppressor mutations during strain construction and culturing. The use
of inducible CRISPR interference also allows essential genes to be perturbed, expanding the
space of possible target genes. Finally, CRISPRi-mediated gene repression is orthogonal to a
cell’s endogenous transcriptional machinery and should perform similarly across strains. Each
plasmid in this library contained an endonuclease-dead Cas9 (dCas9) fused with the Mxi1
repressor domain, an inducible single guide RNA (gRNA) targeting dCas9 to a specific
promoter, a unique 20-nucleotide barcode, a loxP site to enable chromosomal integration, and
the remaining portion of URA3. This plasmid library targeted a total of 1,739 genes that have
been reported to be essential under fermentative or respiratory growth conditions with 8,864
unique gRNAs, with an average of five distinct gRNAs per gene (Smith et al. 2017)
(Supplemental Data 3, Supplemental Figure S4.5). 91 unique gRNAs targeting intergenic and
noncoding regions rather than promoters were also included with the intent of serving as
controls. Roughly 10 distinct barcodes were generated for each gRNA on average, resulting in
an estimated plasmid library size of ≥90,000. All gRNAs were under the control of a
tetO-modified RPR1 promoter, enabling gRNA induction by the addition of anhydrotetracycline
(ATC) to liquid growth medium. ATC itself has no effect on yeast growth at the concentrations
used in this assay (Smith et al. 2016; Bak et al. 2010; Farzadfard, Perli, and Lu 2013).
We integrated this library of barcoded CRISPRi plasmids into each segregant and the BY parent
using large-scale transformations and Cre/loxP-mediated site-specific recombination. Natural
variation in transformability impeded integration of the library into certain segregants, as well as
into the 3S parent (Supplemental Figure S4.6). Based on colony counts and amplicon
sequencing of double barcodes, we estimated that ~10,000-15,000 integrants, or ~1.1-1.7
barcodes per gRNA, were recovered per successfully transformed strain (Supplemental Data 4).
120
Transformants were grown to stationary phase and roughly 10
8
-10
9 cells per strain were
combined into an initial pool, which was used as the initial time point (T0) in the subsequent
fitness assays.
4.3.2 Phenotyping of the segregant-gRNA pool
Plasmids were designed so that after integration, chromosomally-encoded segregant and
plasmid barcodes would be 99 bp apart, enabling their co-amplification by PCR and sequencing
within the same paired-end read (Fig. 4.1A, Supplemental Figure S4.3). Unlike previous
experiments involving single barcodes (Matsui et al. 2022), this double-barcode design allows
both the genotype and its integrated gRNA to be identified in a single sequencing read.
Sequencing of barcode pairs revealed that 169 unique segregants (223 segregant barcodes)
and 8,046 unique gRNAs (107,141 gRNA barcodes) targeting 1,721 distinct genes were present
in T0. Each gRNA appeared in 110 segregants on average, with T0 containing ~2,800,000
double barcodes in total. On average, ~4.5 gRNAs targeted a given gene (Supplemental Data
5).
121
Figure 4.1 Generating and phenotyping a panel of double-barcoded segregants carrying
CRISPRi perturbations. a) Design and generation of double-barcoded haploid segregants
carrying CRISPRi perturbations. Segregant-specific barcodes were first integrated into the
genomic landing pad via a first transformation. Then, barcoded CRISPRi constructs were
integrated into the genomic landing pad in a second transformation. These integrations
produced a double barcode within each cell that uniquely identified its specific segregant-gRNA
combination. b) Pooled fitness assays were used to phenotype all segregant-gRNA
combinations. These fitness assays involved performing double barcode amplicon sequencing
122
at multiple time points and using the resulting frequency data to estimate fitnesses for all double
barcode lineages. c) Density plot showing fitnesses for all lineages in the two replicate
experimental fitness assays with induction of gRNA expression (ATC1 and ATC2). Each data
point represents a double-barcode lineage that appears in both ATC1 and ATC2, with an
independent fitness estimate calculated for each assay. d) Density plot showing fitnesses for all
lineages across one experimental fitness assay with induction of gRNA expression (ATC1) and
one control assay without induction of gRNA expression (CON). e) Average fitness values of 74
gRNAs that directly overlap polymorphisms between BY and 3S, across both experimental and
control assays. gRNAs were designed for the BY genome; segregants with 3S alleles at the
binding site will not perfectly match the gRNA. Asterisks and ‘n.s.’ respectively indicate p-values
<1x10
-5 and >0.05 via t-test. f) Distribution of all gRNA effects as calculated by the mixed effects
linear model. The estimated null distribution is shown by the dashed line. The threshold used for
calling gRNAs with mean effects is three standard deviations below the mean of this null
distribution, as indicated by the vertical red line. The gRNA effect here is a mean calculated by
averaging across all segregants carrying the gRNA. g) Density plot of average fitness values for
all gRNAs with mean effects across the two fitness assays (ATC1 and ATC2). For each gRNA,
replicate barcodes within each genotype were averaged before taking the mean across all
genotypes. h) Density plot of average fitness values for all gRNAs with mean effects across one
experimental fitness assay and one control assay (ATC1 and CON). In all density plots, the data
points–double barcode lineages in (c) and (d) and gRNAs in (g) and (h)–are organized in bins to
generate heatmaps.
The T0 pool was split into three separate fitness assays: two replicate experimental assays
where the gRNA expression was induced with ATC (ATC1 and ATC2, experimental conditions)
and one control assay where no ATC was added and gRNA expression was not induced (CON,
control condition). All assays were done in synthetic complete medium containing glucose as
the fermentable carbon source and lacking uracil, which maintained selection for the URA3
marker generated by integration of CRISPRi plasmids. We performed the assays for 10
generations in serial batch culture, diluting 1:4 roughly every two generations, with a bottleneck
size of ~2.5x10
10 cells. We sequenced each assay at zero, two, four, six, and 10 generations
(T0, T1, T2, T3, and T5). At each of these time points, all double barcodes were extracted,
sequenced and counted. For each double barcode, frequency measurements across all five
time points were combined into a single lineage, using Bartender clustering to account for
sequencing errors and mutations (Zhao et al. 2018). These lineages were used to estimate the
relative fitnesses of segregant-gRNA combinations using PyFitSeq (F. Li, Salit, and Levy 2018)
(Fig. 4.1B, Supplemental Figures S4.7 - S4.10; Supplemental Table S4.1; Supplemental
123
Data 6 - 8). PyFitSeq estimates the fitnesses of lineages relative to the mean fitness of the
whole population. Specifically, PyFitSeq models the fitness of each individual lineage as
constant across time and allows the mean population fitness to vary over time with changes in
the frequencies of different lineages (F. Li, Salit, and Levy 2018). Our fitness estimates were
stable regardless of whether we used all time points or different subsets of time points
(Supplemental Table S4.2), indicating that heterogeneity in the timing of gRNA induction and
gene knockdown did not have a major impact on fitness estimates. Because fitness estimates
are relative to the mean population fitness within an assay, comparing fitnesses between assays
requires a normalization step. To enable comparisons between assays, we normalized the
fitnesses within each assay relative to the fitnesses of control gRNAs, which target noncoding
and intergenic DNA. We also adjusted all three assays such that the mean fitness of all lineages
in the CON assay was zero.
Because all three fitness assays contain the same double barcode lineages, we can directly
compare the fitnesses of lineages between assays. We expected that lineages would have
highly correlated fitnesses between the two replicate ATC assays. In part, we had this
expectation because we previously showed these segregants have different fitnesses (Matsui et
al. 2022; Mullis et al. 2022a), implying that a major source of variance in fitness in this study
should be baseline fitness differences among segregants. Additionally, we expected that
lineages carrying efficacious gRNAs would have different fitnesses in ATC and CON assays.
However, only a subset of the gRNAs have potent biological effects under our assay conditions
(Smith et al. 2016; 2017), so we expected that only some lineages would exhibit differences
between ATC and CON. Most lineages should thus correlate well between ATC and CON due to
the lack of gRNA effects, with the few lineages carrying efficacious gRNAs showing large
deviations. Indeed, we found that fitness estimates for the two replicate ATC assays were highly
correlated (Fig. 4.1C, Spearman’s correlation: 0.8), and the correlation of fitness estimates
124
between CON and ATC1 was only slightly lower (Fig. 4.1D, Spearman’s correlation: 0.683). As
expected, a subset of lineages showed decreased fitness in ATC1 compared to CON,
presumably because these lineages carry gRNAs with significant fitness effects.
To identify specific efficacious gRNAs, we used mixed effects linear models that accounted for
both the baseline fitnesses of segregants and the mean fitness effects of gRNAs across
segregants. For the remainder of this paper, we use the ‘mean effect’ of a gRNA to refer to its
average fitness effect across all segregants, as estimated from a mixed effects linear model.
Similar to a prior study using the same gRNA library (Smith et al. 2016), ~33% of gRNAs (2,290)
had significant mean effects. The specific gRNAs that were efficacious in our experiment
showed substantial overlap with those identified in the prior study; of 731 gRNAs previously
shown to have an effect, 89% (651) also had significant mean effects in our data set. To confirm
that our technique was capable of detecting gRNAs with variable effects across segregants, we
examined gRNAs in our library that directly target polymorphic sites. Because these gRNAs are
all specific to the BY genome, a 3S allele at the binding site is likely to reduce, if not eliminate,
gRNA binding and efficacy. For these gRNAs, we found that segregants with the 3S allele at the
binding site had significantly higher fitness than segregants with the BY allele, implying that
gRNA binding was impacted by the SNP. This difference in fitness was correlated with the
distance between the disrupting SNP and the protospacer adjacent motif, with smaller distances
leading to a larger difference between BY and 3S alleles (Supplemental Figure S4.11).
Additionally, in the CON assay, where gRNAs are not induced, this difference was not significant
(Fig. 4.1E, Supplemental Figure S4.11 and Supplemental Table S4.3). This implies that the
fitness differences observed in our assays were due to the biological effects of the gRNAs, and
that our technique had sufficient precision to detect differences in gRNA activity across
segregants. All gRNAs that bind near polymorphic sites were excluded from further analysis. As
an additional validation, within the same assay, efficacious gRNAs targeting the same gene had
125
correlated effect sizes (Pearson correlation = 0.382, pval = 5.6x10
-19
; Supplemental Figure
S4.12). Such a correlation was never observed across 1,000 permutations of the same data
(Pearson correlations in permutations: mean = 5.4x10
-4
, min = -0.143, max = 0.118;
Supplemental Figure S4.12).
Because all genes targeted in this study have been previously annotated as essential under
fermentative or respiratory growth conditions (Smith et al. 2017), we did not expect gRNAs to
show positive effects. Thus, in addition to the ANOVA model, we employed a conservative effect
size threshold to identify efficacious gRNAs. We only considered a gRNA further if its fitness
effect was at least three standard deviations below the mean of a null distribution inferred from
the data (Fig. 4.1F, Supplemental Data 9). Of the 6,977 CRISPRi perturbations in this study that
had invariant binding sites, 1,536 met all of these criteria, targeting 787 distinct genes
(Supplemental Figure S4.13). To validate this thresholding step, a single fitness estimate was
obtained for each of these efficacious gRNAs by averaging all of its genotype and gRNA
barcode replicates within an experiment. These average fitnesses were highly correlated
between ATC1 and ATC2 (Fig. 4.1G, Spearman’s correlation: 0.986), but not between ATC1
and CON (Fig. 4.1H, Spearman’s correlation: 0.198).
4.3.3 Identification and characterization of background effects
We measured the fitness of each segregant-gRNA combination under both induced and
uninduced conditions, making it possible to explicitly test for interactions between gRNAs and
segregants using mixed effect linear models. At a False Discovery Rate of 0.05, 699 of the
1,536 efficacious gRNAs had significant background effects based on a gRNA-segregant
interaction term (Supplemental Data 9). These results suggest that segregants do not show
major differences in global CRISPRi efficacy, as most efficacious gRNAs (~54%) have
indistinguishable effects across segregants. The minority of gRNAs that show background
126
effects target 460 distinct genes in total (58% of all genes targeted by efficacious gRNAs). Prior
work across model systems has shown that 15%-32% of all genes show background effects
(Goldstein and Ehrenreich 2021). However, excluding genes whose perturbation does not show
measurable effects in any individual has been shown to increase estimates of the prevalence of
background effects to as high as 74%-89% (Goldstein and Ehrenreich 2021). Thus, the
proportion of genes that we identified with background effects is in line with other studies. For
the remainder of this paper, we focus on a reduced data set, selecting a single gRNA for each of
the 460 genes. Selections were made based on the significance of the background effect,
prioritizing gRNAs with mapped loci (as described later).
We next identified factors that caused gRNAs to show variable effects across segregants. To
compare fitness effects across diverse gRNAs, we first estimated the effect of each individual
gRNA in each individual segregant using mixed effects linear models. Then, the mean effect of
a gRNA was subtracted from these segregant-specific values. This assigned a single ‘deviation
value’ to each segregant, representing the difference between the effect of the gRNA in a
specific segregant and its effect on average (Fig. 4.2A). This process was repeated for each
gRNA, assigning a deviation value to every observed combination of gRNA and segregant (Fig
4.2B, Supplemental Data 10). We found a slight negative relationship between the deviation
values (the difference between the effect of a gRNA in an individual segregant and the mean
gRNA effect) and the baseline fitness of that segregant, with the same gRNA having a more
detrimental effect in higher-fitness strains (simple linear regression R
2 = 0.149, pval < 1.0x10
-100
,
95% bootstrap confidence interval: 0.145-0.152; Fig. 4.2C). In addition, the variance of a
gRNA’s deviation values was related to its mean effect size, with more detrimental gRNAs
showing larger variances (simple linear regression R
2 = 0.543, pval = 6.1x10
-80
, 95% bootstrap
confidence interval: 0.459-0.618; Fig. 4.2D, Supplemental Figure S4.14). These results
indicate that the baseline fitness of a segregant (without any perturbation) and the mean effect
127
of the gRNA can both impact background effects. Baseline fitness impacts the direction of a
background effect in an individual segregant and mean gRNA effect impacts the magnitude of
background effects across segregants.
Figure 4.2 Calculating deviation values for all segregant-gRNA combinations. a)
Schematic representation of how deviation values are calculated from fitness. b) Heatmap of all
deviation values. Segregants are shown along the y-axis, ordered by mean fitness, and gRNAs
128
are shown along the x-axis, ordered by mean effect size. Missing combinations are shown in
gray. Only rows and columns with 30% or less missing data are shown. c) Relationship between
baseline segregant fitness and deviation values for that genotype. Dots indicate average
deviation value, and lines indicate 2 standard deviations. The 95% bootstrap confidence interval
for the R
2 value of 0.149 was 0.145-0.152. d) Relationship between mean gRNA effect and the
standard deviation (SD) of deviation values for that gRNA, with the best-fit line shown. The 95%
bootstrap confidence interval for the R
2 value of 0.543 was 0.459-0.618.
We also determined the extent to which genetic factors segregating in the cross explain the
deviations. Using replicated measurements among genotypes and gRNAs, we estimated broad
sense heritability (H
2
), which measures the total genetic contribution to a trait(Lynch and Walsh
1998). H
2 was 0.737 on average (sd = 0.06, min = 0.54, max = 0.88, Fig. 4.3A), implying that
background effects identified here have a mostly genetic basis.
All loci contributing to a background effect show epistasis with a perturbation (pairwise
epistasis). However, some loci may interact not only with a perturbation but also each other
(higher-order epistasis). The relative contribution of pairwise vs. higher-order epistasis
underlying background effects can be estimated by comparing H
2 and narrow sense heritability
(h
2
), which measures only the additive genetic contribution to a trait (Lynch and Walsh 1998).
Using genomic relatedness-based estimation, we found that h
2 was 0.358 on average (sd =
0.193, min = 0, max = 0.775). Thus, 51.4% (1 - 0.358/0.737) of the genetic basis of background
effects could be attributed to higher-order epistasis on average (Fig. 4.3B).
129
130
Figure 4.3 Using deviation values as traits in linkage mapping. a) Histogram of broad sense
heritability (H
2
) values for all gRNAs with background effects. b) Histogram of the ratio between
narrow sense heritability (h
2
) and H
2
for all gRNAs with background effects, shown as (1 - h
2
/H
2
).
c) Linkage mapping on deviation values. Upper panel: Visualization of all 2x-log10
(pval) drops for
the gRNAs with significant peaks. Each row is a unique gRNA. Locations of major fitness loci
are indicated by black bars. Lower panel: Number of overlapping 2x-log10
(pval) drops at each
nucleotide position along the genome. Regions where more intervals overlap than expected by
chance (hubs) are shown in unique colors. Vertical lines indicate chromosome boundaries.
Linkage mapping results for the replicate assay (ATC2) are available as Supplemental Figure
S4.15. d) Histogram of all effect sizes for the detected loci. These values were obtained from the
linear model used for linkage mapping, deviations ~ locus, taking the coefficient of the locus
term. This is roughly equivalent to the difference between the mean deviation value of
segregants carrying the 3S allele and the mean deviation value of segregants carrying the BY
allele. e) Histogram of the proportion of heritability explained by each detected locus, calculated
as R
2
/H
2
. f) Interaction plots connecting gRNA location to the corresponding 2x-log10
(pval) drop
for each of the four hubs.
4.3.4 Hub loci control the phenotypic effects of many gRNAs
We next used linkage mapping to identify interacting loci that cause gRNAs to show variable
effects across segregants. Specifically, we treated the deviations as trait values and mapped
loci contributing to these deviations. By definition, loci identified in such scans show pairwise
genetic interactions with gRNAs and play an additive role in the deviations. Because only 169
segregant genotypes were present in our data set, statistical power was limited, and we were
only able to detect loci for a subset of the gRNAs. Further, genetic variants that modify
chromatin accessibility near gRNA targets may impact CRISPRi efficacy at those sites. Such
variants were a minor contributor to our data set: we found only 21 gRNAs with detected loci
within 10 kb of their binding site and excluded these cases from subsequent analyses
(Supplemental Table S4.4). Across the remaining gRNA-targeted genes with background
effects, we detected 234 loci at a permutation-based significance threshold of -log10
(pval) ≥ 4.21
(Fig. 4.3C, upper panel, Supplemental Data 11). Of the targeted genes, 172 had a single
mapped locus, 28 had two mapped loci, and two had three mapped loci.
Detected loci were not evenly distributed across the genome (Fig. 4.3C, lower panel). Across all
loci, the BY allele produced higher deviation values roughly as often as the 3S allele did (Fig.
131
4.3D). On average, individual loci explained 21% of a deviation’s total genetic basis (sd = 0.055,
Fig. 4.3E). Of the 234 detected loci, 169 (72.2%) were grouped into one of four ‘hubs’ that
interacted with a larger number of genes than expected by chance. Hubs on Chromosomes VII,
X, XIII, and XIV showed linkage to 38, 25, 76, and 18 gRNA-targeted genes, respectively (Fig.
4.3F, Supplemental Table S4.5). Similar results were obtained from the replicate experimental
assay (Supplemental Figure S4.15). The gRNA-targeted genes to which each hub showed
linkage were largely distinct, with only 13 genes (7.7%) connected to multiple hubs. This result
further supports the notion that the segregants do not show major differences in global CRISPRi
efficacy. We also found four other loci that are likely hubs but did not pass our detection
threshold (Supplemental Figure S4.16). Our identification of hubs is consistent with a recent
study in which the fitness effects of transposon insertions were measured in a panel of yeast
segregants and loci were identified that modified the effects of numerous insertions (Johnson et
al. 2019). Further, two hubs (Chr XIII and XIV) and two loci that interacted with multiple gRNAs
but did not pass the threshold to be called hubs (Chr XII and XV) in our study overlapped
multi-hit loci found in this other study.
In addition, we mapped three loci that contribute to the baseline fitness of the segregants (Fig.
4.3C). This low number of detected fitness loci was due to the small number of segregants
tested. The hubs on Chr X and Chr XIII each overlapped one of these fitness loci, implying that
the same locus can affect both baseline segregant fitness and the effects of genetic
perturbations. We found that 43% of all detected loci with effects on deviations were mapped to
these two hubs. However, as we show below, these loci had different impacts on baseline
fitness and CRISPRi perturbations.
132
4.3.5 Properties of hub loci that interact with many genetic perturbations
We found that the effect of each hub shows a different relationship with the fitness effects of the
gRNAs it interacts with. The Chr VII and XIV hubs exhibit the greatest effects in the presence of
gRNAs that cause the largest fitness deficits. These hubs were not detected as significant loci
during the linkage mapping of baseline fitness (Fig. 4.3C), but were instead identified through
their interactions with multiple gRNAs (Figs. 4.4A and B). In the presence of gRNAs interacting
with either the Chr VII or Chr XIV hub, the 3S allele of the locus becomes beneficial relative to
the BY allele. This is true for all 38 of the gRNAs that interact with the Chr VII hub and all 18 of
the gRNAs that interact with the Chr XIV hub. The magnitude of the difference between
genotype classes at these hubs is positively related to the mean effect of a gRNA (Chr VII
simple linear regression R
2 = 0.44, pval = 6.2x10
-06
; Chr XIV simple linear regression R
2 = 0.63,
pval = 7.8x10
-05
).
133
Figure 4.4 Features of and comparisons between hubs. Effects of the Chr VII (a), XIV (b), X
(c), and XIII (d) hubs on mean fitness and deviation value across gRNAs that interact with each
hub. Dot plots in upper panels: Effects of the hub on mean fitness when each interacting gRNA
is induced. Control indicates a subset of 1,294 gRNAs with no phenotypic effect in our data set,
as determined by gRNAs with p-values greater than 0.8 for both the mean effect term and the
background effect term in a mixed effects linear model (see Methods section ‘Identification and
quantification of gRNA effects’). Orange points show the mean fitness of all genotypes with the
3S allele at the peak marker for that gRNA’s interacting locus and green points show genotypes
with the BY allele. The fitnesses of replicate barcodes were averaged before calculating the
mean for each genotype. Vertical bars for each point indicate standard error. The control data
used the center of the hub instead of a peak marker. Barplots in lower panels: Effects of a hub
on deviation values across interacting gRNAs. The change in deviation values was calculated
by taking the mean of all deviation values among genotypes with the 3S allele at the peak
marker and subtracting the mean of all deviations among genotypes with the BY allele. The
control data has no deviation values.
By contrast, the Chr X and XIII hubs show the smallest effects in the presence of gRNAs that
cause the largest fitness deficits. These hubs were both detected as significant loci during the
linkage mapping of baseline fitness (Fig. 4.3C), with the 3S allele again being beneficial.
However, the fitness effects of both hubs are usually attenuated in the presence of interacting
gRNAs (Figs. 4.4C and D). This is true for 23 of 25 of the gRNAs that interact with the Chr X
hub and all 76 of the gRNAs that interact with the Chr XIII hub. Unlike the Chr VII and XIV hubs,
the magnitude of the difference between genotype classes at the these hubs is inversely related
to the mean effect of a gRNA (Chr X simple linear regression R
2 = 0.52, pval = 4.5x10
-05
; Chr
XIII simple linear regression R
2 = 0.57, pval = 4.7x10
-08
). Interactions between gRNAs and hubs
can result in masking of a hub’s effect (both Chr X and XIII) or even sign epistasis (Chr XIII
only). These results demonstrate that genetic perturbations can quantitatively modify the effects
of loci, causing them to show a spectrum of effects in an otherwise genetically identical
population maintained in the same environment.
4.3.6 Insights from the genetic interaction network of the yeast cell
We hypothesized that the interactions between a given hub and different gRNAs were due to a
common underlying mechanism. Using the CellMap genetic interaction network (Usaj et al.
134
2017) and gRNA-targeted genes to which a hub showed linkage, we identified multiple
significant functional enrichments (Fig. 4.5). The Chr XIII and XIV hubs were enriched for
transcription (p-values ≤1.0x10
-04
); the Chr X and XIII were enriched for mitosis (p-values
≤0.05); and the Chr VII and X hubs were enriched for DNA replication and repair (p-values
≤0.05). Several hubs also had unique enrichments: Chr VII for protein turnover and cell polarity
(p-values ≤1.0x10
-06
); Chr X for glycosylation & protein folding and MVB sorting & RIM signaling
(p-values ≤1.0x10
-04
); and Chr XIII for rRNA/ncRNA processing and vesicle traffic (p-values
≤1.0x10
-07
). These data suggest that each hub likely acts through a distinct combination of
cellular processes to influence the phenotypic effects of genetic perturbations. The Chr VII and
XIV hubs have similar effects on mean fitness and deviation values across interacting gRNAs
(Figs. 4.4A and B), but few functional categories are shared between their interaction networks
(Figs. 4.5A and B). The same is true for Chr X and Chr XIII (Figs. 4.4C and D; Figs. 4.5C and
D), indicating that different mechanisms can produce similar allelic relationships.
135
Figure 4.5 Interaction networks for each hub. Cellmap visualization of the interaction
networks for the Chr VII (a), XIV (b), X (c), and XIII (d) hubs. A list of all genes targeted by a
gRNA in this hub were used as a query at thecellmap.org to create these images, at a
significance of p ≤ 0.05.
4.4 Discussion
We found that the effect of a genetic perturbation depends on baseline fitness (i.e. the fitness of
an individual in the absence of a perturbation), the mean fitness impact of the perturbation
136
across individuals, and the specific alleles carried by an individual at interacting loci. Our
findings do not appear to be driven by major differences in global CRISPRi efficacy, as most
efficacious gRNAs do not show background effects and hubs interact with different sets of
gRNAs. Further, variation in chromatin accessibility near gRNA target sites cannot explain our
results because the vast majority of loci that contribute to deviations did not occur near gRNA
target sites. Based on the heritability explained by detected loci, we can infer that there are
likely to be multiple interacting loci contributing to each background effect. The large number of
perturbations in our data allowed us to determine the degree to which segregating loci interact
with different perturbations. While some loci interact with only one perturbation, we identified
several hub loci that interact with many perturbations, implying that responses to many different
perturbations can have a shared genetic basis.
Perturbation-locus genetic interaction networks like the one described here have been
historically difficult to map because they require introducing and quantitatively phenotyping a
large number of genetic perturbations in a large number of individuals. These previously
unmeasured networks shape how individuals respond to different perturbations. Features of
these networks likely impact how genetic background shapes the effects of perturbations,
leading to incomplete penetrance, variable expressivity, and differences in robustness to
perturbations. They might also control which mutations are deleterious or beneficial, thereby
influencing evolutionary trajectories. Similar to more well-described networks between genetic
perturbations alone (Costanzo et al. 2016) or between natural variation alone (Matsui et al.
2022; Ba et al. 2022), perturbation-locus networks appear to contain many nodes with few
interactions and few nodes with many interactions. Further study is needed to determine how
these different types of genetic networks relate to each other.
137
The measurable perturbation-locus network will depend on the genetic variation segregating
among examined individuals, and this will necessarily be a subset of the full perturbation-locus
network within a species. To simplify this problem, we focused on a systematic analysis of the
perturbation-locus network between two haploid strains, BY and 3S. The current study could not
fully map the network of interactions between perturbations and loci in the BYx3S cross due to
the modest number of assayed segregants. This sample size was chosen to facilitate
examination of a genome-scale gRNA library, with the expectation that many gRNAs would not
be efficacious under our assay conditions. These results now enable follow-up studies with
more segregants and fewer gRNAs. Such studies will facilitate the detection of many more loci
that interact with perturbations and thereby comprehensive mapping of the network of
interactions between perturbations and loci in this cross. The ultimate goal of this line of study
would be to predict the outcome of a perturbation in an unseen genetic background with
potential lessons for therapeutic genome editing, crop and livestock improvement, and
engineering of cells and microorganisms for industrial applications.
4.5 Methods
4.5.1 Generation of barcoded haploid segregants
The barcoded BYx3S MATɑ segregants in this study were previously reported (Matsui et al.
2022; Mullis et al. 2022a). In brief, all segregants were tetrad dissected from a cross between
fcy1∆ flo8∆ flo11∆ ura3∆ versions of BY and 3S that both carried the same genomic landing pad
at the neutral YBR209W locus. The fcy1∆ ura3∆ deletions provide counterselectable markers,
while the flo8∆ flo11∆ deletions eliminate cell-cell adhesion phenotypes, such as flocculation.
The genomic landing pad contains multiple partially-crippled loxP sites in close proximity, in
addition to a galactose-inducible Cre recombinase. To barcode segregants, we individually
transformed them with a plasmid library containing 20-nucleotide random barcodes, integrated
plasmids into the landing pad by Cre/loxP-mediated recombination, and obtained random
138
integrants. The barcodes present in the segregants, as well as segregant genotypes, were then
determined via multiple Illumina HiSeq 2500 lanes (Mullis et al. 2022a). Briefly, six HiSeq lanes
were used to perform whole-genome sequencing on each individual segregant, with reads
mapped against the S288c reference genome R64-2-1_20150113 using BWA (H. Li and Durbin
2009). SAMTOOLS (H. Li et al. 2009) was used to convert alignments to pileups, from which
high-confidence SNP lists were generated. Segregant barcodes were identified using a
separate HiSeq 2500 lane, in which the barcode was PCR amplified from each segregant using
specific primers. After being isolated from sequencing reads, barcodes were clustered using
Bartender (Zhao et al. 2018), with clusters making up greater than 5% of all reads from a
segregant being assigned as its true barcode.
4.5.2 CRISPRi plasmid construction
The pE32 CRISPRi plasmid was constructed using plasmids pKR387, pKR482, and pKR919.
An annotated plasmid map of pE32 is available in the supplementary information. To construct
the pE32 CRISPRi plasmid, a C to T synonymous mutation was first introduced at amino acid
position 1319 (Glycine, GGC -> GGT) of dCas9 on pKR387 to remove an AscI recognition site
using the QuikChange II site-directed mutagenesis kit (Agilent #200523). The resulting product
was transformed into NEB 10-beta cells using a standard heat shock protocol and transformants
were selected on LB agar plates with 100 μg/mL of carbenicillin. From this pKR387 plasmid with
the mutated dCas9, a DNA fragment containing a GPM1 promoter, tetR gene, GPM1 terminator,
TEF promoter, dCAS9, MxiI repressor domain, and CYC1 terminator was amplified by PCR. The
resulting product was inserted into linearized pKR482 using NEBuilder HiFi DNA assembly kit
(NEB E5520S) and transformed into 10-beta cells. Transformants were selected on LB agar
plates with carbenicillin, and the resulting plasmid was named pSL25. This pSL25 plasmid was
linearized with AscI and ClaI and ligated with a double-stranded oligonucleotide containing a
partially crippled Lox66 site using T4 DNA ligase (NEB M0202S). The ligation product was
139
transformed into 10-beta cells, selected on LB agar plates with carbenicillin, and named pSL55.
Finally, a DNA fragment containing RPR1 promoter, tetO sequence, and a hammerhead
ribozyme sequence was amplified from pKR919 by PCR. The hammerhead ribozyme was
included because preliminary experiments found that it improves gRNA efficacy in CRISPRi.
The PCR product was inserted into pSL55, which was linearized with SpeI and BspQI, using
NEBuilder HiFi DNA assembly kit. The resulting product was transformed into 10-beta, selected
on LB agar plates with carbenicillin, and named pE32.
4.5.3 Generation of the CRISPRi library
All CRISPRi gRNAs used in this paper were previously generated (Smith et al. 2016; 2017). To
generate the CRISPRi plasmid library for this paper, ~20,000 arrayed yeast strains (Smith et al.
2017), each containing a unique chromosomally-encoded gRNA, were plated onto YPD agar
plates with 300 μg/mL of Hygromycin B in a 384-well format using Singer ROTOR. The yeast
cells were grown overnight at 30°C. Although these yeast strains are genetically identical except
for the 20 bp gRNA, heterogeneity in growth was observed across yeast strains. To minimize
bias in gRNA frequency, equal amounts of each overnight colony were transferred into 200 μL
of water using the ROTOR. Cells containing essential gRNAs targeting essential and
non-essential genes were then pooled separately and spun down at 3000 rpm for 15 minutes
prior to genomic DNA extraction. Genomic DNA was extracted using MasterPure yeast DNA
purification kit (Lucigen MPY80200). A DNA fragment containing a gRNA with the tracrRNA
scaffold sequence, URA3 promoter, 5’ half of URA3, an artificial intron, and an I-SceI cut site
was amplified from the pooled genomic yeast DNA. The reverse primer used to amplify this
region contained a random 20 bp sequence, which was used to mark each gRNA with an
unique barcode. To minimize PCR bias, the amount of DNA required for amplification was
calculated such that each gRNA is represented by ~1,000 DNA molecules. A total of 330 ng and
270 ng of yeast genomic DNA was used as template for amplification of ~11,000 non-essential
140
and ~9,000 essential gRNAs, respectively. The resulting product was inserted into pE32, which
was linearized with BspQI and AscI, using NEBuilder HiFi DNA assembly kit. The resulting
product was transformed into 10-beta cells and selected on LB agar plates with carbenicillin. To
ensure high barcode complexity, ~110,000 and ~90,000 transformation colonies were scraped
and pooled prior to plasmid extraction. While the CRISPRi library used in this study initially had
a library size of ~20,000 gRNAs, a previously unknown design issue with the non-essential
plasmid library affected growth on selective media, resulting in substantial depletion of
non-essential gRNAs in the T0 pool. As a result, all non-essential gRNAs were excluded from
analyses, which had a minor impact on our total data. Raw data from non-essential gRNAs is
included in the Supplementary Information.
4.5.4 Linking barcodes to gRNAs in the CRISPRi library
The CRISPRi library was initially generated using fully randomized barcode sequences,
necessitating a linkage step to connect known gRNAs with their barcodes. To link barcodes and
gRNAs, the full plasmid library was sequenced using a PCR-free library preparation method and
Oxford Nanopore MinION. The SQK-LSK109 protocol was used to prepare this sequencing
library, with elution steps after bead cleanup performed at 37°C for 10 min to increase yield.
Sequencing was performed on R9.4.1 Chemistry MIN-106 flow cells. Basecalling was performed
on the USC Center for Advanced Research Computing Discovery cluster. We ran Guppy v6.0.1
with the configuration dna_r9.4.1_450bps_sup.cfg on a single node with 16 threads and a V100
GPU. After basecalling, we first mapped the gRNA portion of a read to its best match among the
known gRNAs and then mapped the barcode portion of read to its best match among the known
gRNA barcodes from barcode sequencing of the T0 pool. Fuzzy string matching was done in
Python using the process.extractOne() function in the rapidfuzz library (Bachmann [2020] 2023).
4.5.5 Transformation of haploid strains with the CRISPRi library
141
The strains used in this study were randomly selected from the haploid panel described above.
All selected strains were individually transformed with the full CRISPRi library, using a modified
version of the standard lithium acetate protocol (Gietz and Schiestl 2007) that scaled up all
volumes by a factor of 10. Roughly 8.0x10
8 cells and 30 μg of plasmid library were used for
each transformation. Immediately after transformation, cells were grown in 3.4 mL of YP +
Galactose liquid media for 18-20 hours to induce integration of the plasmid while minimizing cell
division. After induction, cultures were grown in SC - URA liquid media in order to select for cells
that successfully integrated the plasmid. Since SC - URA is not immediately lethal to
non-integrated cells, cultures were first grown in 60 mL of selective media for 48 hours. Next, 20
mL of the culture was combined with 40 mL fresh SC - URA to perform a second setback. This
two-step enrichment was done to ensure that as many cells in the cultures were true integration
events and to maintain the barcode and gRNA diversity among yeast transformants. After
enrichment, 12.5 mL of the stationary phase culture was saved and frozen as a freezer stock.
4.5.6 Fitness Assay
One full freezer stock from each transformed segregant (roughly 1.25x10
8 cells) was used for
the fitness assay, regardless of transformation efficiency. This value was chosen assuming a
roughly uniform distribution of gRNA frequency within each segregant, leading to an expectation
of ≥10
4 cells per lineage. All freezer stocks were thawed and grown to stationary phase in SC -
URA liquid media. Then, 10 mL from each culture (roughly 2x10
9 cells) was combined to
generate a 2.5 L T0 pool. Multiple T0 cell pellets were frozen for fitness assays and DNA
extraction. To begin the assay, 250 mL of the T0 pool (roughly 5x10
10 cells) was thawed and
inoculated into 750 mL fresh SC - URA liquid media. To induce gRNA expression, ATC was
added to experimental flasks (assays) at a final concentration of 250 ng/mL. ATC was not added
to the control assay. To generate the first time point (T1), assays were grown for 24 hours at
21°C with shaking. The majority of the culture was harvested and saved as a frozen cell pellet,
142
while the remaining 250 mL was inoculated into 750 mL of fresh media and grown for another
24 hours. This process was repeated to generate the remaining time points (T2 through T7).
The ATC1, ATC2, and CON assays were grown, harvested, and diluted in parallel, under
identical growth conditions.
4.5.7 Library preparation for barcode sequencing
DNA was extracted from frozen cell pellets using the Zymo Research Quick-DNA
Fungal/Bacterial Midiprep Kit, with roughly 120 μg extracted from each time point. In order to
separate the barcode region from the rest of the genome, extracted DNA was digested for 18
hours with the restriction enzyme I-SceI. The ~250 bp barcode region was then isolated from
the genomic DNA via a standard AMPure XP bead cleanup protocol. Briefly, cleanup was
performed by incubating DNA with 0.4X beads for 15 minutes, taking the supernatant and
incubating with 0.8X beads for 15 minutes, then discarding the supernatant and washing DNA
off the beads. 70% Ethanol washes were performed between each step. Total DNA after bead
cleanup ranged from around 20 μg to 30 μg. The resulting DNA was first amplified with a 5-cycle
PCR with Phusion polymerase, using 200 ng of input DNA and primers specific to the double
barcode region. These primers were:
1)AATGATACGGCGACCACCGAGATCTACACNNXXXXNNACACTCTTTCCCTACACGAC
2)CAAGCAGAAGACGGCATACGAGATNNXXXXNNGTGACTGGAGTTCAGACGTGTGCTCTT
CCGATCT
The N’s in the above sequence represent the random unique molecular identifiers (UMIs) used
to account for and eliminate PCR duplicates that could appear in later amplification steps. The
X’s represent Illumina multiplexing indices used to enable eventual pooling of multiple libraries
onto a single sequencing lane. To ensure consistency of the PCR protocol, each library was split
in half on this step, with each aliquot amplified with a different set of multiplexing indices. Final
143
results and data output were similar across all index combinations for each library. All primer
sequences are available in the Supplementary Information.
After the first PCR step, all products were pooled together and purified with Zymo Research
DNA Clean & Concentrator-5 columns, using 150 μL of PCR product per column and eluting
into 30 μL. The purified product was used to set up a second 26-cycle PCR with Phusion
polymerase, using 15 μL of template. Universal Illumina primers P1 and P2 were used for this
reaction. All PCR products from this reaction were pooled and put through another Clean &
Concentrator-5 column, with 375 μL of product per column, and eluted into 35 μL. The cleaned
product was re-pooled, and the 200-300 bp region was extracted via agarose gel
electrophoresis. DNA was recovered from the gel using Zymo Research Zymoclean Gel DNA
Recovery Kits and Clean & Concentrator-5 columns. The purity and concentration of the final
library was assessed using Life Technologies Qubit 2.0 and Thermo Fisher NanoDrop ND-1000
Spectrophotometer. The structure, sequence, and diversity of each library was verified via
Sanger sequencing.
4.5.8 Barcode sequencing
150 bp paired-end sequencing was performed for all time points, on either a NovaSeq 6000 or a
HiSeq 4000 platform. Up to three multiplexed time points were included on each NovaSeq lane,
with each time point being allocated roughly 800 million reads regardless of platform. Each
sequencing lane was performed with 25% PhiX spike-in to ensure nucleotide diversity during
the run, since the majority of the amplicons consisted of fixed bases. Sequencing data was
analyzed with custom-written code in Python and R. For the initial processing of large files,
USC’s Center for Advanced Research Computing Discovery cluster was used. Reads were
sorted based on several multiplexing indices unique to each time point. Forward reads were
used to extract gRNA barcodes and reverse reads were used for genotype barcodes, with reads
144
filtered out if the expected fixed nucleotides did not appear immediately downstream of either
barcode. These downstream sequences are CCCGAGTCGCGATAA and TACCGTTCGTATAGG
for the gRNA and segregant barcodes, respectively. Reads were also filtered out if the first 35
bases of either the forward or reverse read had an average Illumina quality score below 30.
Consistent with previous publications(Levy et al. 2015; Schlecht, Liu, Blundell, St Onge, et al.
2017; Mullis et al. 2022b; Matsui et al. 2022), very similar barcodes were clustered together and
treated as the same sequence in order to minimize the effects of sequencing errors and random
mutations on barcode counts. Clustering was performed using both Bartender (Zhao et al. 2018)
software and the rapidfuzz package (Bachmann [2020] 2023), and was done separately for
genotype and gRNA barcodes. First, a list was generated of every unique barcode detected
across all time points by the extraction method described above. Consensus sequences were
identified from this input using Bartender, with the merging threshold disabled (-z -1) to avoid
frequency-based clustering. Each consensus sequence was then compared against the entire
list of verified reference barcodes using rapidfuzz, with similarity scores of 90 or higher
considered a match. This threshold is roughly equivalent to a 1-2 nucleotide difference between
the Bartender consensus sequence and the known reference barcode. The majority of reads
clustered and matched this way (>95%) were associated with a perfectly-matched reference
barcode (score = 100). Reference lists for genotype barcodes were obtained through separate
HiSeq 4000 lanes on pooled segregant barcodes (Mullis et al. 2022a), in addition to Sanger
sequencing on individual segregants. Reference lists for gRNA barcodes were obtained from
separate Nanopore sequencing lanes (see section “Plasmid Library Preparation” for more
detail).
After clustering, the frequency of each double barcode in each time point was calculated.
Barcode amplicons undergo PCR before being sequenced, and PCR duplicates can cause
145
significant bias if they are not removed when calculating frequencies. To account for this, a
random 4-mer UMI was added to every multiplexing index during the library preparation, for a
total of 8 random nucleotides per fragment. These UMIs, which are added on an early step of
PCR, are distinct from the 20-nucleotide barcodes described above, and are only used to
eliminate PCR duplicates. Because the total number of possible 8-nucleotide combinations (4
8 =
65,536) is significantly higher than the expected number of reads for any individual
double-barcode fragment, it is unlikely that the same DNA template will contain the same UMIs
by chance. This can facilitate estimation of barcode frequencies by counting unique UMIs
detected for each double barcode. However, UMI-containing primers were not fully removed
from the sample by PCR cleanup kits due to their large size (> 50 bp), causing unique UMIs to
be added in all stages of PCR, not just the first few cycles. This meant that not every PCR
duplicate could be identified via UMIs and removed from our frequency measurements.
4.5.9 Removing PCR chimeras
The double barcode constructs in this study consist of two highly variable barcodes separated
by a region of invariant nucleotides, which can enable the production of spurious double
barcodes (chimeras) during PCR (Omelina et al. 2019). To quantify and account for chimeras,
we chose ~10 individual segregants and extracted their barcodes in isolation from each other.
These individually-prepared segregants were sequenced via the methods described above on
the HiSeq 4000 platform. Since segregants were completely isolated during this library
preparation, there was no opportunity for PCR chimeras to be generated. This allowed us to
construct a high-confidence list of the true gRNA barcodes present in those 10 segregants,
which in turn identified double barcodes from the fitness assay data that were likely chimeras.
The number of double barcodes per segregant identified as chimeric based on invalid
combinations of barcodes from this data was consistently <10%. To more broadly correct these
data for chimeras, a linear model was fit for the frequency of each known chimera as a function
146
of the total frequency of the corresponding segregant and gRNA barcodes in that sequencing
lane: (frequency of chimera) ~ (total frequency of segregant barcode) + (total frequency of
gRNA barcode). This linear model was fit separately in four early time points from the fitness
assay, and the coefficients were averaged across the four models. The final model was used to
correct every double barcode’s calculated frequency in each time point for its expected number
of chimeras, with corrected double barcode frequencies set to a minimum of zero.
4.5.10 Fitness estimation
Fitness estimation was performed as previously reported (F. Li, Salit, and Levy 2018), with minor
edits to adapt the method to our fitness assay. Specifically, differences in coverage between
time points (Supplemental Figure S4.7) were first accounted for by normalizing read counts.
This was done by dividing each read count by the total number of reads from that time point. All
read counts were then multiplied by the same arbitrary value (450,000,000, roughly the number
of reads present in T0). PyFitSeq (F. Li, Salit, and Levy 2018) was used to estimate fitness of
each double barcode lineage, which consisted of normalized read counts from five separate
time points. PyFitSeq is commonly used software for fitness estimation in barcode sequencing
studies (F. Li, Salit, and Levy 2018). This software infers the fitnesses of lineages relative to the
whole population based on changes in lineage frequencies over time, to determine the fitness of
each lineage relative to the mean of the population at T0. The mean population fitness is
allowed to vary over time as different lineages proliferate or decline in frequency, but the fitness
of each individual lineage is fixed across all time points. Fitness estimates are used to project
lineage trajectories, these projected trajectories are compared to observed trajectories, and are
adjusted over the course of multiple iterations until an optimum is reached. The maximum
number of iterations was set to 200, and all default PyFitSeq settings were used otherwise.
Lineages were removed from the data set if their log likelihood score was one interquartile
range below the first quartile, or if <5 reads were present at the first time point (T0), as both
147
cases led to inaccurate fitness estimates. Additionally, segregants with <100 associated gRNAs
were removed from the data set entirely, as were gRNAs with <2 associated segregants.
PyFitSeq generates fitness estimates that are relative to the mean population fitness of each
assay at T0, so any small differences between the mean fitness of each assay must be
corrected via normalization. This allows the fitness estimates to be compared between different
assays, and is done by selecting a subset of lineages that is expected to have the same fitness
in all three assays. This normalization was performed using 83 control gRNAs as the common
reference point across the three assays. For each of the control gRNAs, the average fitness of
every lineage carrying that gRNA was determined. Six of the 83 control gRNAs were removed at
this step, as their average fitness was below zero and they had inconsistent effects between the
experimental and control assays. After taking the average fitness of every gRNA, the median of
these values was used to normalize all three assays. Specifically, the difference in these
medians between ATC1 and CON was subtracted from the fitness of every lineage in ATC1, and
this process was repeated for ATC2. Normalization was then performed by taking the difference
in median fitness among the neutral gRNAs between ATC and CON assays, with both ATC
assays normalized to CON.
To ensure the validity of this normalization method, two other common reference points were
chosen. Both of these leveraged the fact that we do not expect any gRNAs to have positive
effects in our data set. The first additional reference point was the set of all gRNAs whose
average fitness was higher in both experimental assays than in the control assay. The second
additional reference point was the set of all gRNAs whose average fitness appeared in the 4th
quartile of all three assays. Normalizing with respect to both of these additional reference points
produced similar values to normalizing via the control gRNAs.
148
Finally, to aid in interpretation of the fitness values, we adjusted all fitness estimates in all
assays by the same value, such that the mean fitness of all lineages in CON was exactly zero.
This was done to ensure that neutral lineages with no major fitness effects were as close as
possible to an intuitive fitness value of zero, which is not produced by PyFitSeq by default.
Because our analysis only concerns comparisons between fitness values, shifting the entire
data set in this way has no impact on the following analysis.
4.5.11 Identification and quantification of gRNA effects
After assays were normalized, mixed effects linear models were used to determine if gRNAs
had a significant mean effect across all segregants, and if any background-specific effects were
present. Here, we use ‘mean effect’ to refer to the impact a gRNA has on fitness on average,
when considering all genotypes together. This is distinct from the genotype-specific deviation
values discussed elsewhere in this study. The mean effect of each gRNA was determined by
first identifying the subset of all lineages carrying that gRNA, then comparing the fitnesses of
these lineages across one experimental assay (ATC1) and the control assay (CON) using mixed
effects models. The mixed effects model fitness ~ segregant + gRNA + error was fit to this
subset of lineages using the lme() function in the nlme package in R (Pinheiro and Bates 2000).
Here, the segregant term was treated as a random effect and the gRNA term was treated as a
fixed effect. All lineages from the control assay were assumed to lack the perturbation (gRNA =
0), and all lineages from the experimental assay were assumed to have the perturbation present
(gRNA = 1). If the addition of a genotype:gRNA interaction term significantly improved model fit,
the model fitness ~ segregant + gRNA + segregant:gRNA + error was used for that gRNA
instead. Regardless of model, if the p-value for the gRNA term was significant after
Benjamini-Hochberg multiple testing correction, the perturbation was considered to have a
potential mean effect. Among the gRNAs with mean effects, if the p-value for the
gRNA:genotype interaction term was also significant after Benjamini-Hochberg multiple testing
149
correction, the gRNA was considered to have a background-dependent effect. This was
repeated for each individual gRNA. After all gRNAs had been tested for effects, this process
was repeated using the second replicate experimental assay (ATC2) rather than ATC1.
Mean gRNA effects were estimated by extracting the gRNA term coefficient from the
appropriate linear model described above. This was done for all gRNAs, regardless of their
effects. Analysis of the control data indicated that some gRNAs had leaky expression, having
similar fitness effects across both experimental and control assays. The linear models described
above also removed these gRNAs from the data set, as the mean gRNA effect was determined
by comparing the same gRNA between ATC and CON assays. We found that the p-value
threshold described above was not sufficiently stringent to identify efficacious gRNAs. To set a
conservative threshold for efficacious gRNAs, we took the subset of gRNAs with neutral or
beneficial effects and simulated a normal distribution from this data. Perturbations were only
considered to have a mean effect if they were also at least three standard deviations below the
median of this simulated distribution.
In the case that a background-specific effect was detected using the method described above,
segregant-specific model coefficients were extracted using the predict() function in R. This
provided an estimated gRNA effect for every segregant in which that gRNA occurs. The overall
mean gRNA effect was then subtracted from these values, giving us segregant-specific
deviation values. These represent the direction and magnitude of each segregant’s response to
a gRNA after correcting out the gRNA’s mean effect. Segregants with insufficient data to obtain
a coefficient were excluded from this calculation. gRNAs with deviation values for less than 35
segregants (~20%) were excluded from further analysis.
4.5.12 Linkage mapping
150
Linkage mapping was performed individually for each gRNA using deviation values as
phenotypes. For each gRNA, we performed genome-wide scans using the lm() function in R,
with the model lm(deviations ~ locus). Significance thresholds were determined by running
1,000 permutations in which a random gRNA was chosen, its deviations were randomly
shuffled, and the lowest p-value was saved. From these 1,000 p-values, the 5th percentile was
used as the significance cutoff for the respective linkage mapping model. Peak detection was
performed by taking 2x-log10
(pval) drops for each locus that was both above the significance
cutoff and at least 100 kb away from any other 2x-log10
(pval) drop. Any loci for which the gRNA’s
binding location was <10 kb away from the peak marker were excluded from analysis. These
could potentially represent polymorphisms that disrupt a gRNA target site or that impact gRNA
binding by modifying local chromatin accessibility. In total, 655 gRNAs were removed this way.
Thus, local variation in chromatin did not have a major impact on our study.
4.5.13 Heritability estimation
Reproducibility of guide effects across ATC1 and CON assays was used to estimate
broad-sense heritability. This was possible due to the replication present in the data set, with
most gRNAs and many segregants each represented by multiple distinct barcodes. First, a
mean fitness estimate for each segregant was obtained from the CON assay and control
gRNAs. Then, for each gRNA, these segregant fitnesses were subtracted from each
corresponding double barcode fitnesses from the ATC1 assay, providing a guide effect estimate
for every individual double barcode. These estimates were used to fit the linear model
guide_effect ~ genotype. Using this model, broad-sense heritability for each gRNA was
determined by taking the sum of squares between genotypes and dividing by the total sum of
squares. Narrow-sense heritability was estimated using the “sommer” package in R. The
A.mat() function was used to generate an additive relationship matrix from the deviation values
151
previously generated for each segregant-gRNA combination, and the mmer() and vpredict()
functions were used to generate the narrow-sense heritability estimate from this matrix.
4.5.14 Analysis of hub loci
A significance threshold for detecting hubs was calculated by assuming a random distribution of
intervals across the genome (Brem et al. 2002). Specifically, the genome was split into 20 kb
bins and a threshold was set at 0.05 / (number of bins). A Poisson distribution was used to
determine the number of overlapping intervals per bin that would result in a p-value below this
threshold. For this Poisson distribution, lambda was set as (total length of all detected
2x-log10
(pval) drops in bp) / (total length of the yeast genome in bp). Hubs were defined as
adjacent loci with counts (number of overlapping intervals) above this threshold. The locus with
the highest count was selected as the marker for that hub. In the case of ties, the genomic
positions of the tied loci were averaged, and the locus closest to the average was used as the
marker. All gRNA-targeted genes whose 2x-log10
(pval) drops overlapped this marker were
considered a part of that hub. The effect of each hub locus on deviation value in the context of
interacting gRNAs was extracted from the linear model used for linkage mapping lm(deviations
~ locus), taking the coefficient of the locus term. Proportion of broad-sense heritability explained
by the hub locus for each gRNA was also calculated this way, by taking the variance explained
by the model and dividing it by the broad-sense heritability estimate for that gRNA. CellMap
analysis was performed by uploading lists of genes comprising each hub to cellmap.org and
downloading enrichment tables and visualizations (Usaj et al. 2017).
152
4.6 Supplementary Material
Figure S4.1: Landing Pad design. a) Genomic landing pad before integration of any plasmids.
LA_71 indicates a partial mutation on the left arm of the Lox site, which has no effect on its own.
RA_66 indicates a similar partial mutation on the right arm. Lox2272/66 should only recombine
with Lox2272/71. b) Expression of Cre recombinase causes the plasmid to be integrated into
the genome in a predictable configuration. c) Recombination at the Lox2272 sites results in the
plasmid contents being integrated into the chromosome. This also places both mutant arms on
the same Lox site, which renders it nonfunctional and limits spontaneous popout of the plasmid.
153
Figure S4.2: Allele frequencies across all genotypes. The proportion of segregants carrying
the 3S allele of a site is shown. The fixed region on Chromosome III is the mating locus, at
which all segregants possess the 3S allele of MATalpha.
154
Figure S4.3: Plasmid design and double-barcoding scheme. a) Genomic landing pad before
integration of any plasmids. b) Integration of segregant barcode plasmid using the Lox2272
sites. A detailed visualization of this step is available in Supplemental Figure 1. c) Integration of
barcoded CRISPRi plasmid library using the LoxP sites. d) Landing pad structure after CRISPRi
plasmid integration. The URA3 gene is only functional after CRISPRi plasmid integration,
allowing for selection in SC - URA liquid media.
155
Figure S4.4: CRISPRi plasmid map.
156
Figure S4.5: Multiple replicate gRNAs target the same gene. a) Number of replicate gRNAs
per gene (mean = 4.14, median = 4, sd = 2.65). b) An example of replicate gRNA positioning
relative to a target gene. The blue box indicates the coding sequence of TFC3/YAL001C, with
all gRNAs positioned around the promoter. Arrows indicate guide strand (plus/minus).
157
Figure S4.6: Transformation efficiency across 58 random segregants. Segregants without
error bars were not replicated. Dots represent means and error bars represent one standard
deviation. The green bar indicates the BY parent, and the orange bar indicates the 3S parent.
158
Figure S4.7: Distribution of data per gRNA. a) Histogram of the number of genotypes per
gRNA for all ~7,000 analyzed gRNAs (mean = 115.98, median = 120, sd = 26.84). b) Number of
double barcodes per gRNA for all analyzed gRNAs (mean = 762.61, median = 674, sd =
474.65). c) Number of genotypes per gRNA for the 460 efficacious gRNAs used in downstream
analyses (mean = 128.85, median = 131, sd = 18.14). d) Number of double barcodes per gRNA
for gRNAs used in downstream analyses (mean = 1008.15, median = 892, sd = 549.26).
159
Figure S4.8: Distribution of fitness values from each assay. a) Distribution of fitness values
in ATC1. b) Distribution of fitness values in ATC2. c) Distribution of fitness values in CON.
160
Figure S4.9: Distribution of barcodes and normalized reads per genotype in each flask.
161
Figure S4.10: Distribution of barcodes and normalized reads per gRNA in each flask.
Normalized read histograms have extreme outliers removed for visibility (~1.5% of all gRNAs).
162
Figure S4.11: XY plot between the effect of SNPs that directly overlap gRNA binding sites
and the distance between the gRNA’s PAM site and that SNP. The effect of the SNP is
calculated as (effect of the guide in segregants with the BY allele) - (effect of the guide in
segregants with the 3S allele). gRNAs perfectly match their targets in segregants with the BY
allele. Simple linear regression R
2 = 0.285, pvalue = 9.9x10
-7
.
163
Figure S4.12: Effects of gRNAs that target the same gene. gRNA effect was estimated by a
mixed effects linear model (see Methods section “Identification and Quantification of Guide
Effects”). If more than two gRNAs targeted the same gene, they were randomly assigned into
pairs. The same subset of gRNAs was used for all following panels. a) X-Y plot showing the
correlation between effects among gRNA pairs in the ATC1 condition (Pearson correlation =
0.382, pval = 5.61x10
-19
). b) X-Y plot showing the correlation between effects among gRNA
pairs in the ATC2 condition (Pearson correlation = 0.378, pval = 5.66x10
-20
). c) Permutations for
the correlation shown in (a). All gRNAs were randomly assigned into pairs, regardless of their
targeted gene, and the same Pearson correlation was calculated. This process was repeated
1000 times, with the resulting distribution of correlations shown in blue. The value from the
original data set is shown by the red line. d) Permutations for the correlation shown in (b).
164
Figure S4.13: Location of all gRNAs across the genome. Numbers show how many gRNAs
in total are present on each chromosome. Gray lines indicate locations of gRNAs without mean
effects. Red lines indicate locations of gRNAs with mean effects only. Green lines indicate
gRNAs with background effects. Blue lines indicate control gRNAs.
165
Figure S4.14: X-Y plot between gRNA effect and all deviation values for that gRNA. Mean
deviation value is shown by a point, with lines indicating two standard deviations. Each point
represents a distinct gRNA-targeted gene with a background effect.
166
Figure S4.15: Linkage mapping results on deviation values for the replicate experimental
flask ATC2. Upper panel: Visualization of all 2x-log10
(pval) drops for the gRNAs with significant
peaks. Each row is a unique gRNA. Locations of major fitness loci are indicated by black bars.
Lower panel: Number of overlapping 2x-log10
(pval) drops at each nucleotide position along the
genome. Regions where more intervals overlap than expected by chance (hubs) are shown in
unique colors. Vertical lines indicate chromosome boundaries.
167
Figure S4.16: Potential hubs below the significance threshold used during analysis. Hubs
were categorized here if they passed the significance threshold of 9 overlapping confidence
intervals in only one of the two conditions (ATC1 or ATC2), or if at least 5 overlapping
confidence intervals were present in at least one condition. a) Effect of a locus near the 5’ end of
chromosome VII, at position 119852 and 39054 bp in width. This locus is distinct from the main
Chr VII hub, and was detected as a significant locus during the linkage mapping on baseline
fitness. This locus was detected only in ATC1. b) Average effect of the locus on chromosome
XII, at position 662627 and 27942 bp in width. c) Average effect of the locus near the 5’ end of
chromosome XV, at position 147540 and 59773 bp in width. This locus was above the
significance threshold only in ATC1. d) Average effect of the second locus on chromosome XV,
at position 1011045 and 11091 in width. This locus was only detected in ATC1.
168
Fitness Assay Time Point Total Raw Reads
ALL 0 460428002
ATC1 1 392520148
ATC1 2 399527009
ATC1 3 311268392
ATC1 5 365532111
ATC2 1 387400026
ATC2 2 354986473
ATC2 3 336666459
ATC2 5 376377113
CON 1 333135885
CON 2 349830935
CON 3 302795126
CON 5 401794794
Table S4.1: Total raw reads used for fitness estimation. Reads were counted just before
performing normalization and running PyFitSeq (see Methods section “Fitness Estimation” for
more detail).
169
Table S4.2: Stability of fitness estimates across different subsets of time points. In order
to test how the choice of time points affects PyFitSeq fitness estimates, the software was
provided with a variable number of input time points from the ATC1 fitness assay. For each
unique set of time points, a fitness estimate was generated for every lineage. The Pearson
correlation was obtained by comparing the fitness estimates for each lineage across two sets of
input time points. The time point set “T0-T1-T2-T3-T5” was the one used in this study. All
p-values were below 2.2x10
-16
.
170
Time Point Set 1 Time Point Set 2 Pearson Correlation
T0-T1-T2-T3-T5 T0-T1-T2 0.691
T0-T1-T2-T3-T5 T0-T1-T2-T3 0.824
T0-T1-T2-T3-T5 T0-T1-T2-T3-T5-T7 0.933
T0-T1-T2-T3-T5 T1-T2-T3-T5 0.888
T0-T1-T2-T3-T5 T2-T3-T5-T7 0.704
T0-T1-T2-T3-T5 T1-T2-T3-T5-T7 0.837
T0-T1-T2 T0-T1-T2-T3 0.822
T0-T1-T2 T0-T1-T2-T3-T5-T7 0.613
T0-T1-T2 T1-T2-T3-T5 0.509
T0-T1-T2 T2-T3-T5-T7 0.244
T0-T1-T2 T1-T2-T3-T5-T7 0.469
T0-T1-T2-T3 T0-T1-T2-T3-T5-T7 0.767
T0-T1-T2-T3 T1-T2-T3-T5 0.678
T0-T1-T2-T3 T2-T3-T5-T7 0.495
T0-T1-T2-T3 T1-T2-T3-T5-T7 0.651
T0-T1-T2-T3-T5-T7 T1-T2-T3-T5 0.833
T0-T1-T2-T3-T5-T7 T2-T3-T5-T7 0.788
T0-T1-T2-T3-T5-T7 T1-T2-T3-T5-T7 0.906
T1-T2-T3-T5 T2-T3-T5-T7 0.729
T1-T2-T3-T5 T1-T2-T3-T5-T7 0.936
T2-T3-T5-T7 T1-T2-T3-T5-T7 0.819
gRNA Sequence Chr gRNA Start gRNA End PAM Position SNP position gRNA Effect
TGATGCATTCACTCTCAGTT 12 805700 805719 805697 805710 -0.0950996
GAACTCTAATACCTTTAGTT 2 691875 691894 691872 691880 -0.0624023
TTATCATAAACCAAGACCGG 7 999243 999262 999240 999249 -0.1315465
AAAAGAGTAACCTAGGAAAT 7 999219 999238 999216 999229 -0.0916477
AATATCACGATAGGACAAAT 12 694205 694224 694202 694211 -0.0356261
AGGCCTGAAAAATTTCAAAG 11 122400 122419 122422 122402 -0.0863247
GGAGAAAACTAAACGGGAAT 7 894494 894513 894516 894512 -0.1013647
AGGTTCCTGGTATCAAAGTG 7 767344 767363 767341 767344 -0.0415362
ATTCACATTCGAGGGGTTGC 4 238808 238827 238830 238826 -0.0738085
AAAACACGTTTGCATTGTTT 16 534884 534903 534906 534894 -0.1368684
TTAATCCGAGGTAATAGATT 2 124958 124977 124955 124959 -0.0926067
GATAATATCACCGTAGTTTC 13 99571 99590 99568 99579 -0.1239932
ATGCGTACAAGGAAGCGGGT 2 186645 186664 186642 186648 -0.0357824
TAATAATTCTCAATAGATAC 15 181592 181611 181614 181607 -0.1913804
GAACATCGATAAGTCGAATG 13 275896 275915 275918 275910 -0.09468
TATTTCTCTGTTAGATGCTC 15 633677 633696 633699 633695 -0.1050648
AGTTCCACTTTTCAGGCAGG 5 196820 196839 196842 196832 -0.0352892
GTTACCCTAATCTATTACCT 2 124950 124969 124972 124966 -0.2212353
TCGTAGGTTCGAATATCTCG 4 1398891 1398910 1398913 1398909 -0.0661702
TTCACATTCGAGGGGTTGCC 4 238809 238828 238831 238826 -0.0751744
CCAACTGCCTCAAAAAGGTA 9 261058 261077 261055 261061 -0.1216244
ACAGCAAAAAATAAGACTGC 15 802393 802412 802390 802400 -0.1444372
CCACTACATTTAGGTAGACG 13 901618 901637 901640 901630 -0.1003644
Table S4.3: Table of gRNAs that are likely to be affected by SNPs near their binding
location. The gRNAs shown all have significant background-dependent effects, a mapped QTL
within 10 kb of the gRNA binding location, and a known BY/3S polymorphism at the gRNA
binding location. More conservative thresholds were used when excluding gRNAs from analysis
(see Methods section “Linkage Mapping”).
171
gRNA Sequence ORF Chr
gRNA
Position
Peak
Start
Peak
End
Peak
Marker -log10(pval) condition
ATTAAAGTCATTTAGAGCAA YBR237W 2 691848 678574 705106 681557 18.46 BOTH
AAAAGCACACCGTATAGAAA YJR040W 10 507680 497086 545773 526954 14.05 BOTH
AAGGTTCCTGGTATCAAAGT YGR138C 7 767345 748014 779852 766068 13.75 BOTH
TGAGTTCTAGCGGAAACCGC YML098W 13 77154 27643 73993 32621 6.38 BOTH
TTTTGGCAAGCACGAATTGA YJR111C 10 636901 642106 677358 660402 5.2 BOTH
GATACCAACCGATCCTTGCT YGL167C 7 190876 154206 226152 175590 5.22 BOTH
TTCGCGTGGCATAACCCGGC YML085C 13 99549 57144 104510 93860 6.95 BOTH
ATTTTGCTGGAAGGGCAATA YLR347C 12 826667 785227 846812 807311 6.66 BOTH
AAGCTAATTAAGTGTAAACT YKL028W 11 385663 367086 420060 394572 5.35 BOTH
ACTATCATAGGTATAAATAC YKL137W 11 185849 164020 193550 184014 8.21 BOTH
ATGAGATTGAAACAATAACA YJR112W 10 636945 641813 683971 655475 5.94 ATC2
TACTGGCCGCCGGCATGCGA YJR002W 10 438744 415069 456781 434839 19.57 BOTH
AGAAGTATGGATGGTTCGAA YGR186W 7 867710 846529 905027 869612 9.62 BOTH
ACCAAGACGCCACTACATTT YMR314W 13 901609 899812 904006 900561 15.2 BOTH
TTTTATGAAGACTGGAGAAC YJR040W 10 507664 496029 560037 545773 4.93 BOTH
GGCGGAAGGAGAAAGTTTCA YDL139C 4 212162 176628 238826 222741 4.71 ATC1
TGTTAGTATTTAAGCTTATT YMR287C 13 845438 824397 894039 855635 5.56 ATC1
TAAAAGCACACCGTATAGAA YJR040W 10 507679 496029 544322 514236 10.77 ATC1
TATGAAGACTGGAGAACAGG YJR040W 10 507661 496029 571066 531791 6.36 ATC1
GTTAAGCATTGTAAAATACC YPL173W 16 222985 208747 244227 222598 4.5 ATC1
AAAGGTAAGAGATAATACTA YOR159C 15 633624 571939 652034 585188 5.96 ATC1
Table S4.4: Table of gRNAs with a mapped QTL near their binding location and without an
overlapping polymorphism. The gRNAs shown all have background-dependent effects and a
mapped QTL within 10 kb of the gRNA binding site. However, they lack any known
polymorphism overlapping the gRNA binding site. All positions (gRNA position, interval start,
interval end, and peak marker) are given in genome position. The width of the 2x-log10
(pval)
confidence interval is shown by the “Interval Start” and “Interval End” columns. The “Condition”
column describes which replicates a mapped QTL was detected in.
172
Hub Chromosome Number of gRNAs
Hub Marker
Position Peak Width (bp)
VII 38 979647 3371
X 25 646356 4918
XIII 76 43096 20469
XIV 18 470844 8643
Table S4.5: Positional information about hubs. For each hub, the peak was defined as the
marker of maximum significance across all gRNA interactions shown by a hub. The peak width
was defined as the region where the highest number of confidence intervals across all gRNA
interactions shown by a hub overlapped.
173
Chapter 5: Conclusion
In this chapter, I will explain the impact of my work, followed by possible future directions and
studies.
5.1 Impact of my work
The work presented here provides a broad, genome-scale view of background effects, both in
terms of their effects on phenotype and their underlying genetic mechanisms. This was enabled
by the combinatorial barcoding scheme used in Chapters 3 and 4, which allowed me to study
background effects at a scale that was not previously feasible. At the time of writing, Chapters 3
and 4 are among the largest-scale S. cerevisiae studies of background effects in the context of
genetic perturbations and diploid strains, respectively. In these studies, I address the following
questions:
1) What patterns or trends of background effects are apparent when taking a genome-scale
view?
In Chapter 3, I found that genetic interactions among standing variants segregating in a cross
tend to cluster around specific hub loci. These hubs were involved in more two- and three-way
interactions than expected by chance, and the majority of detected interactions involved at least
one hub. Additionally, hub loci were more likely to modify additivity in their interactions, with
non-hub loci being more likely to modify dominance. Roughly 4.5 hubs were detected per
environment in this study, with the same locus often identified as a hub in multiple different
environments.
Similar results were found in Chapter 4, where I found that four specific loci interact with
significantly more genetic perturbations than expected by chance. These hubs were distinct
from those identified in Chapter 3, and the majority of loci detected in this study were associated
with a hub. Each hub had a broadly consistent effect on the perturbations it interacted with, but
174
not all hubs had the same effect. For example, two of the hubs tended to reduce variance
among segregants when perturbations were induced, while the other two hubs tended to
increase variance. Additionally, hubs that shared effects tended to be more similar to each other.
Two of the four hubs had effects on baseline segregant fitness, with the 3S allele being
beneficial; these were the same two hubs that reduced variance in the presence of
perturbations.
Overall, these results suggest that hubs of genetic interaction are common, both across
environments and across different types of genetic interactions. The existence of these hubs in
both studies implies that the genetic basis of background effects is not entirely idiosyncratic, and
many interactions could even share similar genetic architectures or underlying molecular
mechanisms. The similarities between hubs observed in both studies further suggests that
some features or qualities of genes may predispose them to a large number of genetic
interactions, although further studies are needed to investigate this hypothesis.
2) What types or categories of loci influence background effects?
In Chapter 3, I found a positive relationship between the fitness effect size of a locus and the
number of genetic interactions it is involved in. Loci that had a higher impact on fitness tended
to be involved in more genetic interactions, and vice versa. However, this was not true in all
cases. In this study, the most common major effect modifier of dominance was the mating type
locus on Chromosome III, which does not have a large impact on fitness itself.
In Chapter 4, I find that baseline segregant fitness has a significant relationship with the
segregant’s response to genetic perturbation. Segregants with high baseline fitness were more
likely to be heavily impacted and experience a larger drop in fitness. This finding is consistent
with another study that introduced genetic perturbations into yeast strains via transposon
mutagenesis (Johnson et al. 2019). Additionally, of the four hub loci identified in this study that
interact with a large number of perturbations, two of them overlap loci that influence baseline
175
fitness. However, the other two hub loci have no detectable effect on fitness, and baseline
segregant fitness explains a relatively small fraction of the variance in response to genetic
perturbation.
Together, these results suggest that loci with very strong effects on baseline fitness are more
likely to be involved in genetic interactions. However, this also appears to be only one of several
possibilities. Half of the hubs detected in Chapter 4 have no effect on baseline fitness, as does
the dominance modifier at the mating type locus found in Chapter 3. Overall, these results imply
that loci affecting baseline fitness are one of multiple categories of loci that are likely to influence
background effects.
3) To what extent does genetic background impact the response to genetic perturbation?
In Chapter 4, I find that genetic background contributes significantly to changes in fitness in
response to genetic perturbation. These background effects are also relatively common,
affecting over 50% of all genes targeted by efficacious perturbations, which is consistent with
other estimates (I. Goldstein and Ehrenreich 2021b). Additionally, the magnitude of these
background effects is often smaller than the effects of the perturbations themselves. Across the
169 mapped loci that were directly investigated, there were no cases where a background effect
fully masked the impact of a perturbation. This indicates that purely qualitative background
effects, such as incomplete penetrance, are rare under the conditions of this experiment.
However, this finding is in contrast to Chapter 2, where the spontaneous mutation mrp20-A105E
exhibits a very high degree of variable expressivity. For some genetic backgrounds, this
mutation is essentially lethal for growth on ethanol, while other backgrounds exhibit near
wild-type levels of growth. While this hypothesis requires further study, this implies that
environment may have a significant impact on the degree to which background effects alter the
response to new genetic perturbations, as Chapter 4 was conducted in fermentative conditions
and Chapter 2 was conducted in respiratory conditions.
176
5.2 Future Directions
5.2.1 Chapters 2 and 4
Chapters 2 and 4 both investigated how background effects shape the response to genetic
perturbation, with Chapter 2 studying a single effect in detail and Chapter 4 taking a broad view
of many such effects. While the scale of Chapter 4 provided valuable insights, the limited
mapping resolution in this study meant that the genetic basis of the background effects could
not be dissected in the same level of depth as Chapter 2. Chapter 4 was also limited by a
technical consideration, since it used CRISPRi to perturb target genes. The optimal gene
targets and gRNA sequences for CRISPRi for this experiment were not yet known when the
study was performed, leading to the inclusion of a large number of replicates and effectively
neutral gRNAs in the data set. While this was necessary in the original study, reducing the
number of gRNAs to include primarily those with strong effects would allow the number of
segregants to be significantly increased, which would lead to much greater mapping resolution.
This could allow the detection of loci at a resolution that allows for identification of causal genes,
which would likely improve our understanding of the mechanisms underlying the background
effects described in Chapter 4.
5.2.2 Chapters 3 and 4
To my knowledge, the two studies described in Chapters 3 and 4 are among the largest-scale
studies of background effects performed to date. However, a significant question arising from
these studies and related work (Johnson et al. 2019) is how the effects of a locus on baseline
fitness relate to the genetic interactions it is involved in. There is significant evidence from
Chapters 2, 3, and 4 that additional environments could provide clarity to this question. For
example, Chapter 4 was performed entirely in the glucose environment, and never identified any
background effect of the same magnitude as the one in Chapter 2, which concerned growth on
177
ethanol. Additionally, Chapter 3 found that hubs of genetic interaction among standing variants
are occasionally found across multiple distinct environments. It is not known if this is true for the
hubs found in Chapter 4, since these are the result of interactions between loci and new genetic
perturbations. An additional study on this question is currently in progress, which I am also
contributing to. This study uses the same library of segregants and perturbations described in
Chapter 4, but uses respiratory growth conditions rather than fermentative.
178
References
Ang, R.M.L., S.-A.A. Chen, A.F. Kern, Y. Xie, and H.B. Fraser. 2023. “Widespread Epistasis
among Beneficial Genetic Variants Revealed by High-Throughput Genome Editing.”
Cell Genomics 3: 100260.
Ba, A.N.N. 2022. “Barcoded Bulk QTL Mapping Reveals Highly Polygenic and Epistatic
Architecture of Complex Traits in Yeast.” eLife, no.
ttps://elifesciences.org/articles/73983. https://doi.org/10.7554/eLife.73983.
Bachmann, M.maxbachmann/RapidFuzz. 2023. “RapidFuzz.”
Bak, G. 2010. “On-off Controllable RNA Hybrid Expression Vector for Yeast Three-Hybrid
System.” BMB Rep 43: 110–14.
Benaglia, T., D. Cahauveau, D.R. Hunter, and D.S. Young. 2009. “Mixtools: An R Package
for Analyzing Mixture Models.” J Stat Softw 32: 1–29.
Billiard, S., and V. Castric. 2011. “Evidence for Fisher’s Dominance Theory: How Many
‘Special Cases’?” Trends Genet 27: 441–45.
Bland, J.M., and D.G. Altman. 1995. “Multiple Significance Tests: The Bonferroni Method.”
BMJ 310 (6973): 10 1136 310 6973 170.
Bloom, J.S. 2015. “Genetic Interactions Contribute Less than Additive Effects to
Quantitative Trait Variation in Yeast.” Nat. Commun 6: 8712.
Bloom, J.S., I.M. Ehrenreich, W.T. Loo, T.-L.V. Lite, and L. Kruglyak. 2013. “Finding the
Sources of Missing Heritability in a Yeast Cross.” Nature 494: 234–37.
Boone, C., H. Bussey, and B.J. Andrews. 2007. “Exploring Genetic Interactions and
Networks with Yeast.” Nat Rev Genet 8 (6): 437–49. https://doi.org/10.1038/nrg2085.
Boyle, E.A., Y.I. Li, and J.K. Pritchard. 2017. “An Expanded View of Complex Traits: From
Polygenic to Omnigenic.” Cell 169: 1177–86.
Brem, R.B., J.D. Storey, J. Whittle, and L. Kruglyak. 2005. “Genetic Interactions between
Polymorphisms That Affect Gene Expression in Yeast.” Nature 436 (7051): 701–3.
https://doi.org/10.1038/nature03865.
Brem, R.B., G. Yvert, R. Clinton, and L. Kruglyak. 2002. “Genetic Dissection of
Transcriptional Regulation in Budding Yeast.” Science 296: 752–55.
Campbell, R.F., P.T. McGrath, and A.B. Paaby. 2018. “Analysis of Epistasis in Natural Traits
Using Model Organisms.” Trends Genet 34: 883–98.
Carlborg, O., and C.S. Haley. 2004. “Epistasis: too often neglected in complex trait
studies?” Nat Rev Genet 5 (8): 618–25. https://doi.org/10.1038/nrg1407.
179
Caudal, E. 2022. “Loss-of-Function Mutation Survey Revealed That Genes with
Background-Dependent Fitness Are Rare and Functionally Related in Yeast.” Proc.
Natl. Acad. Sci 119: 2204206119.
Chandler, C.H., S. Chari, and I. Dworkin. 2013. “Does Your Gene Need a Background
Check? How Genetic Background Impacts the Analysis of Mutations, Genes, and
Evolution.” Trends Genet 29 (6): 358–66. https://doi.org/10.1016/j.tig.2013.01.009.
Chandler, C.H., S. Chari, A. Kowalski, L. Choi, D. Tack, M. DeNieu, W. Pitchers, A.
Sonnenschein, L. Marvin, and K. Hummel. 2017. “How Well Do You Know Your
Mutation? Complex Effects of Genetic Background on Expressivity, Complementation,
and Ordering of Allelic Effects.” PLoS Genet 13 (11).
https://doi.org/10.1371/journal.pgen.1007075.
Chandler, C.H., S. Chari, D. Tack, and I. Dworkin. 2014. “Causes and Consequences of
Genetic Background Effects Illuminated by Integrative Genomic Analysis.” Genetics
196 (4): 1321–36. https://doi.org/10.1534/genetics.113.159426.
Chari, S., and I. Dworkin. 2013. “The Conditional Nature of Genetic Interactions: The
Consequences of Wild-Type Backgrounds on Mutational Interactions in a
Genome-Wide Modifier Screen.” PLoS Genet 9 (8): 10 1371 1003661.
Chen, R., L. Shi, J. Hakenberg, B. Naughton, P. Sklar, J. Zhang, H. Zhou, L. Tian, O.
Prakash, and M. Lemire. 2016. “Analysis of 589,306 Genomes Identifies Individuals
Resilient to Severe Mendelian Childhood Diseases.” Nat Biotechnol 34 (5): 531–38.
https://doi.org/10.1038/nbt.3514.
Cherry, J.M., C. Ball, S. Weng, G. Juvik, R. Schmidt, C. Adler, B. Dunn, S. Dwight, L. Riles,
and R.K. Mortimer. 1997. “Genetic and Physical Maps of Saccharomyces Cerevisiae.”
Nature 387 (6632 Suppl.): 67–73.
Cherry, J.M., E.L. Hong, C. Amundsen, R. Balakrishnan, G. Binkley, E.T. Chan, K.R.
Christie, M.C. Costanzo, S.S. Dwight, and SR Engel. 2012. “Saccharomyces Genome
Database: The Genomics Resource of Budding Yeast.” Nucleic Acids Res
40:D700–D705. https://doi.org/10.1093/nar/gkr1029.
Cheverud, J.M., and E.J. Routman. 1995. “Epistasis and Its Contribution to Genetic
Variance Components.” Genetics 139: 1455–61.
Cheverud, J.M., and E.J. Routman. 1996. “Epistasis as a source of increased additive
genetic variance at population bottlenecks.” Evolution 50: 1042–51.
Churchill, G.A., and R.W. Doerge. 1994. “Empirical Threshold Values for Quantitative Trait
Mapping.” Genetics 138 (3): 963–71.
Churchill, G.A., D.M. Gatti, S.C. Munger, and K.L. Svenson. 2012. “The Diversity Outbred
Mouse Population.” Mamm. Genome 23: 713–18.
Contamine, V., and M. Picard. 2000. “Maintenance and Integrity of the Mitochondrial
Genome: A Plethora of Nuclear Genes in the Budding Yeast.” Microbiol Mol Biol Rev
64 (2): 281–315.
180
Cooper, D.N., M. Krawczak, C. Polychronakos, C. Tyler-Smith, and H. Kehrer-Sawatzki.
2013. “Where Genotype Is Not Predictive of Phenotype: Towards an Understanding of
the Molecular Basis of Reduced Penetrance in Human Inherited Disease.” Hum Genet
132 (10): 1077–1130. https://doi.org/10.1007/s00439-013-1331-2.
Costanzo, M., B. VanderSluis, E.N. Koch, A. Baryshnikova, C. Pons, G. Tan, W. Wang, M.
Usaj, J. Hanchard, and S.D. Lee. 2016. “A Global Genetic Interaction Network Maps a
Wiring Diagram of Cellular Function.” Science 353 (6306).
https://doi.org/10.1126/science.aaf1420.
Covarrubias-Pazaran, G. 2016. “Genome-Assisted Prediction of Quantitative Traits Using
the R Package Sommer.” PLoS ONE 11: 0156744.
Crow, James F. 2010. “On Epistasis: Why It Is Unimportant in Polygenic Directional
Selection.” Philosophical Transactions of the Royal Society of London. Series B,
Biological Sciences 365 (1544): 1241–44. https://doi.org/10.1098/rstb.2009.0275.
DiCarlo, J.E. 2013. “Genome Engineering in Saccharomyces Cerevisiae Using
CRISPR-Cas Systems.” Nucleic Acids Res 41: 4336–43.
Dimitrov, L.N., R.B. Brem, L. Kruglyak, and D.E. Gottschling. 2009. “Polymorphisms in
Multiple Genes Contribute to the Spontaneous Mitochondrial Genome Instability of
Saccharomyces Cerevisiae S288C Strains.” Genetics 183 (1): 365–83.
https://doi.org/10.1534/genetics.109.104497.
Dowell, R.D., O. Ryan, A. Jansen, D. Cheung, S. Agarwala, T. Danford, D.A. Bernstein, P.A.
Rolfe, L.E. Heisler, and B. Chin. 2010. “Genotype to Phenotype: A Complex Problem.”
Science 328 (5977). https://doi.org/10.1126/science.1189015.
Dujon, B. 1981. “Mitochondrial Genetics and Functions.” Edited by J.N. Strathern, E.W.
Jones, and JR Broach. Cold Spring Harbor Laboratory.
Dworkin, I., A. Palsson, K. Birdsall, and G. Gibson. 2003. “Evidence that Egfr contributes to
cryptic genetic variation for photoreceptor determination in natural populations of
Drosophila melanogaster.” Curr Biol 13 (21): 1888–93.
https://doi.org/10.1016/j.cub.2003.10.001.
Ehrenreich, I.M. 2017. “Epistasis: Searching for Interacting Genetic Variants Using
Crosses.” Genetics 206: 531–35.
Endelman, J.B., and J.-L. Jannink. 2012. “Shrinkage Estimation of the Realized
Relationship Matrix.” https://doi.org/10.1534/g3.112.004259.
Ephrussi, B., H. Hottinguer, and Y. Chimenes. 1949. “Action de l’acriflavine sur les levures.
I. La mutation “petite colonie.” Ann Inst Pasteur 76: 351–67.
Ephrussi, B., and P.P. Slonimski. 1955. “Subcellular Units Involved in the Synthesis of
Respiratory Enzymes in Yeast.” Nature 176 (4495): 1207–8.
https://doi.org/10.1038/1761207b0.
181
Falconer, D.S., and T.F.C. Mackay. 1996. “Introduction to Quantitative Genetics.” Longmans
Green, Harlow, Essex, UK.
Farzadfard, F., S.D. Perli, and T.K. Lu. 2013. “Tunable and Multifunctional Eukaryotic
Transcription Factors Based on CRISPR/Cas.” ACS Synth. Biol 2: 604–13.
Fearon, K., and T.L. Mason. 1992. “Structure and Function of MRP20 and MRP49, the
Nuclear Genes for Two Proteins of the 54 S Subunit of the Yeast Mitochondrial
Ribosome.” J Biol Chem 267 (8): 5162–70.
Forsberg, S.K.G., J.S. Bloom, M.J. Sadhu, L. Kruglyak, and Ö. Carlborg. 2017. “Accounting
for Genetic Interactions Improves Modeling of Individual Quantitative Trait Phenotypes
in Yeast.” Nat. Genet 49: 497–503.
Fox, J., and S. Weisberg. 2018. “An R Companion to Applied Regression.” Los Angeles
(CA: Sage Publishing.
Frangoul, H., D. Altshuler, M.D. Cappellini, Y.-S. Chen, J. Domm, B.K. Eustace, J. Foell, J.
Fuente, S. Grupp, and R. Handgretinger. 2021. “CRISPR-Cas9 gene editing for sickle
cell disease and β-Thalassemia.” N Engl J Med 384 (3): 252–60.
https://doi.org/10.1056/NEJMoa2031054.
Galardini, M., B.P. Busby, C. Vieitez, A.S. Dunham, A. Typas, and P. Beltrao. 2019. “The
Impact of the Genetic Background on Gene Deletion Phenotypes in Saccharomyces
Cerevisiae.” Mol Syst Biol 15 (12): 10 15252 20198831.
Geiler-Samerotte, K.A., Y.O. Zhu, B.E. Goulet, D.W. Hall, and M.L. Siegal. 2016. “Selection
Transforms the Landscape of Genetic Variation Interacting with Hsp90.” PLoS Biol 14
(10). https://doi.org/10.1371/journal.pbio.2000465.
Gibson, D.G., L. Young, R.-Y. Chuang, J.C. Venter, C.A. Hutchison, and H.O. Smith. 2009.
“Enzymatic assembly of DNA molecules up to several hundred kilobases.” Nat
Methods 6 (5): 343–45. https://doi.org/10.1038/nmeth.1318.
Gibson, G., and I. Dworkin. 2004. “Uncovering cryptic genetic variation.” Nat Rev Genet 5
(9): 681–90. https://doi.org/10.1038/nrg1426.
Gibson, G., and D.S. Hogness. 1996. “Effect of Polymorphism in the Drosophila Regulatory
Gene Ultrabithorax on Homeotic Stability.” Science 271 (5246): 200–203.
https://doi.org/10.1126/science.271.5246.200.
Gietz, R.D., and R.H. Schiestl. 2007. “High-Efficiency Yeast Transformation Using the
LiAc/SS Carrier DNA/PEG Method.” Nat. Protoc 2: 31–34.
Gietz, R.D., and R.A. Woods. 2002. “Transformation of Yeast by Lithium
Acetate/Single-Stranded Carrier DNA/Polyethylene Glycol Method.” Methods Enzym
350: 87–96. https://doi.org/10.1016/s0076-6879(02)50957-5.
Goldstein, A.L., and J.H. McCusker. 1999. “Three New Dominant Drug Resistance
Cassettes for Gene Disruption in Saccharomyces Cerevisiae.” Yeast 15: 1541–53.
182
Goldstein, I., and I.M. Ehrenreich. 2021. “The Complex Role of Genetic Background in
Shaping the Effects of Spontaneous and Induced Mutations.” Yeast 38 (3): 187–96.
https://doi.org/10.1002/yea.3530.
Griffiths, A.J.F., Peichel Doebley SR, Wasserman JC, and D.A. 2020. “Introduction to
Genetic Analysis.” New York (NY.
Haber, J.E. 2012. “Mating-Type Genes and MAT Switching in Saccharomyces Cerevisiae.”
Genetics 191: 33–64.
Hallin, J. 2016. “Powerful Decomposition of Complex Traits in a Diploid Model.” Nat.
Commun 7: 13311.
Henderson, C.R. 1975. “Best linear unbiased estimation and prediction under a selection
model.” Biometrics 31 (2): 423–47. https://doi.org/10.2307/2529430.
Herskowitz, I., and R. E. Jensen. 1991a. “Methods in Enzymology.” In Methods in
Enzymology, 194:132–46. Elsevier.
Herskowitz, I., and R.E. Jensen. 1991b. “Putting the HO Gene to Work: Practical Uses for
Mating-Type Switching.” In Methods in Enzymology, edited by C. Guthrie and G.R.
Fink, 132–46. Cambridge, MA: Academic Press.
Himmelmann, L. 2015. “Package ‘HMM.’” https://cran.r-project.org/web/packages/.
Hou, J., G. Tan, G.R. Fink, B.J. Andrews, and C. Boone. 2019. “Complex Modifier
Landscape Underlying Genetic Background Effects.” Proc Natl Acad Sci USA 116 (11):
5045–54. https://doi.org/10.1073/pnas.1820915116.
Huang, W. 2012. “Epistasis dominates the genetic architecture of Drosophila quantitative
traits.” Proc. Natl Acad. Sci 109: 15553–59.
Huang, W., and T.F.C. Mackay. 2016. “The Genetic Architecture of Quantitative Traits
Cannot Be Inferred from Variance Component Analysis.” PLoS Genet 12: 1006421.
Jarosz, D.F., and S. Lindquist. 2010. “Hsp90 and Environmental Stress Transform the
Adaptive Value of Natural Genetic Variation.” Science 330 (6012): 1820–24.
https://doi.org/10.1126/science.1195487.
Jarosz, D.F., M. Taipale, and S. Lindquist. 2010. “Protein Homeostasis and the Phenotypic
Manifestation of Genetic Diversity: Principles and Mechanisms.” Annu Rev Genet 44:
189–216. https://doi.org/10.1146/annurev.genet.40.110405.090412.
Johnson, M.S., A. Martsul, S. Kryazhimskiy, and M.M. Desai. 2019. “Higher-Fitness Yeast
Genotypes Are Less Robust to Deleterious Mutations.” Science 366 (6464): 490–93.
https://doi.org/10.1126/science.aay4199.
Julius, D., L. Blair, A. Brake, G. Sprague, and J. Thorner. 1983. “Yeast Ar Factor Is
Processed from a Larger Precursor Polypeptide: The Essential Role of a
Membrane-Bound Dipeptidyl Aminopeptidase.” Cell 32: 839–52.
183
Kang, H.M. 2008. “Efficient Control of Population Structure in Model Organism Association
Mapping.” Genetics 178: 1709–23.
Kang, H.M. 2010. “Variance Component Model to Account for Sample Structure in
Genome-Wide Association Studies.” Nat. Genet 42: 348–54.
Kannan, K., B. Tsvetanova, R.-Y. Chuang, V.N. Noskov, N. Assad-Garcia, L. Ma, C.A.
Hutchison Iii, H.O. Smith, J.I. Glass, and C. Merryman. 2016. “One Step Engineering of
the Small-Subunit Ribosomal RNA Using CRISPR/Cas9.” Sci Rep 6 (30714).
https://doi.org/10.1038/srep30714.
Kobayashi, O., H. Suda, T. Ohtani, and H. Sone. 1996a. “Molecular Cloning and Analysis of
the Dominant Flocculation Gene FLO8 from Saccharomyces Cerevisiae.” Mol. Gen.
Genet. MGG 251: 707–15.
Kobayashi, O., H. Suda, T. Ohtani, and H. Sone. 1996. “Molecular Cloning and Analysis of
the Dominant Flocculation geneFLO8 fromSaccharomyces Cerevisiae.” Mol. Gen.
Genet. MGG 251: 707–15.
Koc, E.C., W. Burkhart, K. Blackburn, M.B. Moyer, D.M. Schlatzer, A. Moseley, and L.L.
Spremulli. 2001. “The Large Subunit of the Mammalian Mitochondrial Ribosome.
Analysis of the Complement of Ribosomal Proteins Present.” J Biol Chem 276 (47):
43958–69. https://doi.org/10.1074/jbc.M106510200.
Kryazhimskiy, S., D.P. Rice, E.R. Jerison, and M.M. Desai. 2014. “Global Epistasis Makes
Adaptation Predictable despite Sequence-Level Stochasticity.” Science 344 (6191):
1519–22. https://doi.org/10.1126/science.1250939.
Kucejova, B., L. Li, X. Wang, S. Giannattasio, and X.J. Chen. 2008. “Pleiotropic Effects of
the Yeast Sal1 and Aac2 Carriers on Mitochondrial Function via an Activity Distinct
from Adenine Nucleotide Transport.” Mol Genet Genomics 280 (1): 25–39.
https://doi.org/10.1007/s00438-008-0342-5.
Lander, E.S., and D. Botstein. 1989. “Mapping Mendelian Factors Underlying Quantitative
Traits Using RFLP Linkage Maps.” Genetics 121 (1): 185–99.
Lang, G.I., D. Botstein, and M.M. Desai. 2011. “Genetic Variation and the Fate of Beneficial
Mutations in Asexual Populations.” Genetics 188 (3): 647–61.
https://doi.org/10.1534/genetics.111.128942.
Laughery, M.F., T. Hunter, A. Brown, J. Hoopes, T. Ostbye, T. Shumaker, and J.J. Wyrick.
2015. “New Vectors for Simple and Streamlined CRISPR-Cas9 Genome Editing in
Saccharomyces Cerevisiae.” Yeast 32 (12): 711–20. https://doi.org/10.1002/yea.3098.
Lee, G., and I. Saito. 1998. “Role of Nucleotide Sequences of loxP Spacer Region in CreMediated Recombination.” Gene 216: 55–65.
Lee, J.T., A.L.V. Coradini, A. Shen, and I.M. Ehrenreich. 2019. “Layers of Cryptic Genetic
Variation Underlie a Yeast Complex Trait.” Genetics 211 (4): 1469–82.
https://doi.org/10.1534/genetics.119.301907.
184
Lee, J.T., M.B. Taylor, A. Shen, and I.M. Ehrenreich. 2016. “Multi-locus genotypes
underlying temperature sensitivity in a mutationally induced trait.” PLoS Genet 12 (3):
10 1371 1005929.
Levy, S.F. 2015. “Quantitative evolutionary dynamics using high-resolution lineage
tracking.” Nature 519: 181–86.
Li, F., M.L. Salit, and S.F. Levy. 2018. “Unbiased Fitness Estimation of Pooled Barcode or
Amplicon Sequencing Studies.” Cell Syst 7: 521-525 4.
Li, H. 2009a. “1000 Genome Project Data Processing Subgroup, The Sequence
Alignment/Map Format and SAMtools.” Bioinformatics 25: 2078–79.
Li, H. 2009. “The Sequence Alignment/Map Format and SAMtools.” Bioinformatics 25:
2078–79.
Li, H., and R. Durbin. 2009. “Fast and Accurate Short Read Alignment with BurrowsWheeler Transform.” Bioinformatics 25: 1754–60.
Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis,
and R. Durbin. 1000. “Genome Project Data Processing Subgroup. The Sequence
Alignment/Map Format and SAMtools.” Bioinformatics 2009;25(16):2078–2079.
https://doi.org/10.1093/bioinformatics/btp352.
Lipinski, K.A., A. Kaniak-Golik, and P. Golik. 2010. “Maintenance and Expression of the S.
Cerevisiae Mitochondrial Genome—from Genetics to Evolution and Systems Biology.”
Biochim Biophys Acta 1797 (6–7): 1086–98.
https://doi.org/10.1016/j.bbabio.2009.12.019.
Lippert, C. 2011. “FaST Linear Mixed Models for Genome-Wide Association Studies.” Nat.
Methods 8: 833–35.
Liu, X. 2019. “iSeq 2.0: A Modular and Interchangeable Toolkit for Interaction Screening in
Yeast.” Cell Syst 8: 338-344 8.
Lo, W.S., and A.M. Dranginis. 1998. “The Cell Surface Flocculin Flo11 Is Required for
Pseudohyphae Formation and Invasion by Saccharomyces Cerevisiae.” Mol. Biol. Cell
9: 161–71.
Lynch, M., and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. Sinauer
Associates.
Mackay, T.F.C. 2014. “Epistasis and Quantitative Traits: Using Model Organisms to Study
Gene–Gene Interactions.” Nat. Rev. Genet 15: 22–33.
Mackay, T.F.C., E.A. Stone, and J.F. Ayroles. 2009. “The Genetics of Quantitative Traits:
Challenges and Prospects.” Nat. Rev. Genet 10: 565–77.
Magwene, P.M. 2011. “Outcrossing, Mitotic Recombination, and Life-History Trade-Offs
Shape Genome Evolution in Saccharomyces Cerevisiae.” Proc. Natl Acad. Sci. USA
108: 1987–92.
185
Mancera, E., R. Bourgon, A. Brozzi, W. Huber, and L.M. Steinmetz. 2008. “High-Resolution
Mapping of Meiotic Crossovers and Non-Crossovers in Yeast.” Nature 454 (7203):
479–85. https://doi.org/10.1038/nature07135.
Märtens, K., J. Hallin, J. Warringer, G. Liti, and L. Parts. 2016. “Predicting Quantitative
Traits from Genome and Phenome with near Perfect Accuracy.” Nat. Commun 7:
11512.
Matsui, T. 2022. “The Interplay of Additivity, Dominance, and Epistasis on Fitness in a
Diploid Yeast Cross.” Nat. Commun 13: 1463.
Matsui, T., and I.M. Ehrenreich. 2016. “Gene-Environment Interactions in Stress Response
Contribute Additively to a Genotype-Environment Interaction.” PLoS Genet 12 (7).
Mullis, M.N. 2022. “Complex Genetics Cause and Constrain Fungal Persistence in Different
Parts of the Mammalian Body.” Genetics 222: 138.
Mullis, M.N., T. Matsui, R. Schell, R. Foree, and I.M. Ehrenreich. 2018. “The Complex
Underpinnings of Genetic Background Effects.” Nat. Commun 9: 3548.
Nadeau, J.H. 2001. “Modifier genes in mice and humans.” Nat Rev Genet 2 (3): 165–74.
https://doi.org/10.1038/35056009.
Narasimhan, V.M., K.A. Hunt, D. Mason, C.L. Baker, K.J. Karczewski, M.R. Barnes, A.H.
Barnett, C. Bates, S. Bellary, and N.A. Bockett. 2016. “Health and Population Effects of
Rare Gene Knockouts in Adult Humans with Related Parents.” Science 352 (6284):
474–77. https://doi.org/10.1126/science.aac8624.
Nasti, R.A., and D.F. Voytas. n.d. “Attaining the Promise of Plant Gene Editing at Scale.”
Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.2004846117.
Omelina, E.S., A.V. Ivankin, A.E. Letiagina, and A.V. Pindyurin. 2019. “Optimized PCR
Conditions Minimizing the Formation of Chimeric DNA Molecules from MPRA Plasmid
Libraries.” BMC Genomics 20: 536.
Paaby, A.B., and M.V. Rockman. 2014. “Cryptic genetic variation: evolution’s hidden
substrate.” Nat Rev Genet 15 (4): 247–58. https://doi.org/10.1038/nrg3688.
Paaby, A.B., A.G. White, D.D. Riccardi, K.C. Gunsalus, F. Piano, and M.V. Rockman. 2015.
“Wild Worm Embryogenesis Harbors Ubiquitous Polygenic Modifier Variation.” Elife
4:e09178. https://doi.org/10.7554/eLife.09178.
Parts, L., A. Batté, M. Lopes, M.W. Yuen, M. Laver, B.-J. San Luis, J.-X. Yue, C. Pons, E.
Eray, and P. Aloy. 2021. “Natural Variants Suppress Mutations in Hundreds of Essential
Genes.” Mol Syst Biol 17 (5). https://doi.org/10.15252/msb.202010138.
Peterson, R.A., and J.E. Cavanaugh. 2020. “Ordered Quantile Normalization: A
Semiparametric Transformation Built for the Cross-Validation Era.” J. Appl. Stat 47:
2312–27.
186
Phillips, P.C. 2008. “Epistasis—the Essential Role of Gene Interactions in the Structure and
Evolution of Genetic Systems.” Nat. Rev. Genet 9: 855–67.
Pinheiro, J.C., and D.B. Bates. 2000. Mixed-Effects Models in S and S-PLUS.
Springer-Verlag. https://doi.org/10.1007/b98882.
Pritchard, J.K., M. Stephens, N.A. Rosenberg, and P. Donnelly. 2000. “Association Mapping
in Structured Populations.” Am. J. Hum. Genet 67: 170–81.
Purcell, S. 2007. “PLINK: A Tool Set for Whole-Genome Association and Population-Based
Linkage Analyses.” Am. J. Hum. Genet 81: 559–75.
Qi, L.S. 2013. “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific
Control of Gene Expression.” Cell 152: 1173–83.
Queitsch, C., T.A. Sangster, and S. Lindquist. 2002. “Hsp90 as a Capacitor of Phenotypic
Variation.” Nature 417 (7): 618–24.
Rabiner, L.R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition.” Proc IEEE 77 (2): 257–86.
Raj, A., S.A. Rifkin, E. Andersen, and A. Oudenaarden. 2010. “Variability in Gene
Expression Underlies Incomplete Penetrance.” Nature 463 (7283): 913–18.
https://doi.org/10.1038/nature08781.
Riordan, J.D., and J.H. Nadeau. 2017. “From Peas to Disease: Modifier Genes, Network
Resilience, and the Genetics of Health.” Am J Hum Genet 101 (2): 177–91.
https://doi.org/10.1016/j.ajhg.2017.06.004.
Rowe, H.C., B.G. Hansen, B.A. Halkier, and D.J. Kliebenstein. 2008. “Biochemical
Networks and Epistasis Shape the Arabidopsis Thaliana Metabolome.” Plant Cell 20:
1199–1216.
Rutherford, S., and S. Lindquist. 1998. “HSP90 as a capacitor for morphological evolution.”
Nature 396 (6709): 336–42. https://doi.org/10.1038/24550.
Sauer, B. 1987. “Functional Expression of the Cre-Lox Site-Specific Recombination System
in the Yeast Saccharomyces Cerevisiae.” Mol. Cell Biol 7: 2087–96.
Schell, R. 2022. “Genetic basis of a spontaneous mutation’s expressivity.” Genetics 220,
iyac013.
Schell, R., M. Mullis, and I.M. Ehrenreich. 2016. “Modifiers of the Genotype–Phenotype
Map: Hsp90 and Beyond.” PLoS Biol 14 (11): 10 1371 2001015.
Schindler, D. 2020. “Genetic Engineering and Synthetic Genomics in Yeast to Understand
Life and Boost Biotechnology.” Bioengineering 7 (4): 10 3390 7040137.
Schlecht, U., Z. Liu, J.R. Blundell, R.P. Onge, and S.F. Levy. 2017. “A Scalable
Double-Barcode Sequencing Platform for Characterization of Dynamic Protein- Protein
Interactions.” Nat. Commun 8: 15586.
187
Shadel, G.S. 1999. “Yeast as a Model for Human mtDNA Replication.” Am J Hum Genet 65
(5): 1230–37. https://doi.org/10.1086/302630.
Shao, H. 2008. “Genetic Architecture of Complex Traits: Large Phenotypic Effects and
Pervasive Epistasis.” Proc. Natl Acad. Sci. USA 105: 19910–14.
Sherman, F. 2002. “Getting Started with Yeast.” Edited by C. Guthrie and G.R. Fink.
Cambridge, MA: Academic Press.
Siegal, M.L., and J.Y. Leu. 2014. “On the Nature and Evolutionary Impact of Phenotypic
Robustness Mechanisms.” Annu Rev Ecol Evol Syst 45: 496–517.
https://doi.org/10.1146/annurev-ecolsys-120213-091705.
Sikorski, R.S., and P. Hieter. 1989. “A System of Shuttle Vectors and Yeast Host Strains
Designed for Efficient Manipulation of DNA in Saccharomyces Cerevisiae.” Genetics
122 (1): 19–27.
Singh, A.P., R. Salvatori, W. Aftab, A. Aufschnaiter, A. Carlström, I. Forne, A. Imhof, and M.
Ott. 2020. “Molecular Connectivity of Mitochondrial Gene Expression and OXPHOS
Biogenesis.” Mol Cell 79 (6): 1051–65. https://doi.org/10.1016/j.molcel.2020.07.024.
Smith, J.D. 2016. “Quantitative CRISPR Interference Screens in Yeast Identify
Chemical-Genetic Interactions and New Rules for Guide RNA Design.” Genome Biol
17: 45.
Smith, J.D. 2017. “A Method for High-Throughput Production of Sequence-Verified DNA
Libraries and Strain Collections.” Mol. Syst. Biol 13: 913.
Steinmetz, L.M., H. Sinha, D.R. Richards, J.I. Spiegelman, P.J. Oefner, J.H. McCusker, and
R.W. Davis. 2002. “Dissecting the architecture of a quantitative trait locus in yeast.”
Nature 416 (6878): 326–30. https://doi.org/10.1038/416326a.
Storey, J.D., J.M. Akey, and L. Kruglyak. 2005. “Multiple Locus Linkage Analysis of
Genomewide Expression in Yeast.” PLoS Biol 3 (8).
https://doi.org/10.1371/journal.pbio.0030267.
Tarutani, Y. 2010. “Trans-Acting Small RNA Determines Dominance Relationships in
Brassica Self-Incompatibility.” Nature 466: 983–86.
Taylor, M.B., and I.M. Ehrenreich. 2014. “Genetic Interactions Involving Five or More Genes
Contribute to a Complex Trait in Yeast.” PLoS Genet 10: 1004324.
Taylor, M.B., and I.M. Ehrenreich. 2015. “Higher-Order Genetic Interactions and Their
Contribution to Complex Traits.” Trends Genet 31: 34–40.
Taylor, M.B., and I.M. Ehrenreich. n.d. “Transcriptional Derepression Uncovers Cryptic
Higher-Order Genetic Interactions.” PLoS Genet 1005606 (doi): 10 1371 1005606.
188
Taylor, M.B., J. Phan, J.T. Lee, M. McCadden, and I.M. Ehrenreich. 2016. “Diverse Genetic
Architectures Lead to the Same Cryptic Phenotype in a Yeast Cross.” Nat. Commun 7:
11669.
Team, The R.Core. 2013. “R: A Language and Environment for Statistical Computing.”
“The 1000 Genomes Project Consortium. A Map of Human Genome Variation from
Population-Scale Sequencing.” 2010. Nature 467: 1061–73.
Thompson, Register, JR, Curotto E, Kurtz J, Kelly M, and R. 1998. “An Improved Protocol
for the Preparation of Yeast Cells for Transformation by Electroporation.” Yeast 14 (6):
565–71. https://doi.org/10.1002/(SICI)1097-0061(19980430)14:6.
Tong, A.H., and C. Boone. 2006. “Synthetic Genetic Array Analysis in Saccharomyces
Cerevisiae.” Methods Mol Biol 313: 171–92.
https://doi.org/10.1385/1-59259-958-3:171.
Usaj, M. 2017. “TheCellMap.Org: A Web-Accessible Database for Visualizing and Mining
the Global Yeast Genetic Interaction Network.” G3 GenesGenomesGenetics 7:
1539–49.
Verwaal, R., N. Buiting-Wiessenhaan, S. Dalhuijsen, and J.A. Roubos. 2018. “CRISPR/
Cpf1 Enables Fast and Simple Genome Editing of Saccharomyces Cerevisiae:
CRISPR/Cpf1-Mediated Genome Editing of Saccharomyces Cerevisiae.” Yeast 35:
201–11.
Visser, J.A.G.M., T.F. Cooper, and S.F. Elena. 2011. “The causes of epistasis.” Proc. R.
Soc. B Biol. Sci 278: 3617–24.
Vu, V., A.J. Verster, M. Schertzberg, T. Chuluunbaatar, M. Spensley, D. Pajkic, G.T. Hart, J.
Moffat, and A.G. Fraser. 2015. “Natural Variation in Gene Expression Modulates the
Severity of Mutant Phenotypes.” Cell 162 (2): 391–402.
https://doi.org/10.1016/j.cell.2015.06.037.
Wach, A., A. Brachat, R. Pöhlmann, and P. Philippsen. 1994. “New Heterologous Modules
for Classical or PCR-Based Gene Disruptions in Saccharomyces Cerevisiae.” Yeast
10: 1793–1808.
Wagner, M.R., C. Tang, F. Salvato, K.M. Clouse, A. Bartlett, S. Vintila, L. Phillips, S.
Sermons, M. Hoffmann, and P.J. Balint-Kurti. 2021. “Microbe-Dependent Heterosis in
Maize.” Proc Natl Acad Sci USA 118:e2021965118.
https://doi.org/10.1073/pnas.2021965118.
Wei, W.-H., G. Hemani, and C.S. Haley. 2014. “Detecting epistasis in human complex
traits.” Nat. Rev. Genet 15: 722–33.
Weinreich, D.M., N.F. Delaney, M.A. DePristo, and D.L. Hartl. 2006. “Darwinian Evolution
Can Follow Only Very Few Mutational Paths to Fitter Proteins.” Science 312: 111–14.
189
Weinreich, D.M., R.A. Watson, and L. Chao. 2005. “Perspective: sign epistasis and genetic
constraint on evolutionary trajectories.” Evolution 59 (6): 1165–74.
https://doi.org/10.1554/04-272.
Widmer, C. 2015. “Further Improvements to Linear Mixed Models for Genome- Wide
Association Studies.” Sci. Rep 4: 6874.
Young, A.I., S. Benonisdottir, M. Przeworski, and A. Kong. 2019. “Deconstructing the
Sources of Genotype-Phenotype Associations in Humans.” Science 365: 1396–1400.
Yu, S.B. 1997. “Importance of Epistasis as the Genetic Basis of Heterosis in an Elite Rice
Hybrid.” Proc. Natl Acad. Sci. USA 94: 9226–31.
Zhao, K. 2005. “An Arabidopsis Example of Association Mapping in Structured Samples.”
PLoS Genet 3: 4.
Zhao, L., Z. Liu, S.F. Levy, and S. Wu. 2018. “Bartender: A Fast and Accurate Clustering
Algorithm to Count Barcode Reads.” Bioinformatics 34: 739–47.
Zhou, X., and M. Stephens. 2012. “Genome-Wide Efficient Mixed Model Analysis for
Association Studies.” Nat. Genet 44: 821–24.
190
Abstract (if available)
Abstract
A major goal of modern genetics is to fully understand how genetic variation influences phenotypic variation. A great deal of our knowledge on the genetic basis of phenotype comes from large-scale studies of many individuals. However, this does not always tell the entire story. The effects of genetic variants can be highly contextual, with some individuals exhibiting dramatically different responses to the same genetic variant. This phenomenon, known as a genetic background effect, means that simply understanding the average effect of a variant across an entire population may not translate to predictive power for individuals. Recent advances in DNA barcoding, sequencing, and CRISPR interference have made it possible to study these effects on an unprecedented scale, potentially enabling general, genome-scale insights into how genetic background effects play a role in determining phenotype.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genetic architectures of phenotypic capacitance
PDF
Exploring the genetic basis of quantitative traits
PDF
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Understanding the genetic architecture of complex traits
PDF
Complex mechanisms of cryptic genetic variation
PDF
The complex genetic and molecular basis of oxidative stress tolerance
PDF
Investigating the role of ploidy on genome stability in fission yeast
PDF
The function of Rpd3 in balancing the replicaton initiation of different genomic regions
PDF
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
PDF
Genomic and physiological characterization of Escherichia coli evolving in long-term batch culture
PDF
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
PDF
Understanding the genetics, evolutionary history, and biomechanics of the mammalian penis bone
PDF
Robustness and stochasticity in Drosophila development
PDF
Ultra rapid identity-by-descent mapping in massive genetic datasets
PDF
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
PDF
Distinct mechanisms of DDK recruitment to Fkh-activated and CEN-proximal origins control replication timing program in S. cerevisiae
PDF
Ecologically responsible domestication of kelp facilitated by genomic tools
PDF
Genetic diversity and bacterial death in the context of adaptive evolution
PDF
Genetic architecture underlying variation in different traits in the Pacific oyster Crassostrea gigas
PDF
Phenotypic plasticity and its ecological and evolutionary significance for reef building coral
Asset Metadata
Creator
Hale, Joseph James
(author)
Core Title
Genome-scale insights into the underlying genetics of background effects
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2024-05
Publication Date
03/13/2024
Defense Date
01/31/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
background effects,complex traits,DNA barcoding,genetics,genomics,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ehrenreich, Ian (
committee chair
), Aparicio, Oscar (
committee member
), Dean, Matthew (
committee member
), Mooney, Jazlyn (
committee member
)
Creator Email
halejj@usc.edu,josephjameshale@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113851029
Unique identifier
UC113851029
Identifier
etd-HaleJoseph-12694.pdf (filename)
Legacy Identifier
etd-HaleJoseph-12694
Document Type
Dissertation
Format
theses (aat)
Rights
Hale, Joseph James
Internet Media Type
application/pdf
Type
texts
Source
20240319-usctheses-batch-1129
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
background effects
complex traits
DNA barcoding
genetics
genomics