Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
(USC Thesis Other)
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CELLULAR LEVEL BOTTLENECKS: GENETIC
DIVERSITY, POPULATION DYNAMICS, AND
TECHNOLOGY DEVELOPMENT
by
Joseph Paul Dunham
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR, EVOLUTION, GENOMICS,
DEVELOPMENTAL BIOLOGY)
May 2015
Copyright 2014 Joseph Paul Dunham
ii
“For a scientist must indeed be freely imaginative and yet skeptical,
creative and yet a critic. There is a sense in which he must be free,
but another in which his thought must be very precisely regimented;
there is poetry in science, but also a lot of bookkeeping.”
Sir Peter B. Medawar, English Biologist 1915-1987
“Just because something doesn’t do what you planned it to do
doesn’t mean it’s useless.”
Thomas A. Edison, American Inventor and Businessman 1847-1931
iii
DEDICATION
This dissertation is dedicated to my incredible family, for all of their endless
support and constant love throughout all of my life. To my late grandmother,
Bernadine Dunham, my mother, Sharon Dunham,
my brother, Chris Angermayer, and my
partner in crime, Bao-Tran Vo:
Thank you for being my rock, my reality check,
and my constant source of laughter.
iv
ACKNOWLEDGMENTS
First and foremost, it is with great appreciation and honor to thank my
mentor, Dr. Sergey Nuzhdin, for his constant professional and personal support,
guidance, and patience throughout my graduate study. He has provided a crucial
environment of growth and independent thinking that has provided me with
opportunities to pursue high-risk projects. Throughout my studies, Dr. Nuzhdin
taught me how to approach research, how to think critically about my own work,
and how to better anticipate the ever-changing nature of science. I admire the
many great qualities he posses: his brilliant and abstract ideas, intelligence,
enormous breadth of knowledge, passion for science, appreciation of his
‘comrades’, and recognition of each student’s unique circumstances and
perspectives. Dr. Nuzhdin shows great care for his graduate students, post-
doc’s, visiting scholar’s, and undergraduates and takes great concern in the
steps they choose to achieve their long-term goals.
Second, I am grateful to my thesis committee members, Dr. Steven Finkel,
whom always kept me on my toes and provided insightful and realistic
suggestions. Dr. Norman Arnheim always had an open door, however more often
than not our most frequent meeting place, the hallway, was always comprised of
intense technological discussion, problem solving, and next exciting steps. Dr.
v
Kimberly Siegmund has been a fantastic member of my committee and it has
been a pleasure to discuss the multiple projects that I was and still am pursuing.
Third, I am also thankful for all my current and previous lab members.
Specifically, I would like to thank Dr. Aaron Tarone, Dr. Maren Friesen, Dr.
Matthew Salomon, Dr. Julia Saltz, Dr. Brad Foley, Dr. Daniel Falgueras, and Dr.
Peter Chang for having challenged me along the way as well as teaching me
new technical, analytical and theoretical approaches.
Fourth, I am indebted to Dr. Matthew Dean, Brent Young and the rest of
the Dean lab for always being a great source of scientific ideas and good BBQ
laughs. I would also like to thank Dr. Ian Ehrenreich, Dr. Xuelin Wu, and Dr.
Andrew Smith for the great scientific conversations over the years. I would like to
thank the two sequencing centers at USC and their personnel: specifically, at the
HSC Epigenome Center Dr. Charles Nicolet, Selene Tyndale, and Helen Truong,
and at the UPC Genome and Cytometry Core Michael Mesa and Dr. Daniel
Falgueras. Without your technical support and excellent data production, my
projects would have been severely stalled.
Fifth, I would like to acknowledge all of my peers that I started this Ph.D
adventure with: Dr. Brad Main, Dr. Michael Philips, Dr. Anna Skylar, Dr. Andrew
Pickering, Dr. Bhairavi Vivi Tolani, Varuzhan Balasanyan, and Pei Zhang. We
struggled through difficult times together and you all were a great support. I
would also like to acknowledge all of the hard work by the department and school
vi
administration. In particular, Linda Bazilian, Dawn Burke, Eleni Yokas, Christina
Tasulis, Laura Cajero, and Kathleen Boeck.
Sixth, I would like to acknowledge my collaborators. I would like to thank
my close friend and collaborator Dr. Heather Simmons for the constant support
and when times were tough, the extra push to keep the projects moving, the
ideas expanding, and the writing flowing. I couldn’t have made it this far without
you. Also, my best friends Kon Xiong and Ben Farkas, thank you for always
being there. I would like to thank all of my collaborators throughout these years
who have taught me so many different areas of research. Special thanks to Dr.
Patrick Phillips, Dr. Richard Jovelin, Dr. William Cresko, Dr. Molly Burke, Dr.
Parvin Shahrestani, Dr. Kevin Thornton, Dr. Michael Rose, Dr. Anthony Long, Dr.
Michelle Arbeitman, Dr. Aaron Tarone, Dr. Maren Friesen, Dr. Ryan Bickel, Dr.
Jennifer Brisson, Dr. Benjamin Dickins, Dr. Israel Pagán, Dr. Gary Munkvold, Dr.
Andy Stephenson, Dr. Edward Holmes, and Dr. Lin Chao. I would also like to
acknowledge my incredible undergraduates that have helped me over the years.
Thank you Jessica Kuo, Jae Lee, Kasey Cheng, Clara Park, Anish Parekh, Chris
Shafer, and Sabra Villa.
Last but not least, I am indebted to my family: my late grandmother,
Bernadine Dunham, mother, Sharon Dunham, brother, Christopher Angermayer,
and my partner, Bao-Tran Vo. I would not have made it this far in my life without
them. I am deeply thankful to the bottom of my heart for the many life lessons
vii
and the self-sacrifice that both my grandmother and mother have made over the
years. I admire their incredible strength, work ethic, talents, and intelligence. My
grandmother taught me that “if you can’t say anything nice, you shouldn’t say
anything at all” and reinforced it with a bar of soap. My mother taught me to fight
through adversity, to always quickly pick yourself up after defeat, and as a family
we can make it through the rough times. I am deeply grateful to my brother Chris
whom has taught me to keep fighting, to never take no for an answer, and we will
achieve our goals. I admire his intelligence, aspiration, entrepreneurial spirit, and
tenacity. I am extremely thankful for my loving girlfriend and partner in crime Bao-
Tran Vo. This dissertation has been a roller coaster, however you have taken
care of me like no other, through unending support, comfort during difficult times,
and the constant laughs we share together everyday.
viii
CURRICULUM VITAE
Joseph P. Dunham
Address: Department of Molecular and Computational Biology
University of Southern California
1050 Childs Way RRI 201B
Los Angeles, CA 90089
Email: jpdunham@usc.edu
Honors:
2014 Spring – Phi Kappa Phi University of Southern California,
Los Angeles, CA
2009 Spring – Teaching Assistant Award. University of Southern California,
2008 Fall BISC 325 Genetics Los Angeles, CA
Co-wrote NIH EUREKA ROI GM0998741
Dr. Nuzhdin awarded: $1.2 million
Teaching Experience:
2014 Feb-March: Guest teaching University of Southern California,
planarian lab section for BISC 480 Los Angeles, CA
2010 Aug – Dec: Teaching Assistant: University of Southern California,
BISC 406L Biotechnology Los Angeles, CA
2008 Aug – Dec: Teaching Assistant: University of Southern California,
BISC 325 Genetics Los Angeles, CA
Professional Preparation:
Ph.D., August 20
th
2014
Molecular and Computational Biology
University of Southern California
Advisor: Dr. Sergey Nuzhdin
ix
B.S. in Biological Sciences:
University of Oregon: June 2005
Biological Sciences Emphasis Areas:
Molecular, Cellular, and Developmental
Neuroscience and Behavior
Minors:
Organic Chemistry
Business
Mentoring:
Since 2010 I have mentored seven undergraduate students (Jessica Kuo,
Kasey Cheng, and Myungjae Lee). From 2010-2012 I trained and oversaw the
molecular work performed by Jessica Kuo pertaining to sample barcoding for
next generation sequencing, Illumina. During this period of time she received
three undergraduate research support awards. These awards were Rose Hills
Foundation summer science and engineering fellowship, USC Women in Science
and Engineering (WiSE), and USC Provost’s Undergraduate Research
Fellowship. Kasey Cheng also received USC Provost’s Undergraduate Research
Fellowship while working on the isolation of bacteria associated with planarian
stem cells.
Spring 2013 I began to mentor two additional undergraduate students Chris
Shafer and Anish Parekh. Our work is focused upon applying transposable
element (TE) targeted sequencing to assay TE jumping dynamics within and
between individuals across different dog breeds. Currently bulldogs, collie, pug,
and basenji breeds have been used for genomic testing. Anish Parekh has
received USC Provost's Undergraduate Research Fellowship to continue this
research through summer 2013.
June 25th, 2013, visiting undergraduate student, Sabra Villa, from the
University of Santa Cruz, conducted research throughout the summer to apply
the TE targeting to freshwater planarian lineages to understand TE dynamics
during population expansion and regeneration. Since Spring 2014, Clara Park
has been working to understand regeneration deficiency in different planaria
species. She received the Summer Undergraduate Research Fellowship (SURF)
to perform this work.
x
Peer-Reviewed Research Publications:
2014
Dunham JP*, Simmons HE*, Holmes EC, Stephenson AG. Analysis of viral
(zucchini yellow mosaic virus) genetic diversity during systemic movement
through a Cucurbita pepo vine. Virus Research doi:
10.1016/j.virusres.2014.07.030
* Authors contributed equally.
2013
Simmons HE, Dunham, JP, Zinn, K. E., Munkvold, G.P., Holmes E.C., and
Stephenson, A.G.
3
Zucchini yellow mosaic virus (ZYMV, Potyvirus): Vertical
transmission, seed infection and cryptic infections. Virus Research 2013
Sep;176(1-2):259-64. doi: 10.1016/j.virusres.2013.06.016.
Dunham JP and Friesen ML. A cost-effective method for high-throughput
construction of Illumina sequencing libraries. Cold Spring Harb Protoc; doi:
10.1101/pdb.prot074187.
Bickel RD, Dunham JP, Brisson JA. Widespread Selection Across Coding and
Noncoding DNA in the Pea Aphid Genome. G3 04/2013;
DOI:10.1534/g3.113.005793.
2012
Simmons HE, Dunham JP, Stack JC, Dickins BJA, Pagán I, Holmes EC,
Stephenson AG. Deep sequencing reveals persistence of intra- and inter-host
genetic diversity in natural and greenhouse populations of zucchini yellow
mosaic virus. Journal of General Virology 93(Pt 8):1831-40. 2012.
S-H Sze, Dunham JP, Carey B, Chang PL, Li F, Edman RM, Fjeldsted C, Scott
MJ, Nuzhdin SV, Tarone AM. A de novo transcriptome assembly of Lucilia
sericata (Diptera: Calliphoridae) with predicted alternative splices, single
nucleotide polymorphisms and transcript expression extimates. Insect
Molecular Biology 21(2):205-21. 2012.
xi
2011
Chang PL*, Dunham JP*, Nuzhdin SV, Arbeitman MN. Somatic sex-specific
transcriptome differences in Drosophila revealed by whole transcriptome
sequencing. BMC Genomics 12:364. 2011.
* Authors contributed equally.
2010
Burke MK, Dunham JP, Shahrestani P, Thornton KR, Rose MR, Long AD.
Genome-wide analysis of a long-term evolution experiment with Drosophila.
Nature 467(7315): 587-590. 2010.
2009
Jovelin R, Dunham JP, Sung FS, Phillips PC. High nucleotide divergence in
developmental regulatory genes contrasts with the structural elements of
olfactory pathways in caenorhabditis. Genetics vol. 181 (4) pp. 1387-97.
2009.
2007
Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and cost-
effective polymorphism identification and genotyping using restriction site
associated (RAD) markers. Genome Research 17:240-248. 2007.
Publications in Review:
Dunham JP, Nguyen DH, Simmons HE, Lee M, Chao L, Nuzhdin SV. Temporal
and Geographical Genomics of the Asexual Planarian Body: Mitochondria.
Submitted to Mol. Biol. Evol.
Simmons HE, Prendeville HR, Dunham JP, Ferrari MJ, Earnest JD, Pilson D,
Munkvold GP, Holmes EC, Stephenson AG. Seed transmission of Zucchini
yellow mosaic virus (a potyvirus) in wild Cucurbita pepo with transgenic virus-
resistance. Submitted to Annuls of Applied Biology.
Saltz J, Dunham JP, Chang PL, Cheng K, Nuzhdin S. Genetic variation in social
preference predicts male behavioural development and fitness in Drosophila
melanogaster. Submitted to Proc. R. Soc. B.
xii
Talks:
USC Molecular and Computational departmental retreat symposium. November
2009.
Posters:
Dunham JP, Nguyen DH, Lee M, Chao L, Nuzhdin SV. A survey of mitochondrial
genome heterogeneity as a function of relatedness in asexual freshwater
planarians. January 10-15, 2014. Plant and Animal Genome XXII Conference.
Dunham JP and Friesen ML. A Cost-Effective Method for High-Throughput
Construction of Illumina Sequencing Libraries. January 10-15, 2014. Plant
and Animal Genome XXII Conference.
Simmons HE, Dunham JP, Stack C, Dickins B, Pagan I, Holmes E, Stephenson
A. Deep sequencing reveals persistence of intra- and inter-host genetic
diversity in natural and greenhouse populations of zucchini yellow mosaic
virus. October 26, 2012. Ecological Genomics 10
th
Anniversary Symposium.
Dunham JP and Nuzhdin SV. Novel Tagging for High Throughput Cost Effective
Illumina 96 Sample RNAseq Library Construciton and Normalization. March
19, 2011. SMBE Symposium on Molecular and Genomic Evolution.
Tarone AM, Sze S-H, Dunham JP, Nuzhdin SV. The Lucilia sericata
Transcriptome: Forensic Entomology for the 21st Century. July 8,
2010. North American Forensic Entomology Association Conference.
Dunham JP, Sung FS, Kao JY, Elhadary M, Arbeitman MN, Nuzhdin SV.
Comprehensive solexa based annotation of alternatively spliced transcripts in
Drosophila melanogaster male, female, and tra mutant transcriptome. USC
Biology Interdisciplinary Graduate Symposium. Los Angeles, CA. May 2009.
Dunham JP, Sung FS, Kao JY, Elhadary M, Arbeitman MN, Nuzhdin SV.
Comprehensive solexa based annotation of alternatively spliced transcripts in
Drosophila melanogaster male, female, and tra mutant transcriptome. 50
th
Annual Drosophila Research Conference. Chicago, IL. Mar 2009.
Tarone A, Dunham JP, Hahn M, Genissel A, Nuzhdin SV. Understanding the
effects of cis-polymorphisms on gene
expression variation. 49
th
Annual Drosophila Research Conference. San
Diego, CA. Mar 2008.
Jovelin R, Dunham JP, Sung FS, Phillips PC. Molecular evolution of the AWA
and AWC pathways in Caenorhabditis. Society for the Study of Evolution
Annual Meeting. New York, NY. June 2006.
Jovelin R., Dunham J.P., Sung F.S., Phillips P.C. Genetic variation and selection
in Caenorhabditis chemosensory pathways. UO & IU joint NSF-IGERT
training program. Bloomington, IN. October 2006.
Dunham JP, Miller MR, Johnson EA, Cresko WA. Rapid genomic clone
chromosomal walking using RAD microarray analysis. Fifth International
Conference on Stickleback Behavior and Evolution. Anchorage, AK. July 30-
August 4, 2006.
xiii
Jovelin R, Dunham JP, Sung FS, Phillips PC. Molecular evolution of the AWA
and AWC pathways in Caenorhabditis. Poster presented for the EMBO
workshop on the study of Evolutionary Biology with Caenorhabditis elegans
and closely related species. Oeiras, Portugal. May, 2006.
Dunham JP, Jovelin R, Phillips PC. Analysis of Inter and Intra Specific
Polymorphism of osm-9 within the Chemosensory Pathway. Forum on
Undergraduate Research Poster Competition. UO, Eugene, OR. Spring 2004.
xiv
TABLE OF CONTENTS
Epigraph ii
Dedication iii
Acknowledgments iv-vii
Curriculum vitae viii-xiii
List of Tables xvi
List of Figures xvii-xviii
Abstract xix-xxi
Chapter One: Introduction
1.1 Multicellularity and Development 1
1.2 Genetic Diversity: Population Size, Selection and Drift 4
1.3 Study Systems 7
1.4 Hosts 11
1.5 Technology – Next Generation Sequencing 14
References 16
Chapter Two: Temporal and Spatial Genomics of the Asexual
Planarian Body: Mitochondria
2.1 Abstract 22
2.2 Introduction 23
2.3 Results 27
2.4 Discussion 42
2.5 Materials, methods, and analysis 47
References 54
Chapter Three: Analysis of viral (zucchini yellow mosaic virus)
genetic diversity during systemic movement through a
Cucurbita pepo vine
(as seen in Dunham et al. 2014.Virus Research 191:172-9)
3.1 Abstract 58
3.2 Introduction 59
3.3 Results 63
3.4 Discussion 75
xv
3.5 Materials, methods, and analysis 80
References 86
Chapter Four: A Cost-Effective Method for High-Throughput
Construction of Illumina Sequencing Libraries
(as seen in Dunham and Friesen. 2013. Cold Spring Harb. Protoc. 9:820-
34)
4.1 Abstract 92
4.2 Materials and methods 93
4.3 Troubleshooting 113
4.4 Discussion 114
References 128
Chapter Five: Conclusions and Future Studies
5.1 Conclusions and future studies 129
References 134
Bibliography 135
Appendix: Abbreviations 147
xvi
LIST OF TABLES
Table 2.1: Planarian sample information and variant results 28
Table 2.2: Gene annotation 29
Table 2.3: Variant effects 33
Table 2.4: Pairwise comparisons between related samples from
posterior to anterior 36
Table 2.5: Related samples shared variants changing in
frequency 37
Table 2.6: Primers spanning the mitochondrial genome 48
Table 3.1: ZYMV variants, variant effects, and coverage per
sample 63
Table 3.2: ZYMV shared variants and effects 70
Table 4.1: Oligonucleotides for library construction 95
Table 4.2A: Procedure A: 100 sample preparation costs 120
Table 4.2B: Procedure A: 200 sample preparation costs 121
Table 4.2C: Procedure A: 400 sample preparation costs 122
Table 4.3A: Procedure B: 100 sample preparation costs 124
Table 4.3B: Procedure B: 200 sample preparation costs 125
Table 4.3C: Procedure B: 400 sample preparation costs 126
Table 4.4: DNA extraction costs 127
xvii
LIST OF FIGURES
Figure 1.1: Cucurbita pepo vine infected with zucchini yellow
mosaic virus 9
Figure 1.2: S. mediterranea undergoing transverse fission 12
Figure 2.1: A Schematic of related and unrelated planarian 27
Figure 2.2: Minor allele frequency spectrum 31
Figure 2.3A: Synonymous and non-synonymous minor allele
frequency spectrum in related planaria 32
Figure 2.3B: Synonymous and non-synonymous minor allele
frequency spectrum in unrelated planaria 33
Figure 2.4: Average number of substitution types 34
Figure 2.5: Venn diagram of shared and unique variants 35
Figure 2.5: Plot of significant allele frequency changes in related
planaria 35
Figure 2.6A: Mitochondrial haplotypes 40
Figure 2.6B: Variant frequency (5-50%) across the mitochondrial
genome 41
Figure 2.7A: Schematic of mutation accumulation in tail fragments 44
Figure 2.7B: Proposed cellular model of cellular dynamics effecting
mitochondrial spatial distribution 44
Figure 2.8: Pairwise coverage plots (all samples) 51
Figure 3.1: Plot illustrating mutation position and type 68
xviii
Figure 3.2: A plot of the number of variants found in each sample 69
Figure 3.3: The concentration of virus and number of variants
found in each sample 72
Figure 3.4A: Cylindrical Inclusion (CI) protein wild type predicted
structure 73
Figure 3.4B: Cylindrical Inclusion (CI) protein mutant predicted
structure 73
Figure 3.5: Rainfall plot was generated to illustrate mutation
clustering within the genome 74
Figure 3.6: Minimum spanning trees depicting ZYMV population
structure 78
Figure 3.7: Schematic of the plant depicting the leaves that were
harvested from a growing vine during infection 81
Figure 4.1: Barcoding and indexing schematic 104
Figure 4.2: Sequence and genome assembly quality 117
xix
ABSTRACT
This thesis was motivated by the recognition of the importance of the
underlying role that genetic variation plays in disease and how this genetic
variation is modulated by evolution; specifically, genetic drift and selection. I have
sought to understand how genetic diversity is modulated by population bottleneck
events at the cellular level and the effects of evolutionary forces on genetic
diversity. I chose two very distinct systems, mitochondria and viruses, to address
these topics and I also developed a new inexpensive and high-throughput
sequencing sample preparation procedure for these research systems.
Mitochondria are central for cellular energy, and for signaling pathways
involved in stem cell proliferation, migration, and apoptosis. Mitochondrial
heteroplasmy, (intra- and inter- cellular genetic diversity), is known to cause stem
cell dysfunction in self-renewal and long-term proliferation. Thus, metazoans
develop from a single cell bottleneck event that acts to homogenize the
mitochondrial population, which significantly minimizes the risk of heteroplasmy.
The freshwater asexual planarian Schmidtea mediterranea does not undergo a
single cell bottleneck event, but rather reproduces by transverse fission or
multicellular inheritance. We sought to understand the effects of lacking a single
cell bottleneck event and found that high levels of genetic diversity and
haplotypes exist within individual planaria, and that mitochondrial populations
xx
varied spatially across the planarian body. As a result we have proposed a
cellular model incorporating regeneration events to explain these patterns of
intra-individual genetic diversity.
Unlike the common homogeneous genomic state of mitochondria, the
population of zucchini yellow mosaic virus contains high levels of genetic
diversity associated with the lack of proofreading activity of RNA polymerase,
high replication rates, and large population size. It is commonly assumed that the
viral population will be subjected to substantial genetic bottlenecks as it moves
systemically through the plant. Thus we sequenced to deep coverage 23 leaves
growing sequentially along a vine, and one side branch leaf. It was determined
that bottleneck events associated with systemic movement are not severe
enough to reduce the viral population and that viral diversity is circulated in the
phloem sap during infection. In addition we found variants clustered within the CI
protein, which has implications for viral movement and infection, which is
populating new niches during systemic movement or host immune evasion.
While pursuing the mitochondrial and viral studies, I optimized and
developed a new sample preparation procedure for next generation sequencing.
This sample preparation can be applied to difficult samples that would normally
yield very little DNA. In addition the procedure is high-throughput, and very cost
effective.
xxi
These studies, although very different in nature, determined that the
extent of evolutionary pressures affecting genetic diversity as a result of cellular
level bottleneck events is not straightforward. These studies described the
patterns and amount of intra-individual genetic diversity and elucidated how this
diversity is affected by the absence of a population bottleneck and regeneration
in planaria or the presence of population bottlenecks imposed by the host plant
during systemic movement. Furthermore, this data offers a deeper understanding
of the dynamics of population movement on genetic diversity, either across the
planarian body or viral invasion between tissues and cells. Thus, in both systems
intra-individual genetic diversity is one of the contributing factors to disease and
dysfunction, therefore understanding the patterns and extent of genetic diversity
through sequencing is important to aid in the development of disease
management and system specific coping mechanisms.
1
CHAPTER ONE:
INTRODUCTION
1.1 Multicellularity and Development
A plethora of phenotypic variation exists in the world across the three
domains of life. This extensive variation presents itself in many different forms,
and within the Eukaryota, this variation is attributed to cellular specialization as
well as endosymbiotic events (Sloan et al. 2014). Specialization is considered to
be a product of the evolution of multicellularity, which is one of the most
significant (Ratcliff et al. 2012) and transformative events in organismal life
history (Smith and Szathmáry, 1995). Larger cell size, cell adhesion, different
reproductive strategies and new biological complexity through the formation of
distinct functional structures are a direct result of multicellularity (Ratcliff et al.
2012). Furthermore, the multicellularity of novel biological features and functions
is attributed to increased cellular cooperation and behavior (Ratcliff et al. 2012;
Simpson, 2012; Kirk, 2005). Despite the dissimilarities in the development of
animals and land plants, cooperation is nonetheless a critical component in both
domains as their development is composed of coordinated cellular processes.
The developmental complexity of nearly all metazoans, whether sexual or
asexual, shares a critical developmental event, the single celled stage of
development. This critical event is conserved across taxa and was recognized
2
over 120 years ago by Haekel (Fairclough et al. 2010) when it was proposed that
animals evolved through:
"repeated self-division of [a] primary cell"
This has been supported by observations that development across multicellular
taxa is mediated through a single cell (zygote) and successive rounds of cell
division (Wolpert and Szathmáry, 2002). This single celled stage of development
is believed to act as a homogenizing bottleneck event, in which development
begins from identical nuclear and mitochondrial genomes. Thus reducing the
potential for conflict via the generation of genomic uniformity (Wolpert and
Szathmáry, 2002).
The developmental norm across metazoans is the progression from a
single cell to a multicellular organism. Progression from a single cell is composed
of several critical stages these being cellular population expansion, cellular
differentiation, and population reduction through apoptosis (programmed cell
death) for tissue specific sculpting (morphogenesis) (Fuchs and Steller, 2011;
Lockshin and Williams, 1964). Furthermore, cellular differentiation into a specific
tissue type also acts as a bottleneck event in which a subset of cells proceeds
through a different cellular trajectory than that of the founding population, and
those populations develop into different tissue types. In metazoans the single cell
(zygote), is totipotent, such that it possesses the developmental potential to
produce all specialized cell (differentiated cells) types within the organism.
3
Although development initiated by a single cell is conserved across multicellular
taxa a few exceptions exist, e.g. asexual freshwater planarians (Schmidtea
mediterranea) and hydra. Unique amongst these systems is their reproductive
mechanism of transverse fission or budding segments of their body. In planaria,
regeneration re-establishes the adult form as well as tissue function through the
coordinated processes of apoptosis and proliferation of totipotent stem cells or
“neoblasts”. Both of these processes are mediated through mitochondrial
pathways, and are essential for regeneration in this system. One might expect
that in such a system mutation accumulation and thus separate cellular lineages
could occur, however the consequence of multi-cellular reproduction or
inheritance has never been examined. The potential for mutation accumulation
within and between cell populations or ”heteroplasmy”, can result in the
breakdown in uniformity between cells. Heteroplasmy is typically associated with
many disease phenotypes and is known to effect cellular proliferation and
maintenance. Although rare, mitochondrial heteroplasmy is thought to occur as a
result of the high replication and mutation rates in mitochondria, which in turn are
due to the reactive oxygen species DNA damage during energy production
(Ahlqvist et al. 2012). Fortunately, heteroplasmy is typically circumvented by
single cell development, in which the cellular bottleneck acts to homogenize the
mitochondrial population. However given the nature of multicellular inheritance in
asexual planaria; the potential for cellular dysfunction due to heteroplasmy is
expected. Thus understanding the population dynamics of the mitochondria
4
having not undergone a single cell bottleneck and the effect of regeneration to
shape these populations is critical for understanding the maintenance of
heteroplasmy in a multicellular inherited system.
Although mitochondria have high mutation rates and large population
sizes, population bottlenecks are known to occur during the establishment of
germline cells early in development, which acts to homogenize the mitochondrial
population. Mitochondria are not limited to their roles in energy production, but
are also active in signaling pathways attributed to stem cell proliferation, cell
death, cellular differentiation and inter-cellular signaling (McBride HM et al.
2006). Mitochondrial heteroplasmy has known disease phenotypes and it has
been shown that heteroplasmy specifically effects stem cell proliferation and
long-term maintenance (Ahlqvist et al. 2012), therefore understanding the level of
genetic diversity and the effects of genetic drift and selection on this genetic
diversity in the mitochondrial populations is critical. Thus the first study
(chapter two) sought to determine if multicellular inheritance results in
mutation accumulation and to examine the effects of regeneration on the
mitochondrial population dynamics across a planarian body.
1.2 Genetic Diversity: Population Size, Selection and Drift
Cellular or pathogenic populations can evolve within an organism, as a
result of mutations that elicit differential responses due to environmental and
individual interactions. These interactions are not unlike more traditional
ecological interactions of organismal competition, resource allocation, and
5
parasitism (Michod and Roze, 2000), which result in a varying environment that
requires adaptation to maximize fitness (Michod, 1999).
RNA viruses, in particular, exhibit extremely high levels of genetic diversity
(Elena and Sanjuán, 2005), which is a result of their large population sizes, high
substitution rates (10
-2
to 10
-5
substitutions/site/year), and rapid replication rates
(Duffy et al. 2008). This intra-host genetic diversity offers the potential for
adaptation to changing environments such as evasion of host immune responses
or movement into different host tissues and organs (Elena and Sanjuán, 2005)
and will experience population bottleneck events (Slatkin, 1987; Cavalli-Sforza et
al. 1993; Futuyma, 1998). This in turn will influence the fitness trajectory of that
population and the ability to repopulate a new niche (Roossinck, 1997). Similar to
Waddington’s epigenetic landscape, the genetic composition of the starting
population, whether it be cells or virus, the inherent interactions between
individuals and the environment will result in different evolutionary pressures and
thus trajectories of the population.
Although populations experience both selection and genetic drift, the
strength of these pressures is dependent upon the population size. The genetic
composition of the population is composed of both reproducing and non-
reproducing individuals, and those alleles contributing to the next generation
represent the effective population (N
e
). This composition reflects a dynamic
process of changing allele frequencies where chance and non-random events
6
alter the amount of variation that is passed onto the next generation. The random
chance of an allele being either fixed or lost in a small population is higher than in
a larger population. Therefore, if N
e
is small, random genetic drift will have a
stronger effect, conversely if N
e
is large, the effect of drift will be much smaller.
Viral population size is often extremely large (A Tobacco mosaic virus
infected leaf contains 10
11
– 10
12
virions) (Garcia-Arenal et al. 2003), however as
a result of bottleneck events occurring at both the intra- and inter-host level, the
N
e
will be significantly lower (Garcia-Arenal et al. 2001) and thus random genetic
drift is expected to be be the primary evolutionary force operating on these
populations. RNA viruses undergo population bottlenecks during systemic
movement within hosts, which will alter the evolutionary trajectories of the
population (Dunham et al. 2014). The severity of the bottlenecks can be extreme,
as few as 2-20 virions, in systemic movement, i.e. leaf-to-leaf, (Fabre et al. 2012;
French and Stenger, 2003; Li and Roossinck, 2004; Sacristan et al. 2003) and
even lower in cell-to-cell (six virions in Soil-borne wheat virus) have been
recorded (Miyashita and Kishino, 2010). The effect of the population bottlenecks
as well as the amount and patterns of genetic diversity will directly influence the
dynamics of selection and drift on RNA virus populations within a host. This
genetic diversity provides the basic resources for pathogens to adapt to novel
environments (as well as hosts) making it an important biological aspect to
understand. The second topic (third chapter) of this dissertation thesis is to
7
ascertain the effects on viral (zucchini yellow mosaic virus) genetic
diversity during systemic movement.
1.3 Study Systems
Mitochondria
Mitochondria are membrane-bound cellular organelles found in the
cytoplasm in most eukaryotic cells and are typically inherited from a single parent
(generally the female) with the passage of only a few mitochondrion (Wiesner et
al. 1992). The number of mitochondra per cell tends to be organism and tissue
dependent, in which a few to a thousand can be present (Voet et al. 2006), and
the number of mitochondrial genomes also vary per mitochondrion. They are
referred to as the “cellular power plants” of the cell as they produce the
necessary chemical energy in the form of adenosine triphosphate (ATP) for
cellular function (Henze K et al. 2003). A by-product of mitochondrial energy
production is reactive oxygen species (ROS), which are believed to contribute to
higher mutation rates. Mutation rates can vary across taxa, for instance in D.
melanogaster the mutation rate is 7.8×10
−8
per site per generation, whereas in
Daphnia the rate is 1.37 × 10
−7
- 1.73 × 10
−7
per nucleotide per generation (5-10
days) (Xu et al. 2011). High mitochondrial genetic diversity within and between
cells, or heteroplasmy, has been characterized as leading to many diseases and
disorders (Nunnari and Suomalainen, 2012), which is believed to be
circumvented by homogenization of the mitochondrial population at the single
celled bottleneck that initiates organismal development.
8
The mitochondrial genome can also vary across taxa but is generally a
circular single chromosome of approximately 16 kilobases (kb) in animals or
multichromosomes up to 11.3 megabases (Mb) in flowering plants (Sloan DB et
al. 2012). The genome content in animals is typically intron free with minimal
intergenic sequence and generally encodes 37 genes in animals. Two genes
encode for rRNAs, 22 tRNAs and 13 polypeptides (Taanman, 1999). The
polypeptides encode enzyme subunits that form complexes that are machinery
for oxidative phosphorylation. The codon code for mitochondria differs from the
nuclear genome, and the codon code can vary between eukaryotes (Jukes and
Osawa, 1990). Mitochondrial gene expression is mediated by cis-acting
mitochondrial and trans-acting nuclear factors (Taanman, 1999) that function to
initiate replication, transcription and protein synthesis.
Zucchini yellow mosaic virus
The single-stranded, positive-sense RNA virus, Zucchini yellow mosaic
virus (ZYMV), belongs to the family Potyviridae, and was first discovered in 1973
in Italy (Lisa et al. 1981). Remarkably this virus achieved a worldwide distribution
within two decades of its discovery and for this reason is considered to be an
emerging virus (Desbiez & Lecoq, 1997). ZYMV typically infects cucurbits
(squash, melon, and cucumber) and causes stunting of the plant that is
associated with blistering, leaf laceration marks and yellow mottling (fig. 1.1)
(Desbiez and Lecoq, 1997). The marketability of fruit for human consumption can
be reduced by up to 94% due to infection, and given that the estimated value of
9
cucurbits is 1.5 billion per annum (Cantliffe et al. 2007) it is clear why this is
considered to be such an important crop pathogen.
Fig. 1.1. The photo shows a Cucurbita pepo vine infected with ZYMV. Typical
disease symptoms such as yellow mottling are shown.
The genome is ~9600 nt long with a 5’ covalently bound viral protein (VPg)
and a polyadenylated 3’ end. The genome encodes 11 putative proteins that are
critical for viral replication, protein splicing, and viral movement. The genes are
arranged from 5’ to 3’ along the genome. The first and third genes are P1 and P3
and both encode proteinase subunits. The second gene, HC-Pro has proteinase
functionality that cleaves the HC-P3 junction and is required for aphid
transmission (Shukla et al. 1991; Urcuqui-Inchima et al. 2001) as well as
mediating exit and entry into the host vascular system (Urcuqui-Inchima et al.
2001). Furthermore, it has also been shown to promote viral amplification, and
acts as a post-transcriptional gene-silencing (PTGS) suppressor (Gal-on, 2007).
The third gene, P3, due to its low sequence homology, is not well characterized
(Shukla et al. 1991; Urcuqui-Inchima et al. 2001), however virus amplification and
10
host pathogenicity are suggested functions (Urcuqui-Inchima et al. 2001). The
fourth gene, P3N-PIPO, was recently identified and was shown to modulate the
plasmodesmatal localization of the CI, which aids in cell-to-cell movement
through interactions with the plasmodesmatal structure (Wei et al. 2010). The
gene 6K1 has been shown to form a complex with P3 and may be involved in
pathogenicity (Urcuqui-Inchima et al. 2001).
The Cylindrical inclusion protein (CI) protein acts as an RNA helicase and is
believed to be involved in cell-to-cell movement (Shukla et al. 1991; Urcuqui-
Inchima et al. 2001). The function of 6K2 has not been determined, however
when mutated in tobacco etch virus (TEV), it leads to viral inviability. It has also
been suggested that 6K2 anchors the replication machinery to ER-like
membranes (Urcuqui-Inchima et al. 2001). The protein NIa or small nuclear
inclusion protein is a proteinase (Shukla et al. 1991; Urcuqui-Inchima et al. 2001)
and could also act in nuclear localization (Urcuqui-Inchima et al. 2001). The VPg
likely acts as a primer during viral synthesis and protects the mRNA from
degradation (Shukla et al. 1991). The NIb protein is the viral RNA-dependent
RNA polymerase. The coat protein (CP) aids to encapsidize the viral genome
and aids in vector transmission (Shukla et al. 1991; Urcuqui-Inchima et al. 2001),
host specificity, RNA amplification, and systemic movement (Urcuqui-Inchima et
al. 2001).
Transmission is primarily mediated through an aphid vector in a non-
11
persistent manner (Lisa et al. 1981), and the seed transmission rate is 1.6%
(Simmons et al. 2011). Replication of ZYMV proceeds through two phases.
Initially, copying of the positive sense genome to produce a negative
complementary strand intermediate. Second, the negative sense strand acts as
template for repeated copying into the positive sense viral strands (Wang et al.
2000). The substitution rate is 10
-2
to 10
-5
substitutions/site/year, which is a
function of high replication rates (Duffy et al. 2008) and the use of error-prone
RNA polymerase. The mean rate of evolutionary change for the ZYMV CP was
estimated to be 5.0×10
−4
nucleotide substitutions per site, per year (Simmons et
al. 2008).
1.4 Hosts
Schmidtea mediterranea
Planarians are triploblastic (i.e. containing three primitive germ layers) and
are bilaterally symmetric. Lophotrochozoa is the phylogenetic group that
planarians belong to and this is the sister group to the Ecdysozoa, (D.
melanogaster and C. elegans) and Deuterostomes (vertebrates) (Sánchez
Alvarado and Kang, 2005). The thousands of planarian species that have been
identified (Brusca and Brusca, 1990) have a worldwide distribution and have
been found in saltwater, freshwater and terrestrial environments. In particular, the
freshwater planarian species Schmidtea mediterranea has become a model
system for the molecular basis for regeneration. This is due to its robust
regenerative ability where it can be cut in any axial orientation, its diploid (2n=8)
12
genome in both sexual and asexual (reproduction by transverse fission (fig. 1.2))
strains (Newmark and Sánchez Alvarado, 2002), small genome size (4.8X10
8
bp)
compared to other planarian species (Sánchez Alvarado and Kang, 2005), and
large stem cell population.
Schmidtea mediterranea regenerative ability has been attributed to a
population of totipotent mitotically active somatic stem cells, called neoblasts,
which constitute 30% of the organism. They are the smallest cells in the
organism (~5-10 um), and are randomly distributed throughout the parenchyma,
with the exception of the area surrounding the pharynx and the most anterior
Fig. 1.2. Image of S. mediterranea undergoing transverse fission. The planarian
attaches it posterior to the substrate surface and through muscle contracts tears
(skinniest portion of planarian) its body to remove its posterior body segment.
13
area above the photoreceptors (Forsthoefel and Newmark, 2009). These areas
were shown to be incapable of regenerative activity (Morgan, 1902). Neoblasts
are the only mitotically active somatic cells (Newmark and Sánchez Alvarado,
2000), are self-renewing, and their progeny have been identified to differentiate
into approximately 40 different cell types.
Several lines of evidence suggest that neoblasts have stem cell like
characteristics. First, electron microscopy studies have identified cell morphology
characteristics common to stem (decondensed DNA with large nuclei (Newmark
and Sánchez Alvarado, 2002). Second, injection of neoblasts into an irradiated
neoblast (destroyed neoblast population) background recovered regenerative
abilities and thereby enabled the long-term survival of the planarian (Baguñà et
al. 1989). Third, given that neoblasts are the only mitotically active cells, BrdU
(thymidine analog) has been successfully integrated into these mitotically active
cells and it was shown that their daughter cells differentiated into different tissue
types. Furthermore, using BrdU pulsing and whole-mount fixation, it has been
shown that neoblasts migrate towards wounds and into areas devoid of
neoblasts (Newmark and Sánchez Alvarado, 2000).
After amputation or transverse fission, regeneration is initiated first by
localized apoptotic activity around the wound site (3-4 hours post-wound
formation) followed by localized neoblasts mitotic activity (4 hours post-wound
formation), which at the wound site generates the regenerative blastema. These
14
events are followed by systemic apoptosis and neoblast proliferation 24-72 hours
post-wound formation, which promotes body remodeling or “morphallaxis”
(Pellettieri et al. 2010; Saló and Baguñà, 1989). Full regeneration can be
achieved within seven days. During normal cellular turnover, neoblasts replace
lost cells and after transverse fission or amputation, form the regenerative
blastema at the wound site, which is the site for the regeneration of missing
structures (Sánchez Alvarado and Kang, 2005).
Cucurbita pepo ssp. texana
A native of Texas, Northern Mexico, and states flanking the Mississippi
River south of southern Illinois, Cucurbita pepo ssp. texana, a free-living squash,
is an annual monoecious vine with indeterminate growth and reproduction. Its
origins are unknown however it is believed to either to be the progenitor of
cultivated squash or an early escape from the cultivated squashes (Decker and
Wilson, 1987; Decker-Walters, 1990; Decker-Walters et al. 2002; Lira et al.
1995). This system is thought to be the optimal host for Zucchini yellow mosaic
virus maintenance (Gal-on, 2007).
1.5 Technology - Next Generation Sequencing
Technological advances in sequencing, specifically Illumina sequencing,
have emerged over the past seven years and now provides opportunities to
sequence natural populations of model and non-model organisms at the genomic
and transcriptome levels. Illumina sequencing enables the sequencing of millions
of molecules of DNA through the parallelization of sequencing of each molecule
15
simultaneously (Morozova and Marra, 2008). Traditional sample preparation or
“library construction” for Illumina sequencing occurs through several phases of
molecular modifications to DNA. Post-extraction of DNA, random fragmentation
of DNA is performed using chemical, enzymatic, or physical fragmentation
approaches. After fragmentation, the DNA ends are blunt end repaired,
adenosine triphosphate added to the 3’ ends of the double-stranded DNA, and
an AT ligation event adds known Illumina adaptors to each end of the double-
stranded DNA molecules. Between each molecular modification step, each
reaction must be cleaned to remove enzymes, salts, and other impurities. This
ligation event incorporates a known sequence for adaptor ligated PCR, which
also subsequently results in the addition of unique sequences to the 5’ and 3’
ends of the DNA that are essential for sequencing. Depending upon the version
of the machine and the current chemistry, up to approximately 200 million
sequences of varying sequence length can be produced within a few weeks.
Illumina technology and others are generalized as Next Generation
Sequencing (NGS) and have been consistently dropping in cost over the past
seven years. Unfortunately, there has not been a concurrent drop in cost in the
associated preparation of samples (Dunham and Friesen, 2013). Thus, the third
study (fourth chapter) of this dissertation is to describe a newly optimized,
inexpensive, and high-throughput procedure for sample preparation.
16
References
Ahlqvist KJ, et al. 2012. Somatic progenitor cell vulnerability to mitochondrial
DNA mutagenesis underlies progeroid phenotypes in Polg mutator mice.
Cell Metab. 15:100-9.
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. 2006. Molecular
biology of the cell (4
th
ed.). (New York: Garland Science; 2002).
Baguñà, J, Saló E, Auladell C. 1989. Regeneration and pattern formation in
planarians. III. Evidence that neoblasts are totipotent stem cells and the
source of blastema cells. Development 107:77-86.
Brusca RC, Brusca GJ. 1990. Invertebrates (Sunderland, MA: Sinauer
Associates).
Cantliffe DJ, Shaw NL, Stoffella PJ. 2007. Current trends in cucurbit production in
the U.S. Acta. Hort. 731:473-478.
Cavalli-Sforza LL, Menozzi P, Piazza A. 1993. Demic expansions and human
evolution. Science 259:639-46.
De Smet I, Lau S, Mayer U, Jürgens G. 2010. Embryogenesis – the humble
beginnings of plant life. Plant J. 61:959-70.
Decker DS, Wilson H D. 1987. Allozyme variation in Cucurbita pepo complex: C.
pep ovar. overifera vs. C. texana. Syst Bot 12:263–273.
Decker-Walters DS. 1990. Evidence for multiple domestication of Cucurbita
pepo. In Biology and Utilization of the Cucurbitaceae, pp. 96–101. Edited
by D. M. Bates, R. W. Robinson and C. Jeffrey. Ithaca, NY: Cornell
University Press.
Decker-Walters DS, Straub JE, Chung SM, Nakata E, Quemada HD. 2002.
Diversity in free-living populations of Cucurbita pepo Cucurbitaceae as
assessed by random amplified polymorphic DNA. Syst Bot. 27:19-28.
Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction
of the tree of life. Nature Reviews Genetics 6:361-375.
Debiez C, Lecoq H. 1997. Zucchini yellow mosaic virus. Plant Path. 46:809-829.
Duffy, S., Shackelton, L.A. and Holmes, E.C. 2008. Rates of evolutionary change
in viruses: patterns and determinants. Nat. Rev. Genet. 9:267-276.
17
Dunham JP and Friesen ML. 2013. A Cost-Effective Method for High-Throughput
Construction of Illumina Sequencing Libraries. Cold Spring Harb Protoc
9:820-34.
Dunham JP, Simmons HE, Holmes EC, Stephenson AG. 2014. Analysis of viral
(zucchini yellow mosaic virus) genetic diversity during systemic movement
through a Cucurbita pepo vine. Virus Research 191:172-9.
Elena SF and Sanjuán R. 2005. Adaptive Value of High Mutations Rates of RNA
Viruses: Separating Causes from Consequences. J. Virol. 79:115555-8.
Fabre F, Montarry J, Coville J, Senoussi R, Simon V, Moury B. 2012. Modelling
the Evolutionary Dynamics of Viruses within Their Hosts: A Case Study
Using High-Throughput Sequencing. PLoS Pathog 8:e1002654.
Fairclough SR, Dayel MJ, King N. 2010. Multicellular development in a
choanoflagellate. Current Bio. 20:R875-876.
Forsthoefel, D.J., Newmark, P.A. 2009. Emerging patterns in planarian
regeneration. Current Opinion in Genetics & Development 19:412-420.
French R, Stenger DC. 2003. Evolution of wheat streak mosaic virus: Dynamics
of population growth within plants may explain limited variation. Annual
Review of Phytopathol 41:199-214.
Fuchs Y and Steller H. 2011. Programmed Cell Death in Animal Development
and Disease. Cell 147:742-758.
Futuyma DJ. 1998. Evolutionary Biology (3
rd
Edit.). (Sinauer Associates, Inc.,
Sunderland, Massachuestts).
Gal-On A. 2007. Zucchini yellow mosaic virus: insect transmission and
pathogenicity – the tails of two proteins. Mol. Plant Pathol. 8:139-150.
García-Arenal, F., Fraile, A. and Malpica, J.M. 2003. Variation and evolution of
plant virus populations. Int. Microbiol. 6:225-232.
Haeckel E. 1874. Memoirs: The Gastraea-Theory, the Phylogenetic Classification
of the Animal Kingdom and the Homology of the Germ-Lamellæ. Quarterly
Journal of Microscopical Science 14:142-165.
Henze K, Martin W. 2003. Essence of mitochondria. Nature 426:127-8.
Jukes TH, Osawa S. 1990. The genetic code in mitochondria and chloroplasts.
Experientia. 46:1117–26.
18
Kircher, M. and Kelso, J. 2010. High-throughput DNA sequencing – concepts and
limitations. BioEssays 3:524-536.
Kirk DL. 2005. A twelve-step program for evolving multicellularity and a division
of labor. Bioessays 27:299-310.
Li H, Roossinck MJ. 2004. Genetic Bottlenecks Reduce Population Variation in
an Experimental RNA Virus Population. J. Virol. 78:10582-10587.
Lira R, Andres TC, Nee M. 1995. Cucurbita. Pages 1-115 in R. Lira, (ed).
Systematic and ecogeographic studies on crop genepools. Volume 9.
International Plant Genetic Resources Institute. Mexico City and Rome.
Lisa V, Boccardo G, D’Agostino G, Dellavalle G, D’Aquilio M. 1981.
Characterization of a potyvirus that causes Zucchini yellow mosaic.
Phytopathology 71:667-672.
Lockshin RA and Williams CM. 1964. Programmed cell death – II. Endocrine
potentiation of the breakdown of the intersegmental muscles of silkmoths.
Journals of Insect Physiology 10:643-649.
McBride HM, Neuspiel M, Wasiak S. 2006. Mitochondria: more than just a
powerhouse. Current Biol. 16:R551-60.
Michod RE. 1999. Darwinian Dynamics, Evolutionary Transitions in Fitness and
Individuality. (Princeton University Press, Princeton, NJ).
Michod RE and Roze D. 2000. Cooperation and conflict in the evolution of
multicellularity. Heredity 86:1-7.
Miyashita S, Kishino H. 2010. Estimation of the Size of Genetic Bottlenecks in
Cell-to-Cell Movement of Soil-Borne Wheat Mosaic Virus and the Possible
Role of the Bottlenecks in Speeding Up Selection of Variations in trans-
Acting Genes or Elements. J Virol 84:1828-1837.
Morgan TH. 1902. Growth and regeneration in Planaria lugubris. Arch. Ent.
mech. Org. 13:179-212.
Morgan TH. 1898. Experimental studies of the regeneration of Planaria maculata.
Arch. Entwm. Org. 7:364-397.
Morozova O and Marra MA. 2008. Applications of next-generation sequencing
technologies in functional genomics. Genomics 92:255-64.
Newmark PA and Sánchez Alvarado A. 2000. Bromodeoxyuridine specifically
labels the regenerative stem cells of planarians. Dev. Biol. 220:142-153.
19
Newmark PA, Sánchez Alvarado A. 2002. Not your father's planarian: a classic
model enters the era of functional genomics. Nature Reviews Genetics
3:210-219.
Nunnari J, Suomalainen A. 2012. Mitochondria: In Sickness and in Health. Cell
148:1145-1159.
Pellettieri J, Fitzgerald P, Watanabe S, Mancuso J, Green DR, Sánchez Alvarado
A. 2010. Cell Death and tissue remodeling in planarian regeneration. Dev
Biol. 338:76-85.
Peris CI, Rademacher EH, Weijers D. 2010. Green beginnings – pattern
formation in the early plant embryo. Curr. Top. Dev. Biol. 91:1-27.
Quint M, Drost H, Gabel A, Ullrich KK, Bön M, Grosse I. 2012. A transcriptomic
hourglass in plant embryongenesis. Nature 490:98-101.
Randolph H. 1897. Observations and experiments on regeneration in planarians.
Arch. Entwm. Org. 5:352-372.
Ratcliff WC, Denison RF, Borrello M, Travisano M. 2012. Experimental evolution
of multicellularity. Proc Natl Acad Sci USA 109:1595-1600
Raven PH, Evert RF, Eichhorn SE. 2005. Biology of Plants. (W.H. Freeman &
Company).
Reddien PW, Sánchez Alvarado A. 2004. Fundamentals of planarian
regeneration. An Rev of Cell and Dev Biol 20:725-757.
Roossinck MJ. 2007. Mechanisms of plant virus evolution. Annu. Rev.
Phytopathol. 35:191-209.
Sacristan S, Malpica JM, Fraile A, Garcia-Arenal F. 2003. Estimation of
population bottlenecks during systemic movement of Tobacco mosaic
virus in tobacco plants. J. Virol. 77:9906-9911.
Saló E, Baguñà J. 1989. Regeneration and pattern formation in planarians II.
Local origin and role of cell movements in blastema formation.
Development 107:69-76.
Sánchez Alvarado A, Kang H. 2005. Multicellularity, stem cells, and the
neoblasts of the planarian Schmidtea mediterranea. Experimental Cell
Research 306:299-308.
20
Shukla, D.D., Frenkel, M.J. and Ward, C.W. 1991. Structure and function of the
potyvirus genome with special reference to the coat protein coding region.
Canadian J. Plant Path. 13:178-191.
Simmons HE, Holmes E.C., & Stephenson, A.G. (2008). Rapid evolutionary
dynamics of Zucchini yellow mosaic virus. J. Gen. Virol. 89:1081-5
Simmons HE, Holmes EC, Gildow FE. Bothe-Goralczyk,
M.A. and Stephenson,
A.G. 2011. Experimental verification of seed transmission in Zucchini
yellow mosaic virus. Plant Dis. 95:751-4.
Simpson C. 2012. The evolutionary history of division of labour. Proc. Biol .Sci.
279:116–121.
Slatkin M. 1987. Gene flow and the geographic structure of natural populations.
Science 236:787-792.
Sloan DB, Alverson AJ, Chuckalovcak JP, Wu M, McCauley DE, Palmer
JD,Taylor DR. 2012. Evolution of Enormous, Multichromosomal Genomes
in Flowering Plants Mitochondria with Exceptionally High Mutations Rates.
PLoS Biology 10:e1001241.
Sloan DB, Triant DA, Wu M, Taylor DR. 2014. Cytonuclear Interactions and
Relaxed Acceleration Sequence Evolution in Organelle Ribosomes. Mol
Bio Evol 31:673-682.
Smith MJ and Szathmáry E. 1995. The Major Transitions in Evolution. (Oxford
University Press, New York).
Taanman J. 1999. The mitochondrial genome: structure, transcription, translation
and replication. Biochmica et Biophysica Acta 1410:103-123.
Urcuqui-Inchima, S., Haenni, A. and Bernardi, F. 2001. Potyvirus proteins: a
wealth of functions. Virus Res. 74:157-175.
Voet D, Voet JG, Pratt CW. 2006. Fundamentals of Biochemistry, 2nd Edition.
John Wiley and Sons, Inc. p. 547.
Wang X, Ullah Z, Grumet R. 2000. Interaction between Zucchini Yellow Mosaic
Potyvirus RNA-Dependent RNA Polymerase and Host Poly-(A) Binding
Protein. Virology 275:433-443.
Waddington CH. 1957. The Strategy of the Genes. (Geo Allen & Unwin, London,
1957).
Wei T, Zhang C, Hong J, Xiong R, Kasschau KD, Zhou X, Carrington JC, Wang
A. Formation of Complexes at Plasmodesmata for Potyvirus Intercellular
21
Movement Is Mediated by the Viral Protein P3N-PIPO. PLoS Pathogens
6:e1000962.
Wiesner RJ, Ruegg JC, Morano I. 1992. Counting target molecules by
exponential polymerase chain reaction, copy number of mitochondrial
DNA in rat tissues. Biochim. Biophys. Acta. 183:553–559.
Wobus AM and Boheler KR. 2005. Embryonic Stem Cells: Prospects for
Developmental Biology and Cell Therapy. Physiol. Rev. 85:636-678.
Wolpert L and Szathmáry E. 2002. Evolution and the egg. Nature 420:745.
Xu S, Schaack S, Seyfert A, Choi E, Lynch M, Cristescu ME. High Mutation
Rates in the Mitochondrial Genomes of Daphina pulex. Mol. Biol. Evol.
29:763-769.
22
CHAPTER TWO:
TEMPORAL AND SPATIAL GENOMICS OF THE ASEXUAL
PLANARIAN BODY: MITOCHONDRIA
2.1 Abstract
A single mitochondrial haplotype, produced at the single cell stage of
development, is a common metazoan developmental strategy that curbs the
production of multiple haplotypes or "heteroplasmy" that would otherwise lead to
disease development. The exception is the asexual freshwater planarian,
Schmidtea mediterranea, which propagates through transverse fission (multi-
cellular reproduction), and is thus predicted to accumulate mutations over time.
Whether this occurs or not has yet to be ascertained, therefore using Illumina
sequencing we investigated the consequences of lacking a single cell bottleneck
at the mitochondrial genomic level. Related (different body regions of a founder),
and unrelated (different founders) samples were used to assay mitochondrial
diversity. We determined the extent of nucleotide diversity and haplotypes within
and between S. mediterranea. We ascertained that mutation accumulation was
age dependent, and that within related individuals shared variants were
geographically structured by distance. Furthermore, the tail region experienced a
cellular bottleneck such that variation was reduced when compared to the middle
and anterior regions. The expected consequences of the cellular population
reduction, expansion and migration as a result of regeneration are mutation
23
accumulation and asymmetric distribution of these variants spatially. Interestingly
we observed higher mutation accumulations in the sample corresponding to the
middle segment of the founder compared to other areas. This is the first study
identifying the consequences of multi-cellular reproduction on mitochondrial
genomic variation within and between asexual S. mediterranea, and illustrates
the implications of regeneration on the evolutionary trajectories experienced by
cellular lineages.
2.2 Introduction
Polymorphic perturbations in functional pathways are known to lead to
many adult onset diseases. Heteroplasmy, which is the accumulation of
mutations within a cell’s mitochondrial population, may contribute to the
physiological aging of cells (Harman, 1972) and mitochondrial dysfunction.
Heteroplasmy is curbed early in development, when the founding mitochondrial
population is restricted at the single celled stage of development. This event acts
as a bottleneck, limiting the mitochondrial population size, which effectively acts
to homogenize the starting population (Dumollard et al. 2007). Mitochondria
exhibit high mutation rates compared to nuclear DNA. Thus, in rare instances
heteroplasmy can exist within an individual. Generally, a homogeneous
mitochondrial founder population is thought to be critical for coordinated activity
within and between cells during the developmental processes of
apoptosis/autophagy, proliferation, cell migration, and cell maintenance. Mouse
24
models have demonstrated that homogeneous mitochondrial populations are
more efficient at buffering against abnormalities such that heterogeneous mice
had severe behavioral differences in response to stress and also demonstrated
reduced learning ability (Sharpley et al. 2012). At the cellular level, neural stem
cell specific abnormalities pertaining to self-renewal and long-term proliferation
capability were associated with reactive oxygen species (ROS) that produced
heteroplasmic mitochondrial mutations (Ahlqvist et al. 2012). The negative
effects that heteroplasmy has on development may explain, at least in part, why
single cell reproduction tends to be the predominant form of reproduction in
metazoans.
Freshwater asexual planaria, including Schmidtea mediterranea, are an
exception to this, as they do not develop from a single cell. While generally
capable of sexual reproduction, asexual populations are in existence where
reproduction is mediated through transverse fission or ‘fissiparity’ (Lázaro et al.
2011) (multi-cellular reproduction) and regeneration from a pool of potentially
thousands of cells. Regeneration is dependent upon mitochondrial pathways,
and is initiated by a combination of two major events. The first is apoptosis of
differentiated cells and the second is, stem cell or "neoblast" proliferation and
migration (Pellettieri et al. 2009; Newmark and Sánchez Alvarado, 2000). These
events occur in a biphasic manner, the first phase, 3-4 hours post fission,
consists of localized apoptosis at the wound site followed by enhanced neoblast
proliferation over the next 24 hours (Saló and Baguñà, 1989). Furthermore,
25
neoblasts located within 500 µM of the fission wound migrate more rapidly, up to
two to three times, when compared to neoblasts located farther from the wound
site (Newmark and Sánchez Alvarado, 2000; Saló and Baguñà, 1989). The
second phase occurs 24 hours after fission and consists of global apoptosis and
proliferation, which promotes body remodeling or “morphallaxis” (Newmark and
Sánchez Alvarado, 2002; Morgan, 1901). A likely consequence of multi-cellular
reproduction is cell lineage specific mutation accumulation within an individual
planarian, however this remains to be demonstrated at either the nuclear or
mitochondrial genomic levels. Also, as a result of cellular proliferation (population
expansion) and apoptosis (population reduction) during regeneration, neoblast
lineages are likely to experience different evolutionary trajectories across the
body of a planarian.
The cellular adaptive responses for the maintenance of heteroplasmy
associated with asexual propagation, and the prevention of mitochondrial
disease phenotypes and death, therefore requires the understanding of the
relative contributions of selection, mutation, and drift in these cell populations.
Consider, for example, the role of mitochondria in adaptation, if selection is the
driving evolutionary force then adaptation is expected to be fairly rapid, and in
contrast if genetic drift is the predominant force one would expect adaptation to
be slower. Which evolutionary force is expected to be the main driver of evolution
is at least partially dependent on the effective population size (N
e
). Given that the
effect of genetic drift is inversely proportional to N
e
one might expect genetic drift
26
to be the main evolutionary force acting on the mitochondrial genomes in
organisms that develop from a single cell. However, in S. mediterranea, given
that a large number of mitochondrial genomes are passed on to their progeny,
one might expect selection to be the main evolutionary force acting on these
populations.
In order to determine the mitochondrial genomic consequences of multi-
cell inheritance we have undertaken next generation sequencing of six asexual
S. mediterranea, three related (derived from the same founder) and three
unrelated (fig. 2.1). In addition, in order to assess the spatial component of
transverse fission, related samples derived from different regions of a single
founder planarian were used to determine the changes in the mitochondrial
population post regeneration. To our knowledge, this is the first study in S.
mediterranea that examines the existence of highly variable mitochondrial
genomes due to multi-cell inheritance.
27
Fig. 2.1. A schematic of related and unrelated planaria. Each sample is a
planarian that was derived a specific number of days (or generations) from the
founder after the founder’s first fission event. Related planarians were derived on
the same day in sequential order from tail segment, middle segment, and then
anterior segment (not including the founder head).
2.3 Results
Sequencing and Genome Structure
The total DNA extract of six S. mediterranea planarians, three unrelated
and three related, were deep sequenced (fig. 2.1). The mitochondrial genomes
were subsequently assembled and had an average coverage depth of 3,011x
(SD +/- 277bp) (table 2.1). The analysis revealed three low coverage regions,
which were subsequently determined to be repetitive via PCR amplification and
Sanger sequencing. The length of the six S. mediterranea mitochondrial
genomes was 16,562 bp, which is comparable to those of the planaria species
Dugesia japonica and D. ryukyuensis, which are 17,799 and 17,015 bp,
Fission
Regenerate
373 days
or
7
generations
Fission
Regenerate
Fission
Regenerate Regenerate Regenerate Regenerate
Unrelated
Individuals
Sequenced
Individuals
Sample 1
Related
Individuals
Sample 2 Sample 3 Sample 4 (Tail) Sample 5 (Middle) Sample 6 (Anterior)
411 days
or
13
generations
441 days
or
19
generations
428 days
or
19
generations
428 days
or
20
generations
428 days
or
21
generations
28
Table 2.1: Sample information and variant results
Sample
1
Sample
2
Sample
3
Sample
4 (T)
Sample
5 (M)
Sample
6 (A)
Age (Days after
founder first split)
411 373 441 428 428 428
Sample
Generation
(Fission Number)
13 7 19 19 20 21
Average
Coverage
2776 2802 3379 3324 2999 2784
Filtered
SNPs/Indels
1656 1862 2723 2052 2183 2118
Note - T=Tail, M=Middle, and A=Anterior samples.
respectively. In addition, the AT genome content was nearly identical (71.6-
75.8%) between all three species. The correct assembly and directionality of the
mitochondrial genome was determined via PCR amplification, as well as
alignment against D. japonica and D. ryukyuensis. A comparison of the three
genomes revealed that no rearrangements had occurred, however, the level of
nucleotide diversity was high. Genome annotation identified 40 known
mitochondrial genes. Interestingly four genes, cox1, atp6, nad4, and nad5 were
partially duplicated within the genome (table 2.2). Gene annotation prediction
identified that most genes were transcribed on the (+) strand with the exception
of gene atp8.
29
Table 2.2: Gene Annotation
Start Stop DNA Strand Name
1*** 33 intergenic
34 96 + tRNA-MET
97 100 intergenic
101 166 + tRNA-HIS
167 169 intergenic
170 235 + tRNA-PHE
202 951 + s-rRNA
945 1006 + tRNA-LEU1
1007 1011 intergenic
1012 1077 + tRNA-TYR
1078 1082 intergenic
1083 1151 + tRNA-GLY
1152 1521 intergenic
1522* 2058 + l-rRNA
2059 2066 intergenic
2067 2130 + tRNA-LEU2
2131 2184 + tRNA-THR
2185 2198 intergenic
2199 2258 + tRNA-CYS
2259 2266 intergenic
2267 2330 + tRNA-ASN
2331 2427 intergenic
2428 2733 + NADH dehydrogenase subunit 4L
2734*** 2969 intergenic
2970 3036 + tRNA-SER1
3029 3076 + NADH dehydrogenase subunit 4
(partial duplicate)
3070 4119 + cytochrome b
4120 4470 intergenic
4471 5619 + NADH dehydrogenase subunit 4
5618** 6121 intergenic
6122 6271 - ATP synthase F0 subunit 8
6272 6447 intergenic
6446*** 7846 + cytochrome c oxidase subunit 1
7845 7860 intergenic
7861 8097 + cytochrome c oxidase subunit 1
(partial duplicate)
8188 8249 + tRNA-GLU
8260 8673 + NADH dehydrogenase subunit 6
8796 9875 + NADH dehydrogenase subunit 5
10389 10446 + tRNA-SER2
30
10447 10508 + tRNA-ASP
10507 10570 + tRNA-ARG
10588 11376 + cytochrome c oxidase subunit 3
11390 11453 + tRNA-ILE
11460 11512 + tRNA-GLN
11513 11581 + tRNA-LYS
11849 12322 + ATP synthase F0 subunit 6
12327 12389 + tRNA-VAL
12386 13222 + NADH dehydrogenase subunit 1
13280 13345 + tRNA-TRP
13373 14107 + cytochrome c oxidase subunit 2
14325*** 14395 + tRNA-PRO
14420 14743 + NADH dehydrogenase subunit 3
14754 14822 + tRNA-ALA
14819*** 15724 + NADH dehydrogenase subunit 5
(partial duplicate)
15766 16164 + ATP synthase F0 subunit 6 (partial
duplicate)
Note. Asterisks indicate regions associated with more or less variants (5-50%
frequency) than expected by chance. ***(chi-square test P<0.001), **( chi-square
test P <0.025), and * (chi-square test P <0.05).
Variants Across Samples
Each S. mediterranea mitochondrial genome contained a large number of
variants, the mean number of SNPs was 2,089 (SD+/-361.6) and the mean
number of indels was 10.5 (SD+/-3.02). A positive correlation (Spearman r=
0.884; two-tailed p=0.033) was found between sample age and total variants and
approximately 80% of the variants were shared by two or more samples. Low
frequency variants (1-5%) constituted approximately 85% of the variants in all
samples, while the remaining ranged from 5-50% in frequency (fig. 2.2) and the
average variant coverage was 109.21 (SD+/-9.13). Nuclear mitochondrial
31
Fig. 2.2. The minor allele frequency spectrum of unrelated (black lines) and
related (grey lines) samples. The insert shows the allele frequency spectrum of
variants at 1-5% frequency. The numbers of low frequency sites appear to be
correlated with sample age.
pseudogenes (Bensasson et al. 2001) can yield a false signal of mitochondrial
heteroplasmy when reads are mapped to the mitochondrial genome (Parr et al.
2006). Therefore samples were mapped to the nuclear genome (Robb et al.
2007) and it was determined that the coverage did not exceed 20x. Given the
high coverage of the mitochondrial genome (3,011), it is unlikely that
32
pseudogenes are biasing our heteroplasmic calls. Nonsynonymous variants were
more abundant than synonymous variants (1111:237), particularly at low
frequency sites (fig. 2.3A-B), and one frame shift, 11 stop codons gained and two
stop codons lost were identified (table 2.3). Transversions were nearly five times
more abundant than transitions. The most frequent transversion was A>T,
comprising 36.5% of substitutions, whereas the only abundant transition was
C>T and this was found in all six samples (fig. 2.4).
A
33
B
2.3. Minor allele frequency spectrum of unrelated (A) and related (B)
synonymous and non-synonymous mutations. Grey = synonymous and black =
non-synonymous.
Table 2.3: Variant Effects
Sample Frameshift
Non-
synonymous
Synonymous
Stop
Codon
Gained
Stop
Codon
Lost
1 1 862 192 11 2
2 1 956 206 14 7
3 1 1468 307 20 6
4 (T) 1 1073 227 16 3
5 (M) 1 1176 252 18 6
6 (A) 1 1129 237 16 4
Note. T=Tail, M=Middle, A=Anterior
34
Fig. 2.4. Substitution types within planaria. Bar plot showing the average
substitutions within related and unrelated planarians. The x-axis represents the
substitution type. The grey and black substitution color of the x-axis text follows
the color of the bar plot. The y-axis is the fraction of the total sites that are a
particular substitution type.
Unique and Shared Variants
Unique variants were found to be lowest in the tail segment (213), highest
in the middle (326) and anterior segment (292) (fig. 2.5A). Shared variants also
showed a pattern consistent with sample location such that sample 4 (tail) shared
more variants (283) with its flanking segment (sample 5 - Middle) than with the
furthest sample segment (252) (sample 6 - Anterior). While shared variants
between samples 5 (M) and 6 (A) (271) were found to be intermediate relative to
the other comparisons (fig. 2.5A).
0.0
0.1
0.2
0.3
0.4
Substitution Type
Fraction of Total Variants
A>C
T>G
T>C
A>G
T>A
A>T
C>A
G>T
G>C
C>G
G>A
C>T
35
A
B
Fig. 2.5. Venn diagram and shared variant frequency plot. (A) Venn diagrams
illustrating the unique and shared variants between unrelated (left) and related
(right) planaria. (B) Plot of shared variant frequency changes among all three
related samples (q<=0.05; FDR corrected). The x-axis represents related
samples (left to right: Tail, Middle, Anterior) and the frequency of each variant
across samples is represented by the y-axis.
Sample 4 (T) Sample 5 (M) Sample 6 (A)
0
10
20
30
Sample
Variant Frequency
36
We determined if a significant change in frequency occurred in the shared
variants between related samples. A greater number of variants increased in
frequency when sample 4 (T) was compared to sample 5 (M) versus sample 6
(A). Interestingly, an order of magnitude fewer variants changed frequency
between sample 6 (A) when compared to sample 5 (M) (table 2.4). Furthermore,
among the variants shared by the three related samples (1298), 5.3% had a
significant change in frequency (q<0.05; FDR corrected). Only one variant (3078)
was found to increase in frequency in all three samples (frequency increased
from tail to anterior) (fig. 2.5B and table 2.5).
Table 2.4: Pairwise comparisons between related samples from posterior to
anterior
Sample 5 (M)
Frequency Increase
Sample 6 (A)
Frequency Increase
Sample 4 (T)
43 23
Sample 5 (M)
3
Note. T=Tail, M=Middle, A=Anterior. Significance q<0.05; FDR corrected.
37
Table 2.5: Related Samples Shared Variants Changing Frequency
Position p-value q-value
Number of Samples with Changing
Variant Frequency
629 6.32E-05 9.64E-03 1
2412 2.90E-04 2.11E-02 1
3078 1.93E-46 1.47E-43 3
7.04E-07 2.68E-04
9.91E-05 1.26E-02
3329 9.87E-04 4.29E-02 1
3331 1.68E-04 1.71E-02 1
3500 1.91E-04 1.82E-02 1
3513 1.16E-03 4.42E-02 1
3913 1.01E-03 4.29E-02 1
3924 9.95E-53 1.52E-49 1
3925 3.95E-04 2.62E-02 1
3937 1.14E-03 4.42E-02 1
4251 6.65E-04 3.38E-02 1
4631 8.95E-07 2.73E-04 2
4.63E-04 2.82E-02
4930 1.27E-04 1.49E-02 1
5099 6.13E-04 3.22E-02 1
5133 1.06E-03 4.36E-02 1
5630 5.40E-04 3.00E-02 1
6189 1.13E-03 4.42E-02 1
6204 3.53E-04 2.45E-02 1
6213 8.40E-05 1.16E-02 1
6234 1.38E-03 4.88E-02 1
6242 1.56E-04 1.70E-02 1
6247 3.74E-05 6.40E-03 1
6600 2.06E-04 1.85E-02 1
7105 3.07E-07 1.56E-04 1
9193 3.78E-05 6.40E-03 1
9950 1.29E-03 4.72E-02 1
10002 8.39E-04 4.00E-02 1
10102 2.37E-04 1.90E-02 1
10107 2.67E-05 5.82E-03 1
10695 1.59E-05 4.04E-03 1
10977 6.90E-04 3.40E-02 1
12826 1.43E-03 4.96E-02 1
12850 2.21E-04 1.87E-02 1
13454 5.50E-04 3.00E-02 1
13457 5.23E-04 3.00E-02 1
13468 4.61E-04 2.82E-02 1
38
13674 2.79E-04 2.11E-02 1
13680 8.96E-04 4.06E-02 1
13717 1.30E-03 4.72E-02 2
9.04E-04 4.06E-02
13737 1.01E-04 0.015048768 1
13745 2.66E-04 0.024870004 1
14116 6.40E-04 0.043490495 1
14120 2.09E-06 0.003126749 1
14126 9.82E-06 0.003666654 1
14135 1.99E-04 0.024278437 1
14277 2.66E-04 0.024870004 2
2.38E-05 0.007116827
14363 2.39E-04 0.024870004 1
14413 6.88E-04 0.044671104 1
14555 7.86E-05 0.015048768 1
14562 5.89E-04 0.043490495 1
14567 5.97E-04 0.043490495 1
14579 8.22E-06 0.003666654 1
14680 5.11E-06 0.003666654 1
15498 1.82E-04 0.024278437 1
15915 8.75E-05 0.015048768 1
15918 5.45E-04 0.043490495 2
9.42E-05 0.015048768
15924 3.93E-04 0.034556808 1
15927 2.11E-04 0.024278437 1
15931 2.86E-05 0.007116827 1
16211 6.32E-04 0.043490495 1
16352 8.75E-23 6.56E-20 1
16476 4.88E-40 7.31E-37 1
16482 3.52E-06 1.76E-03 1
39
Haplotypes
Given the high genetic diversity, haplotypes were determined using 105
bp windows. We observed up to 26 haplotypes with a posterior probability
greater than or equal to 0.9 (fig. 2.6A). Increasing the stringency of the filtering
parameters did not produce any significant change in the number of haplotypes
observed. We did not observe any difference in the number of local haplotypes
between related and unrelated groups. There was a correlation between the
numbers of haplotypes observed and the genomic regions containing variants at
frequencies of 5% or greater (fig. 2.6A-B, rings 1-6). The greatest density of the
high frequency variants and the greatest number of predicted haplotypes were
found primarily in the non-coding or partially duplicated gene regions. The NADH
subunits and cytochrome b also had fewer haplotypes compared to non-coding
regions and contained moderate frequency variants (5.1-15.56%). We
determined that seven coding and non-coding regions contained (p<0.05) more
or less variants (5-50%) than expected by chance alone (see table 2.2).
Specifically intergenic regions (bases 1-33 and 2734-2969) and tRNA-PRO were
found to contain more variants than expected and l-rRNA, intragenic region
(5618-6121), cytochrome c oxidase subunit 1, and NADH subunit 5 duplicate
contained fewer variants than expected by chance.
40
A
41
B
Fig. 2.6. Haplotypes and variant frequencies across the mitochondrial genome.
(A) Ring plot of the number of haplotypes within 105 base pair windows across
the genome in each sample. Inner rings 1-3 represent unrelated samples 1-3.
Rings 4-6 represent related samples 4-6. The outermost black ring is the gene
annotation ring. (B) Ring plot illustrates the genomic location of 5-50% frequency
variants in each sample. Inner rings 1-3 represent unrelated samples 1-3. Rings
4-6 represent related samples 4-6. The outermost black ring is the gene
annotation ring.
42
2.4 Discussion
Sequencing of freshwater asexual planarians revealed that heteroplasmy
is abundant within and between planaria, presumably as a result of their
reproductive mode of transverse fission (multi-cellular reproduction). Using
related samples derived from different regions of a single founder planarian, we
determined the population parameters of the spatial distribution of heteroplasmy,
and a positive correlation was found between the founder age when the sample
was produced and the total number of variants found within the sample.
We examined the processes of regeneration, e.g. apoptosis and
proliferation (plus neoblasts migration), to understand how cell populations
change over regenerative time and space. As the tail is the first segment that is
used for re-growth during transverse fission (Newmark and Sánchez Alvarado,
2002) one could hypothesize that it would accumulate the greatest number of
variants, (fig. 2.8A). However this was not the case and the greatest number of
variants was observed in the middle sample. This is likely the result of
regeneration, as during this period, cellular lineages will undergo population
reduction, and rapid expansion and migration, which will produce regions of high
and low mutation accumulation.
As could be expected, the related samples in close proximity to one
another share more variants, and thus are more related to one another, than
those that are separated by a greater distance. Since neoblast migration from a
43
wound site is distance dependent, migrating neoblast lineages should vary in
abundance from posterior to anterior over regenerative time of a fission product.
This is reflected in our data and the number of unique variants increases in
number from the tail to the anterior and the greatest number of shared variants is
found between the middle and anterior sample. Thus sample 5 (Middle) appears
to be a ‘hybrid’ zone that is populated by local and flanking cell populations (T, M
and A cell lineages). Therefore, after morphallaxis and growth to adult size the
cellular heterogeneity will be unevenly distributed across the body (see fig. 2.8B).
Interestingly, across eukaryotes a constant genomic equilibrium level of
heterozygosity (π
s
) is observed and is directly influenced by the inverse
relationship between mutation rate (u) and N
e
(Lynch, 2010). This inverse
relationship also leads to a consistent invariance of mitochondrial π
s
(heteroplasmy) across populations of organisms (Lynch, 2010; Bazin et al. 2006;
Nabholz et al. 2008; Nabholz et al. 2009). We observed that heteroplasmy was
abundant within individual planarians’ cell populations and is spatially variable
across the body. Due to this spatial heterogeneity, N
e
is likely to be variable
across the planarian body as the number and density of cell lineages could vary.
However, in order to further validate this, additional work such as the
quantification of neoblast migration across the regenerating fission product,
estimates of fission product size reduction, and sequencing of planarian cell
lineages is required.
44
A
B
Fig. 2.7. (A) Illustrates localized mutation accumulation within the most posterior
region (Tail) of a planarian lineage over reproductive time. Although spurious
mutations occur across the fission fragment, this hypothesis assumes that
mutations are localized at the most posterior end of the fission product. (B) A
graphical representation of the effects of neoblast (N) (white, light grey, and dark
grey circles) proliferation/migration and apoptosis on cell lineage/mitochondrial
population dynamics over regenerative time. After a fission event, the fission
product experiences apoptosis (cell death) of differentiated cells and N
proliferation/migration activity. Apoptosis is localized near the fission site and
results in a gradient of cell death centralized at the anterior end, which decreases
towards the posterior end of the fission product (apoptotic heat map:
black=greatest apoptosis, white=absence of apoptosis). As a result, the distance
of N recognition of the fission site is reduced. Migration of N is directed towards
the fission site and is dependent upon the distance to the fission site. A greater
number of N from the posterior section (T) of the fission product will migrate into
the middle (M) section than the most anterior section (A). Likewise more M
derived N will migrate towards A given its close proximity to the fission site. Due
to the apoptotic events within A, A specific N proliferation repopulates this region.
Post- migration
Fission Site
Pre- migration
Fission Site
T
A
M
A
Fission Site
T
M
A
Fission Site Fission Site Fission Site
Regeneration
Regeneration
Fission Fission Fission
Fission Events Over Time
Post- migration
Post- apoptosis
Fission Site
Pre- migration
Pre- apoptosis
Fission Site
T
A
M
A
Fission Site
T
M
A
Migration Apoptosis
Migration &
Apoptosis
Fission Site Fission Site Fission Site
Regeneration
Regeneration
Fission Fission Fission
Fission Events Over Time
45
The result of these events is a mixing of mitochondrial populations across the
body of the fission product. Specifically, M represents a ‘hybrid’ zone of T, M, and
A. After regeneration to the adult size, populations will subsequently be mixed in
varying degrees across the body of the planarian and results in varying degrees
of relatedness between fission products.
It was observed that the number of non-synonymous (NS) variants
outnumbered the synonymous (S) variants (sample average 1,111:237). This is
in direct contrast to natural populations of Drosophila and C. elegans, in which
mitochondrial S mutations have been found to outnumber NS mutations, 36:10
and 35:3, respectively. This is believed to be the result of purifying selection
removing deleterious genotypes (Haag-Liautard C et al. 2008; Denver DR et al.
2000). Additionally, A:T>T:A predominated with changes favoring T (72.7%) and
then G (21.8%) nucleotides, suggesting a unique mutation-selection balance in
this system. A mutation-selection balance such as this is observed across taxa
and is often found to be taxon specific (Montooth and Rand, 2008). Our results
suggest that deleterious mutations have not been completely purged from the
population, a likely consequence of multi-cellular inheritance, as independent cell
lineages will accumulate mutations. Furthermore, given that stem cells have
been shown to actively transfer mitochondria to differentiated cells (Ahmad et al.
2014; Prockop, 2012; Islam et al. 2012), a question that needs to be addressed
is whether neoblasts or differentiated daughter cells of the same cell lineage
contain identical or different levels of mitochondrial heterogeneity. Determining if
variation at the nuclear level is coupled with mitochondrial genome variation
46
would contribute towards understanding the spatial distribution of neoblasts
lineages, whether asymmetric distribution of mitochondria exists, and how this
potential asymmetry pertains to mitochondrial-nuclear genomic conflicts.
The abundant heteroplasmy identified within a single planarian was
greater than that observed within some human population studies, where in ten
individuals upwards of 40 haplotypes were found (He et al. 2010). Interestingly,
the insightful study by Lázaro et al., found low haplotype and nucleotide diversity
within asexual populations of S. mediterranea by assaying partial mitochondrial
gene sequences of cytochrome oxidase I (COI) and cytochrome B (CYB) (2011).
Our results differ from theirs however and this discrepancy is likely because the
Lázaro study used Sanger sequencing to uncover genetic variation, and it has
been demonstrated that Sanger sequencing is less effective at detecting minor
variants than next generation sequencing (NGS) (Simmons et al. 2012).
Furthermore, in our dataset COI and CYB variant frequencies were between 1-
15% and were located in areas that were not assayed by Lázaro et al. (2011)
(fig. 2.6A-B). It was also found that COI contained significantly fewer variants
(frequency 5-50%) than expected which is suggestive of positive selection, which
is supported by the observation that highly expressed oxidative phosphorylation
genes are under strong positive selection (Sloan et al. 2014).
It is highly probable that the end result of multi-cellular reproduction
(fission) in this system is mitochondrial heteroplasmy, which is likely to result in
47
cellular conflicts. These conflicts can have a broad range of interactions, for
instance between the nuclear and mitochondrial genomes or between different
cell lineages. Thus understanding mutation accumulation, cell lineage evolution
and the effects of regeneration in S. mediterranea will provide insight into the
mechanisms of heteroplasmy maintenance and evolution in other systems. To
this end, future NGS of independent cell lineages as well as assaying the nuclear
genome heterogeneity in this system may help to elucidate the degree of
different evolutionary pressures on cellular trajectories and survival.
2.5 Materials, Methods, and Analysis
Materials and Methods
Sample Rearing
Individual planarians were maintained under standard conditions as
described in Newmark and Sánchez Alvarado (2000), six of which were selected
for Illumina sequencing (fig. 2.1). Samples 1-3 were derived from split tail
segments of different founders and are thus “unrelated”. Three additional
samples were derived from a single founder by sequential fission over the course
of a day. These are considered to be “related” samples and correspond to
sample 4 (Tail), sample 5 (Middle), and sample 6 (Anterior).
DNA Extraction, Library Construction, and Sequencing
48
DNA extraction and genomic library construction followed the protocol of
Dunham and Friesen 2013, with the following modifications: 1) DNA was
extracted using 80 µl of Tissue and Cell Lysis Solution (Epicentre) and 20 µl of
Proteinase K (Qiagen). A single planarian was added to this mixture, and using a
200 µl pipet tip, was aspirated in the mixture and incubated at 55°C for 15-30
min, then cooled on ice; 2) 200 µl of MPC Protein Precipitation Reagent
(Epicentre) was added to the cooled sample, mixed thoroughly and spun at
maximum speed for 15 min. The supernatant was removed and brought to 800 µl
with Zymo Genomic Lysis solution following the manufacturer’s protocol. Prior to
the genomic library construction, samples were quantified using QuBit
fluorometer (Life Technologies), and 300-500 ng of DNA was used for library
construction (Dunham and Friesen 2013) (Procedure A). Each sample was
sequenced on a single lane of an Illumina Hiseq 2000 at the University of
Southern California Epigenome Center, HSC campus, producing 105x105
paired-end sequences. After genome assembly, regions across the genome as
well as those represented by low coverage were targeted with primers (table
2.6).
Table 2.6: Primers spanning the mitochondrial genome
Primer Name Target Region Primer Sequence
Primer 1F 36-770 TCTATATAAGGCCGAATAAGTTGC
Primer 1R TATTTATTGGCTTAATTGGTTCATAC
Primer 2F 1961-2625 AAAACAATTCAAAAGGCATCTG
49
Primer 2R CCTGATAGTGCTGGCCGTA
Primer 3F 3046-3185 CGACAACGACGTACAATATGA
Primer 3R TTGTTCTTTCCTCTGTCTGTTTTG
Primer 4F 3921-4795 GACCAACCAGAACCCAACAC
Primer 4R TTGGTTTATGGTCCATTCCTG
Primer 5F 5181-5845 CAGAGTAACCTACTTAACGCAGTCC
Primer 5R CTTCTTATGCTTTATTGGTTTTGGT
Primer 6F 6476-6615 TTTAAAAATGGGCCGAAACC
Primer 6R TCTTCTTTATTTGGGGTTCCTTT
Primer 7F 7141-7315 CCCAAAGAATTCAACAAACGA
Primer 7R AGAGTGATGTTTGTAGCTATGTTGTT
Primer 8F 7771-8190 AGTCAATCTTGAACCACAACCA
Primer 8R TGGTTGATTTGATCTTCTAGTGTTT
Primer 9F 8331-8680 ACTTTACCTGGAAAGAGAAAACTTCC
Primer 9R CACAGTGATCCGTACCTGTGTTA
Primer 10F 9311-9415 TGGCTAACCACCTAAATACCTTT
Primer 10R TTGTTTAATTACAGCTCATGGTTT
Primer 11F 10816-10920 AATGACTTACGAAAGCCCAATC
Primer 11R CTTATACTTTTATGGTGCTTAATGTCA
Primer 12F 11446-11900 GAATCAAAGCTGAAAACAAAGACA
Primer 12R TTTGCTTTTATTGTTTAATAATTCGTC
Primer 13F 12496-13335 CAAACCACCCAACACAGTCA
Primer 13R TCGGTTGATTGTTTTTATCGTG
Primer 14F 13651-14105 AAAAAGAAACCAGAAACAAACTGA
Primer 14R GAAGGTGTTGATGGAAGTTACG
Primer 15F 15296-15435 TCAATAAACCCATTCAAACAATTC
50
Primer 15R TAGTACACACCGCCCGTCA
Primer 16F 15856-16345 TTTAGTCCTGATCAACACCAATTAT
Primer 16R TGGTGTTGTTGTTTTGTGCAG
Genome assembly, comparison, mapping, and variant calls
The default parameters of SSPACE basic v1.0 (Boetzer et al. 2010) were
used for the de novo assembly of the mitochondrial genomes. The S.
mediterranea genome was aligned to the assembled Dugesia japonica (Sakai
and Sakaizumi. 2012) and D. ryukyuensis (Sakai and Sakaizumi, 2012)
mitochondrial genomes using progressiveMaeve algorithm (Darling et al. 2010).
Sample reads were uniquely mapped to each assembled mitochondrial genome
using Burrows Wheeler aligner (BWA) version 0.6.2 (Li and Durbin 2009). Default
parameters were used with the following modifications: the mismatch value was
changed to 15. Samtools version 0.1.18 (Li et al. 2009) was used to sort the files
and for the sam to bam file conversion. Pairwise coverage plots were generated
and the coverage was found to be similar between all six samples (R
2
=0.91-0.99)
(fig. 2.8). Next we simulated the false positive and false negative rates, following
Goto et al. 2011, to determine the appropriate low frequency cutoffs (which was
1%). Variant calls were made using Varscan (version 2.3.5) (Koboldt et al. 2012)
with the filtering set at a base quality score of 30, a minimum coverage of 100,
and a minimum variant coverage of 10. Based on Goto et al. 2011, a minimum
51
variant frequency of 1% was established as a baseline and strand bias filters
followed Li et al. 2010.
Fig. 2.8. Pairwise coverage plots. The x and y-axis of the coverage plots represents
a different sample and its average depth of coverage in non-overlapping 100 base
pair windows.
0 2000 4000 6000 8000 10000 12000
0
2000
4000
6000
8000
10000
12000
Sample 1 Coverage
Sample 2 Coverage
R
2
=0.99
0 2000 4000 6000 8000 10000 12000
0
2000
4000
6000
8000
10000
12000
Sample 3 Coverage
Sample 1 Coverage
R
2
=0.98
0 2000 4000 6000 8000 10000 12000
0
2000
4000
6000
8000
10000
12000
Sample 3 Coverage
Sample 2 Coverage
R
2
=0.98
0 2000 4000 6000 8000 10000 12000
0
2000
4000
6000
8000
10000
12000
Sample 5 Coverage
Sample 4 Coverage
R
2
=0.91
0 2000 4000 6000 8000 10000 12000
0
2000
4000
6000
8000
10000
12000
Sample 6 Coverage
Sample 4 Coverage
R
2
=0.94
0 2000 4000 6000 8000 10000 12000
0
2000
4000
6000
8000
10000
12000
Sample 5 Coverage
Sample 6 Coverage
R
2
=0.98
52
The consensus sequence generated here was identical for all six samples
and has been submitted to GenBank and assigned accession number
BankIt1738860 Schmidtea KM115583.
Genome Annotation, Variant Effects and Haplotype Estimation
The web-based program, MITOS (Bernt et al. 2012), was used to annotate
the mitochondrial genome using genetic code 14 – Alternative Flatworm. SnpEff
(version 3_1) (Cingolani et al. 2012) was used to determine the SNP and indel
effects, the genome-wide synonymous and non-synonymous count, as well as
the transition/transversion rate. The haplotype estimation was performed using
the current version of ShoRAH (Zagordi et al. 2011) using the default parameters
and a window size of 105 bp across the entire genome, which corresponds to the
length of the sequencing read. Analysis filtering was performed using custom
scripts (provided upon request) and the haplotypes with a posterior probability
greater than or equal to 0.9 and coverage of at least 1 were retained.
Simulations, Linear Trends and Plots
We used the simulations scripts available in Goto et al. to estimate the
false positive and false negative rates given varying parameters of error rate and
depth of coverage (2000x) to determine the appropriate minor allele frequency
cutoff (2011). Correlation, Spearman r, and associated plots were generated
using GraphPad Prism version 6.0b for Mac OS X, GraphPad Software (La Jolla
California USA). R Studio (version 2.14.0) was used for variant frequency density
53
plots, transition/transversion frequency density plots, and FDR correction
(Benjamin and Hochberg, 1995) using p.adjust.
54
References
Ahlqvist KJ, et al. 2012. Somatic progenitor cell vulnerability to mitochondrial
DNA mutagenesis underlies progeroid phenotypes in Polg mutator mice.
Cell Metab. 15:100-109.
Ahmed T, Mukherjee S, Pattnaik B, Kumar M, Singh S, Kumar M, Rehman R,
Tiwari BK, Jha KA, Barhanpurkar AP, Wani MR, Roy SS, Mabalirajan U,
Ghosh B, Agrawal A. 2014. Miro1 regulates intercellular mitochondrial
transport & enhances mesenchymal stem cell rescue efficacy. The EMBO
Journal 33:994-1010.
Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA. 2011. BLAST Ring Image
Generator (BRIG): simple prokaryote genome comparisons. BMC
Genomics 12:402.
Bazin E, Glémin S, Galtier N. 2006. Population Size Does Not Influence
Mitochondrial Genetic Diversity in Animals. Science 312:570-572.
Benjamini Y and Hochberg Y. 1995. Controlling the false discovery rate: a
practical and powerful approach to multiple testing. Journal of the Royal
Statistical Society Series B 57:289-300.
Bensasson D, Zhang D, Hartl DL, Hewitt GM. 2001. Mitochondrial pseudogenes:
evolution’s misplaced witnesses. Trends in Ecology and Evolution 16:314-
321.
Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Pütz J,
Middendorf M, Stadler PF. 2012. MITOS: Improved de novo Metazoan
Mitochondrial Genome Annotation. Molecular Phylogenetics and Evolution
69:313-319.
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2010. Scaffolding pre-
assembled contigs using SSPACE. Bioinformatics 27:578-579.
Cingolani P, Platts A, Wang L, Coon M, Nguyen T, Wang L, Land S, Lu X, Ruden
D. 2012. A program for annotating and predicting the effects of single
nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila
melanogaster strain w
1118
; iso-2; iso-3. Fly 6:80-92.
Darling AE, Mau B, Perna NT. 2010. progressiveMauve: Multiple Genome
Alignment with Gene Gain, Loss, and Rearrangement. PLoS One
5:e11147.
55
Dunham JP and Friesen ML. 2013. A Cost-Effective Method for High-Throughput
Construction of Illumina Sequencing Libraries. Cold Spring Harb Protoc
9:820-34.
Dumollard R, Duchen M, Carroll J. 2007. The Role of Mitochondrial Function in
the Oocyte and Embryo. Curr Top Dev Biol. 77:21-49.
Goto H, Dickins B, Afgan E, Paul IM, Taylor J, Makova KD, Nekrutenko A. 2011.
Dynamics of mitochondrial heteroplasmy in families investigated via a
repeatable re-sequencing study. Genome Biol, 12:R59.
Haag-Liautard C, Coffey N, Houle D, Lynch M, Charlesworth B, Keightly PD.
2008. Direct Estimation of the Mitochondrial DNA Mutation Rate in
Drosophila melanogaster. PLoS Biology 6:e204.
He Y, Wu J, Dressman DC, Iacobuzio-Donahue C, Markowitz SD, Velculescu
VE, Diaz LA, Kinzler KW, Vogelstein B, Papadopoulos N. 2010.
Heteroplasmic mitochondrial DNA mutations in normal and tumour cells.
Nature 464:610-614.
Islam MN, Das SR, Emin MT, Wei M, Sun L, Westphalen K, Rowlands DJ,
Quadri SK, Bhattacharya S, Bhattacharya J. 2012. Mitochondrial transfer
from bone marrow-derived stromal cells to pulmonary alveoli protects
against acute lung injury. Nat. Med. 18:759-765.
Koboldt D, Zhang Q, Larson D, Shen D, McLellan M, Lin L, Miller C, Mardis E,
Ding L, Wilson R. 2012. VarScan 2: Somatic mutation and copy number
alteration discovery in cancer by exome sequencing. Genome Research
22:568-76.
Lázaro EM, Harrath AH, Stocchino GA, Pala M, Baguñà J, Riutort M. 2011.
Schmidtea mediterranea phylogeography: an old species surviving on a
few Mediterranean islands? BMC Evolutionary Biology 11:274.
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics 25:1754-1760.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis
G, Durbin R. 2009. 1000 Genome Project Data Processing Subgroup. The
Sequence alignment/map (SAM) format and SAMtools. Bioinformatics
25:2078-2079.
Li M, Schönberg A, Schaefer M, Schroeder R, Nasidze I, Stoneking M. 2010.
Detecting heteroplasmy from high-throughput sequencing of complete
human mitochondrial DNA genomes. Am. J. Hum. Genet. 87:237-249.
56
Lynch M. 2010. Evolution of mutation rate. Trends in Genetics 26:345-352.
Montooth KL, Rand DM. 2008. The Spectrum of Mitochondrial Mutation Differs
across Species. PLoS Biol 6:e213.
Morgan TH. 1901. Regeneration 316 (The Macmillan Co., New York).
Nabholz B, Mauffrey J, Bazin E, Galtier N, Glémin S. 2008. Determination of
Mitochondrial Genetic Diversity in Mammals. Genetics 178:351-361.
Nabholz B, Glémin S, Galtier N. 2009. The erratic mitochondrial clock: variations
of mutation rate, not population size, affect mtDNA diversity across birds
and mammals. BMC Evol. Biol. 10:54.
Newmark PA, Alvarado AS. 2000. Bromodeoxyuridine Specifically Labels the
Regenerative Stem Cells of Planarians. Developmental Biology 220:142-
153.
Newmark PA, Alvarado AS. 2002. Not your father's planarian: a classic model
enters the era of functional genomics. Nature Reviews Genetics 3:210-
219.
Parr RL, Maki J, Reguly B, Dakubo GD, Aguirre A, Wittock R, Robinson Kerry,
Jakupciak JP, Thaver RE. 2006. The pseudo-mitochondrial genome
influences mistakes in heteroplasmy interpretation. BMC Genomics 7:185.
Pellettieri J, Fitzgerald P, Watanabe S, Mancuso J, Green DR, Sánchez Alvarado
A. 2010. Cell Death and tissue remodeling in planarian regeneration. Dev
Biol. 338:76-85.
Prockop DJ. 2012. Mitochondria to the rescue. Nat. Med. 18:653-654.
Robb SMC, Ross E, Alvarado AS. 2007. SmedGD: the Schmidtea mediterranea
Genome Database. Nucleic Acids Research 36:D599-D606.
Sakai M and Sakaizumi M. 2012. The Complete Mitochondrial Genome of
Dugesia japonica (Playhelminthes; Order Tricladida). Zoological Science
29:672-680.
Saló E, Baguñà J. 1989. Regeneration and pattern formation in planarians II.
Local origin and role of cell movements in blastema formation.
Development 107:69-76.
Sloan DB, Triant DA, Wu M, Taylor DR. 2014. Cytonuclear Interactions and
Relaxed Acceleration Sequence Evolution in Organelle Ribosomes. Mol.
Bio. Evol. 31:673-682.
57
Sharpley MS, et al. 2012. Heteroplasmy of mouse mtDNA is genetically unstable
and results in altered behavior and cognition. Cell 151:333-343.
Simmons HE, Dunham JP, Stack JC, Dickins BJ, Pagán I, Holmes EC,
Stephenson AG. 2012. Deep sequencing reveals persistence of intra- and
inter-host genetic diversity in natural and greenhouse populations of
zucchini yellow mosaic virus. J. Gen. Virol. 93:1831-40.
58
CHAPTER THREE:
ANALYSIS OF VIRAL (ZUCCHINI YELLOW MOSAIC VIRUS)
GENETIC DIVERSITY DURING SYSTEMIC MOVEMENT
THROUGH A CUCURBITA PEPO VINE
(as seen in Dunham et al. 2014.Virus Research 191:172-9)
3.1 Abstract
Determining the extent and structure of intra-host genetic diversity and the
magnitude and impact of population bottlenecks is central to understanding the
mechanisms of viral evolution. To determine the nature of viral evolution
following systemic movement through a plant, we performed deep sequencing of
23 leaves that grew sequentially along a single Cucurbita pepo vine that was
infected with zucchini yellow mosaic virus (ZYMV), and on a leaf that grew on a
side branch. Strikingly, of 112 genetic (i.e. sub-consensus) variants observed in
the data set as a whole, only 22 were found in multiple leaves. Similarly, only
three of the 13 variants present in the inoculating population were found in the
subsequent leaves on the vine. Hence, it appears that systemic movement is
characterized by sequential population bottlenecks, although not sufficient to
reduce the population to a single virion as multiple variants were consistently
transmitted between leaves. In addition, the number of variants within a leaf
increases as a function of distance from the inoculated (source) leaf, suggesting
that the circulating sap may serve as a continual source of virus. Notably,
multiple mutational variants were observed in the cylindrical Inclusion (CI) protein
59
(known to be involved in both cell-to-cell and systemic movement of the virus)
that were present in multiple (19/24) leaf samples. These mutations resulted in a
conformational change, suggesting that they might confer a selective advantage
in systemic movement within the vine. Overall, these data reveal that bottlenecks
occur during systemic movement, that variants circulate in the phloem sap
throughout the infection process, and that important conformational changes in
CI protein may arise during individual infections.
3.2 Introduction
RNA viruses tend to harbor abundant genetic diversity through a
combination of highly error-prone replication, short generation times and large
population sizes (Duffy et al. 2008). This genetic diversity has been associated
with the ability of RNA viruses to switch hosts (Jerzak et al. 2008; Holmes, 2009)
overcome host resistance mechanisms (Feuer et al. 1999: Lech et al. 1996), and
alter virulence (Acosta-Leal et al. 2011). However, viral populations also
experience important bottlenecks within individual hosts and at inter-host
transmission, which likely reduce standing genetic diversity and hence impact the
patterns and dynamics of viral evolution. In the case of plant viruses, dramatic
population bottlenecks (down to 2-20 virions) have been reported during both
systemic (Fabre et al. 2012; French and Stenger, 2003; Li and Roossinck, 2004;
Sacristan et al. 2003) and cell-to-cell movement (Gonzalez-Jara et al. 2009;
Miyashita and Kishino, 2010; Tromas et al. 2014). Hence, although plant virus
60
populations are expected to be large (for example, 10
11
– 10
12
virions per infected
leaf have been reported in tobacco mosaic virus (TMV)) (Sacristan et al. 2003),
population bottlenecks will mean that effective population sizes will be
significantly smaller (Garcia-Arenal et al. 2001). This, in turn, will influence the
respective strengths of natural selection and genetic drift, with the latter expected
to dominate in populations with small effective sizes. Determining how genetic
diversity changes as the virus moves systemically through individual plants
therefore provides important insights into the strength of population bottlenecks
and hence the nature of viral evolution.
Unlike animal viruses in which receptor mediated mechanisms allow use
of the extracellular environment to move throughout the host, plant viruses are
restricted to the intra-cellular environment, and cell-to-cell movement is restricted
to the plasmodesmata (Maule and Wang, 1996). Systemic infection of the plant
necessitates movement of the viral population both cell-to-cell through the
plasmodesmata, as well as organ-to-organ through the phloem. For this process
to occur successfully the virus must enter the vascular tissue. This requires
movement from the mesophyll cells and through a series of cells: the
perivascular parenchyma, the phloem parenchyma, the companion cells, and
finally into the sieve tube elements (Niehl and Heinlein, 2011). Hence, movement
of the virus through the plant follows the same path as the photoassimilates
(Maule and Wang, 1996).
61
The Potyviridae is the largest family of plant viruses and comprises
roughly 30% of known plant viruses (Ward and Shukla, 1991). This family,
particularly the aphid-transmitted members, is among the most successful plant
pathogens (Rybicki and Pietersen, 1999). Potyviruses possess single-stranded
positive-sense RNA genomes (Berger, 2001), and harbour at least five proteins
known to be involved in viral movement; the coat protein (CP), the helper
component protein (HC-Pro), the cylindrical inclusion protein (CI), the P3N-PIPO
and the viral genome-linked protein (VPg). The CP is necessary for long distance
spread within the plant and may also facilitate cell-to-cell movement (Dolja et al.
1994; Dolja et al. 1995) via binding to the viral RNA and altering the exclusion
size limit of the plasmodesmata. This is thought to follow the infection front in a
transient fashion (Heinlein et al. 1995; Oparka et al. 1997). Mutation analysis with
wheat streak mosaic virus suggests that the C terminus of the CP may be
required for cell-to-cell movement (Tatineni et al. 2014). The HC-Pro is thought to
function in viral movement by increasing plasmodesmatal permeability (Rojas et
al. 1997). The cylindrical inclusion protein (CI) protein is also involved in cell-to-
cell movement (Shukla et al. 1991; Urcuqui-Inchima et al. 2001), and is believed
to guide the CP-RNA complex to the plasmodesmata (Roberts et al. 1998;
Rodriguez-Cerezo et al. 1997). Mutation analysis has revealed that mutations
affecting the N terminus region of the CI were defective in cell-to-cell movement
in both tobacco etch virus (TEV) (Carrington et al. 1998), and plum pox virus
(PPV) (Gomez de Cedron et al. 2006). The P3N-PIPO modulates the
62
plasmodesmatal localization of the CI, and the CI-P3N-PIPO complex is thought
to be responsible for the plasmodesmatal associated structures that assist cell-
to-cell movement (Wei et al., 2010). Although the role of the VPg is currently
unknown, it has been demonstrated that mutated VPgs in turnip mosaic virus
reduce both cell-to-cell and systemic movement (Dunoyer et al. 2004).
Zucchini yellow mosaic virus (ZYMV) is a member of the Potyviridae that
infects Cucurbitacae (squash, melon and cucumber) globally. ZYMV is
considered to be an emerging virus as it achieved a worldwide distribution within
two decades of its discovery (Debiez and Lecoq, 1997). The symptoms of ZYMV
include severe stunting of the plant and fruits as well as a distinctive yellow
mottling of the leave (Debiez and Lecoq, 1997). Fruits harvested from ZYMV
infected plants are often mottled and deformed and thus tend to be unmarketable,
and ZYMV can reduce agricultural yields up to 94% (Blua and Perring, 1994). In
the United States Cucurbitacae production is estimated to be 1.5 billion per
annum and, given that these are among the 15 most important agricultural crops
in the United States (Cantliffe et al. 2007), it is clear that ZYMV is a significant
crop pathogen. ZYMV is primarily a vector-borne pathogen and is non-
persistently transmitted by aphids. Experimentally, 26 aphid species have been
shown to transmit ZYMV (Katis et al. 2006), and we previously determined a
seed to seedling (vertical) transmission rate of 1.6% (Simmons et al. 2011).
63
To ascertain the extent and pattern of viral genetic diversity as it moves
systemically through the plant, and how this might be impacted by population
bottlenecks (as measured by changes in genetic diversity), we undertook deep
sequencing of 23 sequential leaves on a Cucurbita pepo ssp. texana vine as well
as an additional leaf that grew on a side branch. C. pepo is believed to be the
progenitor of domestic squash (Decker and Wilson, 1987), and the optimal host
for the maintenance of ZYMV (Gal-On, 2007).
3.3 Results
The average depth of sequencing coverage of ZYMV was 27,150X (range
14,671X – 42,764X) and the average amount of genome sequenced was 99.7%.
In total, we observed 112 variants (excluding the 5’ untranslated region), 90 of
which were found in a single sample only (table 3.1) and 22 of which were
shared between two or more samples (table 3.2). 109 of the variants were SNPs
Table 3.1 – Variants, Variant Effects, and Coverage Per Sample
muMNucl Nucleotide
position
Leaf
population
Syn/nonsyn Total Coverage
at Variant Site
Total Coverage
of Variant
142
G-A
5 NS 3660 46
178
G-A
15 NS 21754 403
244
A-T
6 NS 15015 238
261
G-A
15 S 27265 728
314
C-T
16 NS 32130 4864
327 17 S 34767 1552
64
T-C
418
A-G
15 NS 54664 1736
431
G-A
7 S 22952 446
563
T-A
SB NS 12727 234
595
C-T
6 NS 26689 832
638
C-T
21 NS 18181 334
649
A-G
11 NS 19251 621
710
A-G
7 NS 8633 428
816
A-G
22 NS 8978 90
846
G-T
SB NS 6546 234
1219
G-T
13 NS 19411 218
1266
C-T
SB S 5190 161
1404
G-A
6 S 12512 880
1500
G-A
6 S 208977 12085
1622
G-A
15 NS 65506 5611
1701
A-T
21 S 52495 3747
1855
G-A
2 NS 50200 676
1989
C-T
15 S 65428 8365
2058
G-A
7 S 16141 190
2112
C-T
8 S 17583 178
2148
G-A
22 S 51513 1223
2160
A-G
15 S 94684 8587
2414
A-G
17 NS 20264 422
2521
G-A
13 NS 32525 439
65
2563
A-G
23 NS 14906 884
2790
G-A
SB S 15822 1856
2963
T-C
7 NS 7084 125
3143
A-G
19 NS 16714 6472
3279
G-A
8 NS 94833 8676
3364
C-A
22 NS 20597 225
3695
C-T
23 NS 35934 674
3810
A-G
5 S 31218 336
3816
C-T
19 S 42207 694
3846
A-T
5 NS 26761 717
4186
T-C
15 NS 25423 442
4251
A-G
3 S 5854 141
4302
G-A
5 S 11410 300
4305
C-T
10 S 17498 407
4341
G-A
6 S 10356 121
4347
T-C
21 S 7671 338
4462
G-A
6 NS 12601 764
4578
T-C
19 S 14290 714
4632
T-C
9 S 7230 89
4713
T-C
SB S 7317 2739
4730
T-A
6 NS 6052 62
4855
T-G
7 NS 4403 48
4878
T-C
18 S 8577 125
4917 3 S 3513 47
66
T-C
5047
C-A
9 NS 2200 130
5087
A-T
4 NS 6426 68
5172
G-A
22 S 11160 941
5193
A-G
SB S 9333 252
5371
G-A
3 NS 29473 479
5397
C-T
12 S 38703 474
5628
A-T
10 NS 49237 5150
5641
T-G
10 NS 44656 3427
5656
T-C
8 NS 34421 6260
5959
C-T
5 NS 11817 140
6033
G-A
SB S 15834 509
6084
G-A
7 NS 9547 389
6156
T-C
23 S 5249 100
6351
G-A
17 S 16850 177
6363
T-G
1 S 5638 62
6402
C-T
20 S 22907 238
6882
T-C
6 S 5764 73
6962
A-G
17 NS 11794 277
6977
G-A
22 NS 6526 202
7116
A-G
3 S 32593 633
7136
G-A
18 NS 32760 973
7161
A-C
18 S 35060 537
7218
C-T
18 S 51809 1503
67
7347
C-T
SB S 68737 6873
7771
C-T
SB S 53641 1443
7860
G-A
22 S 14305 277
8319
G-A
15 S 20041 1399
8364
T-C
21 S 13865 194
8365
G-A
SB NS 6614 90
8375
A-G
23 NS 7345 903
8415
C-T
10 S 44524 992
8478
A-G
20 S 33620 855
8508
C-T
12 S 19120 291
8557
A-G
6 NS 9465 233
8658
T-C
12 S 19957 442
8665
A-G
20 NS 34817 2243
8899
G-A
23 NS 6451 3557
Table 3.1. Nucleotide positions of the variants observed in one sample only from
all 24 viral populations. The first column designates the nucleotide position at
which the variant occurred, as well as the nucleotide change. The second column
designates in which leaf the variants were found. The third column indicates
whether the mutation was S: synonymous, or NS: non-synonymous. The
penultimate column indicates the read depth at the variant site, and the last
column gives the number of reads for the minor variant.
68
Fig. 3.1: Plot illustrating mutation position and type. The NS (circle) represents
non-synonymous mutations, the S (square) synonymous mutations, and the FS
(triangle) frame-shift mutations. The x-axis is the variant position and the y-axis is
the number of samples that contained the each mutation, and below the x-axis is
a schematic of a potyvirus genome.
0 2000 4000 6000 8000
0
10
20
30
Genome Position
Number of Samples
NS
S
FS
P3N-PIPO PIPO
3079 3258
P1 HC-Pro P3
6K1
CI
6K2
NIa- NIb- Coat
5' UTR 3' UTR
69
Fig. 3.2: A plot of the number of variants found in each sample. Variants in each
column are organized into blocks that correspond to specific group sizes (number
of samples that share the variants within each group). The actual number of leaf
samples that share a variant is denoted by the color of the block. The y-axis
denotes the total number of variants found in each sample and the samples are
denoted in the x-axis.
and three were frameshifts (fig. 3.1). No stop codon mutations were observed. Of
the 109 SNPs, 54 were synonymous and 56 non-synonymous. Four of the
shared variants approached fixation (i.e. were at a frequency greater than 99%),
at sites 3954, 3969, 7477 and 7688, and one other shared variant occurred at a
frequency greater than 50% in some of the samples (position 1071 in leaves 17,
21 and 23). All other shared mutations were minor variants (i.e. occurred in less
Leaf 1
Leaf 2
Leaf 3
Leaf 4
Leaf 5
Leaf 6
Leaf 7
Leaf 8
Leaf 9
Leaf 10
Leaf 11
Leaf 12
Leaf 13
Leaf 14
Leaf 15
Side Branch
Leaf 16
Leaf 17
Leaf 18
Leaf 19
Leaf 20
leaf 21
Leaf 22
Leaf 23
0
10
20
30
Sample
Total Variants
Total Variants Per Sample
1
2
3
4
9
10
11
14
19
22
24
70
than 50% of the reads) (fig. 3.2). The variant that occurred at position 1071, in
combination with a variant that occurred in one sample only (position 8899 in
Leaf 17), resulted in three different consensus sequences. One of the variants
that approached fixation was synonymous (3969), while the other three variants
were non-synonymous 3954 (Q->K), 7477 (Y->H) and 7688 (R->K)). Interestingly,
the last two mutations were found in the NIb gene that encodes the viral
polymerase. Also of note was that all the frameshift mutations were shared
among multiple samples, albeit at low frequency (~1-2%; table 3.2). The
exception was the frameshift at site 5633, which attained a frequency of 16.64%
in sample 2, 9.3% in sample 10, and 14.34% in sample 12.
Table 3.2 – Shared Variants and Effects
Nucleotide Leaf populations NS/S/FS Average Variant
Coverage (Min, Max)
541
G-T
19, 20, 22 & 23 NS 1702 (494,5333)
896
G-A
7, 15 & SB NS 555 (291,900)
1071
G-A
10, 15, 17, 18, 19, 20, 21, 22 & 23 S 6359 (1503, 21660)
1086
T-C
10 & 15 S 1551 (561,2541)
1890
C-T
18 & 20 S 15600 (1151, 30048)
2737
T-C
15 & 22 NS 1508 (335, 2681)
2891
FS
In all 24 samples FS 233 (52, 552)
3750
T-A
2, 3, 12, 13, 14, 15, 16, 17, 18, 19
& 20
NS 1102 (536, 1638)
3751
C-A
2, 3, 4, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21 & SB
S 1229 (534, 2374)
3754
C-A
2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23 & SB
NS 1322 (465, 2711)
3758 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, NS 1391 (472, 2750)
71
T-A 15, 16, 17, 18, 19, 20, 21, 22, 23 &
SB
3954
T-C
In all 24 samples S 17352 (6269, 37440)
3969
C-T
In all 24 samples S 18371 (6607, 38430)
5631
FS
21 & SB FS 624 (573, 675)
5633
FS
2, 10, 12 FS 5828 (4310, 7345)
5635
G-T
2, 10, 11, 12, 14, 15, 17, 18, 19 &
20
NS 7527 (4612, 10638)
5733
G-A
9 & 17 NS 1212 (112, 2312)
6543
T-C
18 & 20 S 3191 (357, 6025)
7477
T-C
In all 24 samples NS 48161 (29807,
82198)
7688
G-A
In all 24 samples NS 28038 (10963,
69449)
9426
T-A
16 & 22 NS 443 (344, 541)
Table 3.2. Nucleotide positions of the variants observed that were shared among
leaves. The first column designates the nucleotide position at which the variant
occurred, as well as the nucleotide change. The second column designates in
which leaf samples the variants were found. The third column indicates whether
the mutation was S: synonymous, NS: non-synonymous, or FS: a frameshift. The
final column indicates the average read depth for that position as well as the
minimum and maximum read depths across all samples.
There was an average of 9.2 mutations per sample (range 2-18). Only one
minor variant (2891) was found in all 24 samples. The starting inoculant
population had a total of 13 variants, only three of which were present in the leaf-
to-leaf samples: that at site 2891 and two that approached fixation (3954 and
7688). Interestingly, a significant positive correlation was found between the
increase in the number of variants and time (r = 0.442; p = 0.031), and a
significant linear trend was identified between the average variant frequency and
72
Fig. 3.3: The concentration of virus and number of variants found in each sample.
The y-axes denote the concentration of virus per leaf sample as determined via
qPCR (left hand side) and the number of variants per sample (right hand side),
with the leaf samples given on the x-axis.
time (p = 0.021). In addition as the number of variants increased over time we
observed an overall increase in viral concentration (determined via qPCR) (fig.
3.3).
Four variants were in close proximity to each other in the CI protein:
positions 3750, 3751, 3754 and 3758. Of these, position 3751, was a
synonymous change (C->A), while the other three resulted in amino acid
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SB 16 17 18 19 20 21 22 23
0
1000
2000
3000
4000
5000
0
5
10
15
20
Sample
ng/ul per sample
Variants/Sample
Variants
Concentration
73
changes: position 3750 (N->K), 3754 (Q->K) and 3758 (L->Q). Eleven of the 24
leaf samples sequenced in this study exhibited all four changes, and an
additional four exhibited three of the four variants. To determine if these variants
might have resulted in a conformational change, we used two protein prediction
platforms, Phyre 2 (Kelley and Sternberg, 2009) and I-TASSER (Roy et al. 2010;
Zhang, 2008). Interestingly, two of the three changes (positions 3750 and 3754)
resulted in the same conformational change (fig. 3.4A-B), in which an alpha helix
contracted inwards towards the protein in the mutated form relative to the wild-
type conformation. Nineteen of 24 samples had either one or both of these
mutations that resulted in the conformational change.
A B
Fig. 3.4: The Cylindrical Inclusion (CI) protein as predicted by Phyre2 (A) is the
wild type conformation and (B) depicts the three amino acid substitutions that
resulted in the conformational change. The yellow arrows highlight the alpha
helix that altered as a result of the three mutations.
The spatial distribution of mutations did not differ significantly from that
expected by chance alone in 10 of the 11 genes. The exception was 6K2 (a
ZYMV protein that is believed to anchor the replication apparatus to ER-like
74
membranes (Urcuqui-Inchima et al. 2001)), which harbored more mutations than
expected by chance alone (p=0.05). To determine whether variants were
spatially clustered within the genome, we determined that the average variant
distance was 604.1 base pairs. We defined a cluster as a genomic region
containing at least three variants, the variant distance was less then 50 bp, and
the variants must be shared by at least five samples. A single region met these
criteria – bases 3750-3758 of the CI gene (fig. 3.5). Two other regions, which did
not meet all of our clustering criteria (i.e. that the region contain three or more
variants and that at least five samples share these variants), were identified
(regions 1071-1086 and 5631-5641) and these regions fell in the HC-Pro and
6K2, respectively.
Fig. 3.5: A rainfall plot was generated to illustrate mutation clustering. Each dot
corresponds to an individual mutation, with the x-axis reflecting the genome
position of each mutation, and the y-axis reflecting the distance between each
mutation and its preceding mutation. The grey dotted line is the average distance
between variants across all samples, with the cluster in the CI gene highlighted in
the gray shaded box.
0 2000 4000 6000 8000
0.1
1
10
100
1000
10000
Genome Position
Base pair distance between variants (log)
75
3.4 Discussion
Given that the bulk (80%) of the intra-plant mutational variants that we
found were in one sample only, it is clear that the majority of mutants were not
maintained in the viral population as it moved systemically through the plant.
Whether this is due to selection (i.e. removing deleterious mutations) or genetic
drift, including the action of population bottlenecks, or a combination of both is
unclear. It is possible that the majority of the variants were deleterious and as
such are expected to be purged from the population. Indeed, previous studies
have shown that approximately 70% of the mutations found in RNA viruses are
deleterious or slightly so (Carrasco et al. 2007; Domingo-Calap et al. 2009;
Sanjuan et al. 2004). However, that many of the unique (one sample) variants
were synonymous sites suggests that their loss is more likely associated with
neutral evolutionary processes such as population bottlenecks. In this context it
is notable that only 13 variants in the inoculant were found in the subsequent
leaves, and 22 (of the total of 112) variants were shared amongst the samples.
Hence, these data are compatible with the action of sequential population
bottlenecks during systemic movement, although not so severe as to reduce the
virus population to only a single virion as mutations were consistently shared
among samples. Indeed, the presence of likely deleterious frameshift mutations
in multiple samples, and at frequencies >10%, strongly suggests that
complementation has occurred, which obviously requires the co-transmission of
multiple viral genomes (Aaskov et al. 2006).
76
There have been variable estimates of the size of the population
bottleneck following intra-host movement within plants, covering one to hundreds
of virions, and it is likely that the extent and influence of population bottlenecks is
not uniform across viral-host combinations. Indeed, previous work has shown
that the magnitude of the genetic bottleneck is dependent on the size of the
inoculant dose (Zwart et al. 2011). In addition, work with PPV in Prunus suggests
that although the viral population within a host may harbour extensive genetic
diversity, this diversity will be differentiated into sub-populations that reflect the
physical structure of the tree (Jridi et al. 2006), and which will in turn influence
the impact of any population bottleneck. The existence of relatively wide
population bottlenecks, as noted here, has been reported in the case of some
other plant viruses, (Fabre et al. 2014; Gutierrez et al. 2012) and may be of
sufficient size to allow natural selection to proceed efficiently (Bergstrom et al.
1999).
One of the most notable observations of our study was that the number of
variants increased as a function of distance from the source inoculated leaf. This
is in accord with previous studies with cauliflower mosaic virus (CaMV), at least
in the early stages of infection (Gutierrez et al. 2012). However, contradictory
data has also been reported. For instance, an examination of the movement of
12 experimental cucumber mosaic virus mutants in tobacco found that the
number of mutants in successive leaves decreased as a function of distance
from the source (an average of seven mutants were found in the eighth leaf and
77
an average of five in the 15
th
leaf (Li and Roossinck, 2004). Furthermore, the side
branch leaf grew in several weeks after the nearest leaves (between leaves 9
and 10) had been harvested, and it appears to be more closely related to leaves
further along the vine. This is supported by the minimum spanning tree in which
the side branch clusters with leaves 21, 22, and 23 (fig. 3.6). Hence, these data
suggest that the viral population in the phloem sap may serve as a constant
source of genetic diversity as has been found in CaMV (Gutierrez et al., 2012).
Of particular interest was the observation that four mutations in the CI
protein found in 11 of the 24 samples resulted in a conformational change in the
predicted protein structure. The CI is known to be involved in viral movement in
planta, and two mutations in the N-terminus region of the TEV CI resulted in
defects in cell-to-cell movement, although still able to replicate at wild type levels
(Carrington et al. 1998). Similar results were found with PPV mutants located in
the first 125 amino acids of the CI (Gomez de Cedron et al. 2006). Interestingly,
two of the three changes resulted in the same predicted conformational change
(positions 3750 and 3754) such that 19 out of the 24 leaf samples contain virions
that contained these conformational changes (fig. 3.4B). Although tentative,
these results suggest that there may be a selective advantage to these variants
in systemic movement. Functional work is needed to determine the effects of
these variants in systemic movement within the host and whether they are able
to persist between hosts.
78
Fig. 3.6: Minimum spanning trees depicting the population structure of ZYMV in
each leaf. Each oval represents the viral population from each leaf. Each dot on
the lines linking the ovals represents one mutation.
We had previously conducted a transmission experiment in which we
inoculated a C. pepo plant and completed a series of eight mechanical
inoculations between plants, after which we deep-sequenced these populations
to determine the extent and pattern of viral genetic diversity at the inter-host
79
scale (Simmons et al. 2012). As we used the same inoculant population for both
experiments, the fifth leaf of the first plant of the serial inoculation corresponds to
the leaf four population sequenced in this study (the first true leaf was not
removed from the plant but rather was retained as a viral source). In the previous
experiment there were 16 variants in the 5
th
true leaf and in the current
experiment there were nine variants in the 5
th
true leaf and only one in common
between the two (7688; one of the two variants that approached fixation in this
study and the predominant variant in the inoculum). This suggests that either
there is a substantial stochastic element to the viral population as it moves along
the vine, or that there is structuring of the viral population within the same leaf, or
a combination of both. Either way, it is possible that the portion of the leaf used to
infect the plant in the previous experiment may have contained a vastly different
population than the leaf portion that was used to inoculate in this experiment.
However additional work is needed to ascertain the spatial structuring of the viral
population within the leaf.
Evidently, the extent of the population bottleneck imposed on the viral
population as it moves through the plant is not straightforward to determine, and
is influenced by a variety of factors, including the size of the inoculum dose, the
population structure of the virus, the timing of infection, as well as the host
genotype/viral strain specific interactions. However, what is clear is that despite
the marked population structure within leaves, the population bottlenecks
imposed on the virus as it moves systemically through the plant are not
80
particularly severe. This is shown by the multiple variants shared between leaf
samples, including those likely producing conformational changes in the CI
protein, as well as the frameshift mutations. Moreover, because the number of
variants increased as a function of distance from the inoculum source, it is likely
that the sap serves as a continual source of circulating virus.
3.5 Materials, Methods, and Analysis
Materials and Methods
Greenhouse experiment
The first true leaf of a C. pepo plant was mechanically inoculated in a
greenhouse at The Pennsylvania State University in April 2011 with a ZYMV
sample taken from the first diseased individual from an experimental field during
the 2007 season. The inoculum used here was the same inoculum source as that
used in Simmons et al. 2012, and was prepared from infected leaf tissue diluted
in a phosphate buffer (0.1 M Na2H/KH2PO4 buffer) in a 1:3 v/v ratio.
Carborundum powder (500gm) was then rubbed on the surface of the first true
leaf, and the inoculum subsequently applied to the leaf surface with a pestle.
Leaves along one vine were collected over a two-month period (fig. 3.7). It
appears that removing the inoculated leaf, particularly early in infection, can
effect systemic infection (unpublished data in (Zwart et al. 2012)). Accordingly,
the first true leaf (i.e. the inoculated leaf) was not removed but was retained to
81
serve as a “source” of viral material. Hence, all the leaves collected are in
reference to the inoculated leaf, such that leaf one is the second true leaf.
Fig. 3.7: Schematic of the plant depicting the leaves that were harvested from a
growing vine over time. A total of 23 leaves were collected sequentially, while the
24
th
leaf (side branch) grew in between where leaves nine and ten had grown
prior to harvesting. The side branch was harvested 21 days after leaf nine and 17
days after leaf 10 had been harvested.
A total of 24 leaves were collected. Of these, 1-23 were collected in
sequential order as each leaf attained full size. The exception was the side
branch leaf, which grew at the same time as leaf 16, but was physically located
between where leaf nine and ten had grown with the effect that this leaf was
harvested several weeks after its nearest neighbors were collected (21 days after
leaf nine and 17 days after leaf 10).
RNA isolation, RT-PCR,qPCR and sequencing of seedlings
Fifty mg of leaf tissue was used for analysis. Frozen leaf samples were
used with the E.Z.N.A.® RNA isolation kits (Omega bio-tek, GA) for the isolation
of total RNA. First-strand synthesis of cDNA was generated from the total RNA
Inoculated Leaf
Leaf 1
Side Branch
location
82
using genome-specific primers. The RT-PCR was conducted using Superscript III
First-Strand Synthesis kit (Life Technologies, CA) following the manufacturers
protocol. The single stranded cDNA was then used as template for PCR
amplification using Phusion High-Fidelity polymerase (New England Biolabs,
MA). Manufacturers protocols were followed using HF PCR buffer and 15 µl of
first-strand product was used in a 50 µl total reaction. The PCR conditions were:
98°C for 1 min, 98°C for 10 s, 58°C for 20 s, 72°C for 1 min 20 s, for a total of 20
cycles with a final 5 min 72°C extension and held at 4°C. The five primers pairs
were synthesized (Integrated DNA Technologies, IA) with 787, 886, 759, and 842
bp overlap between amplicons across the genome. Primers: ZYMV_1/5 F1: (nt
27-50 of the reference strain NC_003224.1)
AGAAATCAACGAACAAGCAGACGA, ZYMV_1/5 R1: (nt 2199-2219)
GCAACATCCATCAAC GAAGGC, ZYMV_2/5 F1: (nt 1432-1453)
CAGAACCATACAA GCAC TCACA, ZYMV_2.5 R1: (nt 4072-4096)
GAAAAGCAAA TCCACTCGTCATC, ZYMV_3/5 F1: (nt 3210-3233)
CGTGTGTTTG GTATTCTCCTTGGT, ZYMV_3/5 R1: (nt 5824-5847)
TCCTTTTGT GCTGTTGTTTCCTTT, ZYMV_4/5 F1: (nt 5088-5108)
TGAGAGCACACGCATA CCTTT, ZYMV_4/5 R1: (nt 7803-7823) CGACCCA
CCAATCCTCCATA, ZYMV_5/5 F1: (nt 6981-7001) TGAAACACAGAGCAAGC
GAGA, ZYMV_5/5 R5: (nt 9516-9534) CCGAC AGGACTACGG CATT. The
lengths of each primer product were 2193, 2665, 2638, 2730, and 2553 bp
respectively. The primer products were pooled per sample and gel extracted
83
using Zymoclean Gel Recovery kit (Zymo Research, CA) for removal of non-
specific amplification product. The library construction method used was as
outlined in (Dunham and Friesen, 2013).
The primers used for qPCR used were those in Simmons et al. (2013),
and the reference strain (GenBank accession no. NC_003224.1) was used to
design them using Primer Express® Software version 3.0 (Applied Biosystems)
and fell within the CI protein; forward primer: 5’-
GGACAGTGCGACTATAGCTTCAA-3’ and reverse primer: 5’-
TTTAACCGCGAATTGCGTATC-3’ (2013). The mix for the PCR reaction
contained the following: 10.0 µl SYBER green, 0.2µl of each primer, 7.6µl of PCR
grade water and 2.0µl of template for a total reaction volume of 20µl. The PCR
was carried out in triplicate in an Applied Biosystems StepOnePlus™ Real-Time
PCR-System and the cycling conditions were as follows: holding temperature of
95
o
C for 5 minutes, followed by 40 cycles of 95
o
C for 15 seconds and 60
o
C for 1
minute. The standard curve was produced by creating a dilution series of cDNA
stock of ZYMV.
Analysis
Alignment of raw reads and variant calling
Alignments were performed using the Burrows Wheeler aligner (BWA)
version 0.6.2 allowing 10 mismatches (Li and Durbin, 2009). BAM to SAM file
conversion and filtering was performed with Samtools version 0.1.18 (Li et al.
84
2009). The inoculum sequence from Simmons et al. (2012) was used as a
reference sequence for read alignment and variant calls. Varscan (version 2.3.2)
Koboldt et al. 2012) was used to call the minor mutational variants. To
conservatively eliminate false positive intra-host mutations we only retained
variants that occurred at greater than 100X coverage (although all the variant
sites observed here had a coverage greater than 1000X; Table S1), had a
frequency of 1% or greater, and had a quality score of 30 or greater. In addition,
we simulated the false positive and false negative rates following the protocol of
Goto et al. (2011), to determine the appropriate low frequency cut-off, which was
1%. The strand filter was also applied to eliminate any strand bias.
Conformational protein changes were determined using Phyre 2 (Kelley and
Sternberg, 2009) and confirmed using I-TASSER (Roy et al. 2010; Zhang, 2008).
All the variant consensus nucleotide sequences generated here (three in
total) have been submitted to GenBank and assigned accession numbers
KJ923767-69.
Statistical Analysis:
Correlation analysis and a one-way ANOVA followed by post testing for a linear
trend was performed on all 24 samples using GraphPad Prism version 6.0b for
Mac OS X, GraphPad Software (La Jolla California USA). Additionally, figures
depicting the number of mutations across the genome as well as the rain plot
were generated using GraphPad Prism. All ZYMV sequences were manually
85
aligned using Se-Al (2.0a11; kindly provided by Andrew Rambaut, University of
Edinburgh) and a minimum spanning tree among them was estimated using the
statistical parsimony approach available in the TCS 1.21 program (Clement et al.
2000).
86
References
Aaskov, J., Buzacott, K., Thu, H.M., Lowry, K., Holmes, E.C., 2006. Long-term
transmission of defective RNA viruses in humans and Aedes mosquitoes.
Science 311, 236-238.
Acosta-Leal, R., Duffy, S., Xiong, Z., Hammond, R.W., Elena, S.F., 2011.
Advances in plant virus evolution: translating evolutionary insights into
better disease management. Phytopathol 101, 1136-1148.
Berger, P.H., 2001. Potyviridae, eLS. John Wiley & Sons, Ltd.
Bergstrom, C.T., McElhany, P., Real, L.A., 1999. Transmission bottlenecks as
determinants of virulence in rapidly evolving pathogens. Proc Natl Acad
Sci U S A 96(9), 5095-5100.
Blua, M.J., Perring, T.M., 1989. Effect of Zucchini Yellow Mosaic-Virus on
Development and Yield of Cantaloupe (Cucumis-Melo). Plant Dis 73(4),
317-320.
Cantliffe, D.J., Shaw, N.L., Stoffella, P.J., 2007. Current trends in cucurbit
production in the US. Proceedings of the IIIrd International Symposium on
Cucurbits(731), 473-478.
Carrasco, P., de la Iglesia, F., Elena, S.F., 2007. Distribution of fitness and
virulence effects caused by single-nucleotide substitutions in tobacco etch
virus. J Virol 81(23), 12979-12984.
Carrington, J.C., Jensen, P.E., Schaad, M.C., 1998. Genetic evidence for an
essential role for potyvirus CI protein in cell-to-cell movement. The Plant J
14(4), 393-400.
Clement, M., Posada, D., Crandall, K.A., 2000. TCS: a computer program to
estimate gene genealogies. Mol Ecol 9(10), 1657-1659.
Decker, D.S., Wilson, H.D., 1987. Allozyme variation in Cucurbita pepo complex:
C. pep ovar. overifera vs. C. texana. Syst Bot 12, 263-273.
Desbiez, C., Lecoq, H., 1997. Zucchini yellow mosaic virus. Plant Pathol 46(6),
809-829.
Dolja, V.V., Haldeman, R., Robertson, N.L., Dougherty, W.G., Carrington, J.C.,
1994. Distinct Functions of Capsid Protein in Assembly and Movement of
Tobacco Etch Potyvirus in Plants. Embo J 13(6), 1482-1491.
87
Dolja, V.V., Haldeman-Cahill, R., Montgomery, A.E., Vandenbosch, K.A.,
Carrington, J.C., 1995. Capsid protein determinants involved in cell-to-cell
and long distance movement of tobacco etch potyvirus. Virology 206(2),
1007-1016.
Domingo-Calap, P., Cuevas, J.M., Sanjuan, R., 2009. The Fitness Effects of
Random Mutations in Single-Stranded DNA and RNA Bacteriophages.
Plos Genet 5(11).
Duffy, S., Shackelton, L.A., Holmes, E.C., 2008. Rates of evolutionary change in
viruses: patterns and determinants. Nat Rev Genet 9(4), 267-276.
Dunham, J.P., Friesen, M.L., 2013. A cost-effective method for high-throughput
construction of illumina sequencing libraries. Cold Spring Harbor protocols
2013(9), 820-834.
Dunoyer, P., Thomas, C., Harrison, S., Revers, F., Maule, A., 2004. A cysteine-
rich plant protein potentiates Potyvirus movement through an interaction
with the virus genome-linked protein VPg. J Virol 78(5), 2301-2309.
Fabre, F., Montarry, J., Coville, J., Senoussi, R., Simon, V., Moury, B., 2012.
Modelling the Evolutionary Dynamics of Viruses within Their Hosts: A
Case Study Using High-Throughput Sequencing. PLoS Pathog 8(4).
Fabre, F., Moury, B., Johansen, E.I., Simon, V., Jacquemond, M., Senoussi, R.,
2014. Narrow Bottlenecks Affect Pea Seedborne Mosaic Virus
Populations during Vertical Seed Transmission but not during Leaf
Colonization. PLoS Pathog 10(1), e1003833.
Feuer, R., Boone, J.D., Netski, D., Morzunov, S.P., St Jeor, S.C., 1999.
Temporal and spatial analysis of Sin Nombre virus quasispecies in
naturally infected rodents. J Virol 73(11), 9544-9554.
French, R., Stenger, D.C., 2003. Evolution of wheat streak mosaic virus:
Dynamics of population growth within plants may explain limited variation.
Annual Review of Phytopathol 41, 199-214.
Gal-On, A., 2007. Zucchini yellow mosaic virus: insect transmission and
pathogenicity - the tails of two proteins. Mol Plant Pathol 8(2), 139-150.
Garcia-Arenal, F., Fraile, A., Malpica, J.M., 2001. Variability and genetic structure
of plant virus populations. Annual Review of Phytopathol 39, 157-186.
Gómez de Cedrón, M., Osaba, L., López, L., García, J.A., 2006. Genetic analysis
of the function of the plum pox virus CI RNA helicase in virus movement.
Virus Res 116(1–2), 136-145.
88
Gonzalez-Jara, P., Fraile, A., Canto, T., Garcia-Arenal, F., 2009. The multiplicity
of infection of a plant virus varies during colonization of its eukaryotic host.
J Virol 83(15), 7487-7494.
Gutierrez, S., Yvon, M., Pirolles, E., Garzo, E., Fereres, A., Michalakis, Y., Blanc,
S., 2012. Circulating Virus Load Determines the Size of Bottlenecks in
Viral Populations Progressing within a Host. PLoS Pathog 8(11),
e1003009.
Holmes EC. (2009). The evolutionary genetics of emerging viruses.
Annu.Rev.Ecol.Evol.Syst. 40, 353-372.
Heinlein, M., Epel, B.L., Padgett, H.S., Beachy, R.N., 1995. Interaction of
tobamovirus movement proteins with the plant cytoskeleton. Science
270(5244), 1983-1985.
Jerzak, G.V., Brown, I., Shi, P.Y., Kramer, L.D., Ebel, G.D., 2008. Genetic
diversity and purifying selection in West Nile virus populations are
maintained during host switching. Virology 374(2), 256-260.
Jridi, C., Martin, J.F., Marie-Jeanne, V., Labonne, G., Blanc, S., 2006. Distinct
viral Populations differentiate and evolve independently in a single
perennial host plant. J Virol 80(5), 2349-2357.
Katis, N.I., Tsitsipis, J.A., Lykouressis, D.P., Papapanayotou, A.,
Margaritopoulos, J.T., Kokinis, G.M., Perdikis, D.C., Manoussopoulos,
I.N., 2006. Transmission of Zucchini yellow mosaic virus by colonizing and
non-colonizing aphids in Greece and new aphid species vectors of the
virus. Journal of Phytopathol 154(5), 293-302.
Kelley, L.A., Sternberg, M.J., 2009. Protein structure prediction on the Web: a
case study using the Phyre server. Nat protoc 4(3), 363-371.
Koboldt, D., Zhang, Q., Larson, D., Shen, D., McLellan, M., Lin, L., Miller, C.,
Mardis, E., Ding, L., Wilson, R., 2012. VarScan 2: Somatic mutation and
copy number alteration discovery in cancer by exome sequencing. .
Genome Res DOI: 10.1101/gr.129684.111
Lech, W.J., Wang, G., Yang, Y.L., Chee, Y., Dorman, K., McCrae, D., Lazzeroni,
L.C., Erickson, J.W., Sinsheimer, J.S., Kaplan, A.H., 1996. In vivo
sequence diversity of the protease of human immunodeficiency virus type
1: presence of protease inhibitor-resistant variants in untreated subjects. J
Virol 70(3), 2038-2043.
Li, H., Durbin, R., 2009. Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics 25(14), 1754-1760.
89
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.,
Abecasis, G., Durbin, R., 2009. 1000 Genome Project Data Processing
Subgroup. The Sequence alignment/map (SAM) format and SAMtools.
Bioinformatics 25, 2078-2079.
Li, H., Roossinck, M.J., 2004. Genetic Bottlenecks Reduce Population Variation
in an Experimental RNA Virus Population. J Virol 78(19), 10582-10587.
Maule, A.J., Wang, D., 1996. Seed transmission of plant viruses: a lesson in
biological complexity. Trends Microbiol 4(4), 153-158.
Miyashita, S., Kishino, H., 2010. Estimation of the Size of Genetic Bottlenecks in
Cell-to-Cell Movement of Soil-Borne Wheat Mosaic Virus and the Possible
Role of the Bottlenecks in Speeding Up Selection of Variations in trans-
Acting Genes or Elements. J Virol 84(4), 1828-1837.
Niehl, A., Heinlein, M., 2011. Cellular pathways for viral transport through
plasmodesmata. Protoplasma 248(1), 75-99.
Oparka, K.J., Prior, D.A., Santa Cruz, S., Padgett, H.S., Beachy, R.N., 1997.
Gating of epidermal plasmodesmata is restricted to the leading edge of
expanding infection sites of tobacco mosaic virus (TMV). Plant J 12(4),
781-789.
Roberts, I.M., Wang, D., Findlay, K., Maule, A.J., 1998. Ultrastructural and
Temporal Observations of the Potyvirus Cylindrical Inclusions (CIs) Show
That the CI Protein Acts Transiently in Aiding Virus Movement. Virology
245(1), 173-181.
Rodriguez-Cerezo, E., Findlay, K., Shaw, J.G., Lomonossoff, G.P., Qiu, S.G.,
Linstead, P., Shanks, M., Risco, C., 1997. The coat and cylindrical
inclusion proteins of a potyvirus are associated with connections between
plant cells. Virology 236(2), 296-306.
Rojas, M.R., Zerbini, F.M., Allison, R.F., Gilbertson, R.L., Lucas, W.J., 1997.
Capsid protein and helper component proteinase function as potyvirus
cell-to-cell movement proteins. Virology 237(2), 283-295.
Roy, A., Kucukural, A., Zhang, Y., 2010. I-TASSER: a unified platform for
automated protein structure and function prediction. Nat. Protocols 5(4),
725-738.
Rybicki, E.P., Pietersen, G., 1999. Plant virus disease problems in the
developing world. Adv Virus Res 53, 127-175.
90
Sacristan, S., Malpica, J.M., Fraile, A., Garcia-Arenal, F., 2003. Estimation of
population bottlenecks during systemic movement of Tobacco mosaic
virus in tobacco plants. J Virol 77(18), 9906-9911.
Sanjuan, R., Moya, A., Elena, S.F., 2004. The distribution of fitness effects
caused by single-nucleotide substitutions in an RNA virus. Proc Natl Acad
Sci U S A 101(22), 8396-8401.
Shukla, D.D., Frenkel, M.J., Ward, C.W., 1991. Structure and Function of the
Potyvirus Genome with Special Reference to the Coat Protein Coding
Region. Can. J. Plant Pathol.-Rev. Can. Phytopathol 13(2), 178-191.
Simmons H.E., Dunham, J.P., Zinn, K. E.,
Munkvold, G.P., Holmes E.C., &
Stephenson, A.G. 2013. Zucchini yellow mosaic virus (ZYMV, Potyvirus):
Vertical transmission, seed infection and cryptic infections Virus Research.
176: 259-264.
Simmons, H.E., Dunham, J.P., Stack, J.C., Dickins, B.J.A., Pagan, I., Holmes,
E.C., Stephenson, A.G. 2012. Deep sequencing reveals persistence of
intra- and inter-host genetic diversity in natural and greenhouse
populations of zucchini yellow mosaic virus. J Gen Virol 93, 1831-1840.
Simmons, H.E., Holmes, E.C., Gildow, F.E., Bothe-Goralczyk, M.A., Stephenson,
A.G., 2011. Experimental Verification of Seed Transmission of Zucchini
yellow mosaic virus. Plant Dis 95(6), 751-754.
Tatineni, S., Kovacs, F., French, R., 2014. Wheat streak mosaic virus infects
systemically despite extensive coat protein deletions: Identification of
virion assembly and cell-to-cell movement determinants. J Virol 88(2),
1366-1380.
Tromas, N., Zwart, M.P., Lafforgue, G., Elena, S.F. 2014. Within-host
spatiotemporal dynamics of plant virus infection at the cellular level. Plos
Genetics 10(2): e1004186 doi:10.1371/journal.pgen.1004186
Urcuqui-Inchima, S., Haenni, A.L., Bernardi, F., 2001. Potyvirus proteins: a
wealth of functions. Virus Res 74(1-2), 157-175.
Ward, C.W., Shukla, D.D., 1991. Taxonomy of potyviruses: current problems and
some solutions. Intervirology 32(5), 269-296.
Wei, T., Zhang, C., Hong, J., Xiong, R., Kasschau, K.D., Zhou, X., Carrington,
J.C., Wang, A., 2010. Formation of Complexes at Plasmodesmata for
Potyvirus Intercellular Movement Is Mediated by the Viral Protein P3N-
PIPO. PLoS Pathog 6(6), e1000962.
91
Zhang, Y., 2008. I-TASSER server for protein 3D structure prediction. BMC
Bioinformatics 9, 40.
Zwart, M.P., Daros, J.A., Elena, S.F., 2011. One Is Enough: In Vivo Effective
Population Size Is Dose-Dependent for a Plant RNA Virus. PLoS Pathog
7(7).
Zwart, M.P., Daros, J.A., Elena, S.F., 2012. Effects of Potyvirus Effective
Population Size in Inoculated Leaves on Viral Accumulation and the Onset
of Symptoms. J Virol 86(18), 9737-9747.
92
CHAPTER FOUR:
A COST-EFFECTIVE METHOD FOR HIGH-THROUGHPUT
CONSTRUCTION OF ILLUMINA SEQUENCING LIBRARIES
(as seen in Dunham and Friesen. 2013. Cold Spring Harb. Protoc.
9:820-34)
4.1 Abstract
Despite the plummeting cost of next-generation sequencing, the
preparation of sequencing libraries using commercially available kits still remains
expensive. This can be prohibitive for large-scale comparative or experimental
studies, where hundreds to thousands of samples need to be analyzed. The
increasing use of multiplexing dozens to hundreds of samples underscores the
urgent need to develop a cost effective and time efficient high-throughput method
for library preparation. By optimizing and scaling down the steps in library
construction and using commonly available reagents, the protocol described here
allows for the preparation of DNA libraries in a 96 well format, using no
specialized equipment, at a substantial savings in both reagent cost and
personnel hours. Utilizing this optimized high-throughput format results in a 10
fold cost reduction, compared to commercially available kits, making per library
or pooled sample costs ~$12.6-14.9 for individually prepared libraries and ~$8.6-
10.6 for pooled libraries with individual barcodes; both techniques allow for up to
144 samples to be pooled on a single lane with the barcodes tested herein.
93
4.2 Materials and Methods
Materials
Reagents
Agarose I (Amresco 0710-100G)
Bovine serum albumin (BSA)
Buffer EB (QIAGEN)
dATP (10 mM)
dntp mix (10mM)
DNA Clean & Concentrator-5 kit (Zymo Research D4004)
EDTA (0.5 M)
GeneRuler 100 bp Plus DNA Ladder (ThermoScientific SM0323)
Klenow fragment (New England Biolabs M0212L)
Liquid N
2
MPC Protein Precipitation Solution (Epicentre MMP03750)
NEBNext dsDNA Fragmentase (New England Biolabs M0348S)
NEBuffer 2 (New England Biolabs B7002S)
Phusion High-Fidelity DNA Polymerase (New England Biolabs M0530S)
Proteinase K (QIAGEN 19133)
Qubit dsDNA High-Sensitivity Assay Kit (Life Technologies Q32851)
Quick Blunting Kit (New England Biolabs E0542S)
Quick Ligation Kit (New England Biolabs M2200S)
SafeView (nucleic acid stain) (abm G108)
Tissue and Cell Lysis Solution (Epicentre MTC096H)
Tween-0 (0.1%)
ZR-96 DNA Clean & Concentrator-5 kit (Zymo Research; D4023 or D4024)
Alternatively, Agencourt AMPure XP beads (Beckman Coulter A63880,
A63881, or A63882) can be substituted. See the Method section
(Alternative Procedures) for instructions. Library construction costs per
sample can increase depending upon bead volume purchased.
ZR-96 Quick-gDNA kit (Zymo Research D3010)
Zymoclean Gel DNA Recovery Kit (Zymo Research D4001)
Alternatively, Agencourt AMPure XP beads (Beckman Coulter A63880,
A63881, or A63882) can be substituted. See the Method section
(Alternative Procedures) for instructions.
Equipment
Caps (8-cap strip for 1.1-mL racked tubes) (USA Scientific 1294-0800)
Centrifuge (with rotors for tubes and 96-well plates)
Collection plates (96 wells) (Zymo Research C2002)
Gel electrophoresis apparatus
94
Falcon tubes (15 mL)
PCR machine
Qubit fluorometer (Life Technologies)
Alternatively, the kappa quantification kit (Kapa Biosystems KK4824)
can be used to quantify each sample for pooling at Step 31 of the
protocol.
Stainless steel ball bearings (1/8 in diameter)
TempPlate PCR plates (96-wells, nonskirted, 0.2 mL, natural) (USA Scientific
1402-9596)
TempPlate sealing mat (for 96-well PCR plates) (USA Scientific 1400-9605)
TissueLyser II (QIAGEN)
Transilluminator (Clare Chemical Research)
Tubes (1.1 mL; in strips of 8; to refill racks) (USA Scientific 1212-8000)
Method
Interruption of the protocol can occur after completion of Steps 5, 6, 7, 8, 9,
10, 11, 12, and 13, however greater yields have ben observed if libraries are
constructed without interruption. We suggest that master mixes or individual
reagents are first pipetted into 12-tube strip tubes and a 12-channel
multipipettor used for pipetting into wells of a 96-well plate(s). All mixes are
given for a single sample in the following steps. All sample cleaning steps
using the ZR-96 DNA Clean & Concentrator-5 kit should add an additional 4
µl of elution buffer or PCR grade H
2
O since 4 µl is typically lost during the
elution spin. For steps using DNA Clean & Concentrator-5 kit, add an
additional 0.3 µl of elution buffer or H
2
O. The volumes in the protocol do not
reflect the expected loss of H
2
O.
95
Adaptor Formation:
1) Combine the following for:
Reagent Volume (µl)
Multi/Std Adaptor 1 (100 µM) 45
Multiplexing Adaptor 2 (100 µM) 45
NEBuffer 2 10
dH
2
O 0
Final Volume 100
See table 4.1 for oligonucleotide sequences. For barcoded adaptor
formation the barcoded adaptor oligo’s are substituted for the Multi/Std
Adaptor 1 and Multiplexing Adaptor 2.
Table 4.1: Oligonucleotides
Oligo ID Sequence
a
Non-barcoded Multiplexing (Indexing) Adaptor:
Multi/Std Adapter 1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT
Multiplexing Adapter 2 P-GATCGGAAGAGCACACGTCT
Barcoded Multiplexing (Indexing) Adaptor:
PE A1-ACTG P- CAGTAGATCGGAAGAGCACACGTCT
PE A2-ACTG ACACTCTTTCCCTACACGACGCTCTTCCGATCTACTGT
PE A1-ATGC P- GCATAGATCGGAAGAGCACACGTCT
PE A2-ATGC ACACTCTTTCCCTACACGACGCTCTTCCGATCTATGCT
PE A1-AGCT P- AGCTAGATCGGAAGAGCACACGTCT
PE A2-AGCT ACACTCTTTCCCTACACGACGCTCTTCCGATCTAGCTT
PE A1-TACG P- CGTAAGATCGGAAGAGCACACGTCT
PE A2-TACG ACACTCTTTCCCTACACGACGCTCTTCCGATCTTACGT
PE A1-TCGA P- TCGAAGATCGGAAGAGCACACGTCT
PE A2-TCGA ACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGAT
PE A1-TGAC P- GTCAAGATCGGAAGAGCACACGTCT
PE A2-TGAC ACACTCTTTCCCTACACGACGCTCTTCCGATCTTGACT
PE A1-CAGT P- ACTGAGATCGGAAGAGCACACGTCT
PE A2-CAGT ACACTCTTTCCCTACACGACGCTCTTCCGATCTCAGTT
PE A1-CTAG P- CTAGAGATCGGAAGAGCACACGTCT
PE A2-CTAG ACACTCTTTCCCTACACGACGCTCTTCCGATCTCTAGT
PE A1-CGTA P- TACGAGATCGGAAGAGCACACGTCT
PE A2-CGTA ACACTCTTTCCCTACACGACGCTCTTCCGATCTCGTAT
PE A1-GATC P- GATCAGATCGGAAGAGCACACGTCT
PE A2-GATC ACACTCTTTCCCTACACGACGCTCTTCCGATCTGATCT
PE A1-GCAT P- ATGCAGATCGGAAGAGCACACGTCT
PE A2-GCAT ACACTCTTTCCCTACACGACGCTCTTCCGATCTGCATT
PE A1-GTCA P- TGACAGATCGGAAGAGCACACGTCT
PE A2-GTCA ACACTCTTTCCCTACACGACGCTCTTCCGATCTGTCAT
Multiplexing Primers:
Multi/Std PCR Primer 1.0 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T
Multiplexing PCR Primer 2.0 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC*T
96
Index primers:
Primer Index 1 CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTT*C
Primer Index 2 CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTT*C
Primer Index 3 CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTT*C
Primer Index 4 CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTT*C
Primer Index 5 CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTT*C
Primer Index 6 CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTT*C
Primer Index 7 CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTT*C
Primer Index 8 CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTT*C
Primer Index 9 CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTT*C
Primer Index 10 CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTT*C
Primer Index 11 CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTT*C
Primer Index 12 CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTT*C
a
5’-3’
* Represents PTO bond. The use of this modification is necessary for long-term stability of the oligo.
P- Corresponds to a 5’-Phosphate modification
2) Mix and cycle using the following temperatures:
Temperature Time
95°C 5 min
90°C 1 min
85°C 1 min
80°C 1 min
75°C 30 sec
Go to step 5 for 69 cycles. Drop 1°C every cycle.
4°C hold
After cycling program is finished, incubate at 4°C for 30 minutes (or
place on ice). Store at -20°C. Thaw adaptor on ice or at 4°C.
Extraction of gDNA from Cell Culture Samples:
For extraction of gDNA from tissue samples, see Steps 9-15.
3) Grow bacteria in 15-mL Falcon tubes. Centrifuge at 3000 rpm and
remove the supernatant.
4) Add 280 µL of Tissue and Cell Lysis Solution and 20 µL of proteinase K
to each sample. Resuspend the pellets and incubate for 30 min at
55°C.
97
5) Allow the samples to cool and add 200 µL of MPC Protein Precipitation
Solution. Mix by inverting several times and centrifuge at maximum
speed for 15 min.
A cloudy white precipitate should be seen after mixing. If a precipitate
is not observed, additional MPC Protein Precipitation Solution can be
added following the manufacturer’s suggestions.
6) Without disrupting the pellet, transfer the supernatant to a new 96-well
collection plate.
7) Extract the DNA using the ZR-96 Quick-gDNA kit. Follow the
manufacturer’s instructions.
It is preferable to use a large concentration of DNA per microliter
(e.g., 100 ng/µL). Although the Quick-gDNA MiniPrep kit requires 50
µL for elution, we suggest 25 µL or less to guarantee high DNA
concentrations. If tissue quantity is limiting, use the Quick-gDNA
MicroPrep kit, although column clogging is sometimes observed.
Depending upon the sample size, single column or 96-well based
DNA extraction kits are available. The use of β-mercaptoethanol is
highly recommended. The addition of MPC Protein Precipitation
Solution (Step 5) can be omitted if β-mercaptoethanol is used as
described in the ZR-96 Quick-gDNA protocol. Otherwise, use equal
98
volutmes (not 4 volumes) of Zymo Lysis Buffer if excessive sample is
not used and the MPC step is performed.
8) Quantify using a Qubit fluorometer.
Proceed to Step 16
Extraction of gDNA from Tissue Samples:
Tissue can be stored long-term at -80°C before processing.
9) Place tissue samples in 1.1-mL 8-strip tubes with caps and freeze in
liquid N
2
. Crush the tissue by adding a single ⅛-in. diameter stainless
steel ball bearing to each tube. Process the samples in a TissueLyser
II at 30 Hz for 30 sec. Re-freeze the samples in liquid N
2
if additional
crushing is needed. Spin the samples down immediately after
crushing.
10) Combine 280 µL of Tissue and Cell Lysis Solution (Epicentre) and 20
µL proteinase K. Add 100 µL of this mix to each sample and repeat
bead beating.
11) Centrifuge the samples at maximum speed for 1 min. Add the
remaining 200 µL of the Tissue and Cell Lysis Solution/proteinase K
mix to each sample and incubate for 30 min at 55°C, periodically
inverting or flicking to mix.
99
12) Centrifuge the samples at maximum speed for 1 min. Add 200 µL of
MPC Protein Precipitation Solution, mix, and centrifuge at maximum
speed for 15 min.
If sample size is large, more than 200 µL of MPC Protein
Precipitation Solution can be used. A cloudy white precipitate should
be seen after mixing. If a precipitate is not observed, additional MPC
Protein Precipitation Solution can be added following the
manufacturer’s suggestions.
13) Without disrupting the pellet, transfer the supernatant to a new 96-well
collection plate.
14) Extract the DNA using the ZR-96 Quick-gDNA kit. Follow the
manufacturer’s instructions.
The use of β-mercaptoethanol is highly recommended. The addition
of MPC Protein Precipitation Solution (Step 12) can be omitted if β-
mercaptoethanol is used as described in the ZR-96 Quick-gDNA
protocol. Otherwise, use equal volumes (not 4 volumes) of Zymo
Lysis Buffer if excessive sample is not used and the MPC step is
performed.
15) Quantify using a Qubit fluorometer.
100
Fragmentation of gDNA
It is necessary to perform several sample fragmentation trials to optimize
the fragment size range for any specific sample. A DNA starting
concentration of 1 µg or greater can be used; however, this protocol has
been optimized for 500 ng.
16) Set up the digestion reaction in 96-well format.
Reagent Volume (µl)
gDNA (500 ng) Variable
Fragmentase Buffer (10x) (supplied with the 1
Fragmentase)
BSA (100X) 0.1
NEBNext dsDNA Fragmentase 1 (Do not add until Step 18)
Sterile dH
2
O Variable
Final Volume 10
Mix thoroughly.
17) Incubate on ice for 5 min.
18) Vortex NEBNext dsDNA Fragmentase and add to the reaction. Mix
thoroughly.
Fragmentase is an enzyme mix and must be vortexed. Given the
minimal reaction volume, ensuring the addition of exactly 1 µl is
critical to the success of this reaction. This is a common step
contributing to poor library construction.
19) Incubate at 37°C according to the recommended times in
manufacturer’s protocol.
101
We routinely find an incubation time of ~ 20 min to be ideal. If
multiple samples are to be fragmented, it is highly recommended to
stagger the samples with 20 sec increments, or to perform
fragmentation in an eight- or 12- strip tube format for multichannel
use.
20) After incubation, immediately add 6 µL of 0.5 M EDTA to stop the
reaction.
As the reaction reaches the optimal fragmentation range, any time
beyond the predetermined incubation time significantly alters the
success of library construction and the variability between samples.
21) Purify the DNA with ZR-96 DNA Clean & Concentrator-5 kit. Elute in
7.5 µL H
2
O.
Quick Blunting Kit (NEB)
22) Using the Quick Blunting Kit, prepare the following reaction mix:
Reagent Volume (µL)
Eluted DNA (from Step 21) 7.5
Blunt buffer (10x) 1.25
dNTP mix (1mM) 1.25
Blunt Enzyme Mix 0.5
E. coli DNA ligase for Fragmentase (from 0.4
NEBNext Fragmentase Kit)
Sterile dH
2
O 1.6
Final Volume 12.5
Addition of E. coli DNA ligase for Fragmentase is critical for library
construction.
102
23) Incubate the sample for 30 min at 25°C. Hold at 4°C.
Although this kit can be heat inactivated, it is not recommended in
order to retain AT-rich genomic fragments.
24) Purify the DNA with ZR-96 DNA Clean & Concentrator-5 kit. Elute in
15 µl H
2
O.
Addition of dATP
25) Prepare the following reaction mix:
Reagent Volume (µl)
Eluted DNA (from Step 24) 15
NEBuffer 2 2.5
dATP (10mM) 0.5
Klenow exo- 1.5
Sterile dH
2
O 5.5
Final Volume 25
26) Incubate the sample for 30 min at 37°C. Hold at 4°C.
27) Purify the DNA with ZR-96 DNA Clean & Concentrator-5 kit. Elute in 6
µl H
2
O.
Adaptor ligation (NEB Quick Ligation Kit)
28) Using the Quick Ligation Kit, follow either Procedure A or Procedure
B.
103
Procedure A: 96-Well, Nonpooled, Single-Sample Approach
i. Prepare the following reaction mix:
Reagent Volume (µl)
Eluted DNA (from Step 27) 6
Quick Ligation Buffer (2X) 7.5
Multiplexing (Indexing) Adaptor (45 µM) 1.2
Quick T4 DNA Ligase 0.6
Final Volume 15.3
The adaptor should be added directly to each sample prior to
making ligase/buffer mix. Ligase and buffer should be combined and
vortexed thoroughly. The high viscosity of both the ligase and 2×
buffer can result in poor mixing of these two components. Ligase is
added to a tube and then the buffer. The ligase is aspirated from the
bottom of the tube into the buffer and then vortex to thoroughly mix.
See Table 4.1 for oligonucleotide sequences.
29) Incubate for 15 min at 25°C.
Although this kit can be heat inactivated, it is not recommended in
order to retain AT-rich genomic fragments.
30) Purify with ZR-96 DNA Clean & Concentrator-5 kit. Elute in 6 µL H
2
O.
Cleaning after ligation is critical. Excess adaptor-adaptor product will
cause a greater propensity for adaptor-adaptor product to co-
migrate with desired product during gel purification and reduce PCR
efficiency.
104
Procedure B: 96-well, Pooled-Sample Approach
i) Prepare the following reaction mix using one barcoded adaptor per
column (see fig. 4.1):
Reagent Volume (µL)
Eluted DNA 6
2X Quick Ligation Buffer 7.5
Barcoded Adaptor (45 µM) 1.2
Quick T4 DNA Ligase 0.6
Final Volume 15.3
Fig. 4.1. Barcoding and indexing schematic. Example 96-well format
combinatorial barcoded adaptors/indexing primers. Each well contains a unique
sample. Each column has one of 12 barcoded adaptors ligated to it. Each row is
then pooled for eight indexing PCR reactions.
The adaptor should be added directly to each sample prior to
making ligase/buffer mix. Ligase and buffer should be combined and
vortexed well. The high viscosity of both the ligase and 2× buffer can
result in homogenization failure. Ligase is added to a tube, followed
105
by buffer. The ligase is aspirated from the bottom of the tube into the
buffer and then vortex to thoroughly mix. This mixture is added to
the sample and adaptor mix. See Table 1 for oligonucleotide
sequences.
ii) Incubate for 15 min at 25°C.
iii) Retain 50% of the ligation reaction from each sample for long-term
storage at -80°C.
iv) Purify the remaining 50% cDNA with DNA Clean & Concentrator-5
kit. Elute in 6 µL of H
2
O.
Prior to purification after ligation, pool the samples in the following
fashion. Pool samples A1–A6, A7–A12, B1–B6, B7–B12, and so on
for each row for a total of 16 pooled samples (Fig. 1). Use these 16
samples to load on a gel. Pooling samples down a column is also
an option; however, never pool more than six samples. Cleaning
after ligation is critical. Excess adaptor–adaptor product will cause
a greater propensity for adaptor–adaptor product to co-migrate with
the desired product during gel purification. This will result in
variable and reduced amplification efficiency in the final indexing
PCR step. Pooling of more than six ligation reactions at a time
results in clogging of the purification column and should not be
attempted.
106
Gel purification of cDNA Templates
29) Purify the DNA using the Zymoclean Gel DNA Recovery Kit, follow
either Procedure A or Procedure B.
Procedure A: 96-Well, Nonpooled, Single-Sample Approach
i) Prepare a 1.5-2% 1X TBE agarose gel.
Use a small width comb size because the product will be barely
visible. The volume of gel in the cast should not be excessive, but
sufficient for the sample to be loaded. Excess gel thickness
increases incubation times and leads to greater product loss. Load
individual samples with a ladder between each sample. Include
blank lanes between each sample and ladder to prevent cross-
contamination.
ii) Load sample and loading dye with a 100 bp ladder in the flanking
well.
iii) Run gel for 30-60 min at 120V until sufficient separation of the 100-
bp and 200-bp bands of the DNA ladder occurs.
iv) Cut a gel slice at 350/400 bp (+/- 20 bp) to 800 bp (depending upon
sequencing length) and purify the DNA with Zymo Research
Zymoclean Gel DNA Recovery Kit. Elute the DNA into 7.5 µL of
H
2
O.
107
Use a transilluminator to prevent UV-mediated DNA damage during
gel excision. Incubate the gel in Agarose Dissolving Buffer
(supplied with the kit) at no greater than 30 ̊C. The Zymoclean Gel
kit typically results in a loss of 0.3 µL during the elution step.
Procedure B: 96-well Pooled-Sample Approach
i) Prepare a 1.5-2% 1X TBE agarose gel.
Use a small width comb size because product will be barely visible.
The volume of gel in the cast should not be excessive, but sufficient
for the sample to be loaded.
ii) Load sample and loading dye with a 100 bp ladder in the flanking
well.
Load the row pools (two per row of a 96-well plate; see Step 28
Procedure B iv) adjacent to one another, that is, ladder, A1–A6,
A7–A12, ladder, B1–B6, B7–B12, ladder, etc. When cutting the
samples from the gel, cut both pooled row samples in the same
slice in order to gel purify an entire row in a single column. Do not
separate the two samples between each row.
iii) Run gel for 30-60 min at 120V until sufficient separation of the 100-
bp and 200-bp bands of the DNA ladder occurs.
108
iv) Cut a gel slice at 350/400 bp (+/- 20 bp) to 800 bp (depending upon
sequencing length). Purify with Zymo Research Zymoclean Gel DNA
Recovery Kit. Elute the DNA into 15 µL of H
2
O.
Use a transilluminator to prevent UV-mediated DNA damage during
gel excision. Incubate the gel in Agarose Dissolving Buffer
(supplied with the kit) at no greater than 30 ̊C. The Zymoclean Gel
kit typically results in a loss of 0.3 µL during the elution step.
See Troubleshooting.
Indexing PCR
30) Perform PCR using either Procedure A or Procedure B.
Procedure A: 96-Well, Nonpooled, Single-Sample Approach
i) Prepare the following reaction mix. Use one index primer per
sample. See Table 1 for oligonucleotide sequences.
Reagent Volume (µL)
Phusion buffer (5X) (supplied with polymerase) 5
Multi/Std PCR Primer 1.0 (25 µM) 0.5
Multiplexing PCR Primer 2.0 (0.5 µM) 0.5
PCR Primer Index 1 (or others) (25 µM) 0.5
dNTP mix (10 mM) 0.75
Phusion DNA polymerase 0.5
Gel Extracted Sample 7.5
Sterile dH
2
O 9.75
Final Volume 25
109
ii) Run the following PCR cycling program for a total of 18 cycles.
Temperature Time
98°C 1 min
98°C 10 s
65°C 30 s
72°C 30 s
Cycle to Step 2 for 17 rounds
72°C 5 min
4°C hold
iii) Purify the DNA with ZR-96 DNA Clean & Concentrator-5 kit. Elute
in 16-26 µl Bufer EB.
After indexing PCR, single sample libraries can be cleaned either
as pools and then quantified or cleaned as single samples,
quantified, and then pooled.
Procedure B: 96-well, Pooled Approach
i) Prepare the following reaction mix. Use one index primer per pooled
row (Fig. 1). See Table 1 for oligonucleotide sequences.
Reagent Volume (µL)
Phusion buffer (5X) (supplied with polymerase) 10
Multi/Std PCR Primer 1.0 (25 µM) 1
Multiplexing PCR Primer 2.0 (0.5 µM) 1
PCR Primer Index 1 (or others) (25 µM) 1
dNTP mix (10 mM) 1.5
Phusion DNA polymerase 1
Gel Extracted Sample 15
Sterile dH
2
O 19.5
Final Volume 50
110
ii) Run the following PCR cycling program for a total of 18 cycles.
Temperature Time
98°C 1 min
98°C 10 s
65°C 30 s
72°C 30 s
Cycle to Step 2 for 17 rounds
72°C 5 min
4°C hold
iii) Purify the DNA with DNA Clean & Concentrator-5 kit. Elute in 16-26
µl Buffer EB.
Library Quantification and Dilution
31) Quantify with the Qubit dsDNA High-Sensitivity Assay kit and
fluorometer. Dilute the samples with Buffer EB to 10 nM in the
presence of 0.1% Tween-20. Pool the desired number of samples per
lane of sequencing.
Alternatively, the kappa quantification kit can be used to quantify
each sample for pooling, but per sample costs will increase.
32) Sequence the libraries on any Illumina platform or store at -80°C.
The final quantified library is the best indication of a successful
library construction. Final values of 5 ng/µL in 16–26 µL following
Procedure A or 10 ng/µL in 16–26 µL following Procedure B
routinely results in high quality sequence with minute adaptor
contaminate. Values of <2 ng/µL indicate that the library preparation
111
was not efficient and will generally result in a poor sequencing run,
as a result of excessive adaptor sequence and/or low cluster
densities.
See Troubleshooting.
Alternative Procedures
AMPure Bead XP based sample cleanup
This is an alternative to the Zymo Research DNA Clean & Concentrator
kits. Prepare fresh 70% ethanol prior to each use of beads. If using a
robot, bring samples up to 40 µL volume with H
2
O.
1. Resuspend beads and add 1 volume of beads per volume of sample
and mix well.
The beads are viscous. Watch for residual beads retained on the
outside of pipette tip or inside pipette tip after adding to sample.
2. Incubate the mix for 5 min at room temperature.
3. Place samples on a magnet for 1 min. Once beads have been pulled to
the side of the tube, remove the supernatant.
4. While samples are on magnet, add 100 µl of 70% ethanol. Wait 30 sec
and remove the ethanol. Repeat this step one additional time. Be
certain that all ethanol is removed prior to elution step.
112
5. Remove from magnet and add desired volume of H
2
O. Incubate at
room temperature for 2 minutes.
6. Place on magnet for 1 min and remove supernatant containing purified
sample.
AMPure Bead XP based sample size selection
This is an alternative to gel-size selection and purification using the
Zymoclean Gel DNA Recovery Kit. Prepare fresh 70% ethanol prior to
each use of beads. If using a robot, bring samples up to 40 µL volume with
H
2
O. For size selection after adaptor ligation, bring samples up to 40 µL.
This protocol will select for fragments ≥ 150–200 bp.
1. Resuspend beads and add 0.7x volume of beads per volume of
sample and mix well.
The beads are viscous. Watch for residual beads retained on the
outside of pipette tip or inside pipette tip after adding to sample.
2. Incubate the mix for 5 min at room temperature.
3. Place samples on a magnet for 1 min. Once beads have been pulled to
the side of the tube, remove the supernatant.
4. While samples are on magnet, add 100 µl of 70% ethanol. Wait 30 sec
and remove the ethanol. Repeat this step one additional time. Be
certain that all ethanol is removed prior to elution step.
113
5. Remove from magnet and add desired volume of H
2
O. Incubate at
room temperature for 2 minutes.
6. Place on magnet for 1 min and remove supernatant containing purified
sample.
4.3 Troubleshooting
Problem (Step 29): The fragmentation smear observed on a 1.5%
agarose gel deviates from those pictured in the manufacturer’s
protocol.
Solution: Likely due to incomplete fragmentation (a high-molecular-weight
DNA band with a smear and limited DNA in the target range) or
excessive fragmentation (a faint 50- to 1000-bp smear). Prior to
starting this library construction protocol, the fragmentation step should
always be optimized with a subset of the samples for desired smear
size and intensity. We have determined that the two major causes of
fragmentation failure are (i) the quality of extracted DNA and (ii) a
NEBNext dsDNA Fragmentase pipetting error. DNA quality is primarily
a function of the kit used for extraction. We have provided two
extraction methods that select for high-molecular-weight DNA, removal
of RNA, and post-fragmentation results that consistently yield the
desired frag- mentation size range and intensity. The second issue
occurs as a result of error while pipetting small volumes. As the
114
reaction volume is only 10 µL the volume of NEBNext dsDNA
Fragmentase needed is miniscule, therefore any excess reagent
added to the reaction contributes significantly to the overall reaction
dynamics. Careful attention to detail (e.g., specifically pipetting off the
surface of the reagent rather than submerging the tip) and proper
pipetting technique will ensure that the desired fragmentation range is
achieved.
Problem (Step 32): The gel of the fragmented sample looks normal, but
the library concentration is <1 ng/µL.
Solution: Due to the presence of nicked DNA after fragmentation, it is
critical to add E. coli DNA ligase (supplied with the NEBNext dsDNA
Fragmentase kit) prior to downstream steps to ensure that the DNA is
repaired. Failure to add this enzyme will result in poor amplification
from adaptors during the indexing PCR step.
4.4 Discussion
Genomic sequencing is becoming increasingly popular in both model and
non-model systems. This is primarily due to a continuously decreasing cost per
base sequenced across sequencing platforms (Stein LD, 2010). With Illumina
sequencing in particular it is the result of greater cluster densities, increased
sequencing lengths, sample pooling per lane, and paired-end sequencing.
Unfortunately, the excessive cost of library construction using commercially
115
available kits, $50 - $110 per sample, has, to date, been unable to parallel the
reduction in sequencing cost.
Procedures A and B of our method expand upon standard single sample
library construction approaches (Margulies et al. 2005, Mortazavi et al. 2008), by
protocol optimization that reduces construction costs by an order of magnitude in
a flexible high-throughput manner. This reduction in cost is directly attributable to
the optimization of reagent usage, a 50% reduction in starting material, and the
incorporation of unique pooling stages, while utilizing commonly used and highly
trusted reagents. Procedure A was tested for whole genome sequencing in
eukaryotes (Drosophila, the flatworm Schmidtea mediterranea, and the model
legume Medicago truncatula) as well as in prokaryotes (Mesorhizobium and
Ensifer bacteria), ranging from a few to 96 individual samples simultaneously
processed in our high-throughput manner. For example, fig. 4.2A, C, and E
shows summary statistics from sequencing and de novo assembly of a collection
of 48 Mesorhizobium strains. These strains were individually processed following
Procedure A, and sequenced to ~20x coverage in 76bp paired end format and
draft genomes were assembled using a5 pipeline (Tritt et al. 2012); 190,810,672
reads (95.1%) were used out of 200,550,351 total reads (S. S. Porter, P. L.
Chang, C. A. Conow, J. P. Dunham, M. L. Friesen, in preparation). To
demonstrate the flexibility and reduction in cost and time achieved by pooled
library construction (Procedure B) as compared to individual library construction
(Procedure A),18 Ensifer medicae Illumina libraries were constructed, fig. 4.2B,
116
D, F. De novo assembly was performed using the a5 pipeline (Tritt et al. 2012),
and it was determined that each sample contained <1% adaptor sequence and
>90% of the bases had a quality score of 20 or greater. Specifically, fig. 4.2A
shows the total number of sequencing reads produced per individual sample.
Twelve samples were pooled per lane based upon the final quantification of each
library, and consistently resulted in equal coverage across all 48 samples. By
comparison, Procedure B also shows equal coverage after sequencing even
when samples are pooled at early stages, thus indicating no obvious source of
biased library construction or biased amplification within pooled libraries, fig.
4.2B. Amongst the sequence produced per sample using either Procedure A or
B, >95% of the sequence used for de novo assembly was of high quality, fig.
4.2C and D. Figures 4.2E and F illustrate that both procedures consistently
produce high contig lengths between samples, suggesting little genome
coverage bias in using either Procedures A and B. Furthermore the number of de
novo contigs per sample as well as the maximum contig length was consistent
across samples, fig. 4.2E and F.
Procedures A and B are both high-throughput options that can be used to
construct 96 (8 pools of 12 indexes) or 144 (12 indexes, 12 barcodes per index
pool) libraries with user flexibility in mind. The first deviation from standard
protocols occurs after the barcoded adaptor ligation, step 9, in which
approximately 50% of each ligation product is removed for long-term storage
117
Fig. 4.2. Sequence and assembly quality. Read outputs and assembly statistics
for bacterial genomes sequenced with Procedures A and B. (A,B) Total number
of reads from libraries prepared using the individual sample protocol (A: 48
strains) and pooled protocol (B: 18 strains). The fraction of reads used in the de
novo assembly was above 95% for both experiments (C,D). Both protocols
yielded high quality draft genomes (E,F) with total assemblies ranging from 5.8 to
7.8 Mb and 5.8 to 7.3 Mb (blue), maximum contigs ranging from 297 to 2511 kb
and 42 to 697 kb (gold), and N50 values ranging from 129 to 911 kb and 5.4 to
262 kb (pink).
(Procedure B). For instance, after removal of 50% of each of the 18 Ensifer
samples, they were pooled into three groups of six samples each, size selected,
and subsequently index PCR amplified for final library construction. Sample
pooling can consist of any size, i.e. 12, eight, or even fewer samples. Variation in
library construction efficiency between samples can result in competition for
coverage during sequencing within pools. Therefore, in the event of poor
coverage the stored ligation product, step 9, can be used for a second round of
Fraction Reads Used
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
1e+00
1e+04
Sample
bp
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
1 2 3 4 5 6 7 8 9 101112131415161718
bp
1e+02
1e+06
1e+00
1e+04
1e+02
1e+06
1 2 3 4 5 6 7 8 9 101112131415161718
Total Reads
1e+00
1e+04
1e+08
1 6 12 18 24 30 36 42 48
Total Reads
1e+00
1e+04
1e+08
Sample
1 6 12 18 24 30 36 42 48
Fraction Reads Used
Sample
Sample
Total Assembly
Max Contig
N50
Total Assembly
Max Contig
N50
0.95 0.95
A
C
E
B
D
F
118
library construction and sequencing. Either Procedure A or B can be used to
complete the libraries for these samples.
It has become common to use barcoded adaptors or Illumina’s primer
based indexing approaches to differentiate samples (Craig, et al. 2008), however
these methods have not been routinely used in combination. Procedure B uses
unique pooling stages, as a result of the method’s combinatorial scheme of
barcoding and indexing for sample differentiation, fig. 4.1. Utilizing these pooling
stages drastically reduces the number of samples for size selection, which
remains one of the most tedious and time-consuming steps of library
construction. Using Procedure B, 96 samples are reduced to a manageable 16
samples for size selection and further reduced to eight gel extractions. This is
assuming each row of 12 samples per row will be combined for indexing PCR (2
pools per row for gel extraction, six samples per pool). This example
demonstrates pooling across a row, however columns can also be pooled
according to the needs of the user, fig. 4.1. Pool size, either Steps 10 and 11 of
Procedure B or Step 14 of Procedure A, is dependent upon the scale of the
project and the desired coverage per sample in each lane. The user must
consider two options to obtain the maximum per sample coverage given the
number of pooled samples and genome size: 1) pool a small number of samples
to obtain the maximum coverage within a single lane, or 2) distribute a large pool
of samples across multiple lanes to achieve the desired per sample coverage.
119
Thus further demonstrating the flexibility, scalability and cost reduction of
Procedures A and B.
In addition to protocol flexibility, the cost per sample can pose a significant
barrier to the desired scale of an experiment. The library construction costs vary
for both procedures, and was determined in two ways: 1) total cost of reagents
for a given number of samples; or 2) per unit enzyme cost (the number of
samples prepared per kit). Estimating sample preparation costs from the total
cost of reagents purchased results in a higher overall cost per sample. However,
since each kit within the protocol can process a different number of samples,
total reagent cost is an inaccurate estimate of per sample cost. Furthermore,
these two approaches of sample cost calculation can vary greatly depending
upon sample size. Tables 4.2A-C and 4.3A-C display the cost breakdown for
Procedures A and B, respectively, reflecting both total kit costs and per
enzymatic costs for sample sizes 100, 200, and 400. Using Procedure A the user
can easily prepare 96 or more libraries given a sequencing pool size of eight or
12 samples per lane and an enzymatic cost of ~$12.6 - 14.9, depending upon
sample size. However the total cost per kit given a particular sample size costs
~$13.4 - $15.86. Alternatively, due to the combinatorial approach of Procedure B,
per enzymatic costs are further reduced to ~$8.5 - $10.5, while total kit costs are
~$9.3 - $12.35. Our estimation of per sample cost for all sample sizes was
determined to minimize both per sample enzymatic cost as well as total reagent
cost. Our protocol uses commonly used and trusted reagents, and although
120
using other reagents could potentially reduce costs further, the reproducibility
and efficiency of these reagents relative to this protocol would require additional
testing and optimization. Also, it is common for library construction sample costs
to neglect including the cost of DNA extraction. Although our cost estimates are
based solely upon library construction, we also provide our DNA extraction
methods and associated costs, table 4.4, as library construction efficiency is
highly dependent upon DNA quality.
Table 4.2A: Procedure A: 100 Samples
Reagent Total
Samples
Company Total
Preps
Catalog
#
Quantity Price Total Price Per Unit
Cost
Fragmentase 100 NEB 100 M0348S 1 96 96 0.96
ZR-96 Pur Kit 100 Zymo 4x96 D4024 1 387 387
ZR-96 Pur Kit 100 Zymo 2x96 D4023 1 199 199 5.86
Gel Recovery 100 Zymo 100 D4001 2 76 152 1.52
Quick Blunt 100 NEB 120 E1201S 3 77 231 1.92
Klenow exo- 100 NEB 133.3 M0212L 1 228 228 1.71
Quick
Ligation
100 NEB 100 M2200S 2 95 190 1.9
Phusion 100 NEB 100 M0530S 1 103 103 1.03
Total Cost: 1586 Enzyme Total: 7.52
Per Sample: 15.86 Column Total: 7.38
Total Unit Cost
Per Sample:
14.90
121
Table 4.2B: Procedure A: 200 Samples
Reagent Total
Samples
Company Total
Preps
Catalog
#
Quantity Price Total Price Per Unit
Cost
Fragmentase 200 NEB 200 M0348S 2 96 192 0.96
ZR-96 Pur Kit 200 Zymo 4x96 D4024 3 387 1161 5.80
Gel Recovery 200 Zymo 200 D4002 1 278 278 1.39
Quick Blunt 200 NEB 200 E1201L 1 308 308 1.54
Klenow exo- 200 NEB 266.6 M0212L 2 228 456 1.71
Quick
Ligation
200 NEB 250 M2200L 1 380 380 1.52
Phusion 200 NEB 200 M0530S 2 103 206 1.03
Total Cost: 2981 Enzyme Total: 6.76
Per Sample: 14.90 Column Total: 7.19
Total Unit Cost
Per Sample:
13.95
122
Table 4.2C: Procedure A: 400 Samples
Reagent Total
Samples
Company Total
Preps
Catalog
#
Quantity Price Total Price Per Unit
Cost
Fragmentase 400 NEB 500 M0348L 1 384 384 0.76
ZR-96 Pur Kit 400 Zymo 4x96 D4024 5 387 1935 4.84
Gel Recovery 400 Zymo 400 D4002 2 278 556 1.39
Quick Blunt 400 NEB 400 E1201L 2 308 616 1.54
Klenow exo- 400 NEB 399.9 M0212L 3 228 684 1.71
Quick
Ligation
400 NEB 500 M2200L 2 380 760 1.52
Phusion 400 NEB 500 M0530L 1 412 412 0.82
Total Cost: 5347 Enzyme Total: 6.36
Per Sample: 13.36 Column Total: 6.23
Total Unit Cost
Per Sample:
12.59
The protocols described herein are applicable to small to moderate sized
labs, as typified by most research labs. Thus as a result we did not include robot-
based automation as a source of labor cost reduction. However, the
determination of labor costs will be dependent upon the construction time of the
method, hourly pay of the personnel, and the ability to automate the protocol.
The time to complete our procedures is 8-13 hours for two 96-well plates, and is
dependent upon the experience of the user. Procedure A does not deviate
123
drastically from traditional library construction protocols and therefore would not
significantly increase the labor costs associated with the per sample cost of
library construction. While procedure B increases the rate of library construction
by allowing for pooling stages that decreases the time spent on gel based size
selection. AMPure XP beads is an alternative approach for size selection that
typically reduces handling time by 90%, however this procedure was not
considered here as associated per sample costs can be prohibitive for smaller
labs as the maximum bead volume must be purchased to make beads cost
effective.
To our knowledge this is the first optimized cost effective high-throughput
protocol for library construction. The uniqueness of this approach lies in the
combinatorial barcode and indexing, which not only allows for flexibility in library
construction, but also allows for unique pooling stages. This in turn leads to a
cost effective method of library construction in which per sample library
construction costs are ~$8.5-10.5.
124
Table 4.3A: Procedure B: 100 Samples
Reagent Total
Samples
Company Total
Preps
Catalog
#
Quantity Price Total
Price
Per Unit
Cost
Fragmentase 100 NEB 100 M0348S 1 96 96 0.96
ZR-96 Pur Kit 100 Zymo 4x96 D4024 1 387 387 3.87
Quick Blunt 100 NEB 120 E1201S 3 77 231 1.92
Klenow exo- 100 NEB 133.3 M0212L 1 228 228 1.71
Quick
Ligation
100 NEB 100 M2200S 2 95 190 1.9
Phusion 100 NEB 50 M0530S 1 103 103 2.06
Phusion-
Pool 8
0.25
Phusion-
Pool 12
0.17
Pool of 8 Pool of 12
Enzyme
Cost:
6.75 6.66
Total
Cost:
1235
Column
Cost:
3.87 3.87
Per
Sample:
12.35
Total
Unit
Cost Per
Sample:
10.62
a
10.53
a
a
The cost of gel extraction columns and polymerase costs are not included in the cost estimation
of Procedure B because the enzymatic unit cost is negligible when applied across 96 or more
samples.
125
Table 4.3B: Procedure B: 200 Samples
Reagent Total
Samples
Company Total
Preps
Catalog
#
Quantity Price Total
Price
Per Unit
Cost
Fragmentase 200 NEB 200 M0348S 2 96 192 0.96
ZR-96 Pur Kit 200 Zymo 4x96 D4024 1 387 387
ZR-96 Pur Kit 200 Zymo 2x96 D4023 1 199 199 2.93
Quick Blunt 200 NEB 200 E1201L 1 308 308 1.54
Klenow exo- 200 NEB 266.6 M0212L 2 228 456 1.26
Quick
Ligation
200 NEB 250 M2200L 1 380 380 1.71
Phusion 200 NEB 50 M0530S 1 103 103 2.06
Phusion-
Pool 8
0.25
Phusion-
Pool 12
0.17
Pool of 8 Pool of 12
Enzyme
Cost:
5.7345 5.64
Total
Cost:
1826
Column
Cost:
2.93 2.93
Per
Sample
:
9.13
Total Unit
Cost Per
Sample:
8.66
a
8.57
a
a
The cost of gel extraction columns and polymerase costs are not included in the cost estimation
of Procedure B because the enzymatic unit cost is negligible when applied across 96 or more
samples.
126
Table 4.3C: Procedure B: 400 Samples
Reagent Total
Samples
Company Total
Preps
Catalog
#
Quantity Price Total
Price
Per Unit
Cost
Fragmentase 400 NEB 500 M0348L 1 384 384 0.768
ZR-96 Pur Kit 400 Zymo 4x96 D4024 3 387 1161 2.90
Quick Blunt 400 NEB 400 E1201L 2 308 616 1.54
Klenow exo- 400 NEB 399.9 M0212L 3 228 684 1.71
Quick
Ligation
400 NEB 500 M2200L 2 380 760 1.52
Phusion 400 NEB 50 M0530S 1 103 103 2.06
Phusion-
Pool 8
0.12
Phusion-
Pool 12
0.08
Pool of 8 Pool of 12
Enzyme
Cost:
5.66 5.62
Total
Cost:
4482
Column
Cost:
2.90 2.90
Per
Sample:
11.20
Total
Unit
Cost Per
Sample:
8.56
a
8.52
a
a
The cost of gel extraction columns and polymerase costs are not included in the cost estimation
of Procedure B because the enzymatic unit cost is negligible when applied across 96 or more
samples.
127
Table 4.4: DNA Extraction for 96 samples
Reagent Company Preps/vol
a
Catalog # Quantity Price Total
Price
Volume
(ul)/sample
Per
sample
cost
Cell Lysis Epicentre 600 mL MTC096H 1 213 213 290 0.09
MPC Epicentre 50 mL MMP03750 1 59 59 200 0.23
Proteinase K Qiagen 10 mL 19133 1 267 267 10 0.26
Zymo Quick-gDNA
MiniPrep kit
Zymo 50 preps D3006 2 76 152 1.52
a
Volumes of each reagent for DNA purification can vary depending upon the amount of tissue.
128
References
Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ,
Pawlowski TL, Laub T, Nunn G, Stephan DA, et al. 2008. Identification of
genetic variants using bar-coded multiplexed sequencing. Nat. Methods
5:887–893.
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J,
Braverman MS, Chen YJ, Chen Z, et al. 2005. Genome sequencing in
microfabricated high-density picolitre reactors. Nature 437:376–380.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and
quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5:621-
628.
Stein LD. 2010 The case for cloud computing in genome informatics. Genome
Biology 11:207.
Tritt A, Eisen JA, Facciotti MT, Darling AE. 2012. An Integrated Pipeline for de
Novo Assembly of Microbial Genomes. PLoS ONE 7:e42304.
129
CHAPTER FIVE:
CONCLUSIONS AND FUTURE STUDIES
5.1 Conclusions and Future Studies
The evolutionary forces of genetic drift and selection will have direct
effects on genetic diversity and population dynamics. Genetic diversity is a
critical driver for adaption at the organismal level, however can contribute to
disease at the cellular level in either mitochondria or a viral pathogen. The two
studies, mitochondria and ZYMV, although very different in nature are similar in
the findings such that bottleneck events that influence the strength of
evolutionary pressures affecting genetic diversity are not as severe as would be
predicted and that population movement, either across the planarian body or viral
invasion between tissues and cells, modulates this genetic diversity.
The first study (chapter two) analyzed the consequences that occur due to
the lack of the single cell bottleneck event that initiates development. This event
occurs in nearly all metazoans, however the freshwater asexual planarian
Schmidtea mediterranea is an exception. We predicted that as planaria develop
through multicellular inheritance and regeneration that high mitochondrial genetic
diversity would be found. In other metozoans high levels of diversity are
deleterious as it causes stem cell renewal dysfunction, proliferation failure, and,
in some cases death. It was determined that an enormous abundance of
130
variants, ranging in frequency from 1-50% existed within planaria and found that
mutation accumulation was significantly correlated with sample age. Non-
synonymous variation greatly outnumbered synonymous variation in which a
unique substitution pattern of A:T>T:A predominated. One hypothesis is that in
order to compensate for mutation accumulation that the heteroplasmy would be
distributed into offspring, i.e. the tail segment, which will be removed during
fission based reproduction. Notably, related samples showed a pattern of genetic
relatedness defined by body spatial position in the founder planarian and the
middle sample contained the greatest number of variants, which was contrary to
our original expectation. We propose a cellular based model incorporating the
processes of regeneration in planaria to explain the unique spatial distribution of
variants and discuss the ramifications of these events. The process of
regeneration and specifically the movement of cells appear to be critical in
establishing gradients of diversity, which likely acts as a mechanism for
distributing mitochondrial diversity across the planarian body for these segments
to become offspring. This has implications not only for understanding the roles of
stem cell populations in regeneration, but whether the nuclear genome contains
genetic diversity, whether nuclear diversity creates nuclear-mitochondrial
conflicts, and how this high level of genetic diversity is allocated between
neoblasts and daughter cells.
An abundance of genetic variation was observed; therefore it is likely that
different neoblast lineages exist within single planaria. However to address this,
131
characterization of intra-individual genetic variation at the nuclear level is needed
in order to track neoblasts and daughter cells within a lineage. In this manner,
questions can be addressed pertaining to mitochondrial maintenance, such as
whether asymmetric distribution of mitochondria occurs between neoblasts and
daughter cells. Deep sequencing at the individual cellular level would yield insight
into whether genetic diversity is also found at the nuclear level. Given that the
nuclear genome will also experience the consequences of a lack of a cellular
bottleneck, we expect that heterogeneity would exist at the nuclear level as was
observed at the mitochondrial level. Information regarding nuclear and
mitochondrial diversity with quantitative measurements of apoptosis as well as
neoblasts proliferation and migration would yield evidence to ascertain if the
proposed cellular model is true. Thus planaria can be used as a model system to
address how regenerative events and evolutionary forces influence genetic
diversity as a result of multicellular inheritance, and how intra- and inter-cellular
conflicts develop over time.
The second study (chapter three) determined the magnitude and impact of
population bottlenecks on the extent and structure of intra-host genetic diversity
in ZYMV. Although drastic bottlenecks are believed to be the result of systemic
movement (Sacristan et al. 2003), we found that mutations persist from leaf to
leaf across the leaf samples, which is suggestive that the bottleneck itself did not
drastically reduce the number of virions moving systemically along the vine.
Interestingly, genetic diversity increased per leaf as the vine grew over time. The
132
likely viral source is the circulating sap in the phloem, which it appears, can act
as a viral source. Notably, variants were found to cluster within the cylindrical
inclusion (CI) protein, shared by most (19/24) leaves. As a result of these non-
synonymous mutations a conformational change in the protein was predicted,
which is suggestive that these variants may offer a selective advantage for cell-
to-cell movement, although additional functional work is needed to validate this.
However, what is clear is that the bottlenecks that occur during systemic
movement are not sufficient to restrict viral movement, and viral genetic diversity
appears to circulate throughout the phloem sap during infection. Notably the
clustering of the variants within the CI protein could have implications for viral
movement and genetic diversity during infection.
Considering that the bottlenecks associated with systemic movement did
not appear to be extreme, this likely led to variants clustering in nearly all
samples within the CI protein and is suggestive of host by virus interactions.
Thus using different host genotypes that are inoculated with clonal viral lines
could be used to investigate the extent of viral-host interactions and its effects on
viral genetic diversity and disease severity. Furthermore the use of FACS to sort
individual cells (Tromas et al. 2014) and sequencing of the viral population,
would be greatly informative with respect to understanding the spatial distribution
of the viral population across cells in leaves and to ascertain the stochasticity
associated with systemic movement, vector transmission, and disease severity.
133
The third study (chapter four) outlines a set of procedures that optimizes
steps for sample preparation (single as well as high-throughput) of samples with
limited DNA concentrations (200-500 ng). These protocols significantly reduce
reagent costs for sample preparation to ~$12.6-14.9 for individually prepared
libraries and ~$8.6-10.6 for pooled samples. Despite this improvement there are
still areas of the process where advancements could be made. Specifically,
reducing the time to process libraries for sequencing would help to further reduce
the costs associated with library construction. Furthermore, development of
protocols that reduce the number of molecular steps, remove the PCR
component, and/or are amenable to strand specific sequencing would offer an
opportunity for researchers to pursue more advanced sequencing technologies,
such as single cell sequencing.
This dissertation provides insight into the genetic variation within
mitochondrial and viral populations within their respective host and the use of
next generation sequencing (NGS) to investigate these systems was critical. The
utility of NGS is broad, particularly as it becomes more affordable, and will be
applicable with further sample preparation development for single cell
sequencing. Whether it is stem cell dysfunction due to mitochondrial
heteroplasmy or RNA virus evolution, understanding the extant of genetic
diversity in these systems through sequencing and how genetic diversity
contributes to disease will provide insight into management strategies for disease
prevention
134
REFERENCES
Sacristan, S., Malpica, J.M., Fraile, A., Garcia-Arenal, F., 2003. Estimation of
population bottlenecks during systemic movement of Tobacco mosaic
virus in tobacco plants. J. Virol. 77: 9906-9911.
Tromas N, Zwart MP, Lafforgue G, Elena SF. 2014. Within-host spatiotemporal
dynamics of plant virus infection at the cellular level. Plos Genetics
10:e1004186.
135
BIBLIOGRAPHY
Aaskov, J., Buzacott, K., Thu, H.M., Lowry, K., Holmes, E.C., 2006. Long-term
transmission of defective RNA viruses in humans and Aedes mosquitoes.
Science 311:236-238.
Acosta-Leal, R., Duffy, S., Xiong, Z., Hammond, R.W., Elena, S.F., 2011.
Advances in plant virus evolution: translating evolutionary insights into
better disease management. Phytopathol 101:1136-1148.
Ahlqvist KJ, et al. 2012. Somatic progenitor cell vulnerability to mitochondrial
DNA mutagenesis underlies progeroid phenotypes in Polg mutator mice.
Cell Metab. 15:100–109.
Ahmed T, Mukherjee S, Pattnaik B, Kumar M, Singh S, Kumar M, Rehman R,
Tiwari BK, Jha KA, Barhanpurkar AP, Wani MR, Roy SS, Mabalirajan U,
Ghosh B, Agrawal A. 2014. Miro1 regulates intercellular mitochondrial
transport & enhances mesenchymal stem cell rescue efficacy. The EMBO
Journal 33:994-1010.
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. 2006. Molecular
biology of the cell (4
th
ed.). (New York: Garland Science; 2002).
Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA. 2011. BLAST Ring Image
Generator (BRIG): simple prokaryote genome comparisons. BMC
Genomics 12:402.
Baguñà, J, Saló E, Auladell C. 1989. Regeneration and pattern formation in
planarians. III. Evidence that neoblasts are totipotent stem cells and the
source of blastema cells. Development 107:77- 86.
Bazin E, Glémin S, Galtier N. 2006. Population Size Does Not Influence
Mitochondrial Genetic Diversity in Animals. Science 312:570-572.
Benjamini Y and Hochberg Y. 1995. Controlling the false discovery rate: a
practical and powerful approach to multiple testing. Journal of the Royal
Statistical Society Series B 57:289-300.
Bensasson D, Zhang D, Hartl DL, Hewitt GM. 2001. Mitochondrial pseudogenes:
evolution’s misplaced witnesses. Trends in Ecology and Evolution 16:314-
321.
Berger, P.H., 2001. Potyviridae, eLS. John Wiley & Sons, Ltd.
136
Bergstrom, C.T., McElhany, P., Real, L.A., 1999. Transmission bottlenecks as
determinants of virulence in rapidly evolving pathogens. Proc Natl Acad
Sci USA 96:5095-5100.
Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Pütz J,
Middendorf M, Stadler PF. 2012. MITOS: Improved de novo Metazoan
Mitochondrial Genome Annotation. Molecular Phylogenetics and Evolution
69:313-319.
Blua, M.J., Perring, T.M., 1989. Effect of Zucchini Yellow Mosaic-Virus on
Development and Yield of Cantaloupe (Cucumis-Melo). Plant Dis 73:317-
320.
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2010. Scaffolding pre-
assembled contigs using SSPACE. Bioinformatics 27:578-579.
Brusca RC, Brusca GJ. 1990. Invertebrates (Sunderland, MA: Sinauer
Associates).
Cantliffe, D.J., Shaw, N.L., Stoffella, P.J., 2007. Current trends in cucurbit
production in the US. Proceedings of the IIIrd International Symposium on
Cucurbits 731:473-478.
Carrasco, P., de la Iglesia, F., Elena, S.F., 2007. Distribution of fitness and
virulence effects caused by single-nucleotide substitutions in tobacco etch
virus. J. Virol. 81:12979-12984.
Carrington, J.C., Jensen, P.E., Schaad, M.C., 1998. Genetic evidence for an
essential role for potyvirus CI protein in cell-to-cell movement. The Plant J.
14:393-400.
Cavalli-Sforza LL, Menozzi P, Piazza A. 1993. Demic expansions and human
evolution. Science 259:639-646.
Cingolani P, Platts A, Wang L, Coon M, Nguyen T, Wang L, Land S, Lu X, Ruden
D. 2012. A program for annotating and predicting the effects of single
nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila
melanogaster strain w
1118
; iso-2; iso-3. Fly 6:80-92.
Clement, M., Posada, D., Crandall, K.A., 2000. TCS: a computer program to
estimate gene genealogies. Mol Ecol 9:1657-1659.
Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ,
Pawlowski TL, Laub T, Nunn G, Stephan DA, et al. 2008. Identification of
genetic variants using bar-coded multiplexed sequencing. Nat. Methods 5:
887–893.
137
Darling AE, Mau B, Perna NT. 2010. progressiveMauve: Multiple Genome
Alignment with Gene Gain, Loss, and Rearrangement. PLoS One
5:e11147.
De Smet I, Lau S, Mayer U, Jürgens G. 2010. Embryogenesis – the humble
beginnings of plant life. Plant J. 61:959-970.
Decker DS, Wilson H D. 1987. Allozyme variation in Cucurbita pepo complex: C.
pep ovar. overifera vs. C. texana. Syst. Bot.12:263–273.
Decker-Walters DS. 1990. Evidence for multiple domestication of Cucurbita
pepo. In Biology and Utilization of the Cucurbitaceae, pp. 96–101. Edited
by D. M. Bates, R. W. Robinson and C. Jeffrey. Ithaca, NY: Cornell
University Press.
Decker-Walters DS, Straub JE, Chung SM, Nakata E, Quemada HD. 2002.
Diversity in free-living populations of Cucurbita pepo Cucurbitaceae as
assessed by random amplified polymorphic DNA. Syst. Bot. 27:19–28.
Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction
of the tree of life. Nature Reviews Genetics 6:361-375.
Desbiez, C., Lecoq, H., 1997. Zucchini yellow mosaic virus. Plant Pathol. 46:809-
829.
Dolja, V.V., Haldeman, R., Robertson, N.L., Dougherty, W.G., Carrington, J.C.,
1994. Distinct Functions of Capsid Protein in Assembly and Movement of
Tobacco Etch Potyvirus in Plants. Embo J. 13:1482-1491.
Dolja, V.V., Haldeman-Cahill, R., Montgomery, A.E., Vandenbosch, K.A.,
Carrington, J.C., 1995. Capsid protein determinants involved in cell-to-cell
and long distance movement of tobacco etch potyvirus. Virology
206:1007-1016.
Domingo-Calap, P., Cuevas, J.M., Sanjuan, R., 2009. The Fitness Effects of
Random Mutations in Single-Stranded DNA and RNA Bacteriophages.
PLoS Genet. 5.
Duffy, S., Shackelton, L.A. and Holmes, E.C. 2008. Rates of evolutionary change
in viruses: patterns and determinants. Nat. Rev. Genet. 9:267-276.
Dumollard R, Duchen M, Carroll J. 2007. The Role of Mitochondrial Function in
the Oocyte and Embryo. Curr. Top. Dev. Biol. 77:21-49.
Dunham JP and Friesen ML. 2013. A Cost-Effective Method for High-Throughput
Construction of Illumina Sequencing Libraries. Cold Spring Harb Protoc
9:820-34.
138
Dunham JP, Simmons HE, Holmes EC, Stephenson AG. 2014. Analysis of viral
(zucchini yellow mosaic virus) genetic diversity during systemic movement
through a Cucurbita pepo vine. Virus Research 191:172-9.
Dunoyer, P., Thomas, C., Harrison, S., Revers, F., Maule, A., 2004. A cysteine-
rich plant protein potentiates Potyvirus movement through an interaction
with the virus genome-linked protein VPg. J. Virol. 78:2301-2309.
Elena SF and Sanjuán R. 2005. Adaptive Value of High Mutations Rates of RNA
Viruses: Separating Causes from Consequences. J. Virol. 79:115555.
Fabre F, Montarry J, Coville J, Senoussi R, Simon V, Moury B. 2012. Modelling
the Evolutionary Dynamics of Viruses within Their Hosts: A Case Study
Using High-Throughput Sequencing. PLoS Pathog. 8.
Fabre, F., Moury, B., Johansen, E.I., Simon, V., Jacquemond, M., Senoussi, R.,
2014. Narrow Bottlenecks Affect Pea Seedborne Mosaic Virus
Populations during Vertical Seed Transmission but not during Leaf
Colonization. PLoS Pathog. 10:e1003833.
Fairclough SR, Dayel MJ, King N. 2010. Curr. Bio. 20:R875-876.
Feuer, R., Boone, J.D., Netski, D., Morzunov, S.P., St Jeor, S.C., 1999.
Temporal and spatial analysis of Sin Nombre virus quasispecies in
naturally infected rodents. J Virol 73:9544-9554.
Forsthoefel, D.J., Newmark, P.A. 2009. Emerging patterns in planarian
regeneration. Curr. Opinion in Genetics & Development 19:412-420.
French R, Stenger DC. 2003. Evolution of wheat streak mosaic virus: Dynamics
of population growth within plants may explain limited variation. Annual
Review of Phytopathol. 41:199-214.
Fuchs Y and Steller H. 2011. Programmed Cell Death in Animal Development
and Disease. Cell 147:742-758.
Futuyma DJ. 1998. Evolutionary Biology (3
rd
Edit.). (Sinauer Associates, Inc.,
Sunderland, Massachuestts).
Gal-On A. 2007. Zucchini yellow mosaic virus: insect transmission and
pathogenicity – the tails of two proteins. Mol. Plant Pathol. 8:139-150.
García-Arenal, F., Fraile, A., Malpica, J.M., 2001. Variability and genetic structure
of plant virus populations. An. Rev. of Phytopathol. 39:157-186.
García-Arenal, F., Fraile, A. and Malpica, J.M. 2003. Variation and evolution of
plant virus populations. Int. Microbiol. 6:225-232.
139
Gómez de Cedrón, M., Osaba, L., López, L., García, J.A., 2006. Genetic analysis
of the function of the plum pox virus CI RNA helicase in virus movement.
Virus Res. 116:136-145.
Gonzalez-Jara P, Fraile A, Canto T, Garcia-Arenal F. 2009. The multiplicity of
infection of a plant virus varies during colonization of its eukaryotic host. J.
Virol. 83:7487-7494.
Goto H, Dickins B, Afgan E, Paul IM, Taylor J, Makova KD, Nekrutenko A. 2011.
Dynamics of mitochondrial heteroplasmy in families investigated via a
repeatable re-sequencing study. Genome Biol, 12:R59.
Gutierrez, S., Yvon, M., Pirolles, E., Garzo, E., Fereres, A., Michalakis, Y., Blanc,
S., 2012. Circulating Virus Load Determines the Size of Bottlenecks in
Viral Populations Progressing within a Host. PLoS Pathog. 8:e1003009.
Haag-Liautard C, Coffey N, Houle D, Lynch M, Charlesworth B, Keightly PD.
2008. Direct Estimation of the Mitochondrial DNA Mutation Rate in
Drosophila melanogaster. PLoS Biology 6:e204.
Haeckel, E. 1874. Memoirs: The Gastraea-Theory, the Phylogenetic
Classification of the Animal Kingdom and the Homology of the Germ-
Lamellæ. Quarterly Journal of Microscopical Science 14:142-165.
He Y, Wu J, Dressman DC, Iacobuzio-Donahue C, Markowitz SD, Velculescu
VE, Diaz LA, Kinzler KW, Vogelstein B, Papadopoulos N. 2010.
Heteroplasmic mitochondrial DNA mutations in normal and tumour cells.
Nature 464:610-614.
Heinlein, M., Epel, B.L., Padgett, H.S., Beachy, R.N., 1995. Interaction of
tobamovirus movement proteins with the plant cytoskeleton. Science
270:1983-1985.
Henze K, Martin W. 2003. Essence of mitochondria. Nature 426:127-8.
Holmes EC. (2009). The evolutionary genetics of emerging viruses. Annu. Rev.
Ecol. Evol. Syst. 40:353-372.
Islam MN, Das SR, Emin MT, Wei M, Sun L, Westphalen K, Rowlands DJ,
Quadri SK, Bhattacharya S, Bhattacharya J. 2012. Mitochondrial transfer
from bone marrow-derived stromal cells to pulmonary alveoli protects
against acute lung injury. Nat. Med. 18:759-765.
140
Jerzak, G.V., Brown, I., Shi, P.Y., Kramer, L.D., Ebel, G.D., 2008. Genetic
diversity and purifying selection in West Nile virus populations are
maintained during host switching. Virology 374:256-260.
Jridi, C., Martin, J.F., Marie-Jeanne, V., Labonne, G., Blanc, S., 2006. Distinct
viral Populations differentiate and evolve independently in a single
perennial host plant. J. Virol. 80:2349-2357.
Jukes TH, Osawa S. 1990. The genetic code in mitochondria and chloroplasts.
Experientia. 46:1117-26.
Katis, N.I., Tsitsipis, J.A., Lykouressis, D.P., Papapanayotou, A.,
Margaritopoulos, J.T., Kokinis, G.M., Perdikis, D.C., Manoussopoulos,
I.N., 2006. Transmission of Zucchini yellow mosaic virus by colonizing and
non-colonizing aphids in Greece and new aphid species vectors of the
virus. J. of Phytopathol. 154:293-302.
Kelley, L.A., Sternberg, M.J., 2009. Protein structure prediction on the Web: a
case study using the Phyre server. Nat. protocols 4:363-371.
Kircher, M. and Kelso, J. 2010. High-throughput DNA sequencing – concepts and
limitations. BioEssays. 32:524-536.
Kirk DL. 2005. A twelve-step program for evolving multicellularity and a division
of labor. BioEssays 27:299-310.
Koboldt D, Zhang Q, Larson D, Shen D, McLellan M, Lin L, Miller C, Mardis E,
Ding L, Wilson R. 2012. VarScan 2: Somatic mutation and copy number
alteration discovery in cancer by exome sequencing. Genome Research
22:568-76.
Lázaro EM, Harrath AH, Stocchino GA, Pala M, Baguñà J, Riutort M. 2011.
Schmidtea mediterranea phylogeography: an old species surviving on a
few Mediterranean islands? BMC Evolutionary Biology 11:274.
Lech, W.J., Wang, G., Yang, Y.L., Chee, Y., Dorman, K., McCrae, D., Lazzeroni,
L.C., Erickson, J.W., Sinsheimer, J.S., Kaplan, A.H., 1996. In vivo
sequence diversity of the protease of human immunodeficiency virus type
1: presence of protease inhibitor-resistant variants in untreated subjects.
J. Virol. 70:2038-2043.
Li, H., Roossinck, M.J., 2004. Genetic Bottlenecks Reduce Population Variation
in an Experimental RNA Virus Population. J. Virol. 78:10582-10587.
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics 25:1754-1760.
141
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis
G, Durbin R. 2009. 1000 Genome Project Data Processing Subgroup. The
Sequence alignment/map (SAM) format and SAMtools. Bioinformatics
25:2078-2079.
Li M, Schönberg A, Schaefer M, Schroeder R, Nasidze I, Stoneking M. 2010.
Detecting heteroplasmy from high-throughput sequencing of complete
human mitochondrial DNA genomes. Am. J. Hum. Genet. 87:237-249.
Lira R, Andres TC, Nee M. 1995. Cucurbita. Pages 1-115 in R. Lira, (ed).
Systematic and ecogeographic studies on crop genepools. Volume 9.
International Plant Genetic Resources Institute. Mexico City and Rome.
Lisa V, Boccardo G, D’Agostino G, Dellavalle G, D’Aquilio M. 1981.
Characterization of a potyvirus that causes Zucchini yellow mosaic.
Phytopathology. 71:667–672.
Lockshin RA and Williams CM. 1964. Programmed cell death – II. Endocrine
potentiation of the breakdown of the intersegmental muscles of silkmoths.
Journals of Insect Physiology 10:643-649.
Lynch M. 2010. Evolution of mutation rate. Trends in Genetics 26:345-352.
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J,
Braverman MS, Chen YJ, Chen Z, et al. 2005. Genome sequencing in
microfabricated high-density picolitre reactors. Nature 437:376–380.
Maule, A.J., Wang, D., 1996. Seed transmission of plant viruses: a lesson in
biological complexity. Trends Microbiol 4:153-158.
McBride HM, Neuspiel M, Wasiak S. 2006. Mitochondria: more than just a
powerhouse. Curr. Biol. 16:R551–60.
Michod RE. 1999. Darwinian Dynamics, Evolutionary Transitions in Fitness and
Individuality. (Princeton University Press, Princeton, NJ).
Michod RE and Roze D. 2000. Cooperation and conflict in the evolution of
multicellularity. Heredity 86:1-7.
Miyashita S, Kishino H. 2010. Estimation of the Size of Genetic Bottlenecks in
Cell-to-Cell Movement of Soil-Borne Wheat Mosaic Virus and the Possible
Role of the Bottlenecks in Speeding Up Selection of Variations in trans-
Acting Genes or Elements. J. Virol. 84:1828-1837.
Montooth KL, Rand DM. 2008. The Spectrum of Mitochondrial Mutation Differs
across Species. PLoS Biol. 6:e213.
142
Morgan TH. 1898. Experimental studies of the regeneration of Planaria maculata.
Arch. Entwm. Org. 7:364–397.
Morgan TH. 1901. Regeneration 316 (The Macmillan Co., New York).
Morgan TH. 1902. Growth and regeneration in Planaria lugubris. Arch. Ent.
mech. Org. 13:179-212.
Morozova O and Marra MA. 2008. Applications of next-generation sequencing
technologies in functional genomics. Genomics 92:255–64.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and
quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5:621-
628.
Nabholz B, Mauffrey J, Bazin E, Galtier N, Glémin S. 2008. Determination of
Mitochondrial Genetic Diversity in Mammals. Genetics 178:351-361.
Nabholz B, Glémin S, Galtier N. 2009. The erratic mitochondrial clock: variations
of mutation rate, not population size, affect mtDNA diversity across birds
and mammals. BMC Evol. Biol. 10:54.
Newmark PA and Sánchez Alvarado A. 2000. Bromodeoxyuridine specifically
labels the regenerative stem cells of planarians. Dev. Biol. 220:142-153.
Newmark PA, Sánchez Alvarado A. 2002. Not your father's planarian: a classic
model enters the era of functional genomics. Nature Reviews Genetics
3:210-219.
Niehl, A., Heinlein, M., 2011. Cellular pathways for viral transport through
plasmodesmata. Protoplasma 248:75-99.
Nunnari J, Suomalainen A. 2012. Mitochondria: In Sickness and in Health. Cell
148:1145-1159.
Oparka, K.J., Prior, D.A., Santa Cruz, S., Padgett, H.S., Beachy, R.N., 1997.
Gating of epidermal plasmodesmata is restricted to the leading edge of
expanding infection sites of tobacco mosaic virus (TMV). Plant J. 12:781-
789.
Parr RL, Maki J, Reguly B, Dakubo GD, Aguirre A, Wittock R, Robinson Kerry,
Jakupciak JP, Thaver RE. 2006. The pseudo-mitochondrial genome
influences mistakes in heteroplasmy interpretation. BMC Genomics 7:185.
Pellettieri J, Fitzgerald P, Watanabe S, Mancuso J, Green DR, Sánchez Alvarado
A. 2010. Cell Death and tissue remodeling in planarian regeneration. Dev.
Biol. 338:76-85.
143
Peris CI, Rademacher EH, Weijers D. 2010. Green beginnings – pattern
formation in the early plant embryo. Curr. Top. Dev. Biol. 91:1-27.
Prockop DJ. 2012. Mitochondria to the rescue. Nat. Med. 18:653-654.
Quint M, Drost H, Gabel A, Ullrich KK, Bön M, Grosse I. 2012. A transcriptomic
hourglass in plant embryongenesis. Nature 490:98-101.
Randolph H. 1897. Observations and experiments on regeneration in planarians.
Arch. Entwm. Org. 5:352–372.
Ratcliff WC, Denison RF, Borrello M, Travisano M. 2012. Experimental evolution
of multicellularity. Proc Natl Acad Sci USA 109:1595-1600
Raven PH, Evert RF, Eichhorn SE. 2005. Biology of Plants. (W.H. Freeman &
Company).
Reddien PW, Sánchez Alvarado A. 2004. Fundamentals of planarian
regeneration. An. Rev. of Cell and Dev. Biol. 20:725-757.
Robb SMC, Ross E, Alvarado AS. 2007. SmedGD: the Schmidtea mediterranea
Genome Database. Nucleic Acids Research 36:D599-D606.
Roberts, I.M., Wang, D., Findlay, K., Maule, A.J., 1998. Ultrastructural and
Temporal Observations of the Potyvirus Cylindrical Inclusions (CIs) Show
That the CI Protein Acts Transiently in Aiding Virus Movement. Virology
245:173-181.
Rodriguez-Cerezo, E., Findlay, K., Shaw, J.G., Lomonossoff, G.P., Qiu, S.G.,
Linstead, P., Shanks, M., Risco, C., 1997. The coat and cylindrical
inclusion proteins of a potyvirus are associated with connections between
plant cells. Virology 236:296-306.
Rojas, M.R., Zerbini, F.M., Allison, R.F., Gilbertson, R.L., Lucas, W.J., 1997.
Capsid protein and helper component proteinase function as potyvirus
cell-to-cell movement proteins. Virology 237:283-295.
Roossinck MJ. 2007. Mechanisms of plant virus evolution. Annu. Rev.
Phytopathol. 35:191-209.
Roy, A., Kucukural, A., Zhang, Y., 2010. I-TASSER: a unified platform for
automated protein structure and function prediction. Nat. Protocols 5:725-
738.
Rybicki, E.P., Pietersen, G., 1999. Plant virus disease problems in the
developing world. Adv. Virus Res. 53:127-175.
144
Sacristan S, Malpica JM, Fraile A, Garcia-Arenal F. 2003. Estimation of
population bottlenecks during systemic movement of Tobacco mosaic
virus in tobacco plants. J. Virol. 77:9906-9911.
Saló E, Baguñà J. 1989. Regeneration and pattern formation in planarians II.
Local origin and role of cell movements in blastema formation.
Development 107:69-76.
Sakai M and Sakaizumi M. 2012. The Complete Mitochondrial Genome of
Dugesia japonica (Playhelminthes; Order Tricladida). Zoological Science
29:672-680.
Sánchez Alvarado A, Kang H. 2005. Multicellularity, stem cells, and the
neoblasts of the planarian Schmidtea mediterranea. Experimental Cell
Research 306:299-308.
Sanjuan, R., Moya, A., Elena, S.F., 2004. The distribution of fitness effects
caused by single-nucleotide substitutions in an RNA virus. Proc. Natl.
Acad. Sci. USA 101:8396-8401.
Sharpley MS, et al. 2012. Heteroplasmy of mouse mtDNA is genetically unstable
and results in altered behavior and cognition. Cell 151:333-343.
Shukla, D.D., Frenkel, M.J. and Ward, C.W. 1991. Structure and function of the
potyvirus genome with special reference to the coat protein coding region.
Canadian J. Plant Path. 13:178-191.
Simmons H.E., Holmes E.C., Gildow, F.E., Bothe-Goralczyk,
M.A. and
Stephenson, A.G. 2011. Experimental verification of seed transmission in
Zucchini yellow mosaic virus. Plant Dis. 95:751-4.
Simmons HE, Dunham JP, Stack JC, Dickins BJ, Pagán I, Holmes EC,
Stephenson AG. 2012. Deep sequencing reveals persistence of intra- and
inter-host genetic diversity in natural and greenhouse populations of
zucchini yellow mosaic virus. J. Gen. Virol. 93:1831-40.
Simmons H.E., Dunham, J.P., Zinn, K. E.,
Munkvold, G.P., Holmes E.C., &
Stephenson, A.G. 2013. Zucchini yellow mosaic virus (ZYMV, Potyvirus):
Vertical transmission, seed infection and cryptic infections. Virus Research
176:259-264.
Simpson C. 2012. The evolutionary history of division of labour. Proc. Biol. Sci.
279:116–121.
Slatkin M. 1987. Gene flow and the geographic structure of natural populations.
Science 236:787-792.
145
Sloan DB, Alverson AJ, Chuckalovcak JP, Wu M, McCauley DE, Palmer
JD,Taylor DR. 2012. Evolution of Enormous, Multichromosomal Genomes
in Flowering Plants Mitochondria with Exceptionally High Mutations Rates.
PLoS Biology 10:e1001241.
Sloan DB, Triant DA, Wu M, Taylor DR. 2014. Cytonuclear Interactions and
Relaxed Acceleration Sequence Evolution in Organelle Ribosomes. Mol.
Bio. Evol. 31:673-682.
Smith MJ and Szathmáry E. 1995. The Major Transitions in Evolution. (Oxford
University Press, New York).
Stein LD. 2010. The case for cloud computing in genome informatics. Genome
Biology 11:207.
Taanman J. 1999. The mitochondrial genome: structure, transcription, translation
and replication. Biochmica et Biophysica Acta 1410:103-123.
Tatineni, S., Kovacs, F., French, R., 2014. Wheat streak mosaic virus infects
systemically despite extensive coat protein deletions: Identification of
virion assembly and cell-to-cell movement determinants. J. Virol. 88:1366-
1380.
Tritt A, Eisen JA, Facciotti MT, Darling AE. 2012. An Integrated Pipeline for de
Novo Assembly of Microbial Genomes. PLoS ONE 7:e42304.
Tromas, N., Zwart, M.P., Lafforgue, G., Elena, S.F. 2014. Within-host
spatiotemporal dynamics of plant virus infection at the cellular level. PLoS
Genetics 10:e1004186.
Urcuqui-Inchima, S., Haenni, A. and Bernardi, F. 2001. Potyvirus proteins: a
wealth of functions. Virus Res. 74:157-175.
Voet D, Voet JG, Pratt CW. 2006. Fundamentals of Biochemistry, 2nd Edition.
John Wiley and Sons, Inc. p. 547.
Waddington CH. 1957. The Strategy of the Genes. (Geo Allen & Unwin, London,
1957).
Wang X, Ullah Z, Grumet R. 2000. Interaction between Zucchini Yellow Mosaic
Potyvirus RNA-Dependent RNA Polymerase and Host Poly-(A) Binding
Protein. Virology 275:433-443.
Ward, C.W., Shukla, D.D., 1991. Taxonomy of potyviruses: current problems and
some solutions. Intervirology 32:269-296.
146
Wei T, Zhang C, Hong J, Xiong R, Kasschau KD, Zhou X, Carrington JC, Wang
A. Formation of Complexes at Plasmodesmata for Potyvirus Intercellular
Movement Is Mediated by the Viral Protein P3N-PIPO. PLoS Pathogens
6:e1000962.
Wiesner RJ, Ruegg JC, Morano I. 1992. Counting target molecules by
exponential polymerase chain reaction, copy number of mitochondrial
DNA in rat tissues. Biochim Biophys Acta 183:553–559.
Wobus AM and Boheler KR. 2005. Embryonic Stem Cells: Prospects for
Developmental Biology and Cell Therapy. Physiol. Rev. 85:636-678.
Wolpert L and Szathmáry E. 2002. Evolution and the egg. Nature 420:745.
Xu S, Schaack S, Seyfert A, Choi E, Lynch M, Cristescu ME. High Mutation
Rates in the Mitochondrial Genomes of Daphina pulex. Mol. Biol. Evol.
29:763-769.
Zhang, Y., 2008. I-TASSER server for protein 3D structure prediction. BMC
Bioinformatics 9:40.
Zwart, M.P., Daros, J.A., Elena, S.F., 2011. One Is Enough: In Vivo Effective
Population Size Is Dose-Dependent for a Plant RNA Virus. PLoS Pathog.
7.
Zwart, M.P., Daros, J.A., Elena, S.F., 2012. Effects of Potyvirus Effective
Population Size in Inoculated Leaves on Viral Accumulation and the Onset
of Symptoms. J. Virol. 86:9737-9747.
147
APPENDIX: ABREVIATIONS
A – Anterior
ATP – Adenosine triphosphate
bp – Base pair
BrdU – Bromodeoxyuridine
CI – cylindrical inclusion
CP – Coat protein
DNA – Deoxyribonucleic acid
ER – Endoplasmic reticulum
h - Hour
kb – Kilobase
M – Middle
Mb – Megabase
min – Minute
N – Neoblast
148
N
e
– Effective population
ng – Nanogram
NGS – Next generation sequencing
NS – Non-synonymous
PTGS – Post-transcriptional gene-silencing suppressor
RNA - Ribonucleic acid
ROS – Reactive oxygen species
s – Second
S – Synonymous
SNP – Single nucleotide polymorphism
T – Tail
TEV – Tobacco etch virus
µg – Microgram
µm – Micrometer
µl – Microliter
ZYMV – Zucchini yellow mosaic virus
Abstract (if available)
Abstract
This thesis was motivated by the recognition of the importance of the underlying role that genetic variation plays in disease and how this genetic variation is modulated by evolution
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genetic architectures of phenotypic capacitance
PDF
Exploring the genetic basis of quantitative traits
PDF
Complex mechanisms of cryptic genetic variation
PDF
Evolutionary genomic analysis in heterogeneous populations of non-model and model organisms
PDF
Exploring the genomic landscape of giant kelp: biotechnological implications and sustainable development
PDF
Developing genetic tools to assist in the domestication of giant kelp
PDF
A population genomics approach to the study of speciation in flowering columbines
PDF
Evolutionary mechanisms responsible for genetic and phenotypic variation
PDF
Genome-scale insights into the underlying genetics of background effects
PDF
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Ancestral inference and cancer stem cell dynamics in colorectal tumors
PDF
Molecular anatomy of Noonan syndrome mutations in the testes of unaffected men
PDF
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
PDF
Understanding the genetics, evolutionary history, and biomechanics of the mammalian penis bone
PDF
DNA methylation changes in the development of lung adenocarcinoma
PDF
Investigating the evolution of gene networks through simulated populations
PDF
The complex genetic and molecular basis of oxidative stress tolerance
PDF
From gamete to genome: evolutionary consequences of sexual conflict in house mice
PDF
Genetic diversity and bacterial death in the context of adaptive evolution
PDF
Mechanisms of retinal degeneration caused by genetic and environmental factors
Asset Metadata
Creator
Dunham, Joseph Paul
(author)
Core Title
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Publication Date
02/10/2015
Defense Date
08/20/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cell‐to‐cell movement,genetic diversity,genomics,mitochondria,next generation sequencing,OAI-PMH Harvest,planaria,population bottlenecks,ZYMV
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nuzhdin, Sergey V. (
committee chair
), Arnheim, Norman (
committee member
), Finkel, Steven E. (
committee member
), Siegmund, Kimberly D. (
committee member
)
Creator Email
josephpdnhm@gmail.com,jpdunham@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-531830
Unique identifier
UC11298243
Identifier
etd-DunhamJose-3173.pdf (filename),usctheses-c3-531830 (legacy record id)
Legacy Identifier
etd-DunhamJose-3173.pdf
Dmrecord
531830
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Dunham, Joseph Paul
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
cell‐to‐cell movement
genetic diversity
genomics
mitochondria
next generation sequencing
planaria
population bottlenecks
ZYMV