Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Developing genetic tools to assist in the domestication of giant kelp
(USC Thesis Other)
Developing genetic tools to assist in the domestication of giant kelp
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Developing Genetic Tools to Assist in the Domestication of Giant Kelp
By
Gary Molano
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
May 2023
Copyright 2022 Gary Molano
ii
Acknowledgements
I would first like to thank my family for supporting me throughout my life. Thanks Mom
and Dad for being great parents and pushing me to do well in school. Thanks for staying up late
and helping me with my homework, spending time chaperoning school events, coaching me in
various sports, helping me be a better person, and encouraging me to pursue grad school after
spending a few years outside of academia. I would like to thank my wife, Connie, for being my
best friend and support shoulder during this process. I lean on you for advice and appreciate your
frank feedback. I appreciate using you as a sounding board for ideas and being able to laugh with
you made things much easier for me. I would also like to thank my brother, Nico, who was
always willing to help me with favors and for standing tall during some unfortunate health issues
on the home front. I would not be here today without all of you.
I would like to thank my many mentors and friends along the way, both in academics and
in life, in some form of chronological order. Thank you, Michael, for encouraging me to pursue
grad school when I considered leaving before it started. Thank you, XJ, for opening up your lab
to me when I was gaining experience in research before grad school. Thank you, Hanjing, for
spending time teaching me molecular biology research techniques from scratch. Thank you,
Aaron, sharing your enthusiasm for science with me when I was starting in the lab while making
the research quite fun. Thank you, Fumiaki, for sharing you bench expertise with me over the
years. Thank you, Sergey, for being a great PI to me and encouraging me to explore a new field
of research in your lab. Thank you, Peter, for spending so much time with me working on
refining bioinformatics techniques. Thank you to me officemates, Anupam and Levi, for being
great friends over the years; there were a lot of fun times in 316B, both science and nonscience
related. Thank you, OG Nerds (Justin, Celja, Calista, Katie, and James), for being great friends
iii
during the years at school; I loved our coffee walks, being silly at retreat, and watching you all
become great scientists. Thank you, José, for being a great colleague and co-author; we did some
cool research together. Thank you to the Kelp Team, specifically José, Melisa, and Kelly, for
being awesome scientists and people. It was fun working with you over the years. Thank you to
other members of the Nuzhdin lab for the times both talking shop and not. It has been a pleasure
working with you over the years. Thank you, Martin and Rachel, for being great mentors and
friends during and after my rotation with you. Thank you to Ian and Joe for letting me rotate in
your labs. Thank you to Sergey, Norm, Matt, and Dale for being on my committee. Thank you,
Matt, for recruiting me to USC to begin with.
I would also like to thank my friends and teammates over the years outside of USC.
Thank you, James, Derek, Austin, Madison, Shane, Richard, Adam, and Rob for all the fun times
over the years. Thank you, Malcom and Quentin, for being like little brothers to me. Thank you,
O’Hell crew, especially Fred and Bruce, for the life advice and fun times playing cards. Thank
you to my college teammates, especially Devon, Dillon, Ethan, Shawn, Ari, Alex, and Russell
for balling on and off the field. Thank you to my 7figures teammates, especially AJ, Joe, Chris,
Fish, Dan, Emma, and PVP for all the great memories on and off the field. Thank you to my co-
coaches, Joe and Tom, for spending all those hours with my late during the week and on
tournament weekends. Thank you to Lockdown Ultimate, especially Mikey, Burke, Sam,
Sealand, Wyatt, Phil, and Baby; I really enjoyed coaching you over the years. I would like to
thank Marshall and the bleed blue crew. I would also like to thank the many other important
people in my life not specified here for their support and friendship!
iv
Table of Contents
Acknowledgements………………………………………………………………………………..ii
List of tables………………………………………………………………….……...……………vi
List of figures……………………………………………………………………...……………..vii
Abstract…………………….………………………………………………….………………….ix
Introduction………………….……………………………………………….……….…………...1
Giant Kelp Introduction………………….…………………………….……………………...1
Brown Macroalgae Introduction………………….………………….……………..................1
Brown Macroalgae Evolution………………….………………….………………………..…2
Giant Kelp Evolution………………….………………………….…………………………...3
Giant Kelp life cycle and anatomy………………….………….……………………………...4
Brown Macroalgae Genomics………………….…………….…………………………...…...6
Giant Kelp Genomics………………….…………………….………………………………...7
Brown Macroalgae Aquaculture………………….……….……………………..……...….…8
Giant Kelp as an Aquaculture Target…………………..…………………………...................9
References………………….………………………….………………………….………….11
Chapter I: Sporophyte stage genes exhibit stronger selection than gametophyte stage genes in
haplodiplontic giant kelp………………….……………………..………………..……...............16
Abstract………………….…………………………………………………………...............16
Introduction………………….………………………………………………….…................17
Results………………….…………………………………………………………………….23
Gene Models and Differential Expression Analysis…………………....……………23
Standing molecular variation, admixture, linkage disequilibrium, and inbreeding...26
dN/dS and McDonald-Kreitman Tests …………………………………..…………..29
Screen for biological process enrichment……………………………………………32
Materials and Methods………………….…………………………………………................33
Material preparation and sequencing………………………….……………..............33
Assembly and annotation, and assessment of gene models………….…...………….34
Identification of differentially expressed genes………………………….………......36
SNP calling, Pi, linkage disequilibrium, population specific SNPs….…...…………36
Fixation index, dN/dS, McDonald-Kreitman and Tajima’s D calculations………….37
Discussion………………….………………………………………………………...............38
Conclusion………………….…………………………………………………….………….41
References………………….…………………………………………………..…………….42
Chapter II: The First Scaffolded and Annotated Genome of Giant Kelp (Macrocystis
pyrifera)……………….………………………………………………………………................49
Abstract………………….…………………………………………….…………………49
Introduction………………….…………………………………………………..............50
Results………………….………………………………………………………………..53
v
Genome sequencing, assembly, and annotation………………….……...............53
Genome Comparative analysis………………….……………………………….59
Genome polymorphism………………….……………………………………….60
Materials and Methods………………….………………………………………………..63
Data collection and sequencing………………….………………………………63
Genome assembly………………….…………………………………………….64
Genome completeness………………….………………………………………..67
Annotation………………….…………………………………………………….67
Comparative analysis………………….…………………………………………68
SNP calling and population genetics………………….…………………………68
Discussion………………….………………………………………………….…………70
Conclusion………………….……………………………………………………………71
References………………….…………………………………………………………….73
Chapter III: Genotyping the Giant Kelp Seedbank to Assist in the Giant Kelp De Novo
Domestication………………….………………………………………….……………........82
Abstract………………….…………………………………………………….................82
Introduction………………….………………………………………………...................83
Population Selection for the Giant Kelp Seedbank………………………………87
Farm Design, Location, and Phenotyping Results………….……………………87
Genomic Data Collection……….……………………………………..................88
Results………………….………………………………………………….…..................89
Standing Genetic Variation in the Seed Bank………………….………………...89
Organelle Assembly and Annotation……….……………………....……………90
Diploid and Haploid Organelle Genetic Variation………..………..……………92
Material and Methods………………….………………………………………………...94
Genetic Variation Analysis………………….……………….………..................94
Organelle Assembly and Annotation………………….…………………………95
Discussion………………….…………………………………………………………….95
Conclusion………………….……………………………………………………………97
References………………….…………………………………………………………….99
Dissertation Conclusion…………………………………………………………………….105
vi
List of Tables
Chapter I
Table 1 – Polymorphism measurements for each category of expression given
in nucleotide diversity (π) and Tajima’s D………………………………………………..……..27
Chapter II
Table 1 - Genome statistics comparison between the genomes of Macrocystis
pyrifera (assembled in this study), Ectocarpus sp, Saccharina japonica,
and Undaria pinnatifida………………………………………………………………………….56
vii
List of Figures:
Introduction
Figure 1: The giant kelp life cycle. Figure modified from North 1987…………….……………..6
Chapter I
Figure 1 – A) Adult giant kelp diploid sporophyte (Picture courtesy of Maurice Roper)
B) Microscopic giant kelp haploid gametophytes C) Giant kelp reproductive cycle:
during sexual reproduction, adult sporophytes (2N) produce zoospores (N) through
meiosis, which after settling differentiates into sexed gametophytes. Figure courtesy of
KellyDeWeese………………………………………………………………………………...…21
Figure 2 – A) Average Transcripts Per Million (TPM) for each life stage among all
RNA samples. B) TPM for each RNA-seq sample separated into each life stage….....……..25-26
Figure 3 – Analysis of population structure between the three analyzed populations: A)
Sampling locations of wild kelp populations in Southern California. B) Principal
component analysis showing strong signs of population structure. C) Nei’s genetic
distance between 48 individuals). D) Venn diagram of single nucleotide polymorphism
(SNPs) presented in each population……………………………………………………….…....28
Figure 4 – Measurements of sequence evolution and positive selection: A) Rate of
sequence evolution measured by dN/dS between Macrocystis pyrifera and Ectocarpus
siliculosus. B) Proportion of substitutions that were the result of positive selection
measured as alpha in a McDonald–Kreitman test C) Ratio of 1
st
& 2
nd
F
ST
to 3
rd
F
ST
indicating purifying selection on sporophyte-specific genes...................................................31-32
Figure 5 – Screen for biological process enrichment………………………….…………………32
Chapter II
Figure 1 – Circos plot showing pairwise genetic diversity, SNP density, Tajima’s D,
and Fst in sliding windows across the genome……………………………………….…………55
Figure 2. Comparison of BUSCO assessment of genome completeness macroalgae
genomes and genome data sets………………………………………..……..…………………..57
Figure 3. Protein comparative analysis of orthologs between giant kelp and three
other relevant macroalgal species (Ectocarpus sp., Saccharina japonica and
Undaria pinnatifida) using Orthofinder………………………………………………………….59
Figure 4. Synteny between the Macrocystis pyrifera genome and the genomes of
(A) Ectocarpus siliculosus and (B) Undaria pinnatifida…………..…………………………….60
viii
Figure 5. Population structure analysis of three Southern California giant kelp populations.
(A) Sampling location of the three analyzed wild giant kelp populations. (B) Principal
component analysis of genomic polymorphism showing strong population structure.
(C) LD decay curve for all 48 individuals combined. (D) LD decay curve for each
sampling population separately. (E) Structure plots showing K=2, K= 3, and K=4 for
the 48 individuals………………………………….…………………………….….....................63
Chapter III
Figure 1: Principal component analysis of the genetic variation data of 457 individuals
from the giant kelp seedbank who were also phenotyped on the giant kelp farm.….……...........90
Figure 2: Assembled and annotated draft organelle genome assemblies of giant kelp…........91-92
Figure 3: Principal component analysis of the genetic variation data of both
mitochondrial and chloroplast giant kelp genomes. Each plot contains data from both
the giant kelp haploid seedbank and the diploid population genetics study……………………..93
Abstract
ix
The need to develop new crops to help sustain the growing human population impels the
current effort to domesticate new species such as Macrocystis pyrifera (giant kelp). Giant kelp is
an ecologically important brown macroalga that grows incredibly quickly, and as a keystone
species with global range, provides habitat for hundreds of animals and invertebrate species.
Lessons from earlier domestication, genetics, and farm practices in other brown macroalgae can
be applied to help expedite the domestication of giant kelp, but the lack giant kelp genomic
resources hamper these efforts. Here, I present multiple giant kelp genomic advances: 1) I
compared selection on the different life stages in giant kelp and confirmed the population
structure of Southern California kelp using Illumina sequencing data and a giant kelp gene model
data set composed of genes conserved between giant kelp and the model brown macroalgae,
Ectocarpus siliculosus. 2) I expanded the giant kelp genomic toolkit by assembling, scaffolding,
and annotating a high quality giant kelp genome that can help identify genes and markers
essential for domestication, and calculated giant kelp linkage disequilibrium and pairwise genetic
diversity in Southern California kelp populations. 3) I assessed the genetic variation within 559
genotypes of Southern California giant kelp gametophytes that compose our seedbank.
Developing genetic resources for giant kelp is paramount for accelerating the domestication
effort, as an annotated reference genome combined with germplasm genotype data can be used
for identifying genes and markers essential for domestication success.
1
Introduction
This Thesis contains three chapters and an introduction to brown macroalgae. Chapter I
has been published, Chapter II is in preparation, and Chapter III sets the foundation for future
genotype-phenotype modeling of giant kelp. The introduction provides background knowledge
on giant kelp evolution, biology, aquaculture, and genomics of giant kelp and brown macroalgae.
Giant Kelp Introduction
Macrocystis pyrifera, or giant kelp, is one of the largest and fastest growing primary
producers in the world, and can produce up to 3.5% of its biomass per day (Rassweiler et al.
2018). Giant kelp is usually found attached to a rocky substratum along the coast in waters less
than 30 meters deep, but can also be found in deeper waters if conditions such as light
availability are permissible (Schiel and Foster 2015). Giant kelp is sometimes considered to be a
keystone species as it grows in dense forests that provides habitats for hundreds of marine
animals and invertebrates (Miller et al. 2018). Based on fossil records combined with molecular
clock analyses based on nuclear, plastid, and mitochondrial sequences, giant kelp initially
originated in the Northern Pacific around six million years ago before migrating southward and
proliferating around the world (Starko et al. 2019, Schiel and Foster 2015).
Brown Macroalgae Introduction
Giant kelp is a brown macroalga, (class Phaeophyceae), one of the six independent
eukaryotic lineages to achieve complex multicellularity (along with animals, land plants, red
algae, and two types of fungi). Complex multicellularity is marked by a three-dimensional
structure, cell to cell adhesion, cell to cell communication, and tissue differentiation (Knoll
2011). The brown macroalgae class contains approximately 2000 photosynthetic species and
contains a distinctive color due to the pigment fucoxanthin (Bringloe et al. 2020, Schiel and
2
Foster 2015). Brown macroalgae were initially classified in Kingdom Plantae due to analogous
morphologies to a plant, with a stipe like a stalk of a plant, a holdfast responsible for attaching
the kelp to a rocky substrate like a root, and with photosynthetic blades reaching towards the sun
like leaves (Schiel and Foster 2015). However, genetic studies combined with analysis of brown
macroalgae plastid and flagella structures showed major differences between brown macroalgae
and plants, and led to the reclassification of brown macroalgae into the Kingdom Chromista and
Phylum Ochrophyta (Schiel and Foster 2015). Further phylogenetic studies suggested an
endosymbiotic event occurred in Ochrophyta, in which plastids in Ochrophyta have a red
macroalgae origin, again complicating the evolutionary history of brown macroalgae (Ševčíková
et al. 2015).
Brown Macroalgae Evolution
Increased research attention on brown macroalgae has redefined phylogenetic
relationships based on genetics instead of morphology (Lane et al. 2006). Early phylogenetic
relationships and dating the origin of brown macroalgae species was more difficult to analyze
previously due to a limited fossil record (brown macroalgae soft tissue does not produce
abundant fossils) and convergent evolution between brown, red, and green macroalgae made
identification of the correct lineage difficult (Bringloe et al. 2020). Therefore, molecular clock
analyses using different sets of marker genes have been used to date the origin and
diversification of brown macroalgae species. Initial studies used partial 18S rRNA sequences to
resolve brown macroalgae into one clade (Tan and Druehl 1993). More recent phylogenies have
used increasing large marker gene sets, as well as chloroplast and mitochondrial markers, to help
further resolve the brown macroalgae phylogenetic tree (Phillips et al. 2008, Silberfeld et al.
2010, Starko et al. 2019, Bringloe et al. 2020). Using a combination of nuclear and organellar
3
genes in conjunction with several fossils, a relaxed molecular clock analysis dates the origin of
brown macroalgae (class Phaeophyceae) to ~180 million years ago (Silberfeld et al. 2010).
Another major question regarding the evolutionary history of brown macroalgae is when did the
number of clades and species increase and why did they increase? This expansion of brown
macroalgae clades and species, referred to as the BACR (Brown Macroalga Crown Radiation)
occurred between 130 and 100 million years ago, and this speciation could be explained by the
volcanic activity and basalt floods in Paraná between 134 and 129 million years ago that
probably led to a mass extinction event (Silberfeld et al. 2010, Peate 1997).
Giant Kelp Evolution
The largest and most complex brown macroalgae are in the order Laminaria, or ‘kelps’,
which contains around 120 species of kelp. Kelps are distinguished from other brown
macroalgae due to their larger size, upright growth forms and branching, which provide the
three-dimensional structure that defines kelp habitats and provides niches along both vertical and
horizontal space (Starko et al. 2019). Kelps originated in the Northern Pacific ~31 million years
ago, and diversified quickly as the range of kelps increased southwards, corresponding to the
Eocene-Oligocene boundary (Starko et al. 2019). The Eocene-Oligocene boundary (EO)
occurred about 31 millions years ago and coincided with a rapid cooling of the Pacific Ocean,
which led to mass marine extinctions and reduced taxonomic diversity, (Ivaney et al. 2000). The
newly available ecological niches caused by the cooling of the Pacific Ocean may have provided
the right conditions for the range expansion and increased speciation of kelps (Starko et al. 2019,
Estes and Steinberg 1988). Giant kelp has the largest range among other kelps, extending from
Alaska to Baja California along the eastern Pacific coast in the Northern Hemisphere, and along
the coast of South America, Africa, and Australia (Grahamet al. 2007).
4
Giant Kelp life cycle and anatomy
Giant kelp is a dimorphic haplodiplontic organism that alternates generations between a
microscopic haploid gametophyte and a macroscopic diploid sporophyte [Fig 1, North 1987].
Giant kelp is a fecund organism that releases billions of haploid zoospores from specialized
tissue into the water column, where they then settle onto the rocky ocean floor, where the
zoospores begin to develop into gametophytes (Dayton 1985, Reed et al. 1996). After settling,
giant kelp spores must undergo gametogenesis before they can successfully reproduce and yield
the next generation of diploid sporophytes (North 1987). The settling density of giant kelp
spores, as well as abiotic factors such as light, nutrient availability, and water motion, are
essential for reproduction success (Reed 1990). Male and female spores need to be 1 mm or less
apart after they have settled in order for the female gametophyte to successfully signal its
location to a male gametophyte using the pheromone lamoxirene (Maier et al. 2001, Dayton
1985). After successful fertilization, the diploid zygote then develops into a juvenile sporophyte
that is 5-10 cm tall and contains haptera, a stipe, and a blade. This blade can also be referred to
as a lamina, and the juvenile sporophytes of giant kelp resemble those of other members of the
order Laminariales (kelps) (Schiel and Foster 2015). The juvenile sporophyte initially grows
from its basal meristem, which causes the initial blade to split along its vertical axis, starting at
the base, and forming the primary dichotomy and two new fronds. Fronds continue to grow and
split into more fronds, and some fronds will eventually grow and reach the surface. (Schiel and
Foster 2015, North 1987). Blades, except for the bottom few blades, are attached to the stipe of a
frond with air bladders (pneumatocysts) that provide the buoyancy that allows giant kelp fronds
to reach and float on the ocean surface. The pneumatocysts in giant kelp are also responsible for
its name (Macro = large , and kystis = bladder) (North 1987). Giant kelp will continue to grow as
5
its apical, or scimitar, blade continues to divide into more blades (Schiel and Foster 2015). As
kelp grows towards the ocean surface, fronds can be split along the age of the tissues, with
senescent tissue towards the ocean floor and younger tissue near the surface.
As the apical meristem grows towards the ocean surface, kelp haptera grows from the
primary stipe towards the rocky substrate of the ocean floor to assist with the anchoring of the
kelp individual, and eventually develops into a kelp holdfast. The holdfast then “glues” itself to
the rocky substrate using a combination of secreted polysaccharides, while also growing around
any potential anchor points and other haptera/ holdfasts (Schiel and Foster 2015). Initially,
different holdfast morphologies led to the identification of four unique Macrocystis species.
However, this phenotypic plasticity has been shown to be an environmental response, and all
four species are now considered monospecific and are classified under Macrocystis pyrifera
(Coyer et al. 2001, Demes et al. 2009).
The blades on a giant kelp frond that are closest to the holdfast are referred to as
sporophylls. Sporophylls can be recognized as they typically lack the pneumatocysts that serve
as a junction point between blades and stipes. Sporophylls develop sporangia, which then group
into specialized tissue called sori (North 1987). Sori can be recognized due to its darker color
compared with non-reproductive tissue. Meiosis occurs in the sori tissue, resulting in the release
of the haploid spores into the water column (Schiel and Foster 2015). As with other marine
broadcast spawning organisms, giant kelp is incredibly fecund, and can release billions of spores
into the water column, where they can settle onto the ocean floor and begin the next generation
of kelps (Neushul 1959, Neushul 1963, Plough 2016, Dayton 1985, Reed et al. 1996). The
haploid and diploid stages of giant kelp not only have different morphologies, but also have
different life span ranges. Haploid spores released into the water column can live up to several
6
months, while diploid sporophytes live for four years on average (North, 1978; Carney, 2011;
Carney et al., 2013). However, individual fronds usually will senescence before nine months
(Rodriguez et al. 2013). Most kelp forests show high rates of turnover despite their potential
lifespan, mostly due to storm damage, especially in shallower waters (Dayton and Tegner 1984).
Figure 1: The giant kelp life cycle showing the alternation of generations between the haploid
gametophyte and diploid sporophyte. Figure modified from North 1987.
Brown Macroalgae Genomics
Brown macroalgae are great targets for research not only due to their vast economic and
ecological impacts, global range, and alternation of generations in their life cycle, but also their
7
genetic distance from other established model organisms such as Aribidopsis thaliana,
Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Mus
musculus (Peters et al. 2004). The genomic research focused on brown macroalgae lagged
behind that of established model organisms, despite conditions, such as the independent
evolution of multicellularity and marine environment of brown macroalgae, that could lead to the
discovery of novel genes, gene pathways, and metabolites (Peters et al. 2004). Genomic work
involving brown macroalgae was initially limited to using conserved regions for sequencing
targets such as 18S ribosomal genes, conserved plastid genes, and mitochondrial spacers, but not
all primers were universal among brown macroalgae, which led to unresolvable genetic
relationships among some species (Druehl 1993, Coyer et al. 2002, Phillips et al. 2006, Lane et
al. 2006, Engel et al. 2008). Therefore, the first brown macroalgae genome to be published,
Ectocarpus siliculosus, established a resource that future brown macroalgae genomics studies
could use as a benchmark for comparative analysis (Cock et al. 2010, Karathia et al. 2011).
Subsequent genomic research focusing on kelps led to the publication of the complete genomes
of the two economically important kelp species, Saccharina japonica and Undaria pinnatifida,
and further assisted selective breeding efforts for these species (Ye et al. 2015, Shan et al. 2020,
Hwang et al. 2019, Liu et al. 2014). With comparative genomic analysis now possible, recent
publications have also explored population genetic analyses of Undaria pinnatifia, establishing
genome wide parameters such as pairwise genetic diversity, Fst, and Tajima’s D for the first time
in a brown macroalgae (Graf et al. 2021).
Giant Kelp Genomics
Initial giant kelp genomic studies used the sequences of noncoding rDNA internal
transcribed spacer regions (ITS1 and ITS2) to collapse four Macrocystis pyrifera ecotypes as one
8
species instead of four separate species and established the North American origin of giant kelp
(Coyer et al. 2001, Demes et al. 2009). Ensuing microsatellite studies examined the population
structure of giant kelp along the coasts of northeastern and southeastern Pacific and determined
Southern California to be the region of highest genetic diversity based on microsatellite data
(Johansson et al. 2015, Macaya and Zuccarello 2010). Giant kelp transcriptomics sequenced
from samples collected from various seasons and depth showed upregulation of light harvesting
near the surface and nitrogen acquisition at depth (Konotchick et al. 2013, Salavarría et al. 2018).
Another brown macroalgae transcriptome study identified core genes in the giant kelp sex
determing regions after sequencing male and female giant kelp gametophytes and determined
that genes expressed in the gametophyte stage have higher nonsynonymous to synonymous
nucleotide substitution ratio compared with genes expressed at the sporophyte stage (Lipinska et
al. 2017, Lipinska et al. 2019). Despite the information gleaned in these studies, giant kelp
genomic resources have fallen behind fellow kelps Saccharina japonica and Undaria pinnatifida
as long as the lack of a high quality giant kelp genome continues.
Brown Macroalgae Aquaculture
While brown macroalgae has been cultivated for hundreds of years, a phenotypic
selective breeding program of the brown macroalgae Saccharina japonica and Undaria
pinnatifida began in the 1950s and developed varieties of kelp with improved yield and stress
tolerances (Buchholz et al. 2012, Zhang et al. 2007). Brown macroalgae cultivation has since
greatly expanded, with ~12.5 million tons of Saccharina japonica and ~2.8 million tons of
Undaria pinnatifida harvested in 2020 alone, with most of the harvest purposed for human
consumption and abalone feed (Hwang et al. 2019, FAO 2022). Selective phenotypic breeding
and hybridization led to increased yield and a longer growing season in both Undaria pinnatifida
9
and Saccharina japonica, but cultivar success was variable across environments (Liu et al. 2014,
Hwang et al. 2019). Inhibitors to cultivar success also include trait degeneration due to
inbreeding, clonal propagation from a poor choice of founding population, and gene crossing
from cultivars into wild populations (Hwang et al. 2019, Loureiro et al. 2015). As the global
demand for brown macroalgae continue to increase, genomics assisted breeding techniques
popularized in terrestrial plants, such as genome wide association studies, linkage mapping,
marker identification, and genomic selection, can be applied to brown macroalgae to expedite the
domestication process and further select for specific phenotypes (Shan et al. 2021).
Giant Kelp as an Aquaculture Target
Giant kelp became a species of economic interest during World War I as millions of tons
of giant kelp were harvested in California as an alternative source of potash (Neushul 1987).
However, once the war ended, demand for potash dried up and the potash kelp industry in
California collapsed. Kelp was then harvested by the Kelco Company for alginate starting in
1927, which has industrial uses as a sealant, thickener, and stabilizer (Neushul 1987). Kelp wild
harvest in California ranged from the tens to the hundreds of thousands of tons for most of the
20th century, and was primarily affected by catastrophic weather events like 1982-1984 El Niño
that devastated kelp beds (https://wildlife.ca.gov/Conservation/Marine/Status/2001). Today,
giant kelp continues to be harvested for alginate, as well as for a wide range of other applications
such as animal feed and fertilizer supplements, cosmetics, and pharmaceuticals (Arioli et al.
2015, Makkar et al. 2016, Tanna and Mishra 2018, Hameury et al 2019, Mallakour et al. 2021).
Despite the economic importance and wide use of giant kelp and its extracts, giant kelp
has not been cultivated at the same scale as the economic heavyweights Saccharina japonica and
Undaria pinnatifida (FAO 2022). Early cultivation experiments of giant kelp in California in the
10
1970s failed due to bacterial contamination and farm destruction due to wave activity (Neushul
et al. 1987). However, due to its high mannitol content, giant kelp has also potential as a
feedstock for biofuel and this has greatly increased the interest in giant kelp aquaculture and
domestication projects (Camus et al. 2016). One advantage of the recent increased interest in
giant kelp domestication are lessons from plant and other brown macroalgae domestication
process, coupled with increased genetic tools due to technological improvements like next
generation sequencing, can be applied to giant kelp to expedite the process (Kantar et al. 2017).
Additionally, crossing experiments have not only shown that heterosis (hybrid vigor) is
achievable in giant kelp, and giant kelp crosses have a wide range of phenotypes that can be
shifted towards more desirable outcomes during the domestication process (Westermeier et al.
2010, Westermeier et al. 2011). Crossing experiments have also shown inbreeding depression
(decreased fitness) in selfing giant kelp compared with outcrossing kelp (Raimondi et al. 2004).
Therefore, as giant kelp domestication efforts increase, generating a giant kelp genetic toolkit
would accelerate the domestication process, and could also be used as a useful tool for
demographic and conservation studies.
11
References:
Arioli, T., Mattner, S. W., and Winberg, P. C. (2015). Applications of seaweed extracts in
Australian agriculture: past, present and future. J. Appl. Phycol. 27, 2007–2015. doi:
10.1007/s10811-015-0574-9
Buschmann AH, Graham M, Vasquez J. (2007). Global Ecology of the Giant Kelp Macrocystis.
10.1201/9781420050943.ch2.
Bringloe TT, Starko S, Wade RM, Vieira C, Kawai H, De Clerck O, Cock JM, Coelho SM,
Destombe C, Valero M, Neiva J, Pearson GA, Faugeron S, Serrão EA, Heroen Verbruggen H.
(2020) Phylogeny and Evolution of the Brown Algae, Critical Reviews in Plant Sciences, 39:4,
281-321, DOI: 10.1080/07352689.2020.1787679
Buchholz, C.M., Krause, G., Buck, B.H. (2012). Seaweed and Man. In: Wiencke, C., Bischof, K.
(eds) Seaweed Biology. Ecological Studies, vol 219. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-28451-9_22
Camus, C., Ballerino, P., Delgado, R., Olivera-Nappa, Á, Leyton, C., and Buschmann, A. H.
(2016). Scaling up bioethanol production from the farmed brown macroalga Macrocystis
pyrifera in Chile. Biofuels Bioprod. Bioref. 10, 673–685. doi: 10.1002/bbb.1708
Cock, J., Sterck, L., Rouzé, P. et al. (2010). The Ectocarpus genome and the independent
evolution of multicellularity in brown algae. Nature 465, 617–621.
https://doi.org/10.1038/nature09016
Coyer, J. A., Smith, G. J. & Andersen, R. A. (2001). Evolution of Macrocystis spp.
(Phaeophyceae) as determined by ITS1 and ITS2 sequences. J. Phycol. 37:574–85.
Coyer JA, Peters A, Hoarau G, Stam W, Olsen J. (2002). Inheritance patterns of ITS1,
chloroplasts and mitochondria in artificial hybrids of the seaweeds Fucus serratus and F.
evanescens (Phaeophyceae). European Journal of Phycology, 37(2), 173-178.
doi:10.1017/S0967026202003682
Dayton, P. K. (1985). Ecology of kelp communities. Annu. Rev. Ecol. Syst. 16, 215–245. doi:
10.1146/annurev.es.16.110185.001243
Dayton, P.K. and Tegner, M.J. (1984). Catastrophic storms, El Niño, and patch stability in a
southern California kelp community. Science 224:283–285.
Demes KW, Graham MH, Suskiewicz TS. (2009) Phenotypic plasticity reconciles incongruous
molecular and morphological taxonomies: The Giant Kelp, Macrocystis, (Laminariales,
Phaeophyceae), is a Monospecific Genus. J Phycol. 45(6):1266-9. doi: 10.1111/j.1529-
8817.2009.00752.x. Epub 2009 Sep 29. PMID: 27032582.
12
Engel CR, Billard E, Voisin M, Viard F. (2008) Conservation and polymorphism of
mitochondrial intergenic sequences in brown algae (Phaeophyceae), European Journal of
Phycology, 43:2,195-205, DOI: 10.1080/09670260701823437
J.A. Estes, P.D. Steinberg. (1998) Predation, herbivory, and kelp evolution. Paleobiology, 14, pp.
19-36
FAO. 2022. The State of World Fisheries and Aquaculture 2022. Towards Blue Transformation.
Rome, FAO. https://doi.org/10.4060/cc0461en
Graf, L., Shin, Y., Yang, J.H. et al. (2021). A genome-wide investigation of the effect of farming
and human-mediated introduction on the ubiquitous seaweed Undaria pinnatifida. Nat Ecol Evol
5, 360–368. https://doi.org/10.1038/s41559-020-01378-9
Hameury, S., Borderie, L., Monneuse, J. M., Skorski, G., and Pradines, D. (2019). Prediction of
skin anti-aging clinical benefits of an association of ingredients from marine and maritime
origins: ex vivoevaluation using a label-free quantitative proteomic and customized data
processing approach. J. Cosmet. Dermatol. 18, 355–370. doi: 10.1111/jocd.12528
Hwang, E. K., Yotsukura, N., Pang, S. J., Su, L., and Shan, T. F. (2019). Seaweed breeding
programs and progress in eastern Asian countries. Phycologia 58, 484–495. doi:
10.1080/00318884.2019.1639436
Ivany, L.C., Patterson, W.P., Lohmann, K.C., (2000). Cooler winters as a possible cause of mass
extinctions at the Eocene-Oligocene boundary. Nature 407, 887–890. https://
doi.org/10.1038/35038044.
Johansson, M. L., Alberto, F., Reed, D. C., Raimondi, P. T., Coelho, N. C., Young, M. A., et al.
(2015). Seascape drivers of Macrocystis pyrifera population genetic structure in the northeast
Pacific. Mol. Ecol. 24, 4866–4885. doi: 10.1111/mec.13371
Karathia H, Vilaprinyo E, Sorribas A, Alves R. (2011) Saccharomyces cerevisiae as a model
organism: a comparative study. PLoS One. 6(2):e16015. doi: 10.1371/journal.pone.0016015.
PMID: 21311596; PMCID: PMC3032731.
Kantar M, Nashoba RA, Anderson JE, Blackman BK, Rieseberg LH. (2017). The Genetics and
Genomics of Plant Domestication, BioScience, 67:11:971–982.
https://doi.org/10.1093/biosci/bix114
Konotchick T, Dupont CL, Valas RE, Badger JH, Allen AE. (2013) Transcriptomic analysis of
metabolic function in the giant kelp, Macrocystis pyrifera, across depth and season. New Phytol.
198(2):398-407. doi: 10.1111/nph.12160. Epub 2013 Mar 13. PMID: 23488966; PMCID:
PMC3644879.
Knoll, A. H. 2011. The multiple origins of complex multicellularity. Annu. Rev. Earth Planet.
Sci. 39: 217–239.
13
Lane, C.E., Mayes, C., Druehl, L.D. and Saunders, G.W. (2006), A Multi-gene Molecular
Investigation of the Kelp (Laminariales, Phaeophyceae) Supports Substantial Taxonimic Re-
organization. Journal of Phycology, 42: 493-512. https://doi.org/10.1111/j.1529-
8817.2006.00204.x
Lipinska AP, Toda NRT, Heesch S, Peters AF, Cock JM, Coelho SM. (2017) Multiple gene
movements into and out of haploid sex chromosomes. Genome Biol. 18(1):104. doi:
10.1186/s13059-017-1201-7. PMID: 28595587; PMCID: PMC5463336.
Lipinska, A.P., Serrano-Serrano, M.L., Cormier, A. et al. (2019) Rapid turnover of life-cycle-
related genes in the brown algae. Genome Biol 20, 35. https://doi.org/10.1186/s13059-019-1630-
6
Liu, F., Sun, X., Wang, F. et al. (2014). Breeding, economic traits evaluation, and commercial
cultivation of a new Saccharina variety “Huangguan No. 1”. Aquacult Int 22, 1665–1675.
https://doi.org/10.1007/s10499-014-9772-8
Loureiro, R., Gachon, C. M., and Rebours, C. (2015). Seaweed cultivation: potential and
challenges of crop domestication at an unprecedented pace. New Phytol. 206, 489–492. doi:
10.1111/nph.13278
Macaya, Erasmo & Zuccarello, Giuseppe. (2010). Genetic structure of the giant kelp Macrocystis
pyrifera along the southeastern Pacific. Marine Ecology-Progress Series. 420. 103-112.
10.3354/meps08893.
Maier I, Hertweck C, Boland W. (2001) Stereochemical specificity of lamoxirene, the sperm-
releasing pheromone in kelp (Laminariales, Phaeophyceae). Biol Bull. (2):121-5. doi:
10.2307/1543327. PMID: 11687384.
Makkar, H. P. S., Tran, G., Heuze, V., Giger-Reverdin, S., Lessire, M., Lebas, F., et al. (2016).
Seaweeds for livestock diets: a review. Anim. Feed Sci. Technol. 212, 1–17. doi:
10.1016/j.anifeedsci.2015.09.018
Mallakpour S, Azadi E, Hussain CM. (2021) Chitosan, alginate, hyaluronic acid, gums, and β-
glucan as potent adjuvants and vaccine delivery systems for viral threats including SARS-CoV-
2: A review. Int J Biol Macromol. 182:1931-1940. doi: 10.1016/j.ijbiomac.2021.05.155.
Miller RJ, Lafferty KD, Lamy T, Kui L, Rassweiler A, Reed DC. (2018) Giant kelp, Macrocystis
pyrifera, increases faunal diversity through physical engineering. Proc Biol Sci.
285(1874):20172571. doi: 10.1098/rspb.2017.2571. PMID: 29540514; PMCID: PMC5879622.
Neushul, M (1987). "Energy from marine biomass: The historical record". In Bird, Kimon T.;
Benson, Peter H. (eds.). Seaweed cultivation for renewable resources. Elsevier. pp. 1–37. ISBN
9780444428646.
14
North, W. J. (1987). “Biology of the Macrocystis resource in North America,” in Case Studies of
Seven Commercial Seaweed Resources, eds M. S. Doty, J. F. Caddy, and B. Santelices (San
Francisco, CA: FAO).
Peate, D.W., 1997. The Paraná-Etendeka Province. In: Mahoney, J.J., Coffin, M.F., (Eds.). Large
Igneous Provinces: Continental, Oceanic, and Planetary Flood Volcanism. Geophysical
Monographic Series, vol. 1. Washington, DC, pp. 217–245.
Peters, A.F., Marie, D., Scornet, D., Kloareg, B. and Mark Cock, J. (2004). Proposal of
Ectocarpus Siliculosus (Ectocarpales, Phaeophyceae) as a Model Organism for Brown Algal
Genetics and Genomics. Journal of Phycology, 40: 1079-1088. https://doi.org/10.1111/j.1529-
8817.2004.04058.x
Phillips N, Burrowes R, Rousseau F, De Reviers B, Saunders GW. (2008) Resolving
Evolutionary Relationships Among the Brown Algae Using Chloroplast and Nuclear Genes. J
Phycol. (2):394-405. doi: 10.1111/j.1529-8817.2008.00473.x. PMID: 27041195.
Rassweiler, A., Reed, D.C., Harrer, S.L. and Nelson, J.C. (2018), Improved estimates of net
primary production, growth, and standing crop of Macrocystis pyrifera in Southern California.
Ecology, 99: 2132-2132. https://doi.org/10.1002/ecy.2440
Raimondi, P. & Reed, D. & Gaylord, B & Washburn, Libe. (2004). Effects of self-fertilization in
the giant kelp, Macrocystis pyrifera. Ecology. 85. 3267-3276. 10.1890/03-0559.
Reed, D. C. (1990). The effects of variable settlement and early competition on patterns of kelp
recruitment. Ecology 71, 776–787.
Reed, D. C., Ebeling, A. W., and Anderson, T. W. (1996). Differential reproductive responses to
fluctuating resources in two seaweeds with different reproductive strategies. Ecology 77, 300–
316. doi: 10.2307/2265679
Rodriguez, G.E., Rassweiler, A., Reed, D.C., and Holbrook, S.J. 2013. The importance of
progressive senescence in the biomass dynamics of giant kelp (Macrocystis pyrifera). Ecology
94:1848–1858.
Salavarría, E., Paul, S., Gil-Kodaka, P., and Villena, G. K. (2018). First global transcriptome
analysis of brown algae Macrocystis integrifolia (Phaeophyceae) under marine intertidal
conditions. 3 Biotech 8:185. doi: 10.1007/s13205-018-1204-4
Schiel DR, Foster MS. 2015. The biology and ecology of giant kelp forests. Oakland, CA:
University of California Press.
Ševčíková, T., Horák, A., Klimeš, V. et al. (2015). Updating algal evolutionary relationships
through plastid genome sequencing: did alveolate plastids emerge through endosymbiosis of an
ochrophyte?. Sci Rep 5, 10134. https://doi.org/10.1038/srep10134
15
Shan T, Yuan J, Su L, Li J, Leng X, Zhang Y, Gao H, Pang S. (2020). First Genome of the
Brown Alga Undaria pinnatifida: Chromosome-Level Assembly Using PacBio and Hi-C
Technologies. Front Genet. 11:140. doi: 10.3389/fgene.2020.00140. PMID: 32184805; PMCID:
PMC7058681.
Shan T, Pang S. (2021). Breeding in the Economically Important Brown Alga Undaria
pinnatifida: A Concise Review and Future Prospects. Front Genet. 12:801937. doi:
10.3389/fgene.2021.801937. PMID: 34925470; PMCID: PMC8671753.
Silberfeld T, Leigh JW, Verbruggen H, Cruaud C, de Reviers B, Rousseau F. (2010) A multi-
locus time-calibrated phylogeny of the brown algae (Heterokonta, Ochrophyta, Phaeophyceae):
Investigating the evolutionary nature of the "brown algal crown radiation". Mol Phylogenet Evol.
2010 Aug;56(2):659-74.
Starko S, Soto Gomez M, Darby H, Demes KW, Kawai H, Yotsukura N, Lindstrom SC, Keeling
PJ, Graham SW, Martone PT. (2019) A comprehensive kelp phylogeny sheds light on the
evolution of an ecosystem. Mol Phylogenet Evol.136:138-150. doi:
10.1016/j.ympev.2019.04.012.
Tan, I.H., Druehl, L.D. Phylogeny of the Northeast Pacific brown algal (Phaeophycean) orders as
inferred from 18S rRNA gene sequences. Hydrobiologia 260, 699–704 (1993).
https://doi.org/10.1007/BF00049090
Tanna, B., and Mishra, A. (2018). Metabolites unravel nutraceutical potential of edible
seaweeds: an emerging source of functional food. Compr. Reviews in Food Sci. Food Saf. 17,
1613–1624. doi: 10.1111/1541-4337.12396
Westermeier, R., Patiño, D.J., Müller, H. et al. Towards domestication of giant kelp (Macrocystis
pyrifera) in Chile: selection of haploid parent genotypes, outbreeding, and heterosis. J Appl
Phycol 22, 357–361 (2010). https://doi.org/10.1007/s10811-009-9466-1
Westermeier, R, Patiño, D, & Murúa, P, & Müller, D. (2011). Macrocystis mariculture in Chile:
Growth performance of heterosis genotype constructs under field conditions. Journal of Applied
Phycology. 23. 819-825. 10.1007/s10811-010-9581-z.
Xu, Z., Dapeng, L., Hanhua, H., and Tianwei, T. (2005). Growth promotion of vegetative
gametophytes of Undaria pinnatifida by blue light. Biotechnol. Lett. 27, 1467–1475. doi:
10.1007/s10529-005-1313-0
Ye, Naihao, et al. Saccharina genomes provide novel insight into kelp biology. (2015) Nature
Communications 6.1 1-11.
Zhang QS, Tang XX, Cong YZ, Qu SC, Luo SJ, Yang GP. 2007. Breeding of an elite Laminaria
variety 90-1 through inter-specific gametophyte crossing. Journal of Applied Phycology 19: 303–
311.
16
Chapter I: Sporophyte stage genes exhibit stronger selection than gametophyte stage genes in
haplodiplontic giant kelp
Gary Molano
*1
, Jose Diesel
*1
, Gabriel J. Montecinos
2
, Filipe Alberto
2
and Sergey V. Nuzhdin
**1
*
Co-first authors
1
Department of Molecular and Computational Biology, University of Southern California, Los
Angeles, California, United States
2
Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee,
Wisconsin, United States
**Correspondence:
Sergey V. Nuzhdin
snuzhdin@usc.edu
Keywords: brown algae, giant kelp, purifying selection, genomics, sustainable ocean
Molano G, Diesel J, Montecinos GJ, Alberto F and Nuzhdin SV (2022) Sporophyte Stage Genes
Exhibit Stronger Selection Than Gametophyte Stage Genes in Haplodiplontic Giant Kelp. Front.
Mar. Sci. 8:774076. doi: 10.3389/fmars.2021.774076
Abstract
Macrocystis pyrifera (giant kelp), a haplodiplontic brown macroalga that alternates
between a macroscopic diploid (sporophyte) and a microscopic haploid (gametophyte) phase,
provides an ideal system to investigate how ploidy background affects the evolutionary history
of a gene. In M. pyrifera, the same genome is subjected to different selective pressures and
environments as it alternates between haploid and diploid life stages. We assembled M. pyrifera
gene models using available expression data and validated 8,292 genes models using the model
alga Ectocarpus siliculosus. Differential expression analysis identified gene models expressed in
either or both of the haploid and diploid life stages. Genes expressed preferentially or exclusively
in the gametophyte stage were found to have more nucleotide diversity π = 2.3x10
-3
and 2.8x10
-
3
) than those for sporophytes (π = 1.1x10
-3
and 1x10
-3
). While gametophyte-biased genes show
17
faster sequence evolution, the sequence evolution exhibits less signatures of adaptations when
compared to sporophyte-biased genes. Our findings contrast the standing masking hypothesis,
which predicts higher standing genetic variation at the sporophyte stage, and support the strength
of expression theory, which posits that genes expressed more strongly are expected to evolve
slower. We argue that the sporophyte stage undergoes more stringent selection compared with
the gametophyte stage, which carries a heavy but possibly irregular genetic loads associated with
broadcast spawning. Furthermore, using whole-genome sequencing, we confirm the strong
population structure in wild M. pyrifera populations previously established using microsatellite
markers, and estimate population genetic parameters, such as pairwise genetic diversity and
Tajima’s D, important for conservation and domestication of M. pyrifera.
Introduction
The early evolutionary theory of “masking” posits that ploidy will affect the rates of
selection on genes (Crow and Kimura 1965). As most functionally important mutations are
deleterious but recessive, and the chance of having two deleterious alleles in the same position is
low, heterozygous diploids can “mask” deleterious alleles with wild type alleles (Crow and
Simmons 1983). Masking increases the genetic diversity of the diploid species, as deleterious
alleles persist far longer than in haploid species and carry the potential to be beneficial in varying
environmental conditions (Raper and Flexer 1970). Haploids are unmasked in comparison to
diploids: with only a single copy of each allele, haploids are more exposed to selection, meaning
that beneficial mutations reach fixation and deleterious alleles are purged faster (Kondrashov and
Crow 1991, Otto and Gerstein 2008). In haplodiplontic organisms, in which development and
vegetative growth occur in both haploid and diploid stages, genes expressed in different life
stages should undergo different evolutionary pressures (Jenkins and Kirkpatrick 1995, Bell
18
1994). Masking theory predicts that diploid-specific genes should contain more genetic variation
than haploid genes, as diploid-specific genes are not exposed to selection pressure in the haploid
stage and will accumulate mutations, while recessive mutations will be quickly purged from
haploid-specific genes (Szövényi et al. 2013).
However, masking may not be the primary determinant in gene evolutionary rates. A
previous study examined stage specific selection in land plants and found that masking alone
could not explain the stage specific differences in gene evolutionary rates (Szövényi et al. 2013).
Two alternative theories for selection on genes in different life stages are gene breadth of
expression and overall gene expression strength. Genes expressed across multiple tissue types
tend to evolve more slowly than tissue-specific genes (Park and Choi 2010). Gene expression
levels are also the predominant factor determining gene evolution in yeast (Drummond et al.
2006). Extending the gene breadth of expression theory to haplodiplontic organisms suggests
that genes that are expressed in both haploid and diploid life stages are expected to evolve more
slowly in comparison with genes limited to either the diploid or haploid stage (Szövényi et al.
2013). Extending the strength of expression theory to haplodiplontic organisms suggests that
genes in the life stage with higher expression levels will evolve more slowly.
The multiple ploidy stages found in haplodiplonts provide a system to test the influence
of masking and strength/breadth of expression on gene evolutionary rates in the haploid or
diploid life stages. Haplodiplonts can either be heteromorphic, in which the haploid and diploid
stages differ in morphology, or isomorphic, in which the haploid and diploid stages show no
morphological difference between stages (Hughes and Otto 1999). Some species of alga are
heteromorphic haplodiplonts, and have one life stage that is larger, favorable for growth and
vulnerable to predation, and another life stage that is smaller and resistant to environmental
19
stresses (Klinger 1993). In heteromorphic haplodiplonts, each life stage can have different
environmental conditions, niches, and stresses. Therefore, it is possible to test the factors behind
the different gene evolutionary rates in each life stage (Hughes and Otto 1999). In a diploid
dominant sporophyte, gene breadth of expression predicts that genes expressed in both the
haploid and diploid stages will evolve more slowly than genes expressed strictly in the haploid or
diploid stage. Again, in a diploid dominant sporophyte, gene strength of expression predicts that
genes expressed in the diploid life stage will have higher total expression and will evolve more
slowly than genes expressed in the haploid life stage. This is incompatible with the masking
theory prediction, which posits the haploid stage will evolve more slowly, as mutations will be
quickly purged from the genome. Previous studies in diploid dominant land plants revealed that
gene breadth of expression was the most important underlying factor for gene evolutionary rates,
and that haploid-specific genes did not evolve faster than diploid-specific or ubiquitous genes (as
predicted by masking) (Szövényi et al. 2013).
While gene evolutionary rate dynamics have been compared between diploid and haploid
life stages in angiosperms, other haplodiplontic organisms with different evolutionary histories,
such as many macroalgae, may undergo different selective pressures and interplay between
masking, gene breadth of expression, and gene expression strength (Yoon et al. 2004). For
example, in angiosperms the haploid stage only exists while protected and supplied by diploid
maternal tissues; thus purifying selection on haploid-specific genes might be substantially
reduced. By contrast, many species of brown macroalgae (Ochrophyta) possess free-living
haploid and diploid stages that independently evolved multicellularity and occupy a very
different niche space than land plants, including a more dynamic marine environment. Therefore,
20
it is worth investigating whether the gene expression breadth theory for gene evolutionary rate is
consistent in land plants and brown macroalgae (Cock et al. 2010).
Giant kelp (Macrocystis pyrifera) has a haplodiplontic life cycle, in which large, diploid
adult sporophytes [Fig 1A] produce microscopic haploid zoospores that settle on a substrate and
develop into haploid male or female gametophytes [Fig 1B]. Male gametophytes release sperm,
which fuse with female gametophyte eggs, producing the next generation of diploid sporophytes
(North 1987) [Fig 1C]. The biphasic life cycle of giant kelp provides a platform for investigating
how genes in the same organism can evolve differently depending on the ploidy background they
are being expressed, especially since the haploid and diploid stages are multicellular and are
exposed to natural selection (Bell 2008). Previous work on differential expression in giant kelp
concluded that gametophyte-biased genes had a significantly higher nonsynonymous to
synonymous substitution rate (Lipinska et al. 2019). We plan on building on this previous study
by conducting analyses with population variation data, and thus can measure if the faster
sequence evolution in gametophyte-biased genes is adaptive (Lipinska et al. 2019). To
investigate the gene evolutionary history between different life stages in giant kelp, we built a set
of gene models to generate a transcriptome reference and coupled gene evolution with patterns of
intraspecific standing variation. We also distinguished genes differentially expressed between
stages and used polymorphism data to assess genetic diversity, sequence evolution rates, and
measured adaptive evolution.
21
Figure 1 – A) Adult giant kelp diploid sporophyte (Picture courtesy of Maurice Roper) B)
Microscopic giant kelp haploid gametophytes C) Giant kelp reproductive cycle: during sexual
reproduction, adult sporophytes (2N) produce zoospores (N) through meiosis, which after
settling differentiates into sexed gametophytes. Sexual reproduction of gametophytes happens
when sperm from a male gametophyte encounters a female gametophyte egg. Fusion will result
in an early stage sporophyte completing the cycle. Figure courtesy of Kelly DeWeese
Intricacies of the giant kelp biology and ecology have been extensively studied since the
advent of scuba in the 1950s (Schiel and Foster 2015). More recent work has begun to unravel
the level of genetic variation present in wild populations. Early work on this topic identified
putative wild populations of kelp between geographically separated populations in California by
comparing growth rates in low nitrogen environments (Kopczak et al. 1991). Other studies used
internal described spacer regions of noncoding rDNA, as well as cytochrome c oxidase 1, to
22
resolve kelp phylogeny and collapse multiple potential kelp species in both the Northern and
Southern hemispheres into M. pyrifera (Coyer et al. 2001, Zuccarello and Macaya 2010). These
studies concluded that giant kelp originated in the Northern hemisphere and spread southward
into the Southern hemisphere (Zuccarello and Macaya 2010). Microsatellite marker analysis
along the coasts of North and South America revealed a center of genetic diversity near the
Southern California Bight, with populations becoming increasingly homogenous moving away
from that location (Johansson et al. 2015). As microsatellites have a higher mutation rate when
compared with coding genes, demographic patterns may change when using whole-genome
polymorphism data (Li et al. 2002).
A deeper understanding of giant kelp population genetics would greatly assist
domestication efforts in the near future. The global seaweed aquaculture industry surpassed $6
billion dollars in 2016 and continues to grow, as more seaweed is now cultivated in aquaculture
farms compared with harvested from wild populations (FAO 2018). Most of industry is focused
on several species of historical economic significance, including red macroalgae such as
Eucheuma, Graciliaria, Kappaphycus, and Porphyra (nori), as well as the brown macroalgae
Undaria pinnatifida (wakame) (FAO 2018). However, as the seaweed aquaculture industry
continues to expand, new species of interest for aquaculture domestication have been identified.
The globally distributed giant kelp is an incredibly fast-growing brown macroalgae that provides
the foundation for species-rich ecosystems (Graham et al. 2007). Due to its fast growth rate and
high polysaccharide content, including mannitol and alginate, giant kelp is a great candidate as
feedstocks for biofuels (Camus et al. 2016). Giant kelp can also be used as an animal feed
supplement and has extractable bio-active compounds that can be used in a wide variety of
applications, such as fertilizer supplement to cosmetics to antioxidant activity (Tanna and Mishra
23
2018, Makkar et al. 2016, Arioli et al. 2015, and Hameury et al. 2019). Seaweed domestication
has been hampered by limited starting gene pools and clonal propagation techniques. Therefore,
increasing the level of genetic information available for domestication is paramount for
improving seed stocks and limiting inbreeding depression (Loureiro et al. 2015). Genetic
crossing studies in Chile between geographically distinct populations pointed to potential
heterosis effects in giant kelp (Westermeier et al. 2009). Thus, exploring the population genetic
parameters and genetic relatedness between giant kelp populations in North America will assist
in domestication efforts.
Results
Gene models and differential expression analysis
To investigate gene evolution rates in giant kelp without the availability of a publicly
available reference genome, we first assembled contigs for giant kelp based on published cDNA
data using Trinity (Lipinska et al. 2019, Salavarría et al. 2018, Grabherr et al. 2011). To remove
assembly redundancies and isoforms, we then collapsed the initial contig assembly based on a
minimum length and percent identity thresholds. We then filtered and validated the contigs using
a reciprocal blast, to select high confidence gene models and gain annotation information as
suggested in previous studies (Moreno-Hagelsieb and Latimer 2008, Tatusov et al. 1997, Bork et
al. 1998). We reciprocally blasted against the published annotated Ectocarpus siliculosus
genome, the current model brown macroalgae that has been extensively studied, to identify and
annotate orthologs (Peters et al. 2004, Cock et al. 2010). The orders Laminariales (giant kelp)
and Ectocarpales (Ectocarpus siliculosus) diverged about 98 million years ago (Silberfeld et al.
2010). Reciprocal blasts are a conservative method to reveal orthologs, as only gene models that
are top matches in both blast directions are kept for analysis. Genes will be left out of this
24
analysis if they are not present in either species or strongly evolved between M. pyrifera and E.
siliculosus. M. pyrifera is diploid dominant, with the macroscopic sporophyte stage having more
cell types then the microscopic gametophyte stage. E. siliculosus has less differentiation between
the gametophyte and sporophyte stages, and is physically smaller than M. pyrifera (Lipinska et
al. 2019). Additionally, M. pyrifera contains 28,931 genes (3157 gametophyte biased genes,
22186 unbiased genes, 3588 sporophyte biased genes), while E. siliculosus contains 17,426
genes (4105 gametophyte biased genes, 11224 unbiased genes, 2097 sporophyte biased genes)
(Lipinska et al. 2019). We expect to find more haploid and ubiquitously expressed genes in this
data set due differences between the two species in morphologies and overall gene count,
especially in regard to the sporophyte stage. We also further filtered our gene model data based
on coverage and removed contaminants after blasting to Uniprot (Uniprot Consortium 2021).
Our final gene model data set consisted of 8,292 gene models.
We then assessed each level of pruning of the assembled contigs and gene models for
transcriptome completeness with Busco, which reports the presence of single-copy orthologs to
evaluate genome completeness (Seppey et al. 2019). This step limited redundant gene models in
the analysis, while simultaneously capturing as many genes identified by Busco as possible and
benchmarking them against the published E. siliculosus genome (Supplementary Table Busco
Comparison). We found that the 8,292 gene model data set contained almost all of the Busco
single copy orthologs in the eukaryote_odb10 database that were also found in the E. siliculosus
genome (173 genes to 175 genes, respectively). We also confirmed that there was only one
duplicate Busco gene in the 8,292 gene models. Furthermore, we classified genes with biased
and specific expressions in each life stage using differential expression analysis using DESeq2
and publicly available RNA-seq data from both the M. pyrifera gametophyte and sporophyte life
25
stages (Love et al. 2014, Lipinska et al. 2019, Salavarría et al. 2018). We found that 6,656 genes
were “ubiquitously” expressed, i.e., not differentially expressed between sporophyte and
gametophyte with a log
2
fold change with 0.05 false discovery rate. Most of the remaining genes
showed a higher expression towards the haploid gametophyte life stage, with 734 genes
exhibiting biased expression and 227 genes being completely gametophyte-specific. As for the
diploid sporophyte stage, 574 genes were biased while 101 were specific. We predicted the
increased number of ubiquitous, haploid-biased and haploid-specific genes, relative to
sporophyte biased and sporophyte-specific genes, based on differences in the morphologies and
gene counts between the two species. We then compared the average expression of genes that we
classified as either gametophyte, ubiquitous, or sporophyte; we found that sporophyte genes had
a higher level of expression compared with gametophyte genes. Ubiquitous gene expression was
intermediate compared to the sporophyte and gametophyte average expression levels [Fig 2A].
We then checked how the level of ubiquitous gene expression differed between the gametophyte
and sporophyte stages, and observed small differences between overall ubiquitous gene
expression in either the gametophyte or sporophyte stage [Fig 2B].
26
Figure 2 – A) Average Transcripts Per Million (TPM) for each life stage among all RNA
samples. B) TPM for each RNA-seq sample separated into each life stage.
Standing molecular variation, admixture, linkage disequilibrium, and inbreeding
To compare the degree of polymorphism in genes with biased, specific, and ubiquitous
expression, we performed whole genome sequencing on 48 individual sporophytes sampled from
three Southern California populations of M. pyrifera [Fig 3A]. We identified single nucleotide
polymorphisms (SNPs) after aligning reads to our M. pyrifera gene model data set. Using filtered
and bi-allelic SNPs, we calculated nucleotide diversity (π) across each gene model. On average,
nucleotide diversity had a value of π = 1.74x10
-3
between all the gene models in this analysis.
We also observed a higher π in gametophyte-biased and gametophyte-specific genes, with
gametophyte-specific genes containing almost three times the diversity of sporophyte-specific
genes, 2.8x10
-3
and 1.02x10
-3
, respectively [Table1]. We then calculated Tajima’s D for each
gene to as an indicator of random (neutrally) or non-random (selection) gene evolution. We
found that Tajima’s D was negative for all gene groups, showing an excess of low-frequency
polymorphisms when compared to the expected variation under the neutral theory (Tajima
27
1989). A negative Tajima’s D is indicative of a recent population expansion after a bottleneck
and is also consistent with selection sweeps [Table 1].
Table 1 – Polymorphism measurements for each category of expression given in nucleotide
diversity (π) and Tajima’s D. Superscript letters designate groups defined by pairwise Wilcoxon
test with Holm correction. Different superscript letters between groups signifies a p value <
0.005 between said groups.
Prior studies revealed population subdivision in Southern California giant kelp populations based
on microsatellite data, including distinct subpopulations at our sampling sites from Catalina
Island, Santa Barbara, and Camp Pendleton. (Johansson et al. 2015). We found a substantial
population structure between our sampling populations after we performed a principal
component analysis on the polymorphism data using the R package SNPRelate, with PC1 (6.7%
variance explained) and PC2 (5.4% variance explained (Zheng et al. 2012, Wickham et al. 2016)
[Fig 3B]. We further estimated population structure after calculating and visualizing Nei’s
genetic distance for each individual using the R packages vcfR and adgenet (Nei 1973, Jombart
2008, Knaus and Grünwald 2017) [Fig 3C]. Previous brown macroalgae population genetics
studies used molecular markers such as microsatellites to assay genetic variation within a species
and found a majority of the genetic variation within populations instead of between populations
(Breton et al. 2018, Mooney et al. 2018, Goecke et al. 2020). We compared the variation
between the Santa Barbara, Catalina Island, and Camp Pendleton populations using bi-allelic
SNPs across the gene models. We found that a majority of bi-allelic SNPs were population-
Nucleotide
Diversity (π)
Tajima's D
Sporophyte Specific 0.00101
a
-0.865
Sporophyte Biased 0.00116
a
-0.976
Ubiquitous 0.00173
b
-1.06
Gametophyte Biased 0.00252
c
-1.03
Gametophyte Specific 0.0028
c
-0.868
28
specific (24.6% of SNPs are unique to Catalina Island, 19.2% of SNPs are unique to Camp
Pendleton, 16.4% of SNPS are unique to Santa Barbara, and 19.5% of the SNPs are shared by all
three populations) [Fig 3D]. We calculated the fixation indexes, H
T
for all of the populations
together (H
T
= 0.122), and H
s
for each subpopulation (H
s_Santa_Barbara
= 0.103, H
s_Camp Pendleton
=
0.106, and H
s_Catalina
= 0.110) using vcfR (Knaus and Grünwald 2017). There was slightly more
genetic variation between populations than within populations, with the Catalina population
slightly more diverse than the coastal populations.
These findings matched the strong population
structure we had previously noted.
Figure 3 – Analysis of population structure between the three analyzed populations: A) Sampling
locations of wild kelp populations in Southern California. B) Principal component analysis
showing strong signs of population structure. C) Nei’s genetic distance between 48 individuals
29
mapped onto the gene models again showing clear population structure (AQ prefix is the Santa
Barbara Population, CB prefix is the Camp Pendleton population). D) Venn diagram of single
nucleotide polymorphism (SNPs) presented in each population showing that most SNPs are
population specific.
dN/dS and McDonald-Kreitman Tests
Our findings of higher nucleotide diversity in gametophyte expressed genes contradicts
our initial hypothesis that masking would lead to lower genetic diversity in gametophyte
expressed genes (as deleterious mutations in haploid expressed genes should be purged quickly
from the genome). Next, we investigated if the excess polymorphisms in the gametophyte stage
were neutral or adaptive and compared them with ubiquitous and sporophyte-specific genes. To
investigate the rate of sequence evolution between giant kelp life stages, we first looked at the
non-synonymous to synonymous substitution rate of coding regions, using E. siliculosus as an
outgroup. Ubiquitous genes exhibited a slower evolutionary rate in coding regions (lower dN/dS)
[Fig 3A], which was expected due to the conserved nature of the genes tested. Using a pairwise
Wilcox test, corrected with the Holm procedure, we found significant differences between the
dN/dS of ubiquitous genes and gametophyte-specific and gametophyte-biased genes (p < 2e-16),
as well as between gametophyte-specific and gametophyte-biased genes (p = 1.3e-06). We also
found significant differences between the dN/dS of ubiquitous genes and sporophyte-biased
genes (p = 0.03), and no significant differences between ubiquitous genes and sporophyte
specific genes (p = 0.94). Finally, we found significant differences between gametophyte and
sporophyte genes, both biased and specific (Supplementary Table). We did not see any
significant differences between sporophyte-specific and sporophyte-biased genes. Overall, for
genes with stage specific expression, the haploid gametophyte sequences had a higher sequence
evolution rate, while diploid sporophyte-specific genes showed rates more similar to
ubiquitously expressed genes (Supplementary Table).
30
After calculating the sequence evolution rate in M. pyrifera gene models across different
life stages, we used the McDonald-Kreitman test (MKT) to investigate if the substitutions and
segregating variants were nearly neutral or adaptive. To implement MKT, we used M. pyrifera
population polymorphism data and substitution data between M. pyrifera and E. siliculosus to
estimate alpha, the proportion of substitutions, driven by positive selection for each gene model
(McDonald and Kreitman 1991). A higher alpha value signifies more adaptive evolution. When
compared with ubiquitously expressed genes, sporophyte-specific genes and sporophyte-biased
genes had significantly higher alpha (p = 0.014 and p = 7.2e-5, respectively). Gametophyte-
specific genes, despite a higher rate of sequence evolution (higher dN/dS), showed the lowest
alpha values [Fig 3B]. Alpha values for ubiquitous genes was significantly different from
gametophyte-biased genes (p = 0.014), but not from gametophyte-specific genes (p = 0.78). The
low alpha values for gametophyte genes suggest that a large proportion of mutations affecting
the gametophyte stage have experienced nearly neutral evolution compared to the sporophyte
stage.
The negative Tajima’s D values across the different life stages of M. pyrifera points to
either population expansion or strong selection due to the higher number of rare alleles in the
population against the neutral theory expectation. To help determine the genetic patterns
responsible for the negative Tajima’s D values, we calculated the fixation index (F
ST
) for the first
two nucleotides (non-degenerate) of each codon separately from the third (degenerate) from
coding sequences of our gene models. Our expectation is that genes experiencing purifying
selection would have a lower F
ST
for the first two nucleotides of a codon compared with the third
nucleotide. We calculated the ratio of F
ST
of degenerate to non-degenerate nucleotides. We found
that while gametophyte and ubiquitous genes tend to have a similar ratio of degenerate to non-
31
degenerate nucleotides, sporophyte-specific genes tend to have a comparatively lower ratio [Fig
4C]. The higher ratio of degenerate to non-degenerate nucleotides for standing variation points to
less adaptive evolution among divergent giant kelp populations in gametophyte genes.
Figure 4 – Measurements of sequence evolution and positive selection: A) Rate of sequence
evolution measured by dN/dS between Macrocystis pyrifera and Ectocarpus siliculosus coding
regions for each category of gene expression. B) Proportion of substitutions that were the result
of positive selection measured as alpha in a McDonald–Kreitman test. Negative alpha values can
32
be the result of slightly deleterious mutations segregating. C) Ratio of 1
st
& 2
nd
F
ST
to 3
rd
F
ST
indicating purifying selection on sporophyte-specific genes. GB – gametophyte-biased, GS –
gametophyte specific, SB – sporophyte-based, SS – sporophyte specific. (pairwise Wilcoxon
test; ***p value < 0.001; **p value < 0.01; *p value < 0.05)
Screen for biological process enrichment
To identity trends in gene ontology between the different life stages in M. pyrifera, we
annotated the gene model data set using Trinotate (Bryant et al. 2017). To have a better overview
of the ontology content, we assigned gene ontology (GO) terms to a generic gene ontology
subsets, or GO slims. We tested for enrichment of GO slims for biased and specific genes when
compared to the whole set of genes using GOseq and visualized the five most enriched terms of
each category [Fig 5] (Young et al. 2010).
Figure 5 – Screen for biological process enrichment: first five most enriched processes for both
biased and specific genes (GOseq p < 0.05 with Wallenius correction for multiple testing). Size
of the circle correspond to the ratio of terms associated with a GO slim in that category of gene
expression relative to all terms in that Slim.
Almost 70% of GO terms associated with cell adhesion were found to be gametophyte
biased. The enrichment for cell adhesion genes in the gametophyte stage of giant kelp hints at the
33
complex and essential process of anchoring to substrates in the ocean. There was also an
enrichment of terms related to external encapsulating structures, with gametophyte biased genes
containing more than 30% of the GO terms. The study of this subset of genes can provide
insights into the strategies used by microscopic gametophytes to survive in a harsh oceanic
environment. For sporophyte genes, we found transport to be enriched, in both biased and
specific categories, and cell differentiation terms to be enriched in the sporophyte biased
category. Both of these comprise a set of genes potentially used to achieve giant kelp’s
sporophyte large size and complexity compared with its gametophyte stage.
Methods
Material preparation and sequencing
We collected kelp blades from 50 distinct individuals from three distinct locations in Southern
California (17 from Catalina Island, 16 from Santa Barbara, and 17 from Camp Pendleton) in the
summer of 2018. To limit epiphyte contamination in DNA extraction, we treated the blades
axenically (Singh et al. 2011, Weinberger 1999) before drying the tissue in silica beads. Due to
high levels of contamination by polysaccharides and polyphenols, we had to perform a
combination of DNA extraction methods. We first placed 10mg of dried tissue in a Qiagen
Powerbead Tubes with metal 2.38 mm beads and homogenized them using a Qiagen TissueLyser
for 60s at 30 Hz. From the powdered tissue, we extracted DNA using the Macherey-Nagel
Nucleospin Plant II kit along with the CTAB lysis buffer. Slight modifications were made to the
protocol based on other successful DNA sequencing projects involving brown macroalgae
(Guzinksi et al. 2018). We further cleaned the extracted DNA using the Qiagen DNeasy Power
Clean Pro Cleanup Kit.
34
To ensure the quality of the extracted and cleaned DNA, we checked for contaminants
using the 260/230 nm and 260/280 nm wavelength ratios from a Nano-drop spectrophotometer.
We then checked the concentration of the samples using a Qubit fluorometer. To check for
fragment size and contamination, we ran the DNA on a 1% agarose gel, looking specifically for
samples with a distinct band and no smearing. We then confirmed that the DNA could be
amplified successfully with PCR, using primers for the giant kelp genes IF2A and 18S to ensure
that there was no PCR inhibition (Konotchick et al. 2013).
Once we had DNA of sufficient quality, we prepared libraries using the KAPA
HyperPlus Kit and standard protocols, except we increased the fragmentase reaction time to one
hour. Libraries were then size selected for an insert size ~300 bp using Kapa Pure Beads. We
again quantified our libraries with Qubit and checked fragment size with a bioanalyzer. Once the
libraries passed initial quality tests, they were sent to Novagene in Davis, California and
sequenced on an Illumina Hiseq 2500 platform with 150 paired end reads. To remove adapter
sequences and trim low-quality trails of the raw sequencing data, we used Trimgalore (Krueger
et al. 2015). We removed individuals with poor sequencing coverage, leaving 48 individuals in
the final data set.
Assembly and annotation, and assessment of gene models
We assembled transcripts with TRINITY using publicly available RNA sequences from
both sporophyte and gametophyte samples (Grabherr et al. 2011, Lipinska et al. 2019, Salavarría
et al. 2018). We merged six gametophyte and two sporophyte samples during assembly to
generate a combined transcriptome. We removed adapters and trimmed sequences with Phred
score lower than 25 and length less than 50 nucleotides with Trimgalore (Krueger et al. 2015).
We then checked the sequences with FastQC (Andrews 2010). We removed contaminants using
35
Deconseq (Schmieder and Edwards 2011) and collapsed the final transcriptome using CAP3
(Huang and Madan 1999). We removed redundancies by clustering transcripts over 500
nucleotides long that blasted with 99% similarity, and selected the longest transcript in each
cluster to keep for analysis using a custom python script (Camacho et al. 2009). We performed a
best reciprocal BLAST against the genome of E. siliculosus, resulting in 9,934 gene models
(Cock et al. 2010). To remove contaminants from our gene models, we used Diamond to blast
our gene models against the Uniprot database and used BlobTools to visualize our results
(Buchfink et al. 2015, Uniprot Consortium 2021, Challis et al. 2020) [Supplemental Plot ]. We
kept gene models in our analysis if they classified to as Phaeophycea or non-hit, and filtered the
resulting gene models based on GC content keeping only the ones between 0.35 and 0.65.
Furthermore, we aligned our population reads to the resulting gene models and removed
instances where average coverage would be lower than 2 and higher than 30. After all filtering
and quality checking applied, we kept 8,292 gene models to be used in our analysis.
We then annotated these gene models using Trinotate using standard settings (Bryant et
al. 2017). We used GOSlimViewer to summarize our data by binning the gene ontology results
from Trinotate into a generic GO Slim set then used GOseq to test for gene ontology enrichment
of both biased and specific gene categories using ubiquitous genes as base (McCarthy et al.
2006, Young et al. 2010). We considered enriched terms with p-values smaller than 0.05, then
we calculated the proportion of terms associated with that gene category.
We analyzed the initial transcriptome contig assembly, the collapsed contigs, and the
reciprocally blasted gene models, and the reference E. siliculosus genome for genome
completeness using Busco v4.06 and the eukaryote_odb10 database with default parameters
36
(Seppey et al. 2019). The transcriptome assemblies and gene models were analyzed using the –
transcriptome option, while the E. siliculosus genome was analyzed using the –genome option.
Identification of differentially expressed genes
We used RSEM to estimate the expression levels of each gene model. RSEM implements
Bowtie2 to align RNA-seq data, and so we aligned the same publicly available RNA data used
for gene model assembly to our 8,292 gene models (Li and Dewey 2011, Langmead 2010). After
alignment, RSEM generates a transcript abundance quantifier table. We then imported the
transcript counts table directly into DEseq2 in order to calculate differential expression based on
life stage (Love et al. 2014). We set the log2 fold change threshold as one and false discovery
rate at 0.05 for DEseq2. In order to quantify our differential expression, we normalized counts
using transcripts per million (TPM), which allows for direct comparison of reads mapped to a
gene model between samples. Gene models were considered specific to a particular life stage
when they had a twofold change or higher difference in expression and specific when the
average TPM was below the fifth percentile on the opposite life stage.
SNP calling, Pi, linkage disequilibrium, population specific SNPs
We aligned Illumina WGS sequences from 48 individuals from three wild populations to
the collapsed, reciprocal blasted gene model data set using Hisat2 version 2.1.0 using standard
parameters (Kim et al. 2019). Two individuals were removed from the analysis due to low
coverage. Using samtools v1.9 (Li et al. 2009), we converted the alignment to a binary file,
sorted and removed PCR duplicates. Using GATK4 best practices pipeline pipeline (Van der
Auwera et al. 2013), we called variants such as single nucleotide polymorphisms (SNPs) and
insertion/deletions, producing a variant call file (VCF). We then combined the individual VCFs
and hard filtered the file to keep bi-allelic SNPs with singletons removed using VCFtools v0.1.16
37
with the following parameters: SNPs sequenced in 95% of individuals, and sites had to have a
minimum mean read depth of 5 reads and a maximum read depth of 30 reads (Danecek et al.
2011).
We then used VCFtools to calculate the pairwise genetic diversity for each gene model
using the –window-pi function. We set the window for the pairwise genetic diversity calculation
to be the length of the gene model to ensure that the diversity for each gene model was calculated
separately.
In order to identify population specific SNPs, we split the filtered VCF file by population
and filtered it as described above. We also set the minor allele frequency threshold to 1 to ensure
that all sites kept represented an actual SNP specific to that population, and increased the minor
allele frequency to 0.05, to remove singletons from this analysis. We then extracted the SNP IDs
from each population specific VCF file, and plotted the resulting venn diagram using the R
package VennDiagram (Chen and Boutros 2011).
We calculated H
s
for each population and H
t
for all populations using the R packages
adgenet and vcfR. The filtered VCF file was loaded into R using vcfR (Knaus and Grünwald
2017). The vcf file was then converted into a genlight object, and the population information for
each sample was added to the genlight object using adegenet (Jombart 2008). We then calculated
the Nei’s genetic distance using the command genetic_diff (method = ‘nei’) in vcfR for each
SNP in the VCF file, and tabulated values for each subpopulation and the total population.
Fixation index, dN/dS, McDonald-Kreitman and Tajima’s D calculations
To look for genetic structure and measure population differentiation we calculated the fixation
index using VCFtools. We differentiated between identified SNPs in coding regions and
separated them into two VCF files - one containing SNPs for both first and second bases of
38
codons and another containing only the third – using a custom script. This information was used
to calculate the ratio between degenerate to non-degenerate SNPs. Negative F
ST
values were
treated as zero.
To quantify the ratio of substitution rates at nonsynonymous and synonymous sites, we
aligned the protein sequences from our genes models to their best Ectocarpus siliculosus blast
pair using Clustal (Madeira et al. 2019). We used Pal2nal to generate corresponding codon
alignments between those and dN/dS was calculated using codeml from PAML (Suyama et al.
2006 and Yang 2007). We filtered out SNPs when the dN or dS value was above two. We used
VCFtools to calculate Tajima’s D separately for each entire gene model.
As for the MKT test, with Ectocarpus siliculosus as outgroup, we calculated derived
allele frequency and divergence using our polymorphism data. Those measurements were done
per each gene model, then processed through the iMKT R package (Murga-Moreno, et al. 2019)
to produce the respective alpha values for each gene – alpha being the proportion of substitution
derived by positive selection.
Discussion
Our study combined whole-genome sequencing of three giant kelp populations with
previously established brown macroalgal genetic resources to understand selection on different
life stages of giant kelp (Lipinska et al. 2019, Salavarría et al. 2018). Our findings concur with
previous studies of life stage-specific selection in giant kelp in which haploid expressed genes
had faster sequence evolution when compared with ubiquitously expressed genes or sporophyte
specific genes (Lipinska et al. 2019). We established that the higher levels of variation in the
gametophyte stage were under weaker selection constrains as compared to sporophyte stage. Our
findings directly refute the masking hypothesis expectation that the higher sequence evolution
39
(dN/dS) in the haploid gametophyte stage would lead to less nucleotide diversity in the haploid
stage. Our findings suggest that gene strength of expression in M. pyrifera was a major factor for
sequence evolution, as genes expressed in the sporophyte stage had higher levels of average
expression and showed more adaptive evolution than genes expressed ubiquitously or in the
gametophyte stage. Other haplodiplontic organisms have shown a similar pattern of gene
evolutionary rates across different life stages (Szövényi et al. 2013).
The surplus of rare alleles found in M. pyrifera may signify population expansion or
selection sweeps. The decreased genetic diversity and higher proportion of mutations under
adaptive selection (alpha values) of sporophyte and ubiquitous genes suggest recent selection
sweeps, while the increased genetic diversity and lower alpha values in the gametophyte stage
suggest relaxed selection. While some of these observations might appear discordant, a closer
examination of the life history of giant kelp provides some possible explanations. Patterns of
high levels of genetic diversity and genetic load are common in marine broadcast spawners
characterized by high juvenile mortality (Plough 2016, Alberto et al. 2010, Johansson et al.
2015). Giant kelp is a fecund macroalgae that releases billions of haploid spores into the water
column before they settle onto the ocean floor (Dayton et al. 1985, Reed et al. 1996). However,
the successful recruitment of giant kelp depends on spore settlement density (spores must be less
than 1 mm apart) and abiotic factors, such as light, substrate, nutrients, and water motion (Reed
1990, Dayton 1985). Additionally, giant kelp spores have a relatively short dispersal distance
that increases with water speed. Gaylord et al. (2004) estimated 50% of spores dispersed within
100 meters with little current (2 cm/s) while 50% of spores dispersed more than 1 km with fast
currents (10-50 cm/s). The maximum lifespan for gametophytes in the ocean is approximately
one to seven months, while the average lifespan for a sporophyte is four years (Carney 2011,
40
North 1978). Previous work identified environmental stochasticity as the main driver for
extinctions and recolonizations in Southern California (Reed et al. 2006, Castorani et al. 2015).
While genes across all life stages had negative Tajima’s D values in M. pyrifera, we
reason that genetic diversity and adaptive mutation differences between life stages occur because
natural selection is operating differently and separately on the gametophyte stage and the
sporophyte stage. r/k selection is an ecological theory that puts attention to either optimizing
reproduction rate or exploitation of carrying capacity (MacArthur and Wilson 1967, Pianka
1970). K selection occurs when a stable ecosystem leads to more investment in the quality over
quantity of progeny, while r selection occurs in an unstable ecosystem with investments in
quantity of progeny over quality (Cassill 2019). The distinct life stages of the haplodiplontic M.
pyrifera can partition selection along the lines of r- and K-selection separately for each life stage
(r refers to the maximum population growth rate [r-max] and favors productivity, while K refers
to the carrying capacity and favors efficiency) (Pianka 1970). The gametophyte stage favors high
fecundity and rapid development (“r-selection”), while the sporophyte stage has slower
development, lower fecundity, and more investment into competitive ability (“K-selection”).
Gametophytes show other signatures of “r-selection”, such as rapid development, small body
size, short lifespan, high mortality, and single reproduction. Sporophytes show signatures of “K-
selection”, such as slower development, larger body size, multiple reproductions, longer lifespan,
and lower mortality.
After performing Tajima’s D and MKT tests for each life stage and population, we found
both purifying and positive selection acting on sporophyte-specific genes. We also found that
minor alleles at gametophyte stage were nearly neutral and not adaptive. We reason that giant
kelp is undergoing greater selection pressure during its sporophyte stage, where it spends a much
41
greater fraction of its lifetime, and follows the r- vs k-selection theory. r-selection favors
productivity, which in this case is the massive production of cheap haploid spores, over the
selection of gametophyte traits adapted to the environment. We hypothesize that spores are
released in large numbers and might potentially have a sizable number of new mutations, but
once the spores have settled and reproduced, the resulting sporophytes undergo heavy selection.
Genes that are expressed ubiquitously show intermediate values for π and MKT, as well as for
average total expression, which suggests that these ubiquitously expressed genes are being
exposed to both the relaxed selection (r-selection) of the gametophyte stage and the increased
selection (K-selection) of the sporophyte stage.
Conclusion
This initial study has set the stage for future work with giant kelp that will greatly benefit
both conservation and domestication efforts by establishing important population genetic
parameters such pairwise genetic diversity, Tajima’s D, and dN/dS using gene models. We
confirmed that genetic variation of giant kelp in Southern California is characterized by strong
regional population structure. Our current analysis is conservative as it focuses on orthologs
between two brown macroalgae species. The gene models are biased in terms of quantity per life
stage as E. siliculosus is a simpler, isomorphic brown macroalgae compared with the larger,
more complex diploid dominant giant kelp. These analyses will be improved in the future when
using a reference genome for giant kelp instead of a set of highly conserved gene models. These
analyses would be further improved by increasing the individual sample size and number of
populations included for both RNA-seq and whole genome sequencing projects. These analyses
can also be extended to other organisms with multiple life stages that experience intermediate r/k
selection, such as the sea turtle.
42
References
Alberto F, Raimondi PT, Reed DC, Coelho NC, Leblois R, Whitmer A, Serrão EA. 2010.
Habitat continuity and geographic distance predict population genetic differentiation in giant
kelp. Ecology. 91(1):49-56.
Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. Available
online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Arioli T, Mattner SW, Winberg PC. 2015. Applications of seaweed extracts in Australian
agriculture: past, present and future. J Appl Phycol. 27(5):2007-2015.
Bell G. 1994. The comparative biology of the alternation of generations. Lectures on
Mathematics in the Life Sciences 251–26.
Bell, G. 2008. Selection: The Mechanism of Evolution. Oxford University Press.
Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. 1998. Predicting
function: from genes to genomes and back. J Mol Biol. 283(4):707-25.
Breton TS, Nettleton JC, O'Connell B, Bertocci M. 2018. Fine-scale population genetic structure
of sugar kelp, Saccharina latissima (Laminariales, Phaeophyceae), in eastern Maine, USA.
Phycologia. 57(1):32-40.
Bryant DM, Johnson K, DiTommaso T, Tickle T, Couger MB, Payzin-Dogru D, Lee TJ, Leigh
ND, Kuo TH, Davis FG, et al. 2017. A Tissue-Mapped Axolotl De Novo Transcriptome Enables
Identification of Limb Regeneration Factors. Cell Rep. 18(3):762-776.
Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using
DIAMOND. Nature Methods 12, 59-60.
Camus C, Ballerino P, Delgado R, Olivera-Nappa Á, Leyton C, Buschmann AH. 2016. Scaling
up bioethanol production from the farmed brown macroalga Macrocystis pyrifera in
Chile. Biofuels, Bioprod. Bioref., 10:673-685.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009.
BLAST+: architecture and applications. BMC Bioinformatics. 10:421.
Carney, L. 2011. A Multispecies Laboratory Assessment of Rapid Sporophyte Recruitment From
Delated Kelp Gametophytes. Journal of Phycology. 47:244-251.
Cassill, D.L. Extending r/K selection with a maternal risk-management model that classifies
animal species into divergent natural selection categories. 2019. Sci Rep 9, 6111.
43
Castorani MC, Reed DC, Alberto F, Bell TW, Simons RD, Cavanaugh KC, Siegel DA,
Raimondi PT. 2015. Connectivity structures local population dynamics: a long-term empirical
test in a large metapopulation system. Ecology. 96(12):3141-3152.
Challis R, Richards E, Rajan J, Cochrane G, Blaxter M. 2020. BlobToolKit – Interactive quality
assessment of genome assemblies. G3: GENES, GENOMES, GENETICS. 10(4):1361-1374.
Charlesworth D, Charlesworth B. 1987. Inbreeding Depression and its Evolutionary
Consequences. Annual Review of Ecology and Systematics, 18(1):237-268.
Chen H, Boutros PC. 2011. VennDiagram: a package for the generation of highly-customizable
Venn and Euler diagrams in R. BMC Bioinformatics. 12:35.
Cock JM, Sterck L, Rouzé P, Scornet D, Allen AE, Amoutzias G, Anthouard V, Artiguenave F,
Aury JM, Badger JH, et al. 2010. The Ectocarpus genome and the independent evolution of
multicellularity in brown algae. Nature. 465(7298):617-21.
Coyer JA, Smith GJ, Andersen RA. 2001. Evolution of Macrocystis spp. (Phaeophyceae) as
determined by ITS1 and ITS2 sequences. J Phycol. 37(4):574-585.
Crow JF, Kimura M. 1965. Evolution in Sexual and Asexual Populations. The American
Naturalist, 99(909):439-450.
Crow JF, Simmons MJ. 1983. The mutation load in Drosophila. pp 1-35 in The genetics and
biology of Drosophila. Academic Press, London.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G,
Marth GT, Sherry ST, et al. 2011. The variant call format and VCFtools. Bioinformatics.
27(15):2156-8.
Dayton PK. 1985. Ecology of kelp communities. Annual Review of Ecology and Systematics
16(1):215-245.
Drummond DA, Raval A, Wilke CO. 2006. A single determinant dominates the rate of yeast
protein evolution. Mol Biol Evol. 23(2):327-37.
FAO. 2018. The global status of seaweed production, trade, and utilization. Globefish Research
Programme Volume 124. Rome. 120 pp. Licence: CC BY-NC-SA 3.0 IGO.
Frazer, N. 1986. Survival from Egg to Adulthood in a Declining Population of Loggerhead Turtles,
Caretta caretta. Herpetologica,42(1), 47-55.
Gaylord B, Reed DC, Washburn L, Raimondi PT. 2004. Physical–biological coupling in spore
dispersal of kelp forest macroalgae. Journal of Marine Systems. 49:19-39.
44
Goecke F, Klemetsdal G and Ergon Å. 2020. Cultivar Development of Kelps for Commercial
Cultivation- Past Lessons and Future Prospects. Front. Mar. Sci. 8:110.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L,
Raychowdhury R, Zeng Q, et al. 2011. Full-length transcriptome assembly from RNA-Seq data
without a reference genome. Nat Biotechnol. 29(7):644-52.
Graham MH, Vásquez, J, Buschmann AH, 2007. Global ecology of the giant kelp Macrocystis:
from ecotypes to ecosystems. Oceanogr. Mar. Biol. Annu. Rev. 45, 39–88.
Guzinski J, Ballenghien M, Daguin-Thiébaut C, Lévêque L, Viard F. 2018. Population genomics
of the introduced and cultivated Pacific kelp Undaria pinnatifida: Marinas-not farms-drive
regional connectivity and establishment in natural rocky reefs. Evol Appl. 11(9):1582-1597.
Hameury S, Borderie L, Monneuse JM, Skorski G, Pradines D. 2019. Prediction of skin anti-
aging clinical benefits of an association of ingredients from marine and maritime origins: Ex
vivo evaluation using a label-free quantitative proteomic and customized data processing
approach. J Cosmet Dermatol. 18(1):355-370.
Huang X, Madan A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9(9):868-
77.
Hughes JS, Otto SP. 1999. Ecology and the Evolution of Biphasic Life Cycles. Am Nat.
154(3):306-320.
Husband BC, Schemske DW. 1996. Evolution of the magnitude and timing of inbreeding
depression in plants. Evolution. 50(1):54-70.
Jombart T. 2008. adegenet: A R package for the multivariate analysis of genetic markers.
Bioinformatics. 24(11):1403-5.
Jenkins CD, Kirkpatrick M. 1995. Deleterious mutations and the evolution of genetic life cycles.
Evolution. 49(3):512-520.
Johansson ML, Alberto F, Reed DC, Raimondi PT, Coelho NC, Young MA, Drake PT, Edwards
CA, Cavanaugh K, Assis J, et al. 2015. Seascape drivers of Macrocystis pyrifera population
genetic structure in the northeast Pacific. Mol Ecol. 24(19):4866-85.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and
genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37(8):907-915.
Klinger T. 1993. The persistence of haplodiploidy in algae. Trends in Ecology & Evolution
8:256–258.
Kondrashov AS, Crow JF. 1991. Haploidy or diploidy: which is better? Nature. 351(6324):314-
5.
45
Konotchick T, Dupont CL, Valas RE, Badger JH, Allen AE. 2013. Transcriptomic analysis of
metabolic function in the giant kelp, Macrocystis pyrifera, across depth and season. New Phytol.
198(2):398-407.
Kopczak CD, Zimmerman RC, Kremer JN. 1991. Variation in nitrogen physiology and growth
among geographically isolated populations of the giant kelp, Macrocystis pyrifera (Phaeophyta).
J Phycol. 27(2):149-158.
Knaus BJ, Grünwald NJ. 2017. Vcfr: a package to manipulate and visualize variant call format
data in R. Mol Ecol Resour. 17(1):44-53.
Krueger F. 2015. Trim galore. A wrapper tool around Cutadapt and FastQC to consistently
apply quality and adapter trimming to FastQ files. Available online at:
https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
Langmead B. 2010. Aligning short sequencing reads with Bowtie. Curr Protoc Bioinformatics.
Chapter 11:Unit 11.7.
Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or
without a reference genome. BMC Bioinformatics. 12:323.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R;
1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format
and SAMtools. Bioinformatics. 25(16):2078-9.
Li YC, Korol AB, Fahima T, Beiles A, Nevo E. 2002. Microsatellites: genomic distribution,
putative functions and mutational mechanisms: a review. Mol Ecol. 11(12):2453-65.
Lipinska AP, Serrano-Serrano ML, Cormier A, Peters AF, Kogame K, Cock JM, Coelho SM.
2019. Rapid turnover of life-cycle-related genes in the brown algae. Genome Biol. 20(1):35.
Loureiro R, Gachon CM, Rebours C. 2015. Seaweed cultivation: potential and challenges of crop
domestication at an unprecedented pace. New Phytol. 206(2):489-92.
Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for
RNA-seq data with DESeq2. Genome Biol. 15(12):550.
MacArthur, R. & Wilson, E. O. The Theory of Island Biogeography (2001 reprint ed.). Princeton
University Press (1967).
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter
SC, Finn RD, Lopez R. 2019. The EMBL-EBI search and sequence analysis tools APIs in 2019.
Nucleic Acids Res. 47(W1):W636-W641.
46
Makkar HPS, Tran G, Heuze V, Giger-Reverdin S, Lessire M, Lebas F, Ankers P. 2016.
Seaweeds for livestock diets: a review. Anim Feed Sci Technol. 212:1–17.
McCarthy FM, Wang N, Magee GB, Nanduri B, Lawrence ML, Camon EB, Barrell DG, Hill
DP, Dolan ME, Williams WP, et al. 2006. AgBase: a functional genomics resource for
agriculture. BMC Genomics. 7:229.
McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the Adh locus in Drosophila.
Nature. 351(6328):652-4.
Mooney KM, Beatty GE, Elsäßer B, Follis ES, Kregting L, O'Connor NE, Riddell GE, Provan J.
2018. Hierarchical structuring of genetic variation at differing geographic scales in the cultivated
sugar kelp Saccharina latissima. Mar Environ Res. 142:108-115.
Moreno-Hagelsieb G, Latimer K. 2008. Choosing BLAST options for better detection of
orthologs as reciprocal best hits. Bioinformatics. 24(3):319-24
Murga-Moreno J, Coronado-Zamora M, Hervas S, Casillas S, Barbadilla A. 2019. iMKT: the
integrative McDonald and Kreitman test. Nucleic Acids Res. 47(W1):W283-W288.
Nei M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National
Academy of Sciences 70:3321–3323.
North WJ. November 1978. Evaluation, Management, and Cultivation of Macrocystis Kelp
Forests. Paper presented at the Symposium on Chilean Algae. Santiago, Chile.
North WJ. 1987. Biology of the Macrocystis resource in North America. In: Doty MS, et al.
editors. Case Studies of Seven Commercial Seaweed Resources. San Francisco, California: FAO.
Otto SP, Gerstein AC. 2008. The evolution of haploidy and diploidy. Curr Biol. 18(24):R1121-4.
Park SG, Choi SS. 2010. Expression breadth and expression abundance behave differently in
correlations with evolutionary rates. BMC Evol Bio. 10:241.
Peters AF, Marie D, Scornet D, Kloareg B, Cock JM. 2004. Proposal of Ectocarpus
siliculosus (Ectocarpales, Phaeophyceae) as a model organism for brown algal genetics and
genomics. J Phycol. 40:1079-1088.
Pianka, E. R. On r-and K-selection. Am. Nat. 104, 592–597 (1970).
Plough LV. 2016. Genetic load in marine animals: a review. Curr Zool. 62(6):567-579.
Raimondi PT, Reed DC, Gaylord B, Washburn L. 2004. Effects of Self-Fertilization in the Giant
Kelp, Macrocystis Pyrifera. Ecology. 85(12):3267–3276.
47
Raper JR, Flexer AS. 1970. The road to diploidy with emphasis on a detour. Symp. Soc. Gen.
Microbiol. 20:401–432.
Reed DC. 1990. The effects of variable settlement and early competition on patterns of kelp
recruitment. Ecology. 71: 776–787.
Reed DC, Ebeling AW, Anderson TW. 1996. Differential reproductive responses to fluctuating
resources in two seaweeds with different reproductive strategies. Ecology. 77:300–316.
Reed DC, Kinlan BP, Raimondi PT, Washburn L, Gaylord B, Drake PT. 2006. A
Metapopulation Perspective on Patch Dynamics of Southern California. In: Kritzer JP, Sale PF,
editors. Marine Metapopulations. Academic Press. p. 353-386.
Salavarría E, Paul S, Gil-Kodaka P, Villena GK. 2018. First global transcriptome analysis of
brown algae Macrocystis integrifolia (Phaeophyceae) under marine intertidal conditions. 3
Biotech. 8(4):185.
Seppey M, Manni M, Zdobnov EM. 2019. BUSCO: Assessing Genome Assembly and
Annotation Completeness. In: Kollmar M, editor. Gene Prediction. Methods in Molecular
Biology, vol 1962. Humana, New York, NY. 2019
Schiel DR, Foster MS. 2015. The biology and ecology of giant kelp forests. Univ of California
Press.
Schmieder R, Edwards R. 2011. Fast identification and removal of sequence contamination from
genomic and metagenomic datasets. PLoS One. 6(3):e17288.
Singh RP, Bijo AJ, Baghel RS, Reddy CR, Jha B. 2011. Role of bacterial isolates in enhancing
the bud induction in the industrially important red alga Gracilaria dura. FEMS Microbiol Ecol.
76(2):381-92.
Silberfeld T, Leigh JW, Verbruggen H, Cruaud C, de Reviers B, Rousseau F. 2010. A multi-
locus time-calibrated phylogeny of the brown algae (Heterokonta, Ochrophyta, Phaeophyceae):
Investigating the evolutionary nature of the "brown algal crown radiation". Mol Phylogenet Evol.
56(2):659-74.
Szövényi P, Ricca M, Hock Z, Shaw JA, Shimizu KK, Wagner A. 2013. Selection is no more
efficient in haploid than in diploid life stages of an angiosperm and a moss. Mol Biol Evol.
30(8):1929-39.
Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion of protein sequence
alignments into the corresponding codon alignments. Nucleic Acids Res. 34(Web Server
issue):W609-12.
Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics. 123(3):585-95.
48
Tanna B, Mishra A. 2018. Metabolites Unravel Nutraceutical Potential of Edible Seaweeds: An
Emerging Source of Functional Food. Comprehensive Reviews in Food Sciences and Food
Safety. 17(6):1613-1624.
Tatusov RL, Koonin EV, Lipman DJ. 1997. A genomic perspective on protein families. Science.
278(5338):631-7.
The UniProt Consortium, UniProt: the universal protein knowledgebase in 2021, Nucleic Acids
Research, Volume 49, Issue D1, 8 January 2021, Pages D480–
D489, https://doi.org/10.1093/nar/gkaa1100
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan
T, Shakir K, Roazen D, Thibault J, et al. 2013. From FastQ data to high confidence variant calls:
the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics.
43(1110):11.10.1-11.10.33.
Weinberger F. 1999. Epiphyte-Host Interactions: Gracilaria conferta and Associated Bacteria.
Dissertation, University of Kiel, Germany, 138 pp.
Westermeier R, Patino DJ, Müller H, Müller D. 2010. Towards domestication of giant kelp
(Macrocystis pyrifera) in Chile: selection of haploid parent genotypes, outbreeding, and
heterosis. J Appl Phycol. 22:357-361.
Williams, GC. 1957. Pleiotropy, Natural Selection, and the Evolution of Senescence. Evolution.
11:398-411.
Yang Z. 2007.PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol.
24(8):1586-91.
Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. 2004. A molecular timeline for the
origin of photosynthetic eukaryotes. Mol Biol Evol. 21(5):809-18.
Young MD, Wakefield MJ, Smyth GK, Oshlack A. 2010. Gene ontology analysis for RNA-seq:
accounting for selection bias. Genome Biol. 11(2):R14.
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. 2012. A high-performance
computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics.
28(24):3326-8.
Zuccarello G, Macaya E. 2010. DNA barcoding and genetic divergence in the giant kelp
Macrocystis (Laminariales). J Phycol. 46:736-742.
49
Chapter II: The First Scaffolded and Annotated Genome of Giant Kelp (Macrocystis pyrifera)
Gary Molano
*1
, Jose Diesel
*1
, Gabriel J. Montecinos
2
, Kelly DeWeese
1
, Sara Calhoun
3
, Igor
Grigoriev
3
, Alan Kuo
3
, Anna Lipzen
3
, Asaf Salamov
3
, Filipe Alberto
2
and Sergey V. Nuzhdin
**1
*
Co-first authors
1
Department of Molecular and Computational Biology, University of Southern California, Los
Angeles, California, United States
2
Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee,
Wisconsin, United States
3
Joint Genome Institute, Department of Energy, Berkeley, California, United States
In Preparation
Abstract:
Macrocystis pyrifera (giant kelp), is a haplodiplontic brown macroalga that is of great
ecological importance as it is a massive primary producer and foundational species that provides
habitat for hundreds of species. Giant kelp has great economic potential as a feedstock for
biofuel, and can be used as food, a source of alginate, and in fertilizers, cosmetics, and
pharmaceuticals. One of the limitations to exploiting giant kelp’s economic potential as well as
assisting in giant kelp conservation efforts is a lack of genomic tools like a high quality,
contiguous reference genome with accurate gene annotations. Reference genomes attempt to
capture the complete genomic sequence of an individual (or individuals) from a specific species,
and importantly provide a universal structure for comparison across a multitude of genetic
experiments, both within species and between species. We assembled the giant kelp genome de
novo using PacBio reads, and then ordered contigs into chromosome level scaffolds using Hi-C.
We found the giant kelp genome to be 537 MB in length, with a total of 34 scaffolds and 188
contigs. The assembly N50 is 13,669,674 and the GC content of the genome is 50.37%. We
50
assessed the genome completeness using BUSCO, and found giant kelp contained 94% complete
BUSCO genes from the stramenopile clade. Annotation of the giant kelp genome revealed
25,919 genes. Additionally, we present genetic variation data based on 48 diploid giant kelp
sporophytes from three different Southern California populations that confirms the strong
population structure found in other studies of these populations. We present a high-quality giant
kelp genome that greatly increases the genetic knowledge of this ecologically and economically
vital species
Introduction
Macrocystis pyrifera (giant kelp) is the world’s largest macroalga and one of the fastest
growing multicellular autotrophs on Earth, increasing in mass by an average of 3.5% per day
(Rassweiler et al. 2018). It forms extensive subtidal forests on shallow reefs in temperate seas
that are among the most productive ecosystems on Earth (Graham et al. 2007, Reed and
Brzezinski 2009). Importantly, the high primary production and three-dimensional structure of
giant kelp forests provide habitat for hundreds of species, ranging from microscopic
invertebrates to different types of fish and mammals (Schiel and Foster 2015). It’s not surprising
that Darwin considered giant kelp forests analogous to terrestrial rainforests, owing to the
impressive species diversity sustained in each habitat (Darwin 1860). However, giant kelp and
other brown macroalgae (Phaeophyceae) are photosynthetic eukaryotes that are more closely
related to diatoms than land plants (Charrier et al. 2008). Investigating the genomes of brown
macroalgae can reveal novel genes and biological procceses as brown macroalgae evolved
multicellularity independently from other eukaryotes (Cock et al. 2011).
As a haplodiplontic organism, giant kelp alternates between a microscopic haploid
gametophyte stage and a macroscopic diploid sporophyte stage. The diploid sporophyte releases
51
haploid zoospores into the water column that eventually settle on the ocean floor and develop
into sexed gametophytes. Female gametophytes release a pheromone and oogania that trigger the
release of sperm from the male gametophyte, and attract the sperm towards the oogonia (Boland
et al. 1983). After fertilization, young diploid sporophytes begin to develop into the macroscopic
adult seen in kelp forests (Schiel and Foster et al. 2015). The dispersal distance of the zoospores
depends on many factors, including water motion (Gaylord et al. 2002, 2004, 2006), and
successful colonization depends on zoospores of the opposite sex settling with millimeters of
each other on suitable habitat (Reed et al. 1990, 1991), which is dependent on the size of nearby
kelp forests and the number and synchronous timing of zoospores released by them (Reed et al.
1997, 2004, Castorani et al. 2015, 2017).
Interest in giant kelp as an aquaculture crop has increased as the global algal aquaculture
market in 2020 produced 35 million tons of algae worth $16.5 billion (FAO 2022). Giant kelp is
one of the main sources of alginate, a long chain polysaccharide found in brown macroalgae cell
walls, with uses in food as a thickener and in medicine as a hydrogel (Reyes-Tisnado et al. 2005,
Mollah et al. 2021). Giant kelp also has a range of other applications including human food,
animal feed, cosmetics, pharmaceuticals, and fertilizers (Chopin and Tacon 2021). Due to its fast
growth rate and limited composition of lignin and cellulose, giant kelp has been identified as a
potential feedstock for biofuel (Camus et al. 2016). However, the large scale cultivation of giant
kelp lags behind brown macroalgae species such as Saccharina japonica and Undaria
pinnatifida, which are grown in China, Korea, and Japan as human food sources and have
undergone selective breeding programs since the 1950s (Sohn 1988, Hwang et al. 2019, Liu et al.
2014).
52
Genomics can greatly benefit aquaculture production by assisting in breeding efforts to
increase crop productivity, increase the quality of specific compounds in the crop, and increase
resistance to stress, disease, and bacterial infection (Briggs 1988). The first brown macroalga to
have its genome completely sequenced was Ectocarpus siliculosus, which has served as a model
species (Cock et al. 2010). Both Saccharina japonica and Undaria pinnatifida have since had
their genomes sequenced, an important step in breeding programs as reference genomes provide
an individual’s complete genetic information that can be universally compared against across
experiments (Ye et al. 2015, Shan et al. 2020, Kaye and Wasserman 2021).
Previous research has identified the need for improved cultivars of brown macroalgae
while avoiding low cultivar quality and inbreeding depression, and this improvement can be
expedited by increasing the availability of genomic tools for species such as giant kelp (Loureiro
et al. 2015, Robinson et al. 2013). However, the genetic tools for giant kelp are limited in
comparison with Saccharina japonica and Undaria pinnatifida; a giant kelp transcriptome in
2013, a heavily fragmented genome with an estimated completeness of 50% based on single
copy orthologs, and a set of gene models derived from reciprocal blasts against Ectocarpus
siliculosus (Konotchick et al. 2013, Paul et al. 2022, Molano et al. 2022). Prior research
identified a northern hemisphere origin for giant kelp based on phylogenetic analysis of the
ribosomal internal transcribed regions (Coyer et al. 2001). Further molecular dating in
conjunction with fossil records estimate that giant kelp emerged as a species ~13 million years
ago and initially was found in the colder waters in the Pacific Ocean off of the Alaskan coast
(Starko et al. 2019). Due to Pleistocene glacial and intraglacial cycles, the global hotspot of
microsatellite genetic diversity for giant kelp is presently in the Southern California Bight in the
northeast Pacific (Johansson et al. 2015). Therefore, in our efforts to support giant kelp
53
domestication, we focused on Southern California to investigate the genetic diversity in giant
kelp populations using whole genome sequencing. To this end, we improved giant kelp’s
genomic resources by assembling a chromosomal level nuclear genome and investigated three
Southern California populations for markers that can be used in selective breeding for
aquaculture.
Results
Genome sequencing, assembly, and annotation
We extracted DNA from a single female haploid gametophyte and sequenced the DNA
using PacBio Sequel II technology (See Materials and Methods). We obtained 57GB of long
reads, representing a coverage of approximately 100x of our estimated 513-542MB giant kelp’s
genome. Our de novo assembly done with Canu v1.9 (Koren et al. 2017) generated 1,033 contigs
with N50 of 1.7MB.
Contamination from sample collection and library preparation has been found in many
different reference genomes, such as the cow genome, and can be the cause for erroneous results
in downstream analysis (Merchant et al. 2014). Contamination has been identified as a concern
in other brown macroalgae, as bacterial contamination was found in the Saccharina japonica
genome (Dittami and Corre et al. 2017). We checked the giant kelp de novo assembly for
contamination before we scaffolded the genome. Some amount of bacterial contamination in the
sequencing of the giant kelp gametophytes is expected due to the natural presence of epiphytes
found in giant kelp. However, our strategy of using published brown macroalgae genomes to
filter initial PacBio reads relies on those genomes to be completely free of contamination (Florez
et al. 2019). Therefore, we used the blobtools and blobtoolkit pipeline to remove candidate
54
contaminated contigs from our genome assembly before we scaffolded the genome (Laetsch and
Blaxter 2017, Challis et al. 2020).
After decontamination, scaffolding using Hi-C technology was performed by Phase
Genomics and clustered 96.82% of the contigs into 34 clusters, resulting in a final assembly of
223 scaffolds with a total genome size of 537MB and N50 of 13.6MB [Table for genome
stats/comparison, Fig 1].
55
Figure 1. Circos plot of first 34 contigs of the giant kelp genome. Different aspects of the
genome are represented in each concentric circle. (A) Scaffold size in MB. (B) Gene density
heatmap. (C) Percentage of GC ranging from 44% to 55%. (D) Nucleotide diversity ranging
from 0 to 0.007. (E) SNP density heatmap. (F) Tajima’s D values ranging from -2.2 to 0.5. (G)
Fst values ranging from 0 to 0.34. All values are plotted on the same 200kb sliding window with
40kb intervals.
56
Table 1: Genome statistics comparison between the genomes of Macrocystis pyrifera (assembled
in this study), Ectocarpus sp, Saccharina japonica, and Undaria pinnatifida.
Assessment of genome completeness was done using stramenopile’s BUSCO v5.2.1
(Manni et al. 2021) dataset. BUSCO (Benchmarking Universal Single Copy Orthologs) searches
a genome for single copy ortholog proteins that are in data sets of specific lineages. BUSCO
scores from the giant kelp genome can then be compared against other brown macroalgal
genomes, with the higher number of BUSCO genes showing a more complete genome and lower
number of duplicated genes showing less duplication artifacts from assembly. For this analysis,
we contrasted our giant kelp genome assembly against three other published brown macroalgae
genomes: Ectocarpus siliculosus, Saccharina japonica, and Undaria pinnatifida (Cock et al.
2010, Ye et al. 2015, Shan et al. 2020). The stramenopile BUSCO analysis showed that the giant
kelp genome (Macrocystis pyrifera) compared favorably to other published brown macroalgae
genomes, with 94 complete BUSCO genes, 1 fragmented gene, and 5 missing genes [Fig 2]. We
also compared the giant kelp genome to a recently published giant kelp genome and a set of giant
kelp gene models that were filtered through reciprocal blasts against Ectocarpus siliculosus, both
respectively containing 11 and 78 complete BUSCO genes (Paul et al. 2022, Molano et al. 2022).
Species Giant Kelp Ectocarpus
siliculosus
Saccharina
japonica
Undaria
pinnatifida
Total Length 537452659 196804589 535382851 511280173
Number of Contigs 223 30 4598 114
Largest Contig 26510325 18946431 2388791 32303302
GC Content 50.37 53.59 49.35 50.14
N50 13669674 6528661 342726 16510065
Number of N’s per
100 kbp
12.99 2601.51 1541.53 49.29
57
Figure 2. Comparison of BUSCO assessment of genome completeness based on the
stramenopiles_odb10 dataset between the macroalgaes genomes of Macrocystis pyrifera
(assembled in this study), Ectocarpus sp, Saccharina japonica, and Undaria pinnatifida. Also
with Macrocystis pyrifera gene models from Molano et al. 2022 and genome from Paul et al.
2022.
Other methods to check on genome completeness include genome contiguity, usually
measured using the N50 statistic, and comparing genome size to estimated genome size
(Hanschen et al. 2020). N50 is the length of the shortest contig or scaffold for which contigs or
scaffolds with greater or equal length cover at least 50 % of the assembly (Alhakami et al. 2017).
We report the genome assembly statistics of our giant kelp genome and the three major brown
macroalgae genomes, Ectocarpus siliculosus, Saccharina japonica, and Undaria pinnatifida
using QUAST Version 5.0.2 (Gurevich et al. 2013). Our N50 for giant kelp is 13,669,674, which
is greater than the N50 for Ectocarpus siliculosus and Saccharina japonica, but less than that of
Undaria pinnatifida. Our giant kelp genome also contained less missing bases per 100,000 bases
out of the other brown macroalgae genomes.
Several metrics can be used to estimate total genome size, including physical methods
like flow cytometry and microspectrophotometry, or computational methods like kmer size
estimation. We assessed giant kelp genome size using microspectrophotometry; the potential
58
genome sizes of giant kelp gametophytes ranged from 882 MB to 1,176 MB (Phillips et al.
2011). However, when using flow cytometry, the giant kelp genome was estimated to be 686 MB
(Salvador Soler et al. 2019). This discrepancy may be explained by heterogeneous amounts of
nuclear material in giant kelp gametophytes. Giant kelp female gametophytes have been shown
to sometimes have double the genetic material compared to most male gametophytes (Müller et
al. 2016). Other brown macroalgae, such as Saccharina latissima, also have variable amounts of
DNA content in their haploid tissue (Goecke et al. 2022). Therefore, using physical parameters
to accurately calculate the genome size of brown macroalgae may require homogeneous mixes of
cells with the same levels of DNA content.
Computational methods can estimate the genome size of an organism based on
approximating the repeat structure of sequenced shotgun reads from a genome (Li and Waterman
2003). Sometimes, the genome estimates using kmers, or unique subsequences of DNA of length
k, may produce different lengths of the genome compared with physical measurements from
flow cytometry (Pflug et al. 2020). Since the physical estimates of the giant kelp genome are not
consistent, we compared them against a kmer based estimate. Long reads from Pacbio
sequencing have been shown to accurately estimate genome size using kmers as long as the reads
have been corrected (Hengchao et al. 2020). We used our Canu-corrected reads to estimate the
genome size based on kmer frequency using kmc (Kokot et al. 2017). The giant kelp estimated
genome size based on kmer frequency from the Pacbio corrected reads is 513MB for k=25 and
542 MB for k=31. Our assembled genome is 537 MB, which falls within this predicted range of
genome sizes. The 537 MB genome size is almost half of the upwards estimate from flow
cytometry and microspectrophotometry of 686 MB to 1176 MB, which may have been inflated
due to the polytenic state of the gametophytes of giant kelp.
59
Protein coding genes were predicted using both de novo and homology-based
methodologies, which after filtered resulted in the prediction of 25, 919 protein coding genes,
greater than predictions of Saccharina japonica and Ectocarpus siliculosus, with 18,733 and
16,256 respectively [Genome Table & Fig. 1A(gene density track)] (Cock, J. Mark, et al. 2010,
Ye, Naihao, et al. 2015).
Genome Comparative analysis
To identify orthologous genes among other relevant macroalgae we performed a
comparative analysis using Orthofinder between Macrocystis pyrifera, Ectocarpus sp.,
Saccharina japonica and Undaria pinnatifida. A total of 70,317 genes were analyzed, of which
61,267 (87.1%) were assigned to a total of 14,001 orthogroups, of those, 7,660 were present in
all four species [Fig 3].
Figure 3. Protein comparative analysis of orthologs between giant kelp and three other relevant
macroalgal species (Ectocarpus sp., Saccharina japonica and Undaria pinnatifida) using
Orthofinder. Numbers represent shared orthogroups between species.
60
Differences in genome size between Ectocarpales and Laminaries have been explained by
the expansion of repetitive elements in the larger genomes of Laminaries (Graf, Louis, et al.,
2021). Overall synteny is conserved between E. siliculosus and M. pyrifera single copy
orthologs, but there are signs of chromosomal rearrangement including the splitting of four
chromosomes and fusion of two, which explains the chromosome number difference between the
two species [Fig. 4A]. As expected, synteny between M. pyrifera and U. pinnatifida, both in the
order Laminariales, has less chromosomal rearrangement while still having signals of
chromosomal splitting with similar gene density [Fig. 4B]
Figure 4. Synteny between the Macrocystis pyrifera genome and the genomes of (A) Ectocarpus
siliculosus and (B) Undaria pinnatifida. Bands represent clusters of at least 10 single copy
orthologs no more than 3MB apart. Purple bands are potential chromosome splitting or fusion.
Grey bands represent scaffold pairs with the biggest synteny while red bands are orthologs out of
their respective pair. Bands with lower number of links superimpose bands with higher number
of links. Histogram represents density of single copy orthologs in 1MB windows
Genome polymorphism
61
Previous work in giant kelp population genetics sequenced 48 diploid individuals from
three Southern California giant kelp populations (Catalina Island, Santa Barbara, and Camp
Pendleton [Fig 5A]) and calculated parameters such as pairwise genetic diversity and Tajima’s D
using a set of reciprocally blasted gene models (8292 gene models) (Molano et al. 2022). We
assessed the genetic variation across the entire giant kelp genome by re-analyzing the diploid
sequence data from these 48 individuals using our newly assembled genome and the GATK4
best practices pipeline (Van der Auwera et al. 2013). After hard filtering the VCF file to remove
low quality SNPs and potential sequencing artifacts, we found 16,019,851 biallelic SNPs in the
48 individuals [Fig1 C].
We assessed the genetic variation between the different populations using principal
component analysis. Prior studies revealed population subdivision in Southern California giant
kelp populations based on microsatellite data, including distinct subpopulations at our sampling
sites from Catalina Island, Santa Barbara, and Camp Pendleton. (Johansson et al. 2015). Results
from the principal component analysis confirmed substantial population structure between our
sampling populations as we found polymorphism data with PC1 (6.7% variance explained) and
PC2 (5.4% variance explained) that was consistent with Molano et al. 2022 calculations using
the gene model data set [Fig 5B].
We also calculated linkage disequilibrium for the 48 samples combined and for each
population separately. We further filtered a VCF file for LD calculations to keep sites with no
missing data and a minor allele frequency of 0.10 using vcftools, keeping 1,021,896 bi-allelic
SNPs for analysis. We phased the filtered LD vcf file using beagle (Browning et al. 2021) and
calculated and plotted LD for all samples combined and for each population using PopLDDecay
with default settings (Zhang et al. 2019). We used a r^2=0.1 for our LD threshold in this
62
analysis, and estimated that the southern California giant kelp populations have an LD block size
of ~4.25 kb when we combined all 48 individuals into a single population (Vos et al. 2017). We
estimated the LD block size to be ~5.5 kb for the Catalina Island population and ~6kb for the
Camp Pendleton and Santa Barbara populations. We further confirmed this population structure
using the program faststructure, which predicted that the number of subpopulations in our data
set to be equal to the number of sampling locations (k =3) (Raj et al. 2014). The structure plot
shows complete population distinction based on the SNP data set used for LD calculations.
Pairwise genetic diversity (π) across the whole giant kelp genome and found π= 0.0035
[Fig 1B]. Tajima’s D was overall negative throughout the genome, with an average of -1.17522
across 200 kilobase windows [Fig1 D]. Variant effect annotation showed that 1.73% of the
variants were found inside exons with 56% predicted to be missense, while intergenic variants
accounted for 38.5% of all variants.
63
Figure 5. Population structure analysis of three Southern California giant kelp populations. (A)
Sampling location of the three analyzed wild giant kelp populations. (B) Principal component
analysis of genomic polymorphism showing strong population structure. (C) LD decay curve for
all 48 individuals combined. (D) LD decay curve for each sampling population separately. (E)
Structure plots showing K=2, K= 3, and K=4 for the 48 individuals. Faststructure best explained
K=3 the polymorphism data.
Materials and methods
Data collection and sequencing
64
For the genome construction a haploid female single gametophytic cell was isolated and
grown in culture for biomass. To ensure the least amount of contamination during PacBio
sequencing, the culture was repeatedly treated with antibiotics until no bacterial colonies would
form when plated (Singh et al. 2011, Weinberger 1999). High molecular weight DNA was
extracted using the protocol of Doyle and Doyle (1987) with minor modifications. Essentially,
young blades, that had been flash frozen, and kept frozen at -80C, were ground to a fine powder
in a frozen mortar with liquid N
2
followed by very gentle extraction in CTAB buffer (that
included proteinase K, PVP-40 and beta-mercaptoethanol) for 20 mins at 37C and 20 mins at
50C. After centrifugation, the supernatant was gently extracted twice with 24:1 chloroform: iso-
amyl alcohol. The upper phase was adjusted to 1/10
th
volume with 3M Sodium acetate (pH=5.2),
gently mixed, and DNA precipitated with iso-propanol. DNA was collected by centrifugation,
washed with 70% Etoh, air dried for a few minutes, and dissolved thoroughly in 1x TE at room
temperature. Size was validated by pulsed field electrophoresis.
Sequencing of sheared DNA >30kb was performed at the Arizona Genome Institute on a
Pacbio Sequel II Platform. A SMRTbell Express Template Prep Kit 2.0 was used for library
preparation and a Sequel II Binding Kit was used for the sequencing that generated 56GB of
long read data.
Genome assembly
In order to assemble our genomes using sequences free of most contamination, we
loosely aligned our PacBio reads to three different brown macroalgae genomes (Ectocarpus
siliculosus, Saccharina japonica and Cadosiphon okamuranus) using Minimap2 map-pb option
(Li, Heng, 2018) and excluded from our assembly sequences that did not map to any of the
brown macroalgae genomes using Samtools v1.15.1 (Li et al. 2009). We assembled the
65
remaining reads using Canu 1.9 (Koren et al. 2017) on standard settings resulting in a
preliminary assembly of 1,039 contigs containing 539Mbp, which was then polished using
Racon v1.5.0 (Vaser, Robert, et al. 2017).
We initially analyzed the potential contamination of each contig separately. We split the
assembly into individual contigs using faSplit (Kent Github 2022). We then used diamond blast
v2.10 to blast each contig against the Uniprot reference proteome database with an evalue of 1e-
15 (Buchfink et al. 2015, UniProt Consortium 2021). We then added the results to the blobtools
pipeline using the add --hits command and filtered contigs based on length (contig > 10000 base
pairs), GC content (between 0.35-0.65), coverage (x5-x300), and blast classification, keeping
contigs identified as phaeophyceae and no-hit.The first round of filtering removed 161 contigs
and ~42MB of sequence. The 870 contigs were then sent to Phase Genomics for scaffolding.
Chromatin conformation capture data was generated using a Phase Genomics (Seattle,
WA) Proximo Hi-C 2.0 Kit, which is a commercially available version of the Hi-C protocol
(Lieberman-Aiden et al. 2009). Following the manufacturer's instructions for the kit, intact cells
were crosslinked using a formaldehyde solution, digested using the DPNII restriction enzyme,
end repaired with biotinylated nucleotides, and proximity ligated to create chimeric molecules
composed of fragments from different regions of the genome that were physically proximal in
vivo, but not necessarily genomically proximal. Continuing with the manufacturer's protocol,
molecules were pulled down with streptavidin beads and processed into an Illumina-compatible
sequencing library. Sequencing was performed on an Illumina NovaSeq.
Reads were aligned to the draft assembly giant kelp genome also following the
manufacturer's recommendations (https://phasegenomics.github.io/2019/09/19/hic-alignment-
and-qc.html). Briefly, reads were aligned using BWA-MEM with the -5SP and -t 8 options
66
specified, and all other options default (Li and Durbin 2010). SAMBLASTER was used to flag
PCR duplicates, which were later excluded from analysis (Faust and Hall 2014). Alignments
were then filtered with samtools using the -F 2304 filtering flag to remove non-primary and
secondary alignments (Li et al. 2009). Putative misjoined contigs were broken using Juicebox
based on the Hi-C alignments (Durand et al. 2016, Rao et al. 2014).
Phase Genomics' Proximo Hi-C genome scaffolding platform was used to create
chromosome-scale scaffolds from the corrected assembly as described in Bickhart et al.
(Bickhart et al. 2017). As in the LACHESIS method, this process computes a contact frequency
matrix from the aligned Hi-C read pairs, normalized by the number of restriction sites on each
contig, and constructs scaffolds in such a way as to optimize expected contact frequency and
other statistical patterns in Hi-C data (Burton et al. 2013). Approximately 20,000 separate
Proximo runs were performed to optimize the number of scaffolds and scaffold construction in
order to make the scaffolds as concordant with the observed Hi-C data as possible. Finally,
Juicebox was again used to correct scaffolding errors.
After scaffolding and preliminary annotation, we found several contigs that had been
discarded from the assembly actually contained stramenopile_odb10 BUSCO genes. We
determined that our initial filtering with the blobtools pipeline had been too strict, mostly due to
misclassification of the contigs during the blast analysis. The misclassification of sequences has
been shown to be an increasing problem in sequence databases (Bagheri et al. 2020, Cobbin et al.
2021). After manually checking the 161 discarded contigs for potential candidate giant kelp
contigs, we added 29 more contigs into the giant kelp assembly, increasing the size of the
genome ~38MB to 537MB (See Supplementary Table). The additional contigs also raised the
67
BUSCO score using stramenopile_odb10. Unfortunately, the initial 870 contigs had been already
scaffolded, and completely re-scaffolding the genome was cost prohibitive.
Genome completeness
To assess genome completeness using single copy orthologs, we used BUSCO v5.2.1 in
genome mode and in conjunction with the stramenopile_odb10 dataset to compare our giant kelp
genome to the publicly available genomes of Ectocarpus siliculosus, Saccharina japonica, and
Undaria pinnatifida (Manni et al. 2021). We compared the same genomes using QUAST v5.0.2
with standard settings in order to generate genome assembly statistics (Gurevich et al. 2013).
We converted the bam format of raw Pacbio reads into a fasta format.We then corrected
the raw Pacbio reads using Canu with specific settings for Sequel II reads:
correctedErrorRate=0.035 utgOvlErrorRate=0.065 trimReadsCoverage=2
trimReadsOverlap=500 (Koren et al. 2017). We then counted the number of kmers found in the
corrected reads using the kmercounter program kmc and two different kmer sizes, with k=25 and
k=31 (Kokot et al. 2017). We plotted the kmer distributions of both k=25 and k=31, and
estimated the genome size by summing the total number of kmers and dividing by the mean
coverage of kmers in the genome.
Annotation
The giant kelp nuclear genome assembly was annotated using the JGI Annotation
pipeline (Grigoriev, et al. 2014; Kuo et al., 2014). The following steps describe the pipeline in
brief. The genome assembly was masked for repeats using RepeatMasker (Smit et al., 1996-
2010) with the RepBase library (Jurka, et al., 2015) and the most frequent repeats (more than 150
copies) identified by RepeatScout (Price et al., 2015). Protein-coding gene models were
predicted using the following gene modelers: ab initio modelers Fgenesh (Salamov and
68
Solovyev, 2000) and GeneMark (Ter-Hovhannisyan, et al., 2008), homology-based Fgenesh+
and GeneWise (Birney et al., 2004) seeded by BLASTx alignments against the NCBI NR
database, and transcriptome- based modelers Fgenesh, combest (Zhou, 2015), and Braker (Hoff
et al., 2016). For use in gene prediction, transcriptome assemblies were generated from Illumina
RNAseq reads using Trinity (v2.11.0) (Grabher et al., 2011), and as input to Braker, RNA reads
were mapped to the genome using HISAT2 (Kim et al., 2019). To select the best representative
gene model at each locus, automated filtering was performed based on homology and
transcriptome support. In addition, genes with similarity to transportable elements (TE),
containing known TE-related Pfam domains, or lie within repeat-masked regions were excluded
from the annotated gene set. Finally, the protein sequences of the predicted gene models were
functionally annotated using SignalP v3 for signal sequences (Nielsen et al., 1997), TMHMM for
transmembrane domains (Melén et al., 2003), InterproScan for protein domains (Quevillon et al.,
2005), and homologs based on Blastp alignments against the NCBI NR, SwissProt, and KEGG
(Kanehisa, et al.,2006) databases.
Comparative analysis
The protein datasets from Macrocystis pyrifera, Ectocarpus sp., Saccharina japonica,
and Undaria pinnatifida were used for ortholog analysis with Orthofinder v2.5.4 (Emms, D. M.,
& Kelly, S. 2019). The position of single copy orthologs between Macrocystis p. and Ectocarpus
sp. were used to determine synteny between the two genomes. Circos v2.30.1 (Krzywinski, M. et
al. 2009) was used to graph links between orthologs in each genome. Clusters where three or
more orthologs have no more than 1MB of distance between them were graphed as bands linking
their respective position in each genome.
SNP calling and population genetics
69
Raw Illumina reads from 49 giant kelp diploid individuals from three Southern California
were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/bioproject/661280). Reads were
trimmed of adapter sequences and low quality tails using Trimgalore (Krueger et al. 2015). The
reads were then aligned to our giant kelp reference genome using Hisat2 v2.1.0 using standard
parameters, and the ensuing alignment file was converted to binary and sorted using Samtools
v1.9 (Kim et al. 2019, Li et al. 2009). Mean depth per individual across the genome was ~x8, and
was calculated using VCFtools v0.1.16, and one individual was removed from the data set due to
poor coverage (Danecek et al. 2011). After removing PCR duplicates, we called variants such as
single nucleotide polymorphisms (SNPs) and insertion/deletions (indels), producing a variant
call file (VCF) using the GATK4 best practices pipeline (Van der Auwera et al. 2013). Initial
filtering followed the hard filtering suggestions from GATK: “QD < 2.0 || MQ < 40.0 || FS >
60.0 || HaplotypeScore > 13.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0". We then
filtered the VCF further for population genetics analysis on the following parameters: insertions
and deletions removed, biallelic SNPs only, pass quality thresholds of 30, site is called in 90% or
more individuals, and each site has a mean depth of 3 reads. Initially, there were 25, 374, 044
SNPs and indels in the raw VCF file before filtering. The filters reduced the number of SNPs to
16,019,851 for downstream analysis.
We performed a principal component analysis (PCA) on the genetic variation using the
hard filtered VCF as an input into the SNPrelate v1.22.0. We used the SNPrelate standard
pipeline and plotted the PCA using ggplot2 v3.3.2 (Zheng et al. 2012, Wickham et al. 2016. We
then calculated pairwise genetic diversity and Tajima’s D using VCFtools and the hard filtered
VCF file across genomic windows of 200kb with a step interval of 40kb, and for each population
separately (Danecek et al. 2011). We also calculated FST between the three populations using
70
VCFtools and the hard filtered VCF file across genomic windows of 200kb with a step interval
of 40kb.
We further filtered the VCF file used in the PCA analysis for linkage disequilibrium and
population structure analysis using vcftools and the commands --maf 0.10 and max-missing 100
in order to keep alleles that are present 3 individuals and to include sites with base calls from all
individuals. We then phased this VCF using py-popgen (Webb et al. 2021), which implemented
beagle to phase the VCF (Browning et al. 2021). We calculated and plotted the linkage
disequilibrium using PopLDDecay with standard settings. We used the program faststructure to
calculate k values from 1-10 based on the LD vcf file, and used the builtin chooseK.py script
from faststructure to predict the number of subpopulations. We used the builtin distruct.py script
to plot the various structure plots based on the K values.
Discussion
Our study presents the first annotated scaffold-level giant kelp reference genome, which
will support a genomics approach for the ongoing domestication and conservation efforts for this
species. This giant kelp reference genome compares favorably to the three published major
brown macroalgae genomes, Ectocarpus siliculosus, Saccharina japonica, and Undaria
pinnatifida, with similar N50 values (genome contiguity) and BUSCO scores (genome
completeness) (Cock et al. 2010, Ye et al. 2015, Shan et al. 2020, Gurevich et al. 2013).
Compared to current available giant kelp genomes, the giant kelp assembly presented here vastly
improves on genome contiguity and completeness, in particular when comparing BUSCO scores
(94% compared to 11%)(Paul et al. 2022). Therefore, we predict that the scaffolded giant kelp
genome presented here will be the universal reference for future giant kelp genomic projects.
71
We also confirmed the strong population structure seen in previous giant kelp genomic
studies based on various resolutions of genomic data. The population subdivision in the three
Southern California giant kelp populations was initially based on microsatellite data, including
distinct subpopulations at our sampling sites from Catalina Island, Santa Barbara, and Camp
Pendleton (Johansson et al. 2015). This population structure was confirmed using polymorphism
data from the three targeted Southern California population and the gene model data set, and then
reconfirmed using our scaffolded giant kelp genome as the reference. The variance explained by
the principal component analyses using the giant kelp gene models (Chapter I) and the giant kelp
genome was quite similar.
Additionally, we compare how population genetic parameters such as pairwise genetic
diversity and Tajima’s D of three Southern California kelp populations shift when using our
reference giant kelp genome versus publicly available giant kelp gene models (Molano et al.
2022). The gene model data was finalized after reciprocal blast analysis against Ectocarpus
siliculosus, and the set includes 8,292 gene models and 15 MB of sequence. As giant kelp and
Ectocarpus siliculosus diverged about 98 million years ago, this giant kelp gene model set
should include genes that are well conserved between the two species (Silberfeld et al. 2010).
We confirmed that the average pairwise nucleotide diversity (π) of the three Southern California
kelp populations would be higher when using the whole genome (π= 3.53x10
-3
) compared to the
gene models (π= 1.74x10
-3
), as the conserved nature of the gene models reduces the pairwise
genetic diversity. This comparison of the limited gene model data set versus the whole genome
for population genetic analysis can be used to help other brown macroalgae researchers plan
experiments based on what population genetic parameter they are interested in investigating.
Conclusion:
72
The giant kelp genome presented in this study will assist in ongoing giant kelp
domestication efforts by providing a reference genome than be used as a comparative benchmark
between giant kelp individuals sequenced in other kelp genetic experiments. The functional
annotations annotation of the genome can help pinpoint the genomic locations of genes of
interest for domestication for further genetic variation analysis. The use of a conserved gene
model data set for phylogenetic studies may be sufficient in giant kelp as the strong population
structure seen in the diploid data concurred with the results from the gene models.
73
References
Alhakami H, Mirebrahim H, Lonardi S. 2017. A comparative evaluation of genome assembly
reconciliation tools. Genome Biol. 18(1):93. doi: 10.1186/s13059-017-1213-3. PMID: 28521789;
PMCID: PMC5436433.
Bagheri H, Severin AJ, Rajan H. 2020. Detecting and correcting misclassified sequences in the
large-scale public databases. Bioinformatics. 36(18):4699-4705. doi:
10.1093/bioinformatics/btaa586. PMID: 32579213; PMCID: PMC7821992.
Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, Lee J, Lam ET, Liachko I,
Sullivan ST, Burton JN, Huson HJ, Nystrom JC, Kelley CM, Hutchison JL, Zhou Y, Sun J, Crisà
A, Ponce de León FA, Schwartz JC, Hammond JA, Waldbieser GC, Schroeder SG, Liu GE,
Dunham MJ, Shendure J, Sonstegard TS, Phillippy AM, Van Tassell CP, Smith TP. Single-
molecule sequencing and chromatin conformation capture enable de novo reference assembly of
the domestic goat genome. 2017. Nat Genet. 49(4):643-650.
Birney E, Clamp M, Durbin R 2004. GeneWise and Genomewise. Genome Res. 14(5):988-95.
doi: 10.1101/gr.1865504.
Boland, W., F. J. Marner, L. Jaenicke, D. G. Muller, and E. Folster. 1983. Comparative receptor
study in gamete chemotaxis of the seaweeds Ectocarpus siliculosus and Cutleria multifida: an
approach to interspecific communication of algal gametes. European Journal of Biochemistry
134: 97–103.
Briggs SP. Plant genomics: more than food for thought. 1998. Proc Natl Acad Sci U S A.
95(5):1986-8. doi: 10.1073/pnas.95.5.1986. PMID: 9482820; PMCID: PMC33828.
Browning BL, Tian X, Zhou Y, Browning SR. 2021. Fast two-stage phasing of large-scale
sequence data. Am J Hum Genet 108(10):1880-1890.doi:10.1016/j.ajhg.2021.08.005
Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND.
Nat Methods. 12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17. PMID: 25402007.
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J. 2013. Chromosome-scale
scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotech. 31,
1119.
Buschmann, Alejandro & Graham, Michael & Vasquez, Julio. 2007. Global Ecology of the Giant
Kelp Macrocystis. 10.1201/9781420050943.ch2.
Camus, C., Ballerino, P., Delgado, R., Olivera-Nappa, Á, Leyton, C., and Buschmann, A. H.
2016. Scaling up bioethanol production from the farmed brown macroalga Macrocystis pyrifera
in Chile. Biofuels Bioprod. Bioref. 10, 673–685. doi: 10.1002/bbb.1708
74
Castorani MCN, Reed DC, Alberto F, Bell TW, Simons RD, Cavanaugh KC, Siegel DA,
Raimondi PT. 2015. Connectivity structures local population dynamics: a long-term empirical
test in a large metapopulation system. Ecology 96, 3141–3152.
Castorani, M. C. N., Reed, D. C., Raimondi, P. T., Alberto, F., Bell, T. W.,Cavanaugh, K. C., et
al. 2017. Fluctuations in population fecundity drive variation in demographic connectivity and
metapopulation dynamics. Proc. R.Soc. B Biol. Sci. 284:20162086.
Challis R., Richards E., Rajan J., Cochrane G., Blaxter M. 2020. BlobToolKit – Interactive
Quality Assessment of Genome Assemblies. G3 10 1361–1374. 10.1534/g3.119.400908
Charrier, B., Coelho, S.M., Le Bail, A., Tonon, T., Michel, G., Potin, P., Kloareg, B., Boyen, C.,
Peters, A.F. and Cock, J.M. 2008. Development and physiology of the brown alga Ectocarpus
siliculosus: two centuries of research. New Phytologist, 177: 319-332.
https://doi.org/10.1111/j.1469-8137.2007.02304.x
Chopin, T., and Tacon, A.G.T. 2021. Importance of seaweeds and extractive species in global
aquaculture production. Reviews in Fisheries Science & Aquaculture 29.2:139-148.
Cingolani, Pablo, et al. 2012. A program for annotating and predicting the effects of single
nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain
w1118; iso-2; iso-3." Fly 6.2: 80-92.
Cock, J.M., Sterck, L., Rouzé, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V.,
Artiguenave, F., Aury, J.M., Badger, J.H., et al. 2010. The Ectocarpus genome and the
independent evolution of multicellularity in brown algae. Nature. 465(7298):617-21.
Cobbin JC, Charon J, Harvey E, Holmes EC, Mahar JE. 2021. Current challenges to virus
discovery by meta-transcriptomics. Curr Opin Virol. 51:48-55. doi:
10.1016/j.coviro.2021.09.007.
Coyer, J. A., Smith, G. J. & Andersen, R. A. 2001. Evolution of Macrocystis spp.
(Phaeophyceae) as determined by ITS1 and ITS2 sequences. J. Phycol. 37:574–85.
Darwin, C.R. 1860. The voyage of the Beagle. J. M. Dent & Sons, Ltd., London, 1906.
Danecek, Petr, et al. 2011. The variant call format and VCFtools. Bioinformatics 27.15 2156-
2158.
Dittami SM, Corre E. 2017. Detection of bacterial contaminants and hybrid sequences in the
genome of the kelp Saccharina japonica using Taxoblast. PeerJ. 5:e4073. doi:
10.7717/peerj.4073. PMID: 29158994; PMCID: PMC5695246.
Doyle, J.J. and J.L. Doyle. 1987. A rapid DNA isolation procedure from small quantities of fresh
leaf tissues. Phytochem Bull. 19:11–15.
75
Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, and Lieberman-Aiden
E. 2016. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom.
Cell Systems, July 2016.
Emms DM and Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative
genomics. Genome biology 20.1: 1-14.
FAO. 2022. The State of World Fisheries and Aquaculture 2022. Towards Blue Transformation.
Rome, FAO. https://doi.org/10.4060/cc0461en
Florez, J. Z., Camus, C., Hengst, M. B., Marchant, F. & Buschmann, A. H. 2019. Structure of the
epiphytic bacterial communities of macrocystis pyrifera in localities with contrasting nitrogen
concentrations and temperature. Algal Research 44,101706.
Faust GG and Hall IM. 2014. SAMBLASTER: fast duplicate marking and structural variant read
extraction, Bioinformatics, 30: 17:1. https://doi.org/10.1093/bioinformatics/btu314
Gaylord, B., D. C. Reed, P. T. Raimondi, L. Washburn, and S. R. McLean. 2002. A physically
based model of macroalgal spore dispersal in the wave and current-dominated nearshore.
Ecology 83:1239–1250.
Gaylord, B., D. C. Reed, L. Washburn, and P. T. Raimondi. 2004. Physical–biological coupling
in spore dispersal of kelp forest macroalgae. Journal of Marine Systems 49:19–39.
Gaylord B, Reed DC, Raimondi PT, Washburn L. 2006. Macroalgal spore dispersal in coastal
environments: mechanistic insights revealed by theory and experiment. Ecological
Monographs,76, 481–502.
Goecke F, Gómez Garreta A, Martín-Martín R, Rull Lluch J, Skjermo J, Ergon Å. 2022. Nuclear
DNA Content Variation in Different Life Cycle Stages of Sugar Kelp, Saccharina latissima. Mar
Biotechnol (NY).
Graf, Louis, et al. 2021. A genome-wide investigation of the effect of farming and human-
mediated introduction on the ubiquitous seaweed Undaria pinnatifida. Nature ecology &
evolution 5.3: 360-368.
Graham, M. H., Vásquez, J. A., and Buschmann, A. H. 2007. Global ecology of the giant kelp
Macrocystis: from ecotypes to ecosystems. Oceanogr. Mar. Biol.Annu. Rev. 45, 39–88.
Grabherr MG, et al. 2011. Full-length transcriptome assembly from RNA-seq data without a
reference genome. Nat Biotechnol. 29(7):644-52. doi: 10.1038/nbt.1883.
Grigoriev IV, et al. 2014. MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids
Research. 42:D699-D704. doi: 10.1093/nar/gkt1183
76
Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome
assemblies. Bioinformatics. 29(8):1072-5. doi: 10.1093/bioinformatics/btt086.
Hanschen, Erik & Hovde, Blake & Starkenburg, Shawn. 2020. An evaluation of methodology to
determine algal genome completeness. Algal Research. 51. 102019.
10.1016/j.algal.2020.102019.
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M., & Stanke, M. 2016. BRAKER1:
Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.
Bioinformatics (Oxford, England), 32(5), 767–769. doi:10.1093/bioinformatics/btv661
Hunt, Martin, et al. 2015. Circlator: automated circularization of genome assemblies using long
sequencing reads." Genome biology 16.1: 1-10.
Hwang, E. K., Yotsukura, N., Pang, S. J., Su, L., and Shan, T. F. 2019. Seaweed Breeding
Programs and Progress in Eastern Asian Countries. Phycologia 58, 484–495.
doi:10.1080/00318884.2019.1639436
Johansson, M.L., Alberto, F., Reed, D.C., Raimondi, P.T., Coelho, N.C., Young, M.A., Drake,
P.T., Edwards, C.A., Cavanaugh, K., Assis, J., et al. 2015. Seascape drivers of Macrocystis
pyrifera population genetic structure in the northeast Pacific. Mol Ecol. 24(19):4866-85.
Jurka J, et al. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet.
Genome Res. 110:462–467. doi: 10.1159/000084979.
Kanehisa, M, et al. 2006. From genomics to chemical genomics: new developments in KEGG.
Nucleic Acids Research, 34:D354–D357. doi: 10.1093/nar/gkj102.
Kaye AM and Wasserman WW. 2021. The genome atlas: navigating a new era of reference
genomes. Trends Genet. 37(9):807-818. doi: 10.1016/j.tig.2020.12.002.
Kent, J. ucscGenomeBrowser/kent (2022). Github repository,
https://github.com/ucscGenomeBrowser/kent
Kim, D., Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L. 2019. Graph-based genome alignment
and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37(8):907-915.
Kokot M, Dlugosz M, Deorowicz S. 2017. KMC 3: counting and manipulating k-mer statistics.
Bioinformatics. 33(17):2759-2761. doi: 10.1093/bioinformatics/btx304. PMID: 28472236.
Konotchick T, Dupont CL, Valas RE, Badger JH, Allen AE. 2013. Transcriptomic analysis of
metabolic function in the giant kelp, Macrocystis pyrifera, across depth and season. New Phytol.
198(2):398-407.
77
Koren, Sergey, et al. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer
weighting and repeat separation. Genome research 27.5:722-736.
Kuo A., Bushnell B., Grigoriev IV., Martin F. 2014. Fungal genomics: sequencing and
annotation. Fungi. Advances in botanical research. CambridgeElsevier Academic Press, 1–52.
Krueger, F. 2015. Trim galore. A wrapper tool around Cutadapt and FastQC to consistently
apply quality and adapter trimming to FastQ files.Available online at:
https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
Krzywinski, M. et al. 2009. Circos: an Information Aesthetic for Comparative Genomics.
Genome Res 19:1639-1645
Laetsch, D., & Blaxter, M. 2017. BlobTools: Interrogation of genome assemblies. F1000
research, 6, 1287. doi: 10.12688/f1000research.12232.1
Lieberman-Aiden, E., Van Berkum, N.L., Williams L., Imakaev M., Ragoczy T., Telling A.,
Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A.,
Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., lander, E.S., Dekker, J. 2009.
Comprehensive mapping of long-range interactions reveals folding principles of the human
genome. Science, 326, 289-293.
Li H. and Durbin R. 2010 Fast and accurate long-read alignment with Burrows-Wheeler
transform. Bioinformatics, 26, 589-595. [PMID: 20080505]
Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin
R. and 1000 Genome Project Data Processing Subgroup. 2009 The Sequence alignment/map
(SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
Li, H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-
3100. doi:10.1093/bioinformatics/bty191
Li X, Waterman MS. 2003. Estimating the repeat structure and length of DNA sequences using
L-tuples. Genome Res. 13(8):1916-22.
Liu, F., Sun, X., Wang, F. et al. 2014. Breeding, economic traits evaluation, and commercial
cultivation of a new Saccharina variety “Huangguan No. 1”. Aquacult Int 22, 1665–1675.
https://doi.org/10.1007/s10499-014-9772-8
Loureiro, R., Gachon, C.M., Rebours, C. 2015. Seaweed cultivation: potential and challenges of
crop domestication at an unprecedented pace. New Phytol. 206(2):489-92.
Manni, M., Berkeley, M. R., Seppey, M., & Zdobnov, E. M. 2021. BUSCO: Assessing genomic
data quality and beyond. Current Protocols, 1, e323. doi: 10.1002/cpz1.323
78
Melén, K, Krogh, A, and von Heijne, G. 2003. Reliability measures for membrane protein
topology prediction algorithms. Journal of Molecular Biology, 327(3), 735–744. doi:
10.1016/s0022-2836(03)00182-7.
Merchant S, Wood DE, Salzberg SL. 2014. Unexpected cross-species contamination in genome
sequencing projects. PeerJ 2:e675.
Molano G, Diesel J, Montecinos GJ, Alberto F and Nuzhdin SV. 2022. Sporophyte Stage Genes
Exhibit Stronger Selection Than Gametophyte Stage Genes in Haplodiplontic Giant Kelp. Front.
Mar. Sci. 8:774076. doi: 10.3389/fmars.2021.774076
Mollah M. Z. I., Zahid H. M., Mahal Z., Faruque Mohammad Rashed Iqbal, Khandaker M. U.
2021. The Usages and Potential Uses of Alginate for Healthcare Applications. Frontiers in
Molecular Biosciences. 8.
Müller DG, Maier I, Marie D, Westermeier R. 2016. Nuclear DNA level and life cycle of kelps:
Evidence for sex-specific polyteny in Macrocystis (Laminariales, Phaeophyceae). J Phycol.
52(2):157-60.
Nielsen H, et al. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction
of their cleavage sites. Protein Engineering 10:1-6. doi: 10.1093/protein/10.1.1.
Paul S, et. al. 2022. Insight into the genome data of commercially important giant kelp
Macrocystis pyrifera, Data in Brief, Volume 42,108068, ISSN 2352-3409.
Phillips N, Kapraun DF, Gómez Garreta A, Ribera Siguan MA, Rull Lluch J, Salvador Soler N et
al. 2011. Estimates of nuclear DNA content in 98 species of brown algae (Phaeophyta). AoB
Plants 11:1–8.
Pflug JM, Holmes VR, Burrus C, Johnston JS, Maddison DR. 2020. Measuring Genome Sizes
Using Read-Depth, k-mers, and Flow Cytometry: Methodological Comparisons in Beetles
(Coleoptera). G3 (Bethesda). 10(9):3047-3060.
Price AL, Jones NC, Pevzner PA. 2005. De novo identification of repeat families in large
genomes. Bioinformatics 21 Suppl 1:i351-8.
Quevillon, E, et al. 2005. InterProScan: protein domains identifier. Nucleic Acids Research,
33:W116–W120. doi: 10.1093/nar/gki442.
Rao, Suhas S P et al. 2014. A 3D map of the human genome at kilobase resolution reveals
principles of chromatin looping. Cell vol. 159,7: 1665-80.
Raj, Anil et al. 2014. fastSTRUCTURE: variational inference of population structure in large
SNP data sets.” Genetics vol. 197,2: 573-89.
79
Rassweiler, A., Reed, D.C., Harrer, S.L. and Nelson, J.C. 2018. Improved estimates of net
primary production, growth, and standing crop of Macrocystis pyrifera in Southern California.
Ecology, 99: 2132-2132.
Reed, D. C. 1990. The effects of variable settlement and early competition on patterns of kelp
recruitment. Ecology 71:776–787.
Reed, D. C., M. Neushul, and A. W. Ebeling. 1991. Role of settlement density on gametophyte
growth and reproduction in the kelps Pterygophora californica and Macrocystis pyrifera
(Phaeophyceae). Journal of Phycology 27:361–366.
Reed, D. C., T. W. Anderson, A. W. Ebeling, and M. Anghera. 1997. The role of reproductive
synchrony in the colonization potential of kelp. Ecology 77:300–316.
Reed, D.C., Schroeter, S.C. & Raimondi, P.T. 2004. Spore supply and habitat availability as
sources of recruitment limitation in the giant kelp Macrocystis pyrifera. Journal of Phycology
40, 275–284.
Reed, D.C., Kinlan, B.P., Raimondi, P.T., Washburn, L., Gaylord, B. & Drake, P.T. 2006. A
metapopulation perspective on patch dynamics and connectivity of giant kelp. In Marine
Metapopulations, J.P. Kritzer & P.F. Sale (eds). San Diego, California: Academic Press,
Reed, D. C. and M. A. Brzezinski. 2009. Kelp forests. Pages 30-37 In: Laffoley, D.d’A. &
Grimsditch, G. (eds). The management of natural coastal carbon sinks. IUCN, Gland,
Switzerland.
Tisnado RR, Carmona GH, Montesinos ER, Higuera DL, Gutiérrez FD. 2005. Food grade
alginates extracted from the giant kelp Macrocystis pyrifera at pilot-plant scale. Revista de
Investigaciones Marinas, 26, 185-192.
Robinson, N., Winberg, P. & Kirkendale, L. 2013. Genetic improvement of macroalgae: status to
date and needs for the future. J Appl Phycol 25, 703–716
Salamov, A. A., & Solovyev, V. V. 2000. Ab initio gene finding in Drosophila genomic DNA.
Genome research, 10(4), 516–522. doi: 10.1101/gr.10.4.516
Salvador Soler N, Rull Lluch J, Gómez Garreta A. 2019. Intra-individual variation in nuclear
DNA content in Durvillaea antarctica (Chamisso) Hariot, Macrocystis pyrifera (Linnaeus) C.
Agardh and Lessonia spicata (Suhr) Santelices (Phaeophyceae). Crypto- gam Algol 40(2):5–12
Schiel, D. R. & Foster, M. S. 2015. The Biology and Ecology of Giant Kelp Forests. University
of California Press, Oakland California, USA, 395 pp
Shan T, Yuan J, Su L, Li J, Leng X, Zhang Y, Gao H, Pang S. 2020. First Genome of the Brown
Alga Undaria pinnatifida: Chromosome-Level Assembly Using PacBio and Hi-C Technologies.
Front Genet. 11:140.
80
Silberfeld T, Leigh JW, Verbruggen H, Cruaud C, de Reviers B, Rousseau F. 2010. A multi-
locus time-calibrated phylogeny of the brown algae (Heterokonta, Ochrophyta, Phaeophyceae):
Investigating the evolutionary nature of the "brown algal crown radiation". Mol Phylogenet Evol.
56(2):659-74.
Singh RP, Bijo AJ, Baghel RS, Reddy CR, Jha B. 2011. Role of bacterial isolates in enhancing
the bud induction in the industrially important red alga Gracilaria. FEMS Microbiol Ecol.
76(2):381-92
Smit, AFA, Hubley, R, Green, P. RepeatMasker Open-3.0. 1996-2010.
Sohn, CH. 1998. The seaweed resources of Korea. In : Critchley AT, Ohno M, editors Seaweed
Resources of the World. Japan International Cooperation Agency, Yokosuka, 15–33.
Starko, S., Soto Gomez, M., Darby, H., Demes, K.W., Kawai, H., Yotsukura, N., Lindstrom,
S.C., Keeling, P.J., Graham, S.W., Martone, P.T., 2019. A comprehensive kelp phy-
logeny sheds light on the evolution of an ecosystem. Mol. Phylogenet. Evol. 136,
138–150. https://doi.org/10.1016/j.ympev.2019.04.012.
Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. 2008. Gene prediction in
novel fungal genomes using an ab initio algorithm with unsupervised training. Genome research,
18(12), 1979–1990. doi: 10.1101/gr.081612.108
Wang H, et al. 2020. Estimation of genome size using k-mer frequencies from corrected long
reads. (Submitted, arXiv:2003.11817v1)
UniProt Consortium. 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids
Res. 49(D1):D480-D489. doi: 10.1093/nar/gkaa1100. PMID: 33237286; PMCID: PMC7778908.
Van der Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine,
A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al. 2013. From FastQ data to high
confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc
Bioinformatics. 43(1110):11.10.1-11.10.33.
Vaser, Robert, et al. 2017. Fast and accurate de novo genome assembly from long uncorrected
reads. Genome research 27.5 (2017): 737-746.
Vos PG, Paulo MJ, Voorrips RE, Visser RG, van Eck HJ, van Eeuwijk FA. 2017. Evaluation of
LD decay and various LD-decay estimators in simulated and SNP-array data of tetraploid potato.
Theor Appl Genet. 130(1):123-135.
Webb A, Knoblauch J, Sabankar N, Kallur AS, Hey J, Sethuraman A. 2021. The Pop-Gen
Pipeline Platform: A Software Platform for Population Genomic Analyses. Mol Biol Evol.
38(8):3478-3485..
81
Weinberger F. 1999. Epiphyte-Host Interactions: Gracilaria conferta and Associated Bacteria.
Dissertation, University of Kiel, Germany, 138 pp.
Wickham H. 2016.. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
Ye, Naihao, et al. 2015. Saccharina genomes provide novel insight into kelp biology. Nature
communications 6.1: 1-11.
Zhang, Chi et al. 2019. PopLDdecay: a fast and effective tool for linkage disequilibrium decay
analysis based on variant call format files. Bioinformatics (Oxford, England) vol. 35,10: 1786-
1788.
Zheng, X., Levine, D., Shen, J., Gogarten, S.M., Laurie, C., Weir, B.S. 2012. A high-
performance computing toolset for relatedness and principal component analysis of SNP data.
Bioinformatics. 28(24):3326-8.
Zhou K, Salamov A, Kuo A, Aerts AL, Kong X, Grigoriev IV. 2015. Alternative splicing acting
as a bridge in evolution. Stem Cell Investig. 2:19. doi: 10.3978/j.issn.2306-9759.2015.10.01.
82
Chapter III: Genotyping the Giant Kelp Seedbank to Assist in the Giant Kelp De Novo
Domestication
Gary Molano
1
, Gabriel J. Montecinos
2
, Filipe Alberto
2
and Sergey V. Nuzhdin
1
1
Department of Molecular and Computational Biology, University of Southern California, Los
Angeles, California, United States
2
Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee,
Wisconsin, United States
In Prep
This work is based on the collaboration between the Reed and Miller Labs of University of
Southern California at Santa Barbara, the Alberto lab of University of Wisconsin-Milwaukee, the
Nuzhdin Lab at USC, and funded by ARPA-E MARINER.
Here, I discuss some preliminary findings from the genetic variation analysis of the giant kelp
seedbank. This work is meant to follow the publication of the phenotypic data from the kelp test
farm established in 2019 (Wade et al. In Preparation). In addition, this work also follows a
publication in review that looks at the relationship between microbiome and success on the giant
kelp farm (Osborne et al. In Review).
Abstract
Sequencing and genomics today allow for a retrospective view of the process of
domestication as the genomes and genetic variation of domesticated crops are compared against
their wild progenitors (Doebley et al. 2006). While these lessons on domestication can help
improve previously domesticated crops, they can also be excitingly applied to new domestication
targets to expedite the domestication process. One species of great interest for domestication is
Macrocystis pyrifera, or giant kelp. Giant kelp is a brown macroalgae (Class Phaeophyceae) with
a global range with great ecological importance as a keystone species that supports hundreds of
83
species (Buschmann et al. 2016). Giant kelp is currently harvested mostly for alginate extraction
and abalone feed, but has potential in other markets such as animal feed and fertilizer
supplements, cosmetics, and pharmaceuticals (Arioli et al. 2015, Makkar et al. 2016, Tanna and
Mishra 2018, Hameury et al 2019, Mallakour et al. 2021). Giant kelp can also be used as a
feedstock for biofuel production due to its high sugar content and rapid growth (Camus et al.
2016). While other economically important brown macroalgae such as Saccharina japonica and
Undaria pinnatifida have undergone selective breeding programs since the 1950s, giant kelp
production has been limited to wild harvest, and thus giant kelp domestication efforts would
begin de novo (Neushul 1987, Liu et al. 2014, Hwang et al. 2019). The following is a short
review of the human history of domestication, and how lessons from domestication can be
applied towards the de novo domestication of giant kelp.
Introduction
Humans began to transition from hunting, gathering, and foraging to an agrarian society
about eleven thousand years ago as they began to domesticate plants and animals (Bar-Yosef and
Belfer-Cohen 1992). During this Neolithic Revolution, initial agriculture developed in the Near
East Fertile Crescent and targeted the founder crops einkorn wheat, emmer wheat, barley, lentil,
pea, chickpea, and bitter vetch for cultivation (Abbo et al. 2010). The morphological signatures
of early domestication were phenotypic changes such as loss of seed dispersal due to a decrease
in pod shattering (pod indehiscence), increasing the yield for human consumption while
decreasing the ability of the plant to propagate, and increased seed size (Weeden 2018). By 2000
BC, all major crops that are currently used by humans, such as rice, wheat, and maize, had been
domesticated (Zeder 2015). Genotypic signatures of domestication include an overall loss in
genetic diversity, especially among genes important for domestication (Doebly et al. 2006). Loss
84
of genetic diversity in a species lowers its resilience against environmental and disease stress due
to the loss of alleles that are potentially adaptive (Hughes et al. 2008, Evans 2017).
As the human population continues to increase and is predicted to reach over eight billion
in 2022, the amount of food required to sustain this population also increases (United Nations
Department of Economic and Social Affairs, Population Division 2022). In order to continue to
feed the world, humans have had to increase their carrying capacity, or the amount of population
that an environment can sustain, through periods of technological and scientific breakthroughs,
such as the Industrial and Green Revolution (Marten 2001). During the green revolution in the
mid-20th century, global crop yield tripled while land use for farming increased 30% (Pingali
2012). This incredible increase of production was achieved through global distribution of
improved crop germplasms that increased yield and shortened growing seasons, coupled with an
increased use of fertilizers and upgraded irrigation efficiency (Pingali 2012). Another strategy to
further develop superior crop cultivars was seedbanking, in which seeds are collected and stored
from both cultivars and wild populations, and then distributed to breeders and scientists.
Seedbanks increased the immediate genetic diversity available to breeders, while also
contributing to conservation efforts by maintaining temporal genetic diversity for future use
(Vavilov 1987, Walters and Pence 2020).
The molecular process and history of domestication can also be investigated with the
recent advances in computational biology. Humans greatly reduced the genetic diversity of
domesticated crops compared with their wild relatives after thousands of years of selective
breeding, and this reduction was further compounded by founder effects (Doebly et al. 2006).
Specific mutations essential for the domestication process, such as a mutation in the NAC gene
that increases the thickness of seed pods in the domesticated soybean, can now be pinpointed
85
(Dong et al 2014). Not only can these advances in genomics be used to track signatures of
selection in domesticated crops when compared to their wild progenitors, they can also be used
to further improve crops through techniques such as allele mining, genetic modification, or de
novo domestication (Chen et al. 2019). Allele mining involves identifying new haplotypes and
markers associated with trait and gene targets for domestication by re-analyzing the continually
growing publicly available sequencing data (Kumar et al. 2010). One example among many
potential targets of genetic modifications in crop species was increasing the vitamin A content in
rice through transgenics of the vitamin A biosynthesis pathway (Ye et al. 2000). De novo
domestication aims to leverage -omics, such as genomics, transcriptomics, metabolomics, etc., to
reduce the time it takes to domesticate a wild crop by identifying genetic variation in genes
known to be important in domestication and individuals to cross for specific phenotypes (Fernie
and Yan 2019, Jian et al. 2022). De novo domestication has already been successfully applied to
wild species such as the orphan crop Physalis pruinose and wild relatives of crops such as the
potato (Lemmon et al. 2018, Ye et al. 2018).
Giant kelp, currently a “wild” species, is an exceptional target for de novo domestication,
in particular because its haplodiplontic life cycle is conducive to seedbanking. Giant kelp
alternates generations between a microscopic, haploid gametophyte and a macroscopic, diploid
sporophyte (Bell 1997, North 1987). While the giant kelp sporophyte is the life stage is the
cultivation target as it grows rapidly and to a massive size, the gametophyte stage is the life stage
that is ideal for seedbanking as it can be maintained in culture conditions (Wade et al. 2020).
Sexual tissue containing haploid spores can be collected from diploid sporophytes and brought
back to the laboratory, where the spores can be released and settled onto plates. After the spores
have developed into sexed gametophytes, they can be isolated into their own media (Redmond et
86
al. 2014). If gametophytes are grown under red light and low nutrient conditions, these
gametophytes will grow vegetatively and will not enter sexual reproduction (Lüning and Dring
1975, Xu et al. 2005 ). This vegetative growth under the right culture conditions supercharges the
idea of seedbanking, as not only can target genotypes be specifically grown to increase the
available biomass for setting up crosses or for biological experiments, these gametophytes can be
maintained indefinitely for future use.
Being able to culture and grow giant kelp both in the gametophyte and sporophyte stages
is the first step towards de novo domestication of giant kelp (Jian et al. 2022). Both the
gametophyte culturing, line seeding, and kelp farming aspects of kelp cultivation have been
successfully established (Correa et al. 2016, Barrento et al. 2015). In fact, giant kelp seedbanking
has already been established in Chile for maintaining giant kelp genetic diversity, purposefully
targeting thermal tolerance alleles (Barrento et al. 2015). The next step in de novo domestication
is to gather both phenotypic and genetic knowledge of the target species (Jian et al. 2022).
Again, giant kelp is an optimal organism to quickly investigate both phenotypic and genotypic
relationships, as haploid gametophytes can be vegetatively propagated, or “bulked up”, in
culture, with the same exact genotypes then being used for both sequencing and phenotyping
experiments. Statistical modeling techniques like genome wide association studies or genomic
selection can combine sufficient phenotype and genotype data to pinpoint regions in the giant
kelp genome associated with specific phenotypes like higher biomass and survival rates, and can
help predict outcomes of hypothetical crosses of genotypes maintained in the seedbank (Chen et
al. 2014, Budhlakoti et al. 2022). Therefore, establishing a giant kelp seedbank is paramount for
the de novo domestication process, and involves three steps: 1) selecting the founder populations
87
for the seedbank 2) phenotyping the individuals in the seedbank on a giant kelp test farm 3)
genotyping the individuals used in the crosses grown on the farm.
Population Selection for the Giant Kelp Seedbank
Selecting specific founder populations during de novo domestication can also help to
mitigate pitfalls of selection based domestication such as reduced genetic diversity in the
domesticated cultivar (Doebly et al. 2006). Microsatellite analysis in Ecklonia radiata, another
species of kelp, showed kelp forests with higher genetic diversity were more resilient to thermal
stress like heatwaves (Wernberg et al. 2018). In order to maximize the genetic diversity in the
giant kelp seedbank, the evolutionary and demographic history of kelp needs to be examined.
Kelp originated in the northern Pacific about six million years ago, and then spread southwards,
eventually crossing the equator (Starko et al. 2019, Schiel and Foster 2015). Early analysis of
ITS1 and ITS2 of rRNA work showed kelp had more genetic diversity in the Northern
hemisphere compared to Southern hemisphere (Coyer et al. 2001). Microsatellite sequencing of
giant kelp populations in the Northern hemisphere revealed a biodiversity hotspot in Southern
California, compared to both Southern hemisphere populations and Northern Pacific populations
(Johansson et al. 2015). The higher levels of genetic diversity in giant kelp populations in
Southern California guides our selection for founding populations for our giant kelp seedbank to
four Southern California populations: Leo Carrillo, Camp Pendleton, Arroyo Quemado, and
Catalina Island. The seedbank consists of 600 distinct gametophyte cultures (500 female, 100
male), with 66 gametophytes from Arroyo Quemado, 75 gametophytes from Camp Pendleton, 59
gametophytes from Catalina Island, and 400 gametophytes from Leo Carrillo.
Farm Design, Location, and Phenotyping Results
88
While detailed descriptions and results from the giant kelp farm experiment can be found
in Wade et al. (In Preparation), the crossing design and phenotypic results will be briefly
summarized here. The giant kelp test farm was located ~1 km off the coast near Santa Barbara,
CA and 2500 total individual sporophyte seedlines were planted in May 2019 (Wade et al. In
Preparation). Gametophytes in the seed bank were vegetatively propagated to produce enough
biomass for crosses for the farm. Female gametophytes from each of the four founding
populations (between 45-345 per population) were crossed with a single male gametophyte from
the Leo Carrillo population for a total of 500 unique crosses. Each cross was replicated five
times and seeded onto lines, for a total of 2500 individual seed lines with juvenile sporophytes
that were then attached to ropes on the farm in May 2019 (Wade et al. In Preparation). The giant
kelp farm was harvested in September 2019.
Phenotypic data such as survival, total biomass (wet), and blade number were recorded
for all individuals on the giant kelp farm within one day of harvest, and analyzed the chemical
composition of a subset of individuals (Wade et al. In Preparation). Resulting analysis showed
some heritability in traits such as biomass and sugar content, which is encouraging for breeding
efforts (Wade et al. In Prep). In particular, the phenotypic variance and population specific
differences in sugar and ash content is encouraging for domestication efforts as metabolite
accumulation through interruption of a biosynthesis pathway is an early domestication target
(Wade et al. In Preparation, Doebly et al. 2006)
Genomic Data Collection
A collaborator at University of Wisconsin-Milwaukee, Gabriel Montecinos, maintained
the seedbank generated during this project. Another collaborator, Melisa Osborne, summarized
the extraction and sequencing protocol in her work investigating how gametophyte microbial
89
load affects farm outcomes (Osborne et al. In Review). The descriptions of gametophyte
culturing methods, gametophyte DNA extraction, and DNA sequencing protocol for this
experiment can be found in the Osborne publication (Osborne et al. In Review). Briefly,
gametophytes in the seed bank were vegetatively propagated to increase biomass, DNA was
extracted from the gametophytes, and gametophyte DNA was sequenced by BGI by an Illumina
Novaseq S4, producing 11.2 GB of 150 base pair reads (Osborne et al. In Review).
The Osborne paper and the work described below diverge completely once the
gametophyte DNA from the giant kelp seedbank was sequenced. The work described below is
preliminary analysis of the genetic variation data from the giant kelp seedbank, and a pivot to an
organelle based comparison of data sets initially performed for quality control purposes.
Results
Standing Genetic Variation in the Seed Bank
The immediate goal of the study described in this chapter is to produce genetic variation
data, specifically single nucleotide polymorphisms (SNPs), of individual haploid genotypes that
were successfully phenotyped on the giant kelp farm in Santa Barbara (Wade et al. In
Preparation). In order to accomplish this, I aligned Illumina reads generated from 559 individual
genotypes of giant kelp from four Southern California populations to the draft giant kelp genome
discussed in Chapter II (~x2 coverage across the genome). After calling and filtering
polymorphic data, this seedbank genetic variation data set was generated for downstream
genotype-phenotype modeling applications, such as GWAS. The filtered variant call file (VCF)
contained 1794037 bi-allelic SNPs and 457 individuals, as individuals were only kept in the
analysis if they had phenotyped on the giant kelp farm (See Genomic Methods).
90
Two separate principal component analyses (PCAs) from Chapter I and Chapter II, using
the giant kelp gene models and the giant kelp nuclear genome as the respective reference,
revealed strong population structure between the three diploid populations from Arroyo
Quemado, Camp Pendleton, and Catalina Island. This strong population structure between these
populations concurred with the results of an earlier microsatellite study of giant kelp population
genetic diversity, which also predicted that Leo Carrillo giant kelp population was a hybrid zone
(Johansson et al. 2015). The PCA of the haploid seedbank filtered VCF (with PC1 describing
4.83% of variance and PC2 expaining 3.04% of variance) using genome-wide SNPs again
showed strong population structure, but contrary to the microsatellite study hypothesis, as Leo
Carrillo appears to be its own population [Fig 1].
Figure 1: A PCA of the genetic variation data of 457 individuals from the giant kelp seedbank
who were also phenotyped on the giant kelp farm. Previous analysis had shown Camp Pendleton
(CB) as its own population.
Organelle assembly and annotation
The ploidy difference between the gametophyte and sporophyte sequencing projects
complicates investigating the potential discrepancy between the diploid and haploid PCAs
91
(Chapter I, Chapter II). Organelle genomes can help address this problem as they have also been
used as references for resolving phylogenetic relationships and population structure (Wambugu
et al. 2015). Currently, a mitochondrial organelle genome from a giant kelp ecomorph,
Macrocystis integrefolia, is publicly available, but a giant kelp chloroplast genome is not (Chen
et al. 2018). A chloroplast genome is publicly available from a kelp closely related to giant kelp,
Saccharina japonica (Wang et al. 2013). However, the Pacbio sequencing used in the nuclear
assembly genome project also generated organelle reads that can be used to assemble giant kelp
organelle genomes from the same individual as the nuclear genome. Therefore, I de novo
assembled and annotated draft chloroplast and mitochondrial genomes from the same individual
that was sequenced for the construction of the giant kelp genome (Chapter II) (See Organelle
Assembly and Annotation section) [Fig2].
92
Figure 2: Assembled and annotated draft organelle genome assemblies of giant kelp. These
assemblies are based on organelles from the same individual used to construct the nuclear giant
kelp genome.
The giant kelp mitochondrial genome is 39499 base pairs long and contains 38 genes and 27
tRNAs (compared to 37,366 base pairs, 37 genes and 24 tRNAs for the Macrocystis integrefolia
mitochondrial genome) (Chen et al. 2019). The giant kelp chloroplast draft genome is 130, 214
base pairs long and contains 141 genes and 27 tRNAs (compared to 130, 584 base pairs, 139
genes, and 29 tRNAs in the Saccharina japonica chloroplast(Wang et al. 2013).
Diploid and Haploid Organelle Genetic Variation
These draft giant kelp organelle assemblies can then be used as reference sequences and
as a sequencing quality check to compare genetic variation across both the haploid seedbank and
93
the diploid population genetics sequencing projects. Genetic variants were called for all
individuals for both the giant kelp chloroplast and mitochondrial genomes, and then results were
visualized once again using PCA. Surprisingly, PCAs visualizing the organelle polymorphism
data did not show the strong population structure seen in previous chapters (Chapter I, Chapter
II) [Fig 3]. The population structures for both organelle genomes show that some haplotypes are
shared between populations, which is not the case with the nuclear genome population analyses.
The mitochondrial genetic variation PCA (PC1 explains 38.25% of the variance, PC2 explains
28.75 % of the variance) shows three distinct mitochondrial haplotypes, including a distinct
haplotype in the Leo Carrillo population. Three main haplotypes exist in the chloroplast genetic
variation PCA (PC1 explains 30.25% of the variance and PC2 explains 26.27% of the variance),
but the chloroplast again shows more genetic variation than the mitochondrial data. The
congruency between haploid and diploid individuals from the same population did offer
support for the seedbank sequencing data quality.
Figure 3: A PCA analysis of the genetic variation data of both mitochondrial and chloroplast
giant kelp genomes. Each plot contains data from both the giant kelp haploid seedbank and the
diploid population genetics study. Nuclear genetic data showed much strong population structure
compared to the organelle data.
0.00
0.05
0.10
−0.08 −0.04 0.00 0.04
PC1 30.25% Variance Explained
PC2 26.27% Variance Explained
Ploidy
diploid
haploid
Populations
AQ
CB
CI
LC
581 Individuals on Chloroplast Filtered 627 SNPs
94
Materials and Methods
Genetic Variation Analysis
559 gametophytes from the seedbank were successfully sequenced by BGI. Raw 150 base pair
Illumina reads were trimmed of adapters, low quality reads and tails using fastp set with standard
parameters (Chen et al 2018). Trimmed reads were aligned separately to the nuclear genome of
giant kelp presented in Chapter II of this thesis using hisat2 v2.1 with standard parameters (Kim
et al. 2019). Trimmed reads from both the seedbank data set and the diploid data set presented in
Chapter I of this thesis were aligned to the draft organelle genomes presented above, again using
hisat2 v2.1 with standard parameters (Kim et al. 2019). Subsequent steps were repeated on both
nuclear and organelle data sets. Bam files had their duplicates marked using the GATK4 v4.1.2
command “MarkDuplicates”, and then multiple bam files for a single individual genotype were
collapsed into a single bam file using samtools (Van der Auwera et al. 2013, Li et al. 2009).
Genetic variants were called using the GATK4 v4.1.2 with ploidy set to 1 (Van der Auwera et al.
2013). Individual GVCF files were then merged together and converted into a raw VCF file
containing variant information and used for downstream applications using GATK v4.1.2 (Van
der Auwera et al. 2013).
The raw nuclear VCF file containing 559 individuals were then filtered for downstream
genotype-phenotype modeling application based on the following parameters: 1) high quality bi-
allelic SNPs using vcftools and GATK4 best practices 2) the individual had to be successfully
phenotyped on the farm, reducing the data set to 1794037 bi-allelic SNPs and 457 individuals
from 26696822 polymorphisms and 559 individuals (Danecek et al. 2013, Van der Auwera et al.
2013). The chloroplast VCF file combined both haploid seedbank data and diploid sporophyte
data, and was filtered for population genetic analyses (no minor allele frequency threshold,
95
minimum coverage of x100 for each individual) using vcftools and GATK4 best practices,
reducing the chloroplast data set to 630 bi-allelic SNPs and 581 individuals. The mitochondrial
VCF file also combined the haploid and diploid data sets, and followed the same filtering
strategy as the chloroplast VCF (except the minimum coverage for each individual was x75),
reducing the mitochondrial data set to 580 bi-allelic SNPs and 539 individuals from 631
polymorphisms and 608 individuals (Danecek et al. 2013, Van der Auwera et al. 2013).Principal
component analyses on all filtered datasets were performed using SNPrelate and visualized using
R (Zheng et al. 2012, Wickham and Grolemund 2016).
Organelle Assembly and Annotation
The Pacbio reads used to assemble the giant kelp genome (Chapter II) also contained
organelle reads. To extract organelle reads, the Pacbio reads were aligned to the Macrocystis
integrefolia mitochondrial genome and Saccharina japonica chloroplast using minimap2 with
standard parameters (Chen et al. 2019, Wang et al. 2013, Li 2018). Pacbio reads that mapped to
the organelle genomes were then extracted from the fastq file using seqtk, and then these reads
were assembled using canu using standard parameters (Shen et al. 2016, Koren et al. 2017).
Assembly contiguity was verified using mummer to plot the assembled giant kelp genome and
chloroplast with the Macrocystis integrefolia mitochondria and the Saccharina japonica
chloroplast (Marçais et al. 2018). Assemblies were annotated using GeSeq and visualized using
OGDRaw (Tillich et al. 2017, Greiner et al. 2019)
Discussion
The genetic variation of the giant kelp seedbank, together with the phenotypic data from
the giant kelp farm, forms a powerful tool for the de novo domestication of giant kelp. Future
statistical modeling can identify regions of interest in the genome or candidate individuals for
96
crossing schemes. These regions of interest can either be under natural selection or regions that
are associated with specific phenotypes, like growth or survival (Olksyk et al. 2010). Mutations
in genes, such as transcriptional regulators and biosynthesis genes, seen as early domestication
targets in terrestrial crops, can quickly be investigated as the giant kelp genome used in this
study has already been annotated (Doebly et al. 2006). Additionally, organelle genetic variation
data can be used by itself or in conjunction with nuclear genetic variation data for future giant
kelp GWAS studies (Meng et al. 2019).
While the sequencing of seedbank to generate giant kelp genotype data for future
modeling was the purpose of this work, bacteria accounted for a majority of reads sequenced
(Osborne et al. In Review). Giant kelp can be considered a holobiont due to its close association
and symbiotic relationship with core bacteria (Florez et al. 2019). Unintentionally, this project
has started a new line of research examining the relationships between specific kelp genotypes
and their associated microbiome in the giant kelp seedbank (Osborne et al. In Review). While
more giant kelp coverage data could have been generated, the somewhat low coverage of the
reads (~x2 across the giant kelp genome) is buffered by the haploid state of gametophytes in the
seedbank, as each site can only have a single allele called. Therefore, the de novo domestication
of giant kelp can continue despite the bacterial load in the sequencing data.
The ability to vegetatively propagate the gametophytes in the giant kelp seedbank, and
thus maintain specific genotypes indefinitely, allows for exploring the possibility of both one-
step and traditional gradual domestication. One-step domestication involves clonal propagation
from a founder individual and has been successful in species such as the pineapple (2019). If
one-step domestication is applied to giant kelp, crosses of individuals maintained in the seedbank
can be repeated many times, producing sporophytes with identical genotypes. One-step
97
domestication is an intriguing option for specific giant kelp phenotypes such as biomass, as
genetic distance, which can be established amongst the individuals in the seedbank, has been
positively correlated to genetic distance in the related species Laminaria (Li et al. 2008). To
implement gradual domestication in giant kelp, spores from individuals with mutations in target
genes of interest can be crossed. The spores produced by the F1 sporophyte would be collected,
added to the seedbank, and genotyped to check for alleles of interest. While gradual
domestication is more time consuming than one-step domestication, superior cultivars can be
produced with combined multiple genetic variants of interest rather than one (Jian et al. 2022).
The giant kelp seedbank is also a valuable resource for conservation purposes and
demographic studies going forward. A giant kelp seedbank was established and maintained
explicitly for conserving genetic diversity in giant kelp populations in Chile to address increasing
global temperatures (but did not include genotyping and phenotyping of this seedbank) (Barrento
et al. 2015). Kelp has a global range, so adding populations from locations other than Southern
California will increase the genetic diversity in the seedbank and may reveal novel local
adaptation strategies (Buschmann et al. 2007). Additionally, further analysis looking at
population genetic parameters such as pairwise genetic diversity, Fst, and, Tajima’s D can now
be performed. The discrepancy between the population structures of Southern California giant
kelps, between the nuclear versus organelle genomes seen in the analysis above, should be
investigated further.
Conclusion
The genetic variation of the giant kelp seedbank helps the de novo domestication process
by providing necessary inputs of data to statistical models to help design new crosses of interest
from individuals in the seedbank. With this data, phenotypic outputs like biomass can even be
98
predicted for hypothetical crosses, which will further the domestication efforts. Determining
relationships between gametophyte genotype and other phenotypes, such as microbial load in the
seedbank or thermal tolerance of the gametophytes, is now possible with this seedbank genetic
variation data.
99
References
Abbo S, Lev-Yadun S, Gopher A. 2010. Agricultural origins: centers and noncenters; a Near
Eastern reappraisal. Critical Reviews in Plant Science 29, 317–328.
Arioli T, Mattner, SW, Winberg PC. 2015. Applications of seaweed extracts in Australian
agriculture: past, present and future. J. Appl. Phycol. 27, 2007–2015. doi: 10.1007/s10811-015-
0574-9
Bar-Yosef O and Belfer-Cohen A. 1992. From foraging to farming in the Mediterranean Levant.
Transitions to agriculture in prehistory. 21-48.
Barrento S, Camus C, Sousa Pinto I, Buschmann AH. 2015. Germplasm banking of the giant
kelp: Our biological insurance in a changing environment. Algal Research. 13. 13-140.
10.1016/j.algal.2015.11.024.
Becheler R, Haverbeck D, Clerc C, Montecinos G, Valero M, Mansilla A, Faugeron S. 2022.
Variation in Thermal Tolerance of the Giant Kelp’s Gametophytes: Suitability of Habitat,
Population Quality or Local Adaptation? Front. Mar. Sci. 9:802535. doi:
10.3389/fmars.2022.802535
Budhlakoti N, Kushwaha AK, Rai A, Chaturvedi KK, Kumar A, Pradhan AK, Kumar U, Kumar
RR, Juliana P, Mishra DC, Kumar S. 2022. Genomic Selection: A Tool for Accelerating the
Efficiency of Molecular Breeding for Development of Climate-Resilient Crops. Front. Genet.
13:832153. doi: 10.3389/fgene.2022.832153
Buschmann, Alejandro & Graham, Michael & Vasquez, Julio. (2007). Global Ecology of the
Giant Kelp Macrocystis. 10.1201/9781420050943.ch2.
Camus C, Ballerino P, Delgado R, Olivera-Nappa Á, Leyton C, Buschmann AH. 2016. Scaling
up bioethanol production from the farmed brown macroalga Macrocystis pyrifera in Chile.
Biofuels Bioprod. Bioref. 10, 673–685. doi: 10.1002/bbb.1708
Chen E, Huang X, Tian Z, Wing RA, Han B. 2019. The Genomics of Oryza Species Provides
Insights into Rice Domestication and Heterosis. Annu Rev Plant Biol. 70:639-665.
Chen, Jun & Zang, Yu & Shang, Shuai & Tang, Xuexi. 2019. The complete mitochondrial
genome of the brown alga Macrocystis integrifolia (Laminariales, Phaeophyceae). Mitochondrial
DNA Part B. 4. 635-636. 10.1080/23802359.2018.1495114.
Chen, LY., VanBuren, R., Paris, M. et al. 2019. The bracteatus pineapple genome and
domestication of clonally propagated crops. Nat Genet 51, 1549–1558.
Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics. 34(17):i884-i890.
100
Chen W, Gao Y, Xie W, Gong L, Lu K, Wang W, Li Y, Liu X, Zhang H, Dong H, Zhang W,
Zhang L, Yu S, Wang G, Lian X, Luo J. 2014. Genome-wide association analyses provide
genetic and biochemical insights into natural variation in rice metabolism. Nat Genet. 46(7):714-
21.
Correa T, Gutiérrez A, Flores R, Buschmann AH, Cornejo P, Bucarey C. 2016). Production and
economic assessment of giant kelp Macrocystis pyrifera cultivation for abalone feed in the south
of Chile. Aquac Res, 47: 698-707. https://doi.org/10.1111/are.12529
Coyer JA, Smith GJ, Andersen RA. 2001. Evolution of Macrocystis spp. (Phaeophyceae) as
determined by ITS1 and ITS2 sequences. J. Phycol. 37:574–85.
Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., et al. 2011. The
variant call format and VCFtools. Bioinformatics 27, 2156–2158. doi:
10.1093/bioinformatics/btr330
Doebley JF, Gaut BS, Smith BD. 2006. The molecular genetics of crop domestication. Cell.
127(7):1309-21. doi: 10.1016/j.cell.2006.12.006. PMID: 17190597.
Dong, Y., Yang, X., Liu, J. et al. 2014. Pod shattering resistance associated with domestication is
mediated by a NAC gene in soybean. Nat Commun 5, 3352.
Evans SM, Vergés A, Poore AGB. 2017. Genotypic diversity and short-term response to shading
stress in a threatened seagrass: Does low diversity mean low resilience? Front Plant Sci 8:1417.
Florez JZ, Camus C, Hengst MB, Marchant F, Buschmann AH. 2019. Structure of the epiphytic
bacterial communities of Macrocystis pyrifera in localities with contrasting nitrogen
concentrations and temperature. Algal research, 44, 101706. doi: 10.1016/j.algal.2019.101706
Fernie A.R. and Yan J. 2019. De novo domestication: an alternative route toward new crops for
the future. Mol. Plant 12: 615–631.
Goecke F, Klemetsdal G and Ergon Å 2020 Cultivar Development of Kelps for Commercial
Cultivation—Past Lessons and Future Prospects. Front. Mar. Sci. 8:110.
Graham Bell, The evolution of the life cycle of brown seaweeds, Biological Journal of the
Linnean Society, Volume 60, Issue 1, January 1997, Pages 21–38.
Greiner S, Lehwark P and Bock R (2019) OrganellarGenomeDRAW (OGDRAW) version 1.3.1:
expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Research
47: W59-W64
Hameury S, Borderie L, Monneuse JM, Skorski G, Pradines D. 2019. Prediction of skin anti-
aging clinical benefits of an association of ingredients from marine and maritime origins: ex
vivoevaluation using a label-free quantitative proteomic and customized data processing
approach. J. Cosmet. Dermatol. 18, 355–370. doi: 10.1111/jocd.12528
101
Hughes AR, Inouye BD, Johnson MTJ, Underwood N, Vellend M. 2008. Ecological
consequences of genetic diversity. Ecol. Lett. 11, 609–623.
Hwang EK, Yotsukura N, Pang SJ, Su L, Shan, TF. 2019. Seaweed breeding programs and
progress in eastern Asian countries. Phycologia 58, 484–495.
Jian L, Yan J, Liu J. 2022. De Novo Domestication in the Multi-Omics Era. Plant Cell Physiol.
63(11):1592-1606. doi: 10.1093/pcp/pcac077. PMID: 35762778.
Johansson ML, et al. 2015. Seascape drivers of Macrocystis pyrifera population genetic structure
in the northeast Pacific. Mol. Ecol. 24, 4866–4885. doi: 10.1111/mec.13371
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and
genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915. doi:
10.1038/s41587-019-0201-4
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable
and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome
Res. 27(5):722-736. doi: 10.1101/gr.215087.116.
Kumar GR, Sakthivel K, Sundaram RM, Neeraja CN, Balachandran SM, Rani NS, Viraktamath
BC, Madhav MS. 2010. Allele mining in crops: prospects and potentials. Biotechnol Adv.
28(4):451-61.
Lemmon Z.H., Reem N.T., Dalrymple J., Soyk S., Swartwood K.E., Rodriguez-Leal D., et
al. 2018. Rapid improvement of domestication traits in an orphan crop by genome editing. Nat.
Plants 4: 766–770.
Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics.
34:18,3094–3100.
Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin
R. and 1000 Genome Project Data Processing Subgroup. 2009 The Sequence alignment/map
(SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
Li, Xiaojie & Yang, Guanpin & Shi, Yuanyuan & Cong, Yizhou & Che, Shi & Qu, Shancun &
Li, Zhiling. 2008. Prediction of the heterosis of Laminaria hybrids with the genetic distance
between their parental gametophyte clones. Journal of Applied Phycology. 20. 1097-1102.
10.1007/s10811-008-9321-9.
Liu, F., Sun, X., Wang, F. et al. 2014. Breeding, economic traits evaluation, and commercial
cultivation of a new Saccharina variety “Huangguan No. 1”. Aquacult Int 22, 1665–1675.
https://doi.org/10.1007/s10499-014-9772-8
Lüning K, and Dring MJ. 1975. Reproduction, growth and photosynthesis of gametophytes of
Laminaria saccharina grown in blue and red light. Mar. Biol. 29, 195–200. doi:
10.1007/bf00391846
102
Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. 2018. MUMmer4: A
fast and versatile genome alignment system. PLoS computational biology. 14(1):e1005944.
Makkar, H. P. S., Tran, G., Heuze, V., Giger-Reverdin, S., Lessire, M., Lebas, F., et al. 2016.
Seaweeds for livestock diets: a review. Anim. Feed Sci. Technol. 212, 1–17. doi:
10.1016/j.anifeedsci.2015.09.018
Mallakpour S, Azadi E, Hussain CM. 2021. Chitosan, alginate, hyaluronic acid, gums, and β-
glucan as potent adjuvants and vaccine delivery systems for viral threats including SARS-CoV-
2: A review. Int J Biol Macromol. 182:1931-1940. doi: 10.1016/j.ijbiomac.2021.05.155.
Marten GG. 2001. Human Ecology: Basic Concepts for Sustainable Development (1st ed.).
Routledge.
Meng X, Li L, Pascual J, Rahikainen M, Yi C, Jost R, He C, Fournier-Level A, Borevitz J,
Kangasjärvi S, Whelan J, Berkowitz O. 2022. GWAS on multiple traits identifies mitochondrial
ACONITASE3 as important for acclimation to submergence stress. Plant Physiol.188(4):2039-
2058.
Neushul, M. 1987. “Energy from marine biomass: The historical record”. In Bird, Kimon T.;
Benson, Peter H. (eds.). Seaweed cultivation for renewable resources. Elsevier. pp. 1–37.
North, W. J. 1987. “Biology of the Macrocystis resource in North America,” in Case Studies of
Seven Commercial Seaweed Resources, eds M. S. Doty, J. F. Caddy, and B. Santelices (San
Francisco, CA: FAO).
Oleksyk TK, Smith MW, O'Brien SJ. 2010. Genome-wide scans for footprints of natural
selection. Philos Trans R Soc Lond B Biol Sci. 365(1537):185-205.
Osborne, MG, Molano, G, Simons, A, Dao, V, Ong B, Vong B, Singh A, Montecinos GJ,
Alberto, F, Nuzhdin SV. (In Review) Natural variation of Macrocystis pyrifera gametophyte
germplasm culture microbiomes and applications for improving yield in offshore farms. Journal
of Phycology.
Pingali PL. 2012. Green revolution: impacts, limits, and the path ahead. Proc Natl Acad Sci U S
A. 109(31):12302-8. doi: 10.1073/pnas.0912953109.
Redmond, Sarah; Green, Lindsay; Yarish, Charles; Kim, Jang; and Neefus, Christopher, "New
England Seaweed Culture Handbook" (2014). Seaweed Cultivation. 1.
Schiel DR, Foster MS. 2015. The biology and ecology of giant kelp forests. Oakland, CA:
University of California Press.
103
Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q
File Manipulation. PLoS One. 11(10):e0163962. doi: 10.1371/journal.pone.0163962. PMID:
27706213; PMCID: PMC5051824.
Starko S, Soto Gomez M, Darby H, Demes KW, Kawai H, Yotsukura N, Lindstrom SC, Keeling
PJ, Graham SW, Martone PT. 2019 A comprehensive kelp phylogeny sheds light on the
evolution of an ecosystem. Mol Phylogenet Evol.136:138-150. doi:
10.1016/j.ympev.2019.04.012.
Tanna B, and Mishra A. 2018. Metabolites unravel nutraceutical potential of edible seaweeds: an
emerging source of functional food. Compr. Reviews in Food Sci. Food Saf. 17, 1613–1624. doi:
10.1111/1541-4337.12396
Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R and Greiner S. 2017.
GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45:
W6-W11
United Nations Department of Economic and Social Affairs, Population Division. 2022. World
Population Prospects 2022: Summary of Results. UN DESA/POP/2022/TR/NO. 3.
Van der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine,
A., et al. 2013. From FastQ data to high confidence variant calls: the genome analysis toolkit
best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33.
Vavilov NI. 1987. Origin and geography of cultivated plants. Translated to English by D. Love
1992. Cambridge, UK: Cambridge University Press.
Wade R, Augyte S, Harden M, Nuzhdin S, Yarish C, Alberto F. 2020. Macroalgal germplasm
banking for conservation, food security, and industry. PLoS Biol 18(2): e3000641.
https://doi.org/10.1371/journal.pbio.3000641
Wade R., Yorke C.E., Montecinos G.J., Camus, C., Nuzhdin S.V., Miller R.J., Reed, D.
C., Alberto, F. Heritability of biomass & chemical composition in the giant kelp Macrocystis
pyrifera supports the development of its genomic breeding program. In preparation.
Walters, C, Pence, VC. 2020. The unique role of seed banking and cryobiotechnologies in plant
conservation. Plants, People, Planet. 3: 83– 91.
Wambugu PW, Brozynska M, Furtado A, Waters DL, Henry RJ. 2015. Relationships of wild and
domesticated rices (Oryza AA genome species) based upon whole chloroplast genome
sequences. Sci Rep. 5:13957.
Wang X, Shao Z, Fu W, Yao J, Hu Q, Duan D. 2013. Chloroplast genome of one brown
seaweed, Saccharina japonica (Laminariales, Phaeophyta): its structural features and
phylogenetic analyses with other photosynthetic plastids. Mar Genomics. 10:1-9
104
Weeden NF. 2018. Domestication of Pea (Pisum sativum L.): The Case of the Abyssinian Pea.
Front Plant Sci. 9:515. doi: 10.3389/fpls.2018.00515. PMID: 29720994; PMCID: PMC5915832.
Wernberg, T., Coleman, M.A., Bennett, S. et al. 2018. Genetic diversity and kelp forest
vulnerability to climatic stress. Sci Rep 8, 1851.
Wickham, H., and Grolemund, G. 2016. R for Data Science: Import, Tidy, Transform, Visualize,
and Model Data. Sebastopol, CA: O’Reilly Media, Inc.
Xu Z, Dapeng L, Hanhua H, Tianwei T. 2005. Growth promotion of vegetative gametophytes of
Undaria pinnatifida by blue light. Biotechnol Lett. 27(19):1467-75.
Ye M., Peng Z., Tang D., Yang Z., Li D., Xu Y., et al. 2018. Generation of self-compatible
diploid potato by knockout of S-RNase. Nat. Plants 4: 651–654.
Ye X, Al-Babili S, Klöti A, Zhang J, Lucca P, Beyer P, Potrykus I. 2000. Engineering the
provitamin A (beta-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm.
Science. 287(5451):303-5. doi: 10.1126/science.287.5451.303. PMID: 10634784.
Zeder MA. 2015. Core questions in domestication research. Proc Natl Acad Sci U S A.
112(11):3191-8.
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. 2012. A high-performance
computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics.
28(24):3326-8.
105
Dissertation Conclusion
This dissertation has contributed to the genetic toolbox of giant kelp by generating
multiple different reference sequences that can be used for genetic variation studies.
Additionally, this dissertation presents polymorphism data for both diploid and haploid giant
kelp. The polymorphism data of the giant kelp seedbank combined with phenotype data on the
farm will aid in the crossing designs of subsequent test farms.
This dissertation presents the current progress with the de novo domestication of giant
kelp and has identified the next steps to be taken. Retrospective analysis of domesticated
terrestrial crops will continue to help guide the giant kelp team at the Nuzhdin lab in its mission
to domesticate giant kelp.
106
My Contributions:
My role in this broad project focused on domesticating giant kelp started as the initial graduate
student working on brown macroalgae genomics in the Nuzhdin Lab. I recruited three students
directly to the lab to work on the giant kelp domestication project, and successfully wrote a grant
to work on sterilizing kelp using naturally found mutations in the giant kelp seedbank, in
addition to the work described below.
My Role in Chapter 1:
I sampled the sporophyte tissue from 30 individuals from the bay in Two Harbors,
Catalina Island. I overnighted the tissue to the University of Wisconsin-Milwaukee for axenic
and drying treatments. I received the dried tissue, and used the dried tissue to optimize DNA
extraction. The optimization process took months, as levels of polysaccharides and polyphenols
found in the giant kelp tissue would bind to DNA and block downstream enzymatic activities. I
extracted DNA and used a combination of Qbit, Nanodrop, and PCR results to check the quality
of the extracted DNA. I constructed DNA libraries for Illumina Sequencing using the Kapa
Hyperplus kit, and then sent them to Novagene for sequencing.
I assembled a transcriptome using the program Abyss (we did not end up using it going
forward as José’s Trinity assembly was better based on N50 and BUSCO). I wrote the collapse
script that worked on decreasing the number of duplicated gene models in the data set. I used
BUSCO and quast to compare our assemblies at different stages against each other and the
Ectocarpus siliculosus genome. I used blobtools to eliminate contigs that were deemed
contaminants. I trimmed the reads from Novagene and aligned them to the gene model data set. I
called SNPs using GATK4 and filtered the SNP data set using GATK4 and vcftools. I calculated
pairwise genetic diversity for each life stage and overall using vcftools. I performed the PCA
107
using SNPrelate and the filtered VCF file. I also generated the heatmap showing the Nei’s
genetic distance between each individual in the data set and calculated H
s
for each population
and H
t
for all populations using adegenet. I then compared total SNP count between and within
each population using vcftools. I wrote the majority of the published manuscript, including the
abstract, introduction, and discussion.
My Role in Chapter 2:
After José had assembled the genome from the Pacbio reads in the canu pipeline, I
removed contigs that were probable contaminants based on GC content, size, and blast results
using blobtools. I coordinated with Phase Genomics regarding the Hi-C scaffolding of our
genome, including multiple iterations of the giant kelp genome depending on contig filtering. I
used BUSCO and Quast to compare our genome with other available brown macroalgae
genomes. I used k-mer frequencies to estimate the genome size using the corrected Pacbio reads.
I led the coordination with JGI to finalize the contigs used in the genome. After
reviewing contigs eliminated as contaminants using blast and blobtools, we added ~40mb more
of giant kelp genome, which contained ~4,000 giant kelp genes. After finalizing the genome, I
aligned the diploid Illumina reads used in Chapter 1 to the giant kelp genome (instead of the
gene models). I called SNPs using GATK4 and then filtered the VCF file using vcftools and
GATK4. I then phased the VCF file and calculated the linkage disequilbrium for the three
Southern California kelp populations together and separately. I calculated pairwise genetic
diversity for each population of giant kelp and all populations together. I performed PCA using
SNPrelate and the filtered VCF file. I then checked the population structure by using the program
FASTSTRUCTURE and various values of k. I used blast to identify the region in scaffold two
108
that contains the giant kelp sex determining region. I wrote a majority of the manuscript in
preparation, including the introduction, abstract, and discussion.
My Role in Chapter 3:
I assisted in the farm phenotyping in Fall 2019, including phenotyping the kelp on land
for traits such as total biomass, number of blades and stipes, blade and stipe weight, etc. I also
swabbed individual kelp blades three times each with Melisa to generate biofilm samples for all
crosses on the farm.
I coordinated with the University of Wisconsin-Milwaukee and BGI and received the raw
seedbank Illumina data. I aligned the sequences to the scaffolded giant kelp genome and called
SNPs using the GATK4 pipeline, and filtered the corresponding VCF using GATK4 and
vcftools. I noticed that a majority of the reads were bacterial from the giant kelp seedbank
sequencing. I started investigating the microbiome component in the reads, and helped start the
project described in Osborne et al. (In Prep). I performed PCA of the haploid VCF file using
SNPrelate.
I assembled the chloroplast and mitochondrial genomes using canu. I annotated the
genomes using Chlorobox and checked their contiguity by aligning the genomes to publicly
available brown macroalgal organelle sequences using mummer. I aligned both the haploid and
diploid data sets to both the mitochondrial and chloroplast genomes. I then called SNPs on both
organelle reference genomes using GATK4. I then performed a PCA on both filtered organelle
VCFs, revealing different population structure between the nuclear, chloroplast, and
mitochondrial genomes. I wrote the complete manuscript in Chapter 3, which is in the early
stages before publication.
Abstract (if available)
Abstract
The need to develop new crops to help sustain the growing human population impels the current effort to domesticate new species such as Macrocystis pyrifera (giant kelp). Giant kelp is an ecologically important brown macroalgae that grows incredibly quickly, and as a keystone species with global range, provides habitat for hundreds of animals and invertebrate species. Lessons from earlier domestication, genetics, and farm practices in other brown macroalgae can be applied to help expedite the domestication of giant kelp, but the lack giant kelp genomic resources hamper these efforts. Here, I present multiple giant kelp genomic advances: 1) I compared selection on the different life stages in giant kelp and confirmed the population structure of Southern California kelp using Illumina sequencing data and a giant kelp gene model data set composed of genes conserved between giant kelp and the model brown macroalgae, Ectocarpus siliculosus. 2) I expanded the giant kelp genomic toolkit by assembling, scaffolding, and annotating a high quality giant kelp genome that can help identify genes and markers essential for domestication, and calculated giant kelp linkage disequilibrium and pairwise genetic diversity in Southern California kelp populations. 3) I assessed the genetic variation within 559 genotypes of Southern California giant kelp gametophytes that compose our seedbank. Developing genetic resources for giant kelp is paramount for accelerating the domestication effort, as an annotated reference genome combined with germplasm genotype data can be used for identifying genes and markers essential for domestication success.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Exploring the genomic landscape of giant kelp: biotechnological implications and sustainable development
PDF
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
PDF
Ecologically responsible domestication of kelp facilitated by genomic tools
PDF
Diversity and dynamics of giant kelp “seed-bank” microbiomes: Applications for the future of seaweed farming
PDF
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
PDF
Genetic architectures of phenotypic capacitance
PDF
Population genetics and recruitment of the kelp bass, Paralabrax clathratus
PDF
The evolution of gene regulatory networks
PDF
Robustness and stochasticity in Drosophila development
PDF
From gamete to genome: evolutionary consequences of sexual conflict in house mice
PDF
A population genomics approach to the study of speciation in flowering columbines
PDF
A multi-omics investigation into breeding shellfish for ocean acidification resilience in the California current system
PDF
Understanding the genetics, evolutionary history, and biomechanics of the mammalian penis bone
PDF
Exploring the genetic basis of quantitative traits
PDF
Complex mechanisms of cryptic genetic variation
PDF
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
PDF
Evolutionary genomic analysis in heterogeneous populations of non-model and model organisms
PDF
Long term evolution of gene duplicates in arabidopsis polyploids
PDF
Natural variation of Arabidopsis thaliana methylome and its impact on genome evolution
Asset Metadata
Creator
Molano, Gary
(author)
Core Title
Developing genetic tools to assist in the domestication of giant kelp
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2023-05
Publication Date
01/13/2023
Defense Date
12/09/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
domestication,genomics,giant kelp,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nuzhdin, Sergey (
committee chair
), Arnheim, Norman (
committee member
), Dean, Matt (
committee member
), Kiefer, Dale (
committee member
)
Creator Email
gmolano@usc.edu,gpemolano@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112716015
Unique identifier
UC112716015
Identifier
etd-MolanoGary-11413.pdf (filename)
Legacy Identifier
etd-MolanoGary-11413
Document Type
Dissertation
Format
theses (aat)
Rights
Molano, Gary
Internet Media Type
application/pdf
Type
texts
Source
20230118-usctheses-batch-1001
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
domestication
genomics
giant kelp