Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Ecological patterns of free-living and particle-associated prokaryotes, protists, and viruses at the San Pedro Ocean Time-series between 2005 and 2018
(USC Thesis Other)
Ecological patterns of free-living and particle-associated prokaryotes, protists, and viruses at the San Pedro Ocean Time-series between 2005 and 2018
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Ecological patterns of free-living and particle-associated
prokaryotes, protists, and viruses at the San Pedro Ocean
Time-series between 2005 and 2018
Yi-Chun Yeh
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirement for the Degree
DOCTOR OF PHILOSOPHY
(BIOLOGY)
May 2022
ii
Acknowledgments
My dissertation is a product of many years of hard work that was supported by so many
people in my life. I would first thank my advisor Jed Fuhrman for allowing me to learn how to
do research in his lab. His wisdom, curiosity, and enthusiasm always encouraged me to be a
better scientist. He supported me even when I was in the darkest time of my life. I remember
when my father passed away, he kindly asked me if I needed any help. Even though I did not
ask for anything, I still deeply appreciate his nice gesture. It means such much to me even till
now. Because of his kindness and support, I was able to comfortably do all kinds of research in
his lab. I sincerely appreciate the time and effort he put into my research work.
I thank both my qualifying and dissertation committee members: Naomi Levine, David
Hutchins, Carly Kenkel, John Heidelberg, Jacob Bien, and Fengzhu Sun. Thank you all for sharing
me your insights and perspective which greatly improved my research and vision. I also extend
my gratitude to staff in the MEB program, especially Don Bingham, Adolfo Dela Rosa, Douglas
Burleson, and Linda Bazilian. Thank you all for taking care of all the tiny details.
I would like to acknowledge the wonderful Fuhrman lab members: Erin Fichot, David
Needham, Ella Sieradzki, Lyria Berdjeb, Alma Parada, Nathan Ahlgren, Julio Cesar Ignacio, Vicky
Zou, Andrew Long, Shengwei Hou, Jesse McNichol, Colette Fletcher-Hoppe, Melody Aleman,
Alexandra Santora, Delaney Nolin, Sarah Laperriere, and JL Weissman. I feel so grateful that I
join this lab and meet with those wonderful people. They helped me blend into this new
environment since the first day I step into this lab. Discussing research and having SPOT day
with them is one of the best memories of my Ph.D.
iii
I would also like to thank my family and friends. I thank my parents for always supporting
my choice and my love. My dad told me that he always supports me as long as I’m excited
about what I’m going to do every morning when I wake up. I’ll keep that in mind. I also thank
my old and new friends. Thank you all for being in my life, chatting about our dreams, and going
hiking with me.
This dissertation work would not have been possible without the fieldwork support from
the Captain and Crew of the R/V Yellow Fin, Troy Gunderson, and the former and current
members of the SPOT teams. Finally, I would like to acknowledge my funding sources:
Taiwanese Government Scholarship, USC Dornsife, National Science Foundation, the Gordon
and Betty Moore Foundation, and the Simons Foundation.
iv
Table of Contents
Acknowledgments ................................................................................................................................................................... ii
Abstract ................................................................................................................................................................................... v
Introduction ............................................................................................................................................................................. 1
Chapter 1: Comprehensive single-PCR 16S and 18S rRNA community analysis validated with mock communities, and
estimation of sequencing bias against 18S ............................................................................................................................... 7
Abstract ........................................................................................................................................................................................ 8
Introduction ................................................................................................................................................................................. 8
Results and Discussion ............................................................................................................................................................... 12
Materials and Methods ............................................................................................................................................................. 22
References .................................................................................................................................................................................. 26
Figures ........................................................................................................................................................................................ 32
Supplementary information ...................................................................................................................................................... 40
Chapter 2: Temporal and depth dynamics of microbial communities across three domains over 14 years at the San-Pedro
Ocean Time-series ................................................................................................................................................................. 58
Abstract ...................................................................................................................................................................................... 59
Introduction ............................................................................................................................................................................... 59
Results ........................................................................................................................................................................................ 61
Discussion .................................................................................................................................................................................. 68
Conclusions ................................................................................................................................................................................ 76
Materials and Methods ............................................................................................................................................................. 77
References .................................................................................................................................................................................. 80
Figures and Tables ..................................................................................................................................................................... 91
Supplementary information .................................................................................................................................................... 101
Chapter 3: Effects of phytoplankton, viral communities and El Niño warming on free-living and particle-associated marine
prokaryotic community structure ......................................................................................................................................... 112
Abstract .................................................................................................................................................................................... 113
Introduction ............................................................................................................................................................................. 114
Results ...................................................................................................................................................................................... 116
Discussion ................................................................................................................................................................................ 121
Materials and Methods ........................................................................................................................................................... 127
References ................................................................................................................................................................................ 132
Figures and Tables ................................................................................................................................................................... 143
Supplementary information .................................................................................................................................................... 150
Summary ............................................................................................................................................................................. 153
v
Abstract
Marine microbes perform a wide range of biogeochemical process, and thus
understanding microbial community dynamics is critically important. My dissertation
investigates microbial communities at the San Pedro Ocean Time-series (SPOT) location over
the period 2005-2018 from the surface to near bottom. Community composition was analyzed
by sequencing 16S and 18S rRNA genes from two size fractions (0.2-1 and 1-80 μm), with 3-
domain universal primers, which allowed me to analyze eukaryotes as well as free-living and
particle-associated prokaryotes simultaneously. My research first validated this methodology
with 16S and 18S mock communities, and found that this primer set quantitatively recovered
both 16S and 18S mock community composition. However, due to sequencing bias against the
longer 18S amplicons, a two-fold correction factor was required to estimate the true proportion
of each category. I thus applied this to SPOT samples and found that >92% of sequences in the
small size fraction were prokaryotic 16S, and the larger size fraction contained 6-34% 18S, 19-
23% chloroplast 16S, and 46-93% prokaryotic 16S rRNA genes. Throughout the time-series,
prokaryotes were relatively stable temporally compared to protists. Particle-associated and/or
large-celled prokaryotes were more diverse than the smaller free-living ones, especially at the
deeper depths, which I ascribed to endemic taxonomic groups and niche differentiation of
SAR11 and Flavobacteriales. Further, by also including viral metagenomics into the overall
analyses, I concluded that changes in free-living and particle-associated prokaryotic community
structure at 5 m depth were to a large extent driven by the bottom-up control of phytoplankton
communities, while changes in free dsDNA viruses could have been to some extent following
the changes in prokaryotic communities. In addition, a warming event due to El Niño in 2014-
vi
2015 resulted in significant changes in prokaryotic populations, especially for cyanobacteria,
shifting from cold-water ecotypes to warm-water ecotypes. This dissertation work shows how
long-term observations help us better understand microbial community responses to climate
change.
1
Introduction
Marine ecosystems are increasingly affected by anthropogenic disturbances such as
climate change (Halpern et al 2015). Predicting ecosystem responses relies on a better
understanding of community dynamics. However, the majority of studies have focused on
higher trophic levels (Doney et al 2012, Hoegh-Guldberg and Bruno 2010, Nabe-Nielsen et al
2018), and the effects on prokaryotic communities remain unclear. As we know that marine
microbial food webs are responsible for a significant fraction of production, respiration and
trophic transfer (Fuhrman and Caron 2016, Gasol and Kirchman 2018), understanding the
response of marine microbial community to climate change is critically important. Long-term
observations are particularly useful for investigating the complex interactions in the real world.
The best-known microbial ocean time series programs include the Hawaii Ocean Time-
series (HOT), the Bermuda Atlantic Time Series (BATS), the Blanes Bay Microbial Observatory
(BBMO), and the San Pedro Ocean Time-series (SPOT). Of these, SPOT is relatively close to the
coast, with a clear seasonality. SPOT has conducted monthly sampling of the water column
since 2000, so it provides a unique opportunity to examine the spatiotemporal variation of
prokaryotic communities in a subtropical ocean margin system. Moreover, the SPOT location
experienced strong El Niño during 2014-2015, so it is a valuable system to investigate effects of
rising temperatures and their accompanying oceanographic changes, including reduced
amplitude of spring phytoplankton blooms, on the prokaryotic dynamics in a natural setting.
Previous work at SPOT focused on protists and free-living prokaryotes and has found an
annually reoccurring seasonal pattern (Chow et al 2014, Countway et al 2010, Cram et al 2015,
Fuhrman et al 2006, Fuhrman et al 2015, Kim et al 2014). With network analysis, which detects
2
co-occurrence patterns through correlations, Cram et al (2015) found that there are two
modules of free-living prokaryotes in SPOT surface waters. Here, a module means a group of
prokaryotes that tend to co-occur throughout the time series. They found that these two
modules of free-living prokaryotes are negatively correlated with each other and may represent
oligotrophic- vs. opportunistic organisms, which oscillate over time in a “see-saw” fashion. In
addition, potential interactions between free-living prokaryotes and protists at SPOT were
investigated with empirical observations and experiments (Chow et al 2014, Cram et al 2016,
Steele et al 2011). For example, Steele et al (2011) found negative correlations between ciliates
and SAR11, actinobacteria, and flavobacteria, suggesting possible predation. However, these
findings were largely based on automated ribosomal intergenic spacer analysis (ARISA) and
terminal restriction fragment length polymorphism (T-RFLP) fingerprinting, which have reduced
resolution compared to current sequencing approaches, and the particle-associated
prokaryotes were not analyzed in these studies.
The free-living and particle-associated prokaryotes have physiological, genomic, and
phylogenetic differences. Salazar et al (2015) have found that particle attachment preference is
a phylogenetically conserved trait. For example, the Planctomycetaceae clade is particle-
associated, and the SAR202 clade is generally free-living. In addition, free-living prokaryotes,
dominated by groups such as SAR11, often have streamlined genomes that enable efficient
growth under oligotrophic conditions. Particle-associated prokaryotes (e.g., Bacteroidetes), on
the other hand, generally show more metabolic diversity to utilize different kinds of particulate
organic matter (Buchan et al 2014) and are obviously of particular significance during situations
like spring blooms (Francis et al 2021, Needham and Fuhrman 2016, Teeling et al 2012, Teeling
3
et al 2016). Due to these differences, both free-living and particle-associated prokaryotes
should be included in analyses that try to determine the overall roles of prokaryotes in marine
systems.
To more completely examine the microbial roles in the dynamics of this system, this
dissertation research investigated microbial communities from two size fractions (0.2-1 and 1-
80 μm) at SPOT over the period 2005-2018 using universal SSU rRNA sequencing with 3-domain
universal primers (515Y/926R). This primer pair amplifies both 16S and 18S rRNA genes
(Needham and Fuhrman 2016, Parada et al 2016), which means eukaryotes (including
phytoplankton, heterotrophs, and mixotrophs) as well as particle-associated or larger-celled
prokaryotes in the large size fraction are analyzed simultaneously. With this comprehensive
approach, the major components of the marine microbial food web and their potential
interactions were investigated.
For this dissertation research, I first validated how well the methodology (i.e., universal
SSU rRNA sequencing) recovers the true abundances of 16S and 18S rRNA genes, using mixed
16S and 18S mock communities (Chapter One). Then I characterized the spatiotemporal
dynamics of free-living (0.2-1.2 μm), particle-associated or large celled (1.2-80 μm) prokaryotes
and protistan communities throughout the water column at SPOT, showing the overall patterns
at all depths (Chapter Two). Finally, I focused more on the underlying driving factors of surface
microbial communities, using my own data plus a previously published viral metagenomic
dataset from the same seawater samples (Chapter Three). This time-series dataset allowed me
to examine how environmental conditions and biotic factors (i.e., viruses, phytoplankton, and
4
protists) influenced free-living and particle-associated prokaryotes and how warming event due
to El Niño during 2014-2015 affected prokaryotic populations (Chapter Three).
Introduction References
Buchan A, LeCleir GR, Gulvik CA, González JM (2014). Master recyclers: features and functions
of bacteria associated with phytoplankton blooms. Nature Reviews Microbiology 12: 686-698.
Chow C-ET, Kim DY, Sachdeva R, Caron DA, Fuhrman JA (2014). Top-down controls on bacterial
community structure: microbial network analysis of bacteria, T4-like viruses and protists. The
ISME journal 8: 816-829.
Countway PD, Vigil PD, Schnetzer A, Moorthi SD, Caron DA (2010). Seasonal analysis of
protistan community structure and diversity at the USC Microbial Observatory (San Pedro
Channel, North Pacific Ocean). Limnology and Oceanography 55: 2381-2396.
Cram JA, Xia LC, Needham DM, Sachdeva R, Sun F, Fuhrman JA (2015). Cross-depth analysis of
marine bacterial networks suggests downward propagation of temporal changes. The ISME
journal 9: 2573-2586.
Cram JA, Parada AE, Fuhrman JA (2016). Dilution reveals how viral lysis and grazing shape
microbial communities. Limnology and Oceanography 61: 889-905.
Doney SC, Ruckelshaus M, Emmett Duffy J, Barry JP, Chan F, English CA et al (2012). Climate
change impacts on marine ecosystems. Annual review of marine science 4: 11-37.
Francis TB, Bartosik D, Sura T, Sichert A, Hehemann J-H, Markert S et al (2021). Changing
expression patterns of TonB-dependent transporters suggest shifts in polysaccharide
consumption over the course of a spring phytoplankton bloom. The ISME Journal: 1-15.
5
Fuhrman JA, Hewson I, Schwalbach MS, Steele JA, Brown MV, Naeem S (2006). Annually
reoccurring bacterial communities are predictable from ocean conditions. Proceedings of the
National Academy of Sciences 103: 13104-13109.
Fuhrman JA, Cram JA, Needham DM (2015). Marine microbial community dynamics and their
ecological interpretation. Nature Reviews Microbiology 13: 133-146.
Fuhrman JA, Caron DA (2016). Heterotrophic planktonic microbes: virus, bacteria, archaea, and
protozoa. Manual of environmental microbiology: 4.2. 2-1-4.2. 2-34.
Gasol JM, Kirchman DL (2018). Microbial ecology of the oceans. John Wiley & Sons.
Halpern BS, Frazier M, Potapenko J, Casey KS, Koenig K, Longo C et al (2015). Spatial and
temporal changes in cumulative human impacts on the world’s ocean. Nature communications
6: 1-7.
Hoegh-Guldberg O, Bruno JF (2010). The impact of climate change on the world’s marine
ecosystems. Science 328: 1523-1528.
Kim DY, Countway PD, Jones AC, Schnetzer A, Yamashita W, Tung C et al (2014). Monthly to
interannual variability of microbial eukaryote assemblages at four depths in the eastern North
Pacific. The ISME journal 8: 515-530.
Nabe-Nielsen J, van Beest FM, Grimm V, Sibly RM, Teilmann J, Thompson PM (2018). Predicting
the impacts of anthropogenic disturbances on marine populations. Conservation Letters 11:
e12563.
Needham DM, Fuhrman JA (2016). Pronounced daily succession of phytoplankton, archaea and
bacteria following a spring bloom. Nature microbiology 1: 16005.
6
Parada AE, Needham DM, Fuhrman JA (2016). Every base matters: assessing small subunit rRNA
primers for marine microbiomes with mock communities, time series and global field samples.
Environmental microbiology 18: 1403-1414.
Salazar G, Cornejo-Castillo FM, Borrull E, Díez-Vives C, Lara E, Vaqué D et al (2015). Particle-
association lifestyle is a phylogenetically conserved trait in bathypelagic prokaryotes. Molecular
ecology 24: 5692-5706.
Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY et al (2011). Marine bacterial,
archaeal and protistan association networks reveal ecological linkages. The ISME journal 5:
1414-1425.
Teeling H, Fuchs BM, Becher D, Klockow C, Gardebrecht A, Bennke CM et al (2012). Substrate-
controlled succession of marine bacterioplankton populations induced by a phytoplankton
bloom. Science 336: 608-611.
Teeling H, Fuchs BM, Bennke CM, Krueger K, Chafee M, Kappelmann L et al (2016). Recurring
patterns in bacterioplankton dynamics during coastal spring algae blooms. elife 5: e11888.
7
Chapter 1: Comprehensive single-PCR 16S and 18S rRNA community analysis
validated with mock communities, and estimation of sequencing bias against
18S
Yi-Chun Yeh
1
, Jesse C. McNichol
1
, David M. Needham
1,2
, Erin B. Fichot
1
, Lyria Berdjeb
1
, and Jed
A. Fuhrman
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
2
Current address: GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany, 24148
Correspondence:
Yi-Chun Yeh
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Email: yichuny@usc.edu
Chapter one was previously published in Environmental Microbiology.
8
Abstract
Universal primers for SSU rRNA genes allow profiling of natural communities by
simultaneously amplifying templates from Bacteria, Archaea, and Eukaryota in a single PCR
reaction. Despite the potential to show relative abundance for all rRNA genes, universal primers
are rarely used, due to various concerns including amplicon length variation and its effect on
bioinformatic pipelines. We thus developed 16S and 18S rRNA mock communities and a
bioinformatic pipeline to validate this approach. Using these mocks, we show that universal
primers (515Y/926R) outperformed eukaryote-specific V4 primers in observed vs. expected
abundance correlations (slope=0.88 vs. 0.67-0.79), and mock community members with single
mismatches to the primer were strongly underestimated (3-8 fold). Using field samples, both
primers yielded similar 18S beta-diversity patterns (Mantel test, p<0.001) but differences in
relative proportions of many rarer taxa. To test for length biases, we mixed mock communities
(16S+18S) before PCR and found a two-fold underestimation of 18S sequences due to
sequencing bias. Correcting for the two-fold underestimation, we estimate that, in Southern
California field samples (1.2-80 m), there were averages of 35% 18S, 28% chloroplast 16S, and
37% prokaryote 16S rRNA genes. These data demonstrate the potential for universal primers to
generate comprehensive microbiome profiles.
Introduction
Bacteria, archaea, and eukaryotes make up dynamic, diverse communities that interact
with one another and their environment. Studying all these components simultaneously is
essential for understanding how the ecosystem functions as a whole (Fuhrman et al., 2015,
9
Needham et al., 2018, Chénard et al., 2019), though the individual components are mostly
studied separately, due in part to the perception that separate assays are required for each.
Since high-throughput DNA sequencing was introduced, SSU rRNA sequencing has been widely
used for analyzing microbial community structure — especially for prokaryotes by targeting the
16S rRNA gene (Sogin et al., 2006). Analyses focusing on eukaryotic communities with 18S rRNA
sequencing, however, are not as common partly because early sequencing lengths could not
fully capture diversity in longer hypervariable regions (Amaral-Zettler et al., 2009). With
advances in sequencing capacities, two regions (V4 and V9) have become commonly used for
planktonic eukaryotic community profiles (Amaral-Zettler et al., 2009, Stoeck et al., 2010,
Balzano et al., 2015, De Vargas et al., 2015).
Despite these methodological developments, the question of how well the entire
sequencing and analysis pipeline recovers the true abundance of rRNA genes found in the
natural community has received less attention. In pelagic marine environments, studies have
underscored the importance of careful primer design for accurately resolving natural
communities, e.g., correcting the severe underestimate of the SAR11 clade that occurred with
one of the most popular primers (Caporaso et al., 2012, Apprill et al., 2015, Parada et al., 2016).
In addition, validation and inter-comparison of primer performance has also been facilitated by
the development and application of microbial internal standards or “mock communities” to PCR
amplicon analysis (Wear et al., 2018) (hereafter “mocks”). The application of mocks to the PCR
amplification, sequencing, and analysis protocol has demonstrated that even well-designed
primers (515Y/926R vs. 515Y/806R) differ in terms of their ability to recover natural abundance
patterns (Parada et al., 2016). Including mocks in a sequencing run can also verify instrument
10
performance, thus avoiding improper ecological conclusions. For example, inclusion of mocks in
a previous study revealed that an unknown technical issue affecting a single sequencing run
inexplicably caused an entire major taxon to be missing in output data and altered abundances
of other taxa (Yeh et al., 2018). More recently, it has been shown that amplicon methods can
be made even more quantitative by the addition of internal DNA standards (i.e. added to
samples before extraction and purification of DNA; Lin et al. (2019)). This allows normalization
of amplicon data closer to true abundances found in seawater (except for lysis efficiency
variations) and was found to be consistent with other, extensively-validated methods (Lin et al.,
2019).
Bioinformatic methods used for amplicon sequence analysis have also evolved
considerably, with initial efforts focusing on how well algorithms resolve true biological
sequences by clustering sequences into operational taxonomic units (OTUs) at a certain
similarity threshold. This effort has culminated in the development of “denoising” algorithms
that are designed to recover true underlying biological sequences to the individual base (i.e.
amplicon sequence variants; ASVs) by endeavoring to eliminate sequencing and PCR errors
(Eren et al., 2015, Callahan et al., 2016, Amir et al., 2017). Moreover, unlike OTU clustering that
analyze sequences into often vaguely-defined or “fuzzy” units that change study-by-study,
denoising methods aim to better account for batch effects across multiple sequencing runs, and
are able to analyze sequences either sample-by-sample (Deblur) or run-by-run (DADA2), which
greatly reduces computational demand compared to OTU clustering algorithms that analyze
sequences all together (Callahan et al., 2016).
11
Collectively, these studies show how PCR amplicons can generate quantitative data that
allows microbial community composition to be measured alongside other oceanographic
variables. However, choosing an appropriate sequencing strategy remains a significant
challenge given the diverse primers and sequencing technologies currently available. In order to
maximize overall utility, it is highly desirable to keep costs low while generating data with high
phylogenetic resolution. Parada et al. (2016) have previously described a universal primer set
(515Y/926R, modified from Quince et al. (2011)) that simultaneously amplifies 16S and 18S
rRNA in a single PCR reaction. Because of their universal nature, these primers measure both
eukaryotes and prokaryotes and can provide insights into processes such as predation,
parasitism, and mutualism (Needham & Fuhrman, 2016, Needham et al., 2018).
However, analyzing data generated from the universal 515Y/926R primer set has several
potential challenges. First, mixed 16S and 18S amplicon sequences present bioinformatic
challenges since the two types of amplicons must be analyzed differently. This is because
current Illumina read lengths are not long enough to allow the forward and reverse reads to
overlap for the longer 18S amplicon (575-595 bp). If they do not overlap by at least 12 bases
(according to standard methods), they cannot be merged, and if they cannot be merged, the
entire amplicon cannot be generated and analyzed as is typical for 16S amplicon analysis.
Second, PCR and sequencing both discriminate against longer amplicons (Kittelmann et al.,
2013), yet the degree of PCR and sequencing biases against longer 18S amplicons is unknown.
These biases can potentially be detected via mock community analysis, specifically collections
of known 16S or 18S rRNA gene fragments (Bradley et al., 2016, Parada et al., 2016, Needham
12
et al., 2017, Wear et al., 2018, Catlett et al., 2020). Yet to our knowledge, there have not been
tests with mixed mock communities consisting of both 16S and 18S rRNA genes.
In this study, we present results from mock communities designed to validate the
515Y/926R primer set with particular emphasis on its performance with 18S sequences in
comparison to commonly-used 18S-specific primer sets, V4F/V4R and V4F/V4RB (Stoeck et al.,
2010, Balzano et al., 2015). We also present a bioinformatics workflow designed for mixed 16S
and 18S amplicons that generates ASVs differing by as little as a single base, and reproducibly
recovers the exact known sequences from the mock communities. This workflow, which uses
common tools such as cutadapt (Martin, 2011), bbtools
(http://sourceforge.net/projects/bbmap/), DADA2 (Callahan et al., 2016), deblur (Amir et al.,
2017) and QIIME 2 (Bolyen et al., 2018), simplifies sequence analysis for mixed 16S and 18S
amplicons. We also rigorously examined biases between 16S and 18S amplicons at the PCR and
sequencing steps. Lastly, we analyzed natural marine samples collected from the San Pedro
Ocean Time-series (SPOT) using the validated workflow to compare the performance of
different primer sets.
Results and Discussion
Comparison of universal primers (515Y/926R) and eukaryote-specific primers (V4F/V4R and
V4F/V4RB) with 18S mock communities
Our 18S mock communities are mixtures of a number of nearly full-length 18S rRNA
genes designed to represent the major eukaryotic groups found in marine environments.
Among them, a clone in the Prymnesiales (haptophyta) has a single mismatch to the reverse
13
primer V4R (at the 3’ end), and three Dinophyta species (Lingulodinium, Dino-Group-II_b, and
Gymnodinium) have a single mismatch to the reverse primer 926R. Separate mock communities
were developed with members in equal or staggered concentrations to allow for deeper
assessment of PCR, sequencing, or bioinformatic pipelines. As the abundances of taxa in mock
communities are known a priori, they can be used to test which primer set and denoising
algorithm recovers the community composition most accurately.
For 18S even mock communities, V4F/V4R (Stoeck et al., 2010) underestimated
Prymnesiales (haptophyta) by ~4-fold, presumably because of a single base mismatch on the 3’
end of the reverse primers (Fig. 2a). On the other hand, the V4F/V4RB (Balzano et al., 2015)
primers that do not have any mismatches overestimated Prymnesiales (haptophyta) by ~4-fold
(Fig. 2b) while the 515Y/926R primers produced a community composition similar to that
expected (Fig. 2c).
For 18S staggered mock communities, similar results were found. V4F/V4R
underestimated Prymnesiales (haptophyta) by ~5-fold (Fig. 3a), and V4F/V4RB overestimated
Prymnesiales (haptophyta) by ~3-fold (Fig. 3b). 515Y/926R underestimated three Dinophyta
species (with single primer mismatches) to varying degrees (Lingulodinium, ~8-fold; Dino-
Group-II_b, ~3-fold; Gymnodinium, ~4-fold) (Fig. 3c). However, there was no relationship
between degree of underestimation and locations of primer mismatch (Lingulodinium, -11
bases from the 3’ end; Dino-Group-II_b, -12 bases from the 3’ end; Gymnodinium, -2 bases from
the 3’ end).
Overall, the observed 18S mock community composition was more similar to the
expected with 515Y/926R (slope=0.88, r
2
=0.76), especially after removing three mismatched
14
Dinophyta species (slope=0.97, r
2
=0.97), followed by V4F/V4RB (slope=0.79, r
2
=0.87) and
V4F/V4R (slope=0.67, r
2
=0.65) (Fig. 4). With mixed mock communities, 16S and 18S mock
communities were also recovered accurately (Fig. S1). These findings, together with the results
of Parada et al. (2016), indicate that 515Y/926R primers recover both 16S and 18S mock
communities quantitatively regardless of whether examined as separate or in combination.
In addition, our results indicated a 3-8 fold underestimation when there was a primer
mismatch. The same issue was previously found with the original EMP primers (515C/806R, V4)
that underestimated SAR11 by 8-fold (Apprill et al., 2015). Bru et al. (2008) found that
underestimation generally increased as mismatches were closer to the 3’ end of the primer, yet
there was no predictable relationship between the position of mismatch and the degree of
underestimation, which is consistent with our findings. The worst mismatches are at the 3’ end
of the primers, as occurs with the V4R primer (Stoeck et al., 2010) for many common
haptophytes. This observation was the rationale for the creation of the V4RB primer with a 3’
degeneracy (Balzano et al., 2015) that greatly improves recovery of haptophytes that are often
dominant in seawater (Berdjeb et al., 2018). However, we found that, instead of recovering 18S
mock communities as expected, V4F/V4RB overestimated haptophytes by 3-4 fold. Since there
was no primer mismatch bias and all the amplicons were analyzed in the same sequencing run,
a possible source of such bias might be differences in PCR protocols (1-step PCR with
515Y/926R vs. 2-step PCR with V4F/V4RB), but it is not understood why such a strong positive
bias would occur with haptophytes and not the other taxa we examined.
Estimation of PCR and sequencing bias against 18S amplicons using mixed mock communities
15
To test for length-based PCR bias against longer 18S reads, 18S mock communities were
mixed with 16S mock communities in equimolar amounts prior to PCR amplification. The mixed
mock communities were then PCR amplified, products analyzed on an Agilent 2100 Bioanalyzer,
and then sequenced (Fig. 1). Based on bioanalyzer traces that separately quantify the
abundance of 16S and 18S amplicons, there was little systematic PCR bias (about 0.7-1.3-fold)
against 18S PCR products when using the 18S even mock communities that have no primer
mismatches to 515Y/926R (Fig. 5, circle and asterisk dots, x axis only). When the 18S staggered
mocks were included (with three Dinophyta species that have one mismatch to the reverse
primer, 926R), there was considerably more PCR bias, up to 3-fold (Fig. 5, triangle and square
dots, x axis only). The mixed amplicons were then sequenced and split into 16S and 18S reads
pools by an in silico sorting step. By comparing ratios in the bioanalyzer outputs and the raw
read counts after in silico sorting, we observed that there was typically a 2-fold sequencing
discrimination against 18S reads (Fig. 5), which is consistent regardless of community types
(even, staggered) and sequencing runs. That suggests sequencing bias due to length differences
is a consistent property of the Illumina sequencing platform, yet PCR bias due to primer
mismatches is much less predictable. Thus, an evaluation of primer coverage across three
domains, in actual field samples, may help better account for the PCR bias. Parada et al. (2016)
found that 515Y/926R perfectly matches 86% of eukaryotes, 87.9% of bacteria, and 83.9% of
archaea in the SILVA database, but we note that in actual practice the extent of mismatches in
field samples depends on the particular taxa present and their proportions (McNichol et al.,
2020). We should also note that our 18S mock communities are very rich in alveolates such as
dinoflagellates (3 of 10 in even, 7 of 16 in staggered) that tend to have mismatches to the
16
515Y/926R primers; hence they probably overestimate the biases expected in most field
samples.
Comparison of universal primers (515C/926R) and eukaryote-specific primers (V4F/V4RB)
with field samples
A previously published daily time series was used to compare outcomes with different
primers amplifying either 16S+18S or 18S alone from the same DNA extracts (Needham &
Fuhrman, 2016, Berdjeb et al., 2018). This time series covered a spring bloom through summer
at the San Pedro Ocean Time-Series off of Southern California, thus the comparison was
evaluated under a wide range of environmental conditions and biological diversity.
We note that the universal primer in that paper (515C) was slightly different from that
tested here with 18S mock communities (the 4th base on the 5’ end of the forward primer was
C instead of Y, where Y is a mixture of C and T). We thus used a recently reported dataset
(McNichol et al., 2020) to compare the primer coverage. McNichol et al. (2020) have compared
515Y with SSU rRNA sequences retrieved from several marine metagenomic datasets
(BioGEOTRACES, Malaspina, MBARI, and TARA). Their results showed that 88% of eukaryotic
18S rRNA sequences and 99% of cyanobacteria and chloroplast 16S rRNA sequences perfectly
matched 515Y. We further examined where these sequences perfectly matched 515C or 515T.
The results showed that >97% of the sequences perfectly matched 515C, and the incremental
improvement of also considering a T at that position only yielded 0 to 0.1% additional perfect
matches (Table S3), suggesting the results from both primers should be comparable. Note that
17
the 515Y primer simply adds a single degeneracy (primer versions with a C and T at that
position are equally present), so will perfectly match better than 515C alone.
To examine how alpha diversity differed with primer sets, we first rarefied sequences to
the sample with fewest 18S sequences (1155 reads) and repeated this 100 times for each
primer set. The mean rarefied richness (i.e., number of observed ASVs) of the samples amplified
with V4F/V4RB was significantly higher than those amplified with 515C/926R (Welch's t-test,
p<0.001; 30-217 for 515C/926R vs. 104-318 for V4F/V4RB; Fig. 6). The mean rarefied Shannon
index values, however, were similar between these primer sets (Welch’s t-test, p>0.05; 0.97-
4.79 for 515C/926R vs. 1.70-5.05 for V4F/V4RB; Fig. 6). We next evaluated the primer effects on
beta diversity; cluster analysis showed that 515C/926R and V4F/V4RB detected a significantly
similar temporal variation (Mantel test, r=0.95, p<0.001), i.e., similar overall clustering, in
community composition. starting from spring bloom in early March, followed by a post-bloom
period in late March, a transition during April and May to summer in July (Fig. 7).
To test the extent that these two primer sets (515C/926R and V4F/V4RB) amplify similar
communities at the ASV level, the 220 bp overlapping region of the forward reads was used for
detailed examination (noting the forward primers are offset by 4 bases, with V4F going 4 bases
further towards the 3’ end vs. 515C). After rarefying samples amplified with 515C/926R and
V4F/V4RB to the sample with fewest 18S reads (1681 reads) 100 times, on average 2741 ASVs
were detected across the time series, and 1131 ASVs were shared between the primer sets (Fig.
S2). These shared ASVs contributed to 80-100% of the total sequences in the communities
amplified with 515C/926R and 87-97% of sequences in communities amplified with V4F/V4RB.
A total of 254 ASVs were unique to the samples amplified with 515C/926R, and 1412 ASVs were
18
only found in the samples amplified with V4F/V4RB (Fig. S2). A direct comparison showed
515C/926R and V4F/V4RB detected the same abundant ASVs with similar relative abundances,
while there were more differences in rare ASVs, with the V4F/V4RB typically detecting more of
these taxa (Fig. 8 and Fig. S3). The relative abundances of ASVs missed by 515C/926R were all
found to be rare (less than 1.5%) by V4F/V4RB, whereas the relative abundances of some ASVs
missed by V4F/V4RB were more abundant (more than 5%) by 515C/926R (Fig. 8). A taxonomic
comparison shows that under the same sampling effort, there were differences at the order
level between primer sets (Fig. S2). Three orders (Cryptomonas, Discicristata, MAST-6) were
uniquely found in the samples amplified with 515C/926R, and 2 orders (Coccolithales,
Hemiselmis) were unique to the samples amplified with V4F/V4RB. Thus, while the V4 primers
tended to yield more of the rarer ASVs, neither primer set was much more comprehensive
taxonomically than the other, and the two together yielded a broader overall diversity than
either alone.
Clustering of all these overlapping sequences analyzed together showed a similar spring-
summer transition (Fig. S4), with samples usually clustering by date rather than by primer pair.
However, individual dates typically had a 30-50% Bray-Curtis distance between data analyzed
by the two primers, indicating significant differences in quantitative composition. While lacking
independent knowledge of the actual taxa distribution, we note observed distributions within
mock communities more closely matched the expected outcome when using the universal
primers, especially for sequences with no mismatches (Fig. 4).
Our results indicated that beta diversity may be less sensitive to primer bias than alpha
diversity, which has been previously reported (Caporaso et al., 2012, He et al., 2013, Tremblay
19
et al., 2015). Comparing the 220 bp overlapping region amplified by both primer sets
demonstrated that the variation in community composition due to primer sets comes mostly
from the rare taxa, perhaps in part from PCR and/or sequencing errors (He et al., 2013).
Notably, the V4F/V4RB amplification requires a two-step PCR amplification, with more
opportunities for errors and/or biases (Yu et al., 2015).
The application of universal primers (515Y/926R) to three-domain amplicon analysis
Quantitative 16S/18S biases determined by mixed mock community analyses was
“corrected” in field samples to make current best-guess estimates of the true relative
proportions of 18S, chloroplast, and prokaryotic gene abundances. With the mock
communities, we found an overall 2-fold bias against 18S at the sequencing step, and we can
use that as a starting point for making corrections. Applying this 2-fold bias, data from the
protist-enriched 1.2-80 µm fraction of the spring-summer SPOT time series samples would yield
an average of 35% 18S, 28% chloroplast 16S and 37% prokaryote 16S rRNA gene amplicons (Fig.
S5) — in other words, an almost even split in these three categories. Future work will help
determine to what extent the 2-fold bias applies in general, but because some 18S sequences
are much longer than others (Obiol et al., 2020); it is quite possible the biases are worse in
some samples and for some taxa, compared to others. A direct measure of average biases from
each sample should be possible by quantifying DNA in the 16S and 18S amplicons before
sequencing and then comparing the actual 16S and 18S sequences in the final outputs. The read
composition (Fig. S5) and rarefaction curves (Fig. S6) constructed for each field sample together
indicated that for the 1.2-80 m size fraction collected from the SPOT location, about 15,000-
20
70,000 total sequences are required to effectively sample the true richness of marine
planktonic prokaryotes, phytoplankton, and heterotrophic eukaryotes in a single PCR reaction
(Fig. S6, detailed calculation is described in figure legend). For studies collecting whole seawater
(>0.2 µm) that include a larger fraction of prokaryotes, we also estimated the required
sequencing depth using a previously reported dataset from the BioGEOTRACES GA03 trans-
Atlantic expedition (Biller et al., 2018, McNichol et al., 2020). The McNichol et al. study
amplified DNA collected from 100ml of whole seawater using the universal primers
(515Y/926R), analyzed using DADA2. With the read composition and the rarefaction curves of
the Atlantic euphotic seawater samples, we calculated that about 28,000-110,000 total
sequences are required to capture the diversity of prokaryotes and eukaryotes (Fig. S7, detailed
calculation is described in figure legend). However, the number would vary with different
locations, size fractions, sampling volumes, extraction methods, and analysis pipelines.
Conclusions and Future Prospects
This study shows that the three-domain universal primer (515Y/926R) can resolve
community composition for 16S and 18S rRNA in a single PCR reaction, with biases we could
quantify and manage. We were able to investigate the biases relevant to the use of these
primers in a natural setting through the use of 18S mock communities, separately and in
concert with 16S mocks. With field samples, the universal primers (515C/926R) detected similar
community composition and beta-diversity patterns as commonly-used eukaryote-specific
primers (V4F/V4RB). However, the abundance of several taxa varied with primer set (notably
21
with the V4F/V4RB primers yielding more rare eukaryotic taxa), though without independent
data, we cannot assume that reporting more taxa is necessarily more accurate.
Comprehensive simultaneous three domain analysis has three potential advantages
over single domain analyses for determining microbial community composition. First, there is
the obvious advantage of directly comparing 16S and 18S sequence abundances, which can
now be corrected (to some extent) for biases as we have described. Even without absolute
corrected gene counts, results allow for consistent comparisons of ratios between all taxa,
across samples and even sample types; i.e. even without bias corrections, the relative ratios
are robust (McLaren et al., 2019). Second, it provides an independent analysis of phototrophic
protists. As chloroplast 16S rRNA gene databases are constantly growing, the chloroplast 16S
genes amplified with 515Y/926R can help identify (or verify the identities of) phototrophic
eukaryotes, providing a way to characterize phytoplankton communities independent of 18S
and known wide variability in 18S per-cell copy number variations, which range over 10,000
fold (see also Needham & Fuhrman (2016)). Chloroplast 16S data may more closely reflect
phytoplankton biomass distributions than do 18S data, and are being increasingly used in
biological oceanographic studies, sometimes with higher phylogenetic resolution than 18S
(Needham & Fuhrman, 2016, Bennke et al., 2018, Bolaños et al., 2020, Choi et al., 2020). Lastly,
a single universal amplification reduces some major costs associated with amplicon analysis. As
sequencing continues to drop in price per base, the major expense per sample comes from PCR
enzymes, clean-up beads, and labor required for quantification, dilution, gel imaging, etc.
Compared with single-PCR 16S and 18S rRNA community analysis, using separate primers for
16S and 18S assays (noting V4 18S needs 2-step PCR) increases amplicon library preparation
22
costs 2-3 fold, which can exceed the costs of increased coverage in a single universal assay to
yield the desired number of 18S sequences. Overall, this method provides a feasible path for
making quantitative rRNA gene-based assessments of microbial communities across three
domains using amplicon data, when proper validation such as from mock communities is
employed.
Materials and Methods
Mock community preparation
For 16S mock communities, nearly full-length marine 16S rRNA genes were prepared as
previously described (Parada et al., 2016, Yeh et al., 2018). For 18S mock communities, nearly
full-length 18S rRNA clone libraries were prepared from the large size fraction (1.2-80 µm) of
seawater sample collected from the SPOT location. The detailed preparation is described in the
supplementary information. To mimic natural marine communities consisting of both
eukaryotes and prokaryotes, 16S and 18S mock communities were mixed in four combinations
(Fig. 1). Each mixed mock community was pooled at equal molarity after taking lengths into
account; the average length of 16S mocks is 1425 bp, and the average length of 18S mocks is
1770 bp, so resulting amplicons are internal to these lengths and therefore shorter.
Field samples
A daily-to-weekly time series used samples collected from the 5m depth at the San
Pedro Ocean Time-series (SPOT) location from March 12 to July 26 in 2011. The methods and
sequencing data have been previously published under accession numbers PRJEB12215
23
(universal primers) and PRJEB10834 (18S primers) (Needham & Fuhrman, 2016, Berdjeb et al.,
2018).
PCR and sequencing
To pool multiple samples in a single Illumina paired-end sequencing platform, a dual-
index sequencing strategy was used with forward primer (A-I-NNNN-barcode-loci specific
forward primer) and reverse primer (A-index-I-loci specific reverse primer), where A is the
Illumina sequencing adapter, I is the Illumina primer, and barcode and index are sample specific
tags (5 bp barcode and 6 bp index). A detailed protocol is available at
doi.org/10.17504/protocols.io.vb7e2rn. To compare 16S/18S universal primers with eukaryote-
specific primers, mock communities were amplified with 515Y (5’-GTGYCAGCMGCCGCGGTAA-
3’) and 926R (5’-CCGYCAATTYMTTTRAGTTT-3’), V4F (5’-CCAGCASCYGCGGTAATTCC-3’) and V4R
(5’-ACTTTCGTTCTTGATYRA-3’), and V4F and V4RB (5’-ACTTTCGTTCTTGATYRR-3’) (Stoeck et al.,
2010, Balzano et al., 2015, Parada et al., 2016). The only difference between V4F/V4R and
V4F/V4RB is the last base on the 3’ end of the reverse primer (A to R), which corrects a
mismatch, allowing V4F/V4RB amplify haptophytes and some other taxa better (Balzano et al.,
2015). The amplification conditions for each primer pair are described in the supplementary
information. Purified PCR products were quantified with PicoGreen and sequenced on Illumina
HiSeq 2500 in PE250 mode and MiSeq PE300.
In silico processing of amplicon sequences
24
Sequences were demultiplexed by forward barcodes and reverse indices allowing no
mismatches using QIIME 1.9.1 split_libraries_fastq.py. The fully demultiplexed forward and
reverse sequences were then split into per-sample fastq files using QIIME 1.9.1
split_sequence_file_on_sample_ids.py and submitted to the EMBL database under accession
number PRJEB35673.
Scripts necessary to reproduce the following analysis are available at
github.com/jcmcnch/eASV-pipeline-for-515Y-926R. Demultiplexed amplicon sequences were
trimmed with cutadapt, discarding any sequence pairs not containing the forward or reverse
primer. We allowed an error rate of up to 20% to retain amplicons with mismatches to the
primer. Similar to the workflow proposed Mike Lee
(https://astrobiomike.github.io/amplicon/16S_and_18S_mixed), mixed amplicon sequences
were split into 16S and 18S pools using bbsplit.sh from the bbtools package
(http://sourceforge.net/projects/bbmap/) against curated 16S/18S databases derived from
SILVA 132 (Quast et al., 2013) and PR2 (Guillou et al., 2013). The splitting databases used are
available at https://osf.io/e65rs/. The two amplicon categories were then analyzed in parallel
using qiime2 (Bolyen et al., 2019) or DADA2 implemented as the standalone R package
(Callahan et al., 2016) as described in the supplementary information. The results were all
based on DADA2 (QIIME 2) that worked best for each type of sequences after comparing
several denoising algorithms (a detailed comparison is described in the supplementary
information).
PCR and sequencing bias estimation
25
16S and 18S mixed mock communities amplified with 515Y/926R were run on an Agilent
2100 Bioanalyzer to quantify concentrations of 16S and 18S PCR products in each mixed mock
community. Amplicons were analyzed with the High-sensitivity DNA assay kit according to the
manufacturer’s instructions. Due to the length differences between 16S and 18S amplicons, the
concentrations of amplicons were measured by quantifying peak areas on an Agilent
2100 Bioanalyzer using automatic peak detection without altering the instrument-determined
baseline. The 16S:18S ratio of molarity was used to determine PCR bias. Sequence pre-
processing (i.e. bbsplit.sh) split reads into 16S and 18S pools. The 16S:18S ratio based on the
number of reads was used to determine sequencing and PCR bias. The slope of the line derived
from plotting the 16S:18S ratio from Bioanalyzer traces against 16S:18S ratio based on the
number of reads after the bbsplit step was used to define sequencing bias.
Acknowledgments
This work was supported by NSF OCE 1737409, Gordon and Betty Moore Foundation
Marine Microbiology Initiative grant 3779, and Simons Foundation Collaboration on
Computational Biogeochemical Modeling of Marine Ecosystems (CBIOMES) grant 549943. We
thank Julio C. Ignacio-Espinoza, Alexandra Santora, Colette Fletcher-Hoppe, Delaney Nolin, Jake
Weissman, Melody Aleman, Shengwei Hou, and Sarah Laperriere for help with sampling,
laboratory work, and feedback on the manuscript.
26
References
Amaral-Zettler LA, McCliment EA, Ducklow HW & Huse SM (2009) A method for studying
protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-
subunit ribosomal RNA genes. PloS one 4: e6372.
Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Xu ZZ, Kightley EP, Thompson LR,
Hyde ER & Gonzalez A (2017) Deblur rapidly resolves single-nucleotide community sequence
patterns. MSystems 2.
Apprill A, McNally S, Parsons R & Weber L (2015) Minor revision to V4 region SSU rRNA 806R
gene primer greatly increases detection of SAR11 bacterioplankton. Aquatic Microbial Ecology
75: 129-137.
Balzano S, Abs E & Leterme SC (2015) Protist diversity along a salinity gradient in a coastal
lagoon. Aquatic Microbial Ecology 74: 263-277.
Bennke C, Pollehne F, Müller A, Hansen R, Kreikemeyer B & Labrenz M (2018) The distribution
of phytoplankton in the Baltic Sea assessed by a prokaryotic 16S rRNA gene primer system.
Journal of Plankton Research 40: 244-254.
Berdjeb L, Parada A, Needham DM & Fuhrman JA (2018) Short-term dynamics and interactions
of marine protist communities during the spring–summer transition. The ISME journal 12: 1907.
Biller SJ, Berube PM, Dooley K, Williams M, Satinsky BM, Hackl T, Hogle SL, Coe A, Bergauer K &
Bouman HA (2018) Marine microbial metagenomes sampled across space and time. Scientific
data 5: 1-7.
27
Bolaños LM, Karp-Boss L, Choi CJ, Worden AZ, Graff JR, Haëntjens N, Chase AP, Della Penna A,
Gaube P & Morison F (2020) Small phytoplankton dominate western North Atlantic biomass.
The ISME Journal 1-12.
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet C, Al-Ghalith GA, Alexander H, Alm EJ,
Arumugam M & Asnicar F (2018) QIIME 2: Reproducible, interactive, scalable, and extensible
microbiome data science. p.^pp. PeerJ Preprints.
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ,
Arumugam M & Asnicar F (2019) Reproducible, interactive, scalable and extensible microbiome
data science using QIIME 2. Nature biotechnology 1.
Bradley IM, Pinto AJ & Guest JS (2016) Design and evaluation of Illumina MiSeq-compatible, 18S
rRNA gene-specific primers for improved characterization of mixed phototrophic communities.
Appl Environ Microbiol 82: 5878-5891.
Bru D, Martin-Laurent F & Philippot L (2008) Quantification of the detrimental effect of a single
primer-template mismatch by real-time PCR using the 16S rRNA gene as an example. Appl
Environ Microbiol 74: 1660-1663.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA & Holmes SP (2016) DADA2: high-
resolution sample inference from Illumina amplicon data. Nature methods 13: 581-583.
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J,
Fraser L & Bauer M (2012) Ultra-high-throughput microbial community analysis on the Illumina
HiSeq and MiSeq platforms. The ISME journal 6: 1621-1624.
28
Catlett D, Matson PG, Carlson CA, Wilbanks EG, Siegel DA & Iglesias-Rodriguez MD (2020)
Evaluation of accuracy and precision in an amplicon sequencing workflow for marine protist
communities. Limnology and Oceanography: Methods 18: 20-40.
Chénard C, Wijaya W, Vaulot D, dos Santos AL, Martin P, Kaur A & Lauro FM (2019) Temporal
and spatial dynamics of Bacteria, Archaea and protists in equatorial coastal waters. Scientific
Reports 9: 1-13.
Choi CJ, Jimenez V, Needham D, Poirier C, Bachy C, Alexander H, Wilken S, Chavez F, Sudek S &
Giovannoni SJ (2020) Seasonal and geographical transitions in eukaryotic phytoplankton
community structure in the Atlantic and Pacific Oceans. Frontiers in microbiology 11: 2187.
De Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le Bescot N &
Probert I (2015) Eukaryotic plankton diversity in the sunlit ocean. Science 348: 1261605.
Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH & Sogin ML (2015) Minimum entropy
decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker
gene sequences. The ISME journal 9: 968-979.
Fuhrman JA, Cram JA & Needham DM (2015) Marine microbial community dynamics and their
ecological interpretation. Nature Reviews Microbiology 13: 133-146.
Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, Boutte C, Burgaud G, De Vargas C &
Decelle J (2013) The Protist Ribosomal Reference database (PR2): a catalog of unicellular
eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic acids research 41:
D579-D604.
29
He Y, Zhou B-J, Deng G-H, Jiang X-T, Zhang H & Zhou H-W (2013) Comparison of microbial
diversity determined with the same variable tag sequence extracted from two different PCR
amplicons. BMC microbiology 13: 1-8.
Kittelmann S, Seedorf H, Walters WA, Clemente JC, Knight R, Gordon JI & Janssen PH (2013)
Simultaneous amplicon sequencing to explore co-occurrence patterns of bacterial, archaeal and
eukaryotic microorganisms in rumen microbial communities. PloS one 8: e47879.
Lin Y, Gifford S, Ducklow H, Schofield O & Cassar N (2019) Towards quantitative microbiome
community profiling using internal standards. Appl Environ Microbiol 85: e02634-02618.
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing
reads. EMBnet journal 17: 10-12.
McLaren MR, Willis AD & Callahan BJ (2019) Consistent and correctable bias in metagenomic
sequencing experiments. Elife 8: e46923.
McNichol JC, Berube PM, Biller SJ & Fuhrman JA (2020) Evaluating and Improving SSU rRNA PCR
Primer Coverage via Metagenomes from Global Ocean Surveys. bioRxiv
https://doi.org/10.1101/2020.1111.1109.375543.
McNichol JC, Berube PM, Biller SJ & Fuhrman JA (2020) Evaluating and Improving SSU rRNA PCR
Primer Coverage via Metagenomes from Global Ocean Surveys. bioRxiv.
Needham DM & Fuhrman JA (2016) Pronounced daily succession of phytoplankton, archaea
and bacteria following a spring bloom. Nature microbiology 1: 16005.
Needham DM, Sachdeva R & Fuhrman JA (2017) Ecological dynamics and co-occurrence among
marine phytoplankton, bacteria and myoviruses shows microdiversity matters. The ISME
Journal 11: 1614-1629.
30
Needham DM, Fichot EB, Wang E, Berdjeb L, Cram JA, Fichot CG & Fuhrman JA (2018) Dynamics
and interactions of highly resolved marine plankton via automated high-frequency sampling.
The ISME journal 12: 2417.
Obiol A, Giner CR, Sánchez P, Duarte CM, Acinas SG & Massana R (2020) A metagenomic
assessment of microbial eukaryotic diversity in the global ocean. Molecular Ecology Resources
20.
Parada AE, Needham DM & Fuhrman JA (2016) Every base matters: assessing small subunit
rRNA primers for marine microbiomes with mock communities, time series and global field
samples. Environmental microbiology 18: 1403-1414.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J & Glöckner FO (2013) The
SILVA ribosomal RNA gene database project: improved data processing and web-based tools.
Nucleic acids research 41: D590-D596.
Quince C, Lanzen A, Davenport RJ & Turnbaugh PJ (2011) Removing noise from pyrosequenced
amplicons. BMC bioinformatics 12: 38.
Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM & Herndl GJ (2006)
Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the
National Academy of Sciences 103: 12115-12120.
Stoeck T, Bass D, Nebel M, Christen R, Jones MD, Breiner HW & Richards TA (2010) Multiple
marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic
community in marine anoxic water. Molecular ecology 19: 21-31.
Tremblay J, Singh K, Fern A, Kirton ES, He S, Woyke T, Lee J, Chen F, Dangl JL & Tringe SG (2015)
Primer and platform effects on 16S rRNA tag sequencing. Frontiers in microbiology 6: 771.
31
Wear EK, Wilbanks EG, Nelson CE & Carlson CA (2018) Primer selection impacts specific
population abundances but not community dynamics in a monthly time-series 16S rRNA gene
amplicon analysis of coastal marine bacterioplankton. Environmental microbiology 20: 2709-
2726.
Yeh Y-C, Needham DM, Sieradzki ET & Fuhrman JA (2018) Taxon Disappearance from
Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every
Sequencing Run. MSystems 3: e00023-00018.
Yu G, Fadrosh D, Goedert JJ, Ravel J & Goldstein AM (2015) Nested PCR biases in interpreting
microbial community structure in 16S rRNA gene sequence datasets. PloS one 10: e0132253.
32
Figures
Figure 1. Experimental design. 8 mock communities were amplified using the 515Y/926R
primers. The quantity of amplicon DNA was measured with a Bioanalyzer, allowing
quantification of PCR bias against longer 18S amplicons. After sequencing, 16S and 18S reads
were then separated through an in-silico sorting step, and the number of 16S and 18S reads
were counted to quantify the sequencing bias against 18S. The 16S and 18S reads were then
denoised separately using DADA2.
33
Figure 2. Comparison of the proportions added (expected) vs. proportions observed in the
sequence outputs of even 18S mock communities amplified with V4F/V4R (a), V4F/V4RB (b),
and 515Y/926R (c).
34
Figure 3. Comparison of the proportions added (expected) vs. proportions observed in the
sequence outputs of staggered 18S mock communities amplified with V4F/V4R (a), V4F/V4RB
(b), and 515Y/926R (c). The inset in (c) shows re-normalized staggered 18S mock community
amplified with 515Y/926R after removing three mismatched Dinophyta species.
35
Figure 4. Expected staggered 18S mock community abundance of each component plotted
against observed staggered 18S mock community abundance from samples amplified with
different primers pairs (a), and excluding clone members that have mismatches on the given
primer pairs (b). Note the near-perfect slope (0.97) and r
2
(0.97) of the 515Y/926R primer pair
for clones with no mismatches.
36
Figure 5. Comparison of PCR and sequencing biases among four mixed 16S and 18S mock
communities, each combined in 1:1 molar ratios. The X axis shows ratios in the PCR products,
and the Y axis shows the ratios in the final sequences, including biases from PCR plus
sequencing. Hypothetically, if there is no bias, all the mixed mocks would be located at a single
point (1,1). If there is only PCR bias, all the data points would be at the one-to-one line. If there
are PCR and sequencing bias, all the data points would be located above the one-to-one line
(gray area). The slope indicates the sequencing bias. The data points all occur above the dashed
1:1 line, indicating most biases are from sequencing. Note for 18S even mocks (circle and star),
none of which have primer mismatches, the PCR products have a bioanalyzer output ratio near
1, indicating little PCR bias. The staggered 18S mocks (triangle and square) include 3 members
with primer-template mismatches and correspondingly more PCR bias visible on the x axis. In all
cases the final reads show about 2-fold more bias than the PCR biases alone, suggesting a 2-fold
sequencing bias against 18S.
37
Figure 6. The mean rarefied richness (Number of ASVs, left,) and Shannon Index (Hʹ, right) of
eukaryotic communities from a March-July daily time series off of Southern California.
Sequences from each sample were rarefied to the sample with fewest 18S sequences (1681
reads) and repeated 100 times to obtain mean rarefied richness.
38
Figure 7. Bootstrapped average-linkage clustering of eukaryotic communities from a March-July
time series off of Southern California. Eukaryotic communities were amplified by universal
primers (left) and eukaryote-specific primers (right), denoised using DADA2, and then clustered
based on the Bray-Curtis dissimilarity. The crossing lines indicate samples that shifted in
clustering order; note that shifts are generally in deep branches and do not greatly change the
overall community clustering patterns. Different colors represent different clusters. Sampling
dates shown as month/day.
39
Figure 8. The mean relative abundances of each 18S ASV amplified with V4F/V4RB plotted
against the relative abundances of the same ASV amplified with 515C/926R, based only on the
220 bp overlapping region from universal (515C/926R) and eukaryote-specific primers
(V4F/V4RB). Sequences from each sample were rarefied to the sample with fewest 18S reads
(1681 reads) repeated 100 times.
40
Supplementary information
Comprehensive single-PCR 16S and 18S rRNA community analysis validated with mock
communities, and estimation of sequencing bias against 18S
Yi-Chun Yeh
1
, Jesse C. McNichol
1
, David M. Needham
1,2
, Erin B. Fichot
1
, Lyria Berdjeb
1
, and Jed
A. Fuhrman
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
2
Current address: GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany, 24148
Correspondence:
Yi-Chun Yeh
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Email: yichuny@usc.edu
41
Supplementary Materials and Methods
18S mock community preparation
To create even and staggered 18S mock communities, nearly full-length 18S rRNA clone
libraries were prepared from the large size fraction (1.2-80 µm) of seawater samples collected
from the San Pedro Ocean Time-Series location. DNA was amplified using universal eukaryote
primers Euk-A (5’-AACCTGGTTGATCCTGCCAGT-3’) and Euk-B (5’-GATCCTTCTGCAGGTTCACCTAC-
3’) (Countway et al., 2005). 25-µl PCR mixtures contained 1.25X 5Prime Hot master mix (0.5 U
Taq, 45mM KCl, 2.5 mM Mg2+, 200 µm dNTPs; Quanta Bio), 0.5 µM primers, and 1 ng of DNA.
PCR conditions were performed as follows: an initial hot start step at 95°C for 2 min followed by
25 cycles of 95°C for 30 sec, 50°C for 30 sec and 68°C for 4 min. PCR products were then cloned
using TOPO TA cloning kit with pCR4-TOPO Vector and One Shot Top 10 competent cells
according to the manufacturer’s protocols. The cloned PCR products were sequenced using
Sanger sequencing. Species identification was confirmed using BLASTn against the nt database.
16 clones were chosen to represent the major marine eukaryotic groups, including
haptophytes, dinoflagellates, diatoms, ciliates, cercozoa, radiolaria, and copepods. Plasmids
were purified using Qiagen plasmid plus 96 miniprep kit and then amplified with Euk-A and Euk-
B using the same conditions described above. In the even 18S mock community, 10 clones were
mixed with equal molarity. In the staggered 18S mock community, 16 clones were mixed with
different concentrations.
42
PCR and sequencing
To compare 16S/18S universal primers with eukaryote-specific primers, 18S mock communities
were amplified with V4F (5’-CCAGCASCYGCGGTAATTCC-3’) and V4R (5’-ACTTTCGTTCTTGATYRA-
3’), and V4F and V4RB (5’-ACTTTCGTTCTTGATYRR-3’) (Stoeck et al., 2010; Balzano et al., 2015).
The only difference between these two primer pairs is the last nucleotide on the 3’ end of the
reverse primer (A to R), which makes V4F/V4RB amplify haptophytes better (Balzano et al.,
2015). Due to the considerably lower annealing temperature of the reverse primer, the full
primers with indices and Illumina adaptors did not result in any PCR bands. Thus, 2-step PCR
was required to obtain efficient amplification as is standard practice for this primer set (Stoeck
et al., 2010; Balzano et al., 2015; Mahé et al., 2015; Pasulka et al., 2016). The first PCR mixtures
contained 1X Phusion HF buffer (1.5 mM MgCl2), 300 µM dNTPs, 0.5 µM primers, 3% DMSO,
0.5 U Phusion High-Fidelity DNA polymerase (New England BioLabs Inc.), and 1 pg pure mock
community. PCR cycles were as follows: 98°C for 1 min, 10 cycles of 98°C for 30 sec, 53°C for 30
sec, 72°C for 30 sec; and then 15 cycles of 98°C for 30 sec, 48°C for 30 sec, 72°C for 30 sec, and
a final extension step of 72°C for 10 min. The second PCR reaction was performed with full
primers that had barcoded indices and Illumina adaptors. PCR mixtures contained 1X Phusion
HF buffer (1.5 mM MgCl2), 300 µM dNTPs, 0.5 µM primers, 3% DMSO, 0.5 U Phusion High-
Fidelity DNA polymerase (New England BioLabs Inc.), and 2 µl of the PCR products from the first
step. PCR cycles were as follows: 98°C for 1 min, 10 cycles of 98°C for 30 sec, 48°C for 30 sec,
72°C for 30 sec, and a final extension step of 72°C for 10 min.
With 16S/18S universal primers (515Y, 5’-GTGYCAGCMGCCGCGGTAA-3’; 926R, 5’-
CCGYCAATTYMTTTRAGTTT-3’), as shown in Fig. 1, 8 different types of mock communities were
43
amplified using single step PCR with full-length primers that had barcoded indices and Illumina
adaptors. 25-µl PCR mixtures contained 1.25X 5Prime Hot master mix (0.5 U Taq, 45mM KCl, 2.5
mM Mg2+, 200 µm dNTPs; Quanta Bio), 0.3 µM primers, and 1 pg of pure mock community.
PCR conditions were as follows: 95°C for 2 min, 30 cycles of 95°C for 45 sec, 50°C for 45 sec and
68°C for 90 sec, and a final extension step of 68°C for 5 min. Environmental samples were
amplified using the same condition described above but with 0.5 ng of DNA. PCR products were
cleaned using 0.8X Ampure XP magnetic beads (Beckman Coulter). Purified PCR products were
quantified with PicoGreen and sequenced on Illumina HiSeq 2500 in PE250 mode and MiSeq
PE300. For each sequencing run, multiple blanks (i.e. PCR negative controls) were included as
internal controls, meaning PCR water was amplified, cleaned and sequenced as environmental
samples. After sequence processing, blanks were used to check for contamination that comes
from sample bleed-through due to “index hopping”.
Denoising using DADA2 and deblur pipelines
Sequences were analyzed using the DADA2 R package (Callahan et al., 2016), the QIIME
2 q2-dada2 plugin, and the QIIME 2 q2-deblur plugin (Amir et al., 2017) to compare different
denoising outputs. We ran DADA2 in both R and qiime2 platforms to compare version
differences (standalone R DADA2 = v1.10.1; qiime2-2018.8). With DADA2, forward and reverse
reads were trimmed and filtered after inspecting their quality profiles. Filtered reads were used
to make parametric error models for forward and reverse reads independently. Then, filtered
reads were denoised based on run-specific error models. Denoised reads were then merged
and remove chimeric reads. With deblur, reads were merged with qiime2 VSEARCH and filtered
44
using qiime2 q-score-joined plugin. Filtered reads were then processed through the qiime2 q2-
deblur plugin. 16S ASVs were classified with qiime2 classify-sklearn plugin against the SILVA 132
database subsetted to the amplicon region. 16S ASVs classified as Chloroplast were extracted
based on the SILVA classifications and subsequently reclassified against the PhytoRef database.
18S ASVs were classified with qiime2 classify-sklearn plugin against both SILVA 132 and PR2
databases.
However, 18S reads amplified with 515Y/926R are 575-595 bp, which is too long for
forward and reverse reads to overlap, so we chose to trim reads to fixed lengths before the
denoising step. After testing with different trim lengths, the results in the supplementary
information showed that denoiser performance was relatively consistent among runs at a trim
length for the forward reads of 220 bp and the reverse read of 200 bp. With the DADA2 R
package, denoised reads were concatenated and chimeric reads were removed. With the QIIME
2 q2-dada2 and q2-deblur plugins (which did not have an option for independent denoising and
subsequent merging at the time of writing), forward and reverse reads were trimmed using
bbduk.sh and concatenated using fuse.sh from bbtools package
(http://sourceforge.net/projects/bbmap/). Concatenated reads were then processed as
artificial single-end reads and chimeras.
Validation of ASV algorithms by analysis of mock communities
ASVs were compared with BLASTn against in silico mock sequences to determine ASVs
that perfectly matched in silico sequences, artifacts (1-mismatch to the in silico sequences), or
contamination (more than 1-mismatch to the in silico sequences). To compare the
45
performance of each pipeline, we determined 1) the percent of reads that perfectly matched in
silico sequences, 2) the percent of reads removed after denoising, and 3) the R-squared values
of linear regression between observed and expected abundances on a log (x+0.001) scale. The
processing scripts in this study are available on Figshare at
https://doi.org/10.6084/m9.figshare.11320388
Effects of trim length on 18S denoising
Since 18S amplicons amplified with 515Y/926R are too long (~575-595 bp) for forward
and reverse reads to overlap (even with 2 x 300bp sequencing), we decided to trim reads to
fixed lengths before denoising. Trimmed reads are concatenated either before (q2-dada2 and
q2-deblur QIIME 2 plugins) or after denoising (standalone DADA2 R package). As quality profiles
of reverse reads vary widely among runs, a trim length which worked equally well across runs
needed to be determined. We therefore systematically decreased the trim length of reverse
reads from 220 to 100 bp while fixing trim length of forward reads at 220 bp. Three criteria
were then used to compare denoiser performance; 1) percent reads that perfectly matched in
silico sequences, 2) R-squared values obtained by plotting the expected abundance of
staggered mock community against the sequenced staggered mock community on a log
(x+0.001) scale, and 3) percent reads removed after denoising (Table S1 and Table S2). Deblur
successfully recovered staggered mock communities in the proportions expected regardless
which trim length was chosen, and we did not observe any sequences without an exact match
to the known reference sequences. DADA2, however, produced up to 0.5 % of reads that did
not perfectly match the mock communities, though it performed slightly better when
46
concatenating reads after denoising (0.3 % of reads did not perfectly match in silico sequences).
By BLASTing these reads against in silico sequences, we found that these reads could be
accounted for by sample bleed-through or contamination as they had more than 1-mismatch to
the in silico sequences, which were less likely produced by PCR/sequencing errors. Although
deblur never produced such putative erroneous ASVs, it removed a large fraction of reads
(~75%) compared with DADA2 (~25%), yielding fewer observations in the final ASV table (Table
S1 and Table S2). According to the three criteria defined above, denoiser performance was
relatively consistent among sequencing platform at a trim length for the reverse read of 200 bp.
47
Table S1. Effects of trim length on 18S staggered mock community using Miseq sequencer
Trim
length
of
reverse
reads
(bp)
Deblur DADA2 R package DADA2 QIIME 2 version
Percent
reads
perfectly
match in
silico
r
2
expected
vs.
observed
Percent
reads
removed
after
denoising
Percent
reads
perfectly
match in
silico
r
2
expected
vs.
observed
Percent
reads
removed
after
denoising
Percent
reads
perfectly
match in
silico
r
2
expected
vs.
observed
Percent
reads
removed
after
denoising
220 100 0.72 94.2 99.8 0.75 50.1 99.8 0.74 59.9
210 100 0.75 93.0 100.0 0.75 42.3 99.9 0.75 53.6
200 100 0.73 92.1 100.0 0.74 36.6 99.9 0.75 48.8
190 100 0.72 91.4 100.0 0.74 31.8 99.9 0.75 43.8
180 100 0.73 90.4 100.0 0.74 29.0 99.9 0.75 40.3
170 100 0.73 89.5 100.0 0.74 27.5 99.9 0.75 38.2
160 100 0.73 88.8 100.0 0.74 26.5 99.9 0.75 36.7
150 100 0.72 88.3 100.0 0.74 25.4 99.9 0.75 35.0
140 100 0.73 87.2 100.0 0.74 24.3 99.9 0.74 32.9
130 100 0.72 86.5 100.0 0.74 23.8 100.0 0.74 31.8
120 100 0.72 85.9 100.0 0.74 23.2 99.9 0.74 30.7
110 100 0.72 85.5 100.0 0.74 22.8 99.9 0.74 29.9
100 100 0.72 85.0 100.0 0.74 22.5 100.0 0.74 29.1
48
Table S2. Effects of trim length on 18S staggered mock community using Hiseq sequencer
Trim
length
of
reverse
reads
(bp)
Deblur DADA2 R package DADA2 QIIME 2 version
Percent
reads
perfectly
match in
silico
r
2
expected
vs.
observed
Percent
reads
removed
after
denoising
Percent
reads
perfectly
match in
silico
r
2
expected
vs.
observed
Percent
reads
removed
after
denoising
Percent
reads
perfectly
match in
silico
r
2
expected
vs.
observed
Percent
reads
removed
after
denoising
220 100 0.68 72.2 98.8 0.69 26.1 98.9 0.69 11.2
210 100 0.68 71.2 98.9 0.69 23.6 98.8 0.69 11.9
200 100 0.68 70.2 99.3 0.69 22.7 99.0 0.69 12.0
190 100 0.68 69.1 99.4 0.69 21.9 99.2 0.69 11.8
180 100 0.68 68.0 99.4 0.69 21.2 99.3 0.69 11.4
170 100 0.68 67.0 99.5 0.69 20.6 99.3 0.69 11.2
160 100 0.68 65.8 99.6 0.69 20.0 99.2 0.69 10.8
150 100 0.68 64.7 99.6 0.69 19.5 99.3 0.69 10.5
140 100 0.68 63.8 99.7 0.69 18.9 99.6 0.69 10.1
130 100 0.68 62.8 99.9 0.69 18.3 99.8 0.69 9.7
120 100 0.68 61.8 99.9 0.69 17.8 99.7 0.69 9.2
110 100 0.68 60.7 100.0 0.69 17.2 99.7 0.69 8.8
100 100 0.68 59.6 100.0 0.70 16.7 99.8 0.69 8.3
49
Table S3. The composition of metagenomic SSU rRNA matching 515Y (i.e., the 4
th
base of 5’ end
of 515F is C or T). The data was derived from a report (McNichol et al., 2020), which compared
primers with SSU rRNA sequences retrieved from four marine metagenomic datasets. This study
showed that 88% of eukaryotic 18S rRNA and 99% of cyanobacteria and chloroplast 16S rRNA
perfectly match 515Y. We further examined whether those sequences perfectly match 515C or
515T. The results showed that >97% of the sequences perfectly matched 515C and only a small
number perfectly matched 515T.
Percent SSU rRNA matching
515Y with the C at the 4
th
position of 5’ end
Percent SSU rRNA matching
515Y with the T at the 4
th
position of 5’ end
18S rRNA
BioGEOTRACES 96.7 0.1
Malaspina 99.2 0
MBARI 98.8 0
TARA 98.5 0.1
Cyanobacteria + chloroplast 16S rRNA
BioGEOTRACES 98.5 0
Malaspina 100 0
MBARI 99 0.1
TARA 99.2 0.1
50
Figure S1. Expected staggered mock communities plotted against observed staggered mock
communities in mixed mock communities.
51
Figure S2. Venn diagram of shared and unique eukaryotes between two primer sets at different
taxonomic levels. In all cases, the left circles are universal primers, and the right circles are the
eukaryotic specific primers. The numbers in parentheses are the total number of taxa at the
given taxonomic level. At the order level, the 3 orders unique for the universal primers are
listed on the left, and the 2 orders unique for the eukaryotic specific primers are listed on the
right.
52
Figure S3. The relative abundances of ASVs amplified with V4F/V4RB plotted against the
relative abundances of ASVs amplified with 515F/926R using the completely overlapping ASVs
from universal (515F/926R) and eukaryote-specific primers (V4F/V4RB) on a log (x+0.001) scale.
53
Figure S4. Average-linkage clustering of eukaryotic communities using the completely
overlapping ASVs from universal and eukaryote-specific primers. Eukaryotic communities were
amplified using universal (515Y/926R) and eukaryote-specific primers (V4F/V4RB). The
completely overlapping region (i.e. 220 bp of forward reads alone) were then denoised using
DADA2 to the ASV level. Bray-Curtis dissimilarity clustering analysis shows broad similarities in
overall patterns, as in Fig 6, but significant differences of typically 30-50% when the same
sample is analyzed by the different primer sets.
54
Figure S5. The composition of reads found in 1.2-80 μm size fraction of seawater samples
collected from a March-July time series off Southern California.
55
Figure S6. Rarefaction curves for samples collected from a March-July time series off Southern
California. Samples were amplified using 3-domain universal primer (515C/926R) and analyzed
using DADA2. It appears that the numbers of 18S ASVs is saturated at 1,000-6,000 sequences,
and number of chloroplast 16S ASVs is saturated at 2,000-10,000 sequences. For prokaryotic
16S ASVs, the number is saturated at 4,000-15,000 sequences. According to Fig. S5, in this size
fraction (1.2-80 m), there were an average of 15% 18S, 20% chloroplast 16S, and 65%
prokaryotic 16S. Given this information, we can calculate the minimum required sequencing
depth to obtain enough sequences for each domain. For the least diverse sample, which
saturates the earliest, it requires 15,000 total sequences to capture 1687.5 18S sequences
(15000*0.15*0.75), 2250 chloroplast 16S sequences (15000*0.2*0.75), and 7312.5 prokaryotic
16S sequences (15000*0.65*0.75) considering 25% of sequences are removed after the
denoising step (Table S1 and S2). For the most diverse sample, it requires 70,000 total
sequences to obtain 7,875 18S sequences (70000*0.15*0.75), 10,500 chloroplast 16S, and
34,125 prokaryotic 16S sequences considering 25% of sequences are removed after the
denoising step.
56
Figure S7. The read composition and rarefaction curves for whole seawater samples collected
as part of the BioGEOTRACES project. The data was derived from the a report (McNichol et al.,
2020), which amplified DNA collected from 100ml of whole (> 0.2 m) seawater using 3-domain
universal primer (515Y/926R) and analyzed using DADA2. We used the Atlantic surface (<65m
depth) seawater samples (sample IDs correspond with the original study) to validate the
required sequencing depth. It appears that the numbers of 18S ASVs is saturated at 200-1,000
sequences, and number of chloroplast 16S ASVs is saturated at 500-2,500 sequences. For
prokaryotic 16S ASVs, the number is saturated at 20,000-40,000 sequences. In this size fraction
(>0.2 m), there were an average of 1.5% 18S, and 3.1% chloroplast 16S, and 95.4% prokaryotic
16S. Given this information, we can calculate the minimum required sequencing depth to
obtain enough sequences for each domain. For the least diverse sample, which saturates the
earliest, it requires 28,000 total sequences to capture 315 18S sequences (28000*0.015*0.75),
651 chloroplast 16S sequences (28000*0.031*0.75), and 20,034 prokaryotic 16S sequences
(28000*0.954*0.75) considering 25% of sequences are removed after the denoising step (Table
S1 and S2). For the most diverse sample, it requires 110,000 total sequences to obtain 1237 18S
sequences (110000*0.015*0.75), 2557 chloroplast 16S (110000*0.031*0.75), and 78705
prokaryotic 16S sequences (110000*0.954*0.75) considering 25% of sequences are removed
after the denoising step.
57
References
McNichol JC, Berube PM, Biller SJ & Fuhrman JA (2020) Evaluating and Improving SSU rRNA PCR
Primer Coverage via Metagenomes from Global Ocean Surveys. bioRxiv
https://doi.org/10.1101/2020.1111.1109.375543.
58
Chapter 2: Temporal and depth dynamics of microbial communities across three
domains over 14 years at the San-Pedro Ocean Time-series
Yi-Chun Yeh
1
and Jed Fuhrman
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Correspondence:
Jed Fuhrman
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Email: fuhrman@usc.edu
59
Abstract
Marine protists and prokaryotes are crucial globally, yet there are no comprehensive
long-term depth profile studies of whole microbial communities. We characterized microbes at
the San-Pedro Ocean Time-series between 2005 and 2018 from the surface to near-bottom
(890 m), by sequencing 16S and 18S rRNA genes from 0.2-1 μm and 1-80 μm size fractions, with
three-domain universal primers, allowing direct comparisons not otherwise possible. After
sequencing bias correction, >92% of the 0.2-1 μm sequences were prokaryotic 16S, and the 1-
80 μm fraction contained 6-34% 18S, 19-23% chloroplast 16S, and 46-93% prokaryotic 16S rRNA
genes - even the larger size fraction was prokaryote-dominated. Prokaryotes were strongly
seasonal in the euphotic zone, but eukaryotes less so; sub-euphotic prokaryotes were generally
stable and predictable, protists were not. Particle-associated and/or large-celled prokaryotes
(1-80 μm) were more diverse than smaller free-living ones, especially at the deeper depths,
ascribed to endemic taxonomic groups and niche differentiation of SAR11 and Flavobacteriales.
Diversity of phototrophic and heterotrophic protists peaked at the deep chlorophyll maximum.
High Syndiniales abundance and diversity suggests significant parasitism throughout the water
column. The surprising stability and predictability of prokaryotes compared to protists indicates
fundamental differences in their ecology.
Introduction
Marine microbial communities consist of all three domains of life: Bacteria, Archaea,
and Eukaryota. Together, these organisms perform a wide range of marine biogeochemical
process as they represent key trophic roles in microbial food webs, including primary
60
producers, consumers, and decomposers. Although the interactions among Bacteria, Archaea,
and Eukaryota play an important role in determining community structure and ecosystem
functioning, such as carbon sequestration and decomposition, only a few studies have
thoroughly surveyed all these components together (Brown et al 2009, Chénard et al 2019). The
lack of comprehensive investigations is due, in part, to the difficulties of accessing the diversity
across three domains. The emergence of high-throughput sequencing techniques allows us to
identify the community composition using appropriate marker genes (e.g., SSU rRNA genes).
Moreover, studies have shown that a 16S+18S universal primer set can quantitatively survey
the diversity from Bacteria, Archaea, and Eukaryota in a single PCR reaction (Needham and
Fuhrman 2016, Needham et al 2018, Parada et al 2016, Yeh et al 2021).
Understanding the natural variability of community dynamics relies on long-term
observations. The best-known microbial ocean time series programs include the Hawaii Ocean
Time-series (HOT), the Bermuda Atlantic Time Series (BATS), both far offshore open ocean
systems, the Blanes Bay Microbial Observatory (BBMO) nearshore in the NW Mediterranean,
and the San Pedro Ocean Time-series (SPOT). SPOT is about 20 km off the coast of Southern
California, in a coastal basin with about 900 m water depth and restricted horizontal advection
below about 500 m (Berelson 1991), resulting in a persistent hypoxic environment at depth.
SPOT has conducted monthly sampling of the water column since 2000, and it provides a good
opportunity to examine the spatiotemporal variation of microbial communities in a subtropical
marine system.
Previous research at SPOT has found temporal variability of microbial communities and
potential ecological interactions among taxa using network analysis. (Chow et al 2013, Chow et
61
al 2014, Countway et al 2010, Cram et al 2015a, Cram et al 2015b, Hu et al 2016, Kim et al 2014,
Parada and Fuhrman 2017, Steele et al 2011). However, many of these findings are based on
fingerprint technology, such as automated ribosomal intergenic spacer analysis (ARISA) and
terminal restriction fragment length polymorphism (T-RFLP), and the prokaryotic communities
in those studies were only the smaller free-living ones. Here, we present a 14-year study that
uses SSU rRNA sequencing with 16S+18S universal primers to investigate the spatiotemporal
variation of the marine microbial community from two size fractions at SPOT. The 16S and 18S
sequences are denoised into amplicon sequence variants (ASVs) using DADA2 (Callahan et al
2016), which allows resolutions down to single-nucleotide differences. Our goal is a broad
characterization of the vertical and seasonal changes of free-living (0.2-1 μm), particle-
associated or large-celled (1-80 μm) prokaryotes and protistan communities throughout the
water column, comparing depths and size fractions to each other and with environmental
properties to assess what factors have the most significant influence on the overall community
patterns.
Results
Environmental conditions
Satellite data shows that SPOT has a seasonal cycle of chlorophyll and primary
production (Fig. 1). The highest monthly chlorophyll-a concentration and primary productivity
generally occurred in March-May, while the minima occurred in September-November. Thus,
the seasons were defined here with March-May as spring, June-August as summer, September-
November as autumn, and December-February as winter.
62
The whole water column has strong vertical environmental gradients, characterized
here by principal component analysis (PCA) on environmental variables, including temperature,
dissolved oxygen, fluorescence, NO 2+NO 3, and PO 4. In the PCA, the first two principal
components (PCs) explained 94% of the variability (Fig. 2). The first principal component (PC1)
was positively correlated with nutrients, and negatively correlated with temperature and
dissolved oxygen. In the PCA biplot, PC1 was an indicator for sampling depth, generally
ordinating the cold and nutrient-rich deeper samples in the positive range and warm and
nutrient-depleted surface samples in the negative range. The second principal component (PC2)
was negatively correlated with fluorescence and thus separated DCM samples from 5 m
samples.
Microbial community composition
A total of 908 samples collected at the SPOT location during 2005-2018 were analyzed
by SSU rRNA sequencing with the 16S/18S universal primers (515Y/926R) that amplify
prokaryotic 16S, chloroplast 16S, and eukaryotic 18S rRNA genes all at the same time. After
quality control filtering, merging, and chimera removal, the 0.2-1 μm size fraction overall
sequence data (all depths) was partitioned into an average of 96-100% prokaryotic 16S, 3%
chloroplast 16S (only at 5 m and DCM), and <1% eukaryotic 18S. The 1-80 μm size fraction
processed sequences were partitioned into 63-78% prokaryotic 16S, 10-20% chloroplast 16S
(only at 5 m, DCM, and 150 m), and 12-16% eukaryotic 18S (Fig. 3). This methodology has been
tested with prokaryotic and eukaryotic mixed mock communities, and we found that there is a
two-fold underestimation of 18S sequences due to sequencing length bias (Yeh et al 2021). We
63
thus applied a two-fold correction factor to estimate the true proportions of these three
categories. In the 0.2-1 μm size fraction, the correction factor had little effect and proportions
remained the same because there were nearly no eukaryotic 18S sequences found. However,
after applying the two-fold correction to the 1-80 μm size fraction, we estimate that
prokaryotic 16S and chloroplast 16S decreased to 55-67% and 7-18% respectively, and the
relative proportion of eukaryotic 18S increased to 21-28% (Fig. 3 and supplementary
information). So as expected, the 0.2-1 μm size fraction mainly captured prokaryotes, with
picoeukaryotic chloroplast and 18S sequences averaging < 5%, though in a few samples they
exceeded 20% (Fig 3). The 1-80 μm size fraction collected all three categories, yet the larger
fraction was interestingly still dominated by prokaryotes, averaging >50% at all depths even
after the length bias correction. Also, because the euphotic zone at SPOT is generally
considerably shallower than 100 m, with the DCM typically averaging 5 m to 66 m and varying
seasonally, we expected significant contributions from chloroplast 16S (phototrophic
eukaryotes) only in the 5 m and DCM samples. However, there was still a substantial
contribution at 150 m, which are presumably primarily sinking diatoms (Fig. S5).
Prokaryotic 16S community composition varied by depth, season, and size fraction
Here, we examined free-living (0.2-1 μm) and particle-associated or large-celled (1-80
μm) prokaryotic communities using prokaryotic 16S rRNA sequences. Overall, the Illumina
sequencing generated 59,486,285 quality-filtered sequences, ranging from 3452 to 810,688
sequences per sample, which were denoised into 59,156 amplicon sequence variants (ASVs)
64
using DADA2. Of these, only 6,419 ASVs were more than 0.1% in at least one sample, which
accounted for 78-100% of community composition.
We further investigated the alpha diversity among the sampling depths and size
fractions. The asymptotic estimates of ASV richness and Shannon index values of particle-
associated or large-celled prokaryotic communities (1-80 μm) were both significantly higher
than free-living prokaryotic communities (0.2-1 μm) (Fig. 4, ANOVA, p<0.001). The greatest
richness and Shannon index values were observed in the large size fraction (1-80 μm) at 150 m
and 500 m. Pairwise Bray-Curtis similarity was used to assess temporal stability within each
sample category, and there was a trend indicating that temporal stability increased with depth
in both size fractions. Nonmetric multidimensional scaling (NMDS) analysis showed that
changes in prokaryotic community structure correlated most with sampling depths (Fig. 4). A
PERMANOVA test (Table 1) also showed that sampling depth was a strong predictor of
variability explaining 48% of ASV composition (p<0.001), against 5% attributed to the size
fraction (p<0.001) and 2% attributed to season (p<0.001). The effects of size fraction and
season were more evident when the dataset was partitioned by depth (Table 2). Seasonal
effects significantly explained 14% and 11% of variability at 5 m and DCM respectively, whereas
seasonal changes were not as prominent below the euphotic zone, explaining only <6% of the
variability. The strong seasonality at 5 m and DCM were also observed when temporal
difference between samples was plotted against Bray-Curtis similarity (Fig. 5). The effects of
size fraction, on the other hand, were more important below the euphotic zone, explaining 29-
36% of the variability (p<0.001).
65
The taxonomic composition (Fig. 6) showed that the most common orders in this study
were SAR11 (accounting for 19% of the total community in the free-living (FL) size fraction vs.
16.2% in particle-associated or large-celled (PA-L) size fraction), Flavobacteriales (8.3% in FL vs.
16.2% in PA-L), Nitrosopumilales Archaea (11.7% in FL vs. 5.9% in PA-L), Thiomicrospirales
(10.6% in FL vs. 3.3% in PA-L), Marinimicrobia (5.8% in FL vs. 2.8% in PA-L), Rhodobacterales
(4.6% in FL vs. 3.1% in PA-L), Marine Group II Archaea (4.5% in FL vs. 3.5% in PA-L), SAR324
(4.7% in FL vs. 1.3% in PA-L), Synechococcales (1.7% in FL vs. 3.5% in PA-L), and SAR86 (3.7% in
FL vs. 1% in PA-L). In general, SAR11 and Flavobacteriales were abundant in almost all samples.
The highest abundance of SAR11 and Flavobacteriales were found at 5 m and DCM, followed by
150 m, 500 m, and 890 m. At 5 m and DCM, SAR11 dominated in summer/autumn, mainly in
the small size fraction (0.2-1 μm), whereas Flavobacteriales dominated in spring/winter, mainly
in the larger size fraction (1-80 μm). Rhodobacterales, SAR86, and Puniceispirillales (SAR116)
predominated in the euphotic zone, mainly in the small size fraction (0.2-1 μm), whereas
Synechococcales, Actinomarinales, Cellvibrionales were abundant in the large size fraction (1-
80 μm) in the euphotic zone. Additionally, Nitrosopumilales Archaea, Thiomicrospirales,
Nitrospinales, Marinimicrobia, and SAR234 predominated at the deeper depths, mainly in the
small size fraction (0.2-1 μm). Although other major groups (such as Artic97B-4 marine groups,
UBA10353 marine groups, and Sphingomonadales) did not show a clear pattern of distribution,
they were generally present in particle-associated or large-celled communities (1-80 μm) in
deeper depths.
To examine if these endemic groups contributed to the high richness in the large size
fraction (1-80 μm) at 150 m and 500 m, the number of ASVs observed within each group were
66
calculated (Fig. 8a). The heatmap showed that the richness at 150 m and 500 m was not only
contributed by the orders that were endemic to deeper waters, but also by SAR11 and
Flavobacteriales. As SAR11 and Flavobacteriales have been found to have niche differentiation
between major subgroups (Brown et al 2012, Fernández-Gomez et al 2013, Giovannoni 2017,
Kraemer et al 2020, Thrash et al 2014, Tsementzi et al 2016), the ASV richness within each
subgroup was further compared (Fig. 8b and 8c). The results showed that SAR11 clade I, II, III
and IV were all found at SPOT, all with different vertical patterns. SAR11 clade I was present in
high richness regardless the sampling depth and size fraction. In contrast, SAR11 clade II were
only found in high richness at 150 m and 500 m. For Flavobacteriales, the Flavobacteriaceae,
NS7 marine group, NS9 marine group, Crocinitomicaceae, and Crymorphaceae were all found in
high richness in the larger size fraction (1-80 μm). Among these subgroups, the NS9 marine
group showed high richness at 150 m and 500 m.
Chloroplast 16S community composition varied by season
The chloroplast 16S rRNA gene was used to identify phototrophic eukaryotic
communities from the 1-80 μm size fraction in the euphotic zone (5 m and DCM) (excluding
most dinoflagellates; Needham et al 2016). The Illumina sequencing generated 10,876,142
quality-filtered sequences, ranging from 1897 to 710,700 sequences per sample, which were
denoised into 4041 ASVs using DADA2. Of these, only 1,772 ASVs were more than 0.1% in at
least one sample. These ASVs accounted for 96-100% of community composition. When
comparing the alpha- and beta-diversity between sampling depths, no significant differences in
the asymptotic estimates of ASV richness and pairwise Bray-Curtis similarity were found
67
between 5 m and DCM (Fig. 4). The Shannon index values of chloroplast 16S communities at 5
m were slightly higher than that at DCM (ANOVA, p<0.001). Comparing the chloroplast 16S
communities using NMDS showed that samples were slightly clustered by sampling depth. A
PERMANOVA test (Table 1) further confirmed that sampling depth and season only explained
7% and 6% of ASV composition (p<0.001). After partitioning by depth, the seasonal effect at 5m
and DCM both explained 10% of variability (Table 2) and was visualized by plotting temporal
difference between samples against Bray-Curtis similarity (Fig. S2). The taxonomic composition
showed that the major classes were Prymnesiophyceae (41%), Bacillariophyta (19%),
Mamiellophyceae (15%), Pelagophyceae (8%), Dictyochophyceae (5%) and Cryptophyceae (5%).
The heatmap showed that the classes Prymnesiophyceae and Bacillariophyceae predominated
in all samples with Prymnesiophyceae as the main representative during summer/autumn, and
Bacillariophyceae dominated during spring/winter (Fig. 7a).
Eukaryotic 18S community composition varied by depth
18S rRNA gene was used to identified eukaryotic communities from the 1-80 μm size
fraction. The Illumina sequencing generated 12,125,346 quality-filtered sequences, ranging
from 112 to 578,005 sequences per sample, which were denoised into 26,645 ASVs. Of these,
only 7059 ASVs were more than 0.1% in at least one sample, and these ASVs accounted for 83-
100% of community composition. The taxonomic identity showed that eukaryotic 18S
communities were sporadically dominated by Metazoa, including Arthropoda (usually
copepods), Cnidaria, Urochordata (usually larvaceans), Annelida, Ctenophora (Fig. S3). The
Metazoa sequences likely represented juveniles, eggs, or organismal fragments that we expect
68
were sporadically captured in this size fraction, so we excluded these sequences in further
analyses (also considering we are focusing on microbial communities here). The asymptotic
estimates of ASV richness and Shannon index values of 18S communities were both significantly
different among sampling depths (ANOVA, p<0.001). The richness and Shannon index were
lowest at 890 m and peaked at DCM. Comparing the beta-diversity using NMDS and
PERMANOVA test showed that samples were clustered by sampling depth, explaining 18% of
ASV composition (p<0.001), against 2% attributed to season (Fig.4 and Table 1). When
considering the seasonal effect by depth, the PERMANOVA test showed that season explained
14% of total variation at 500m and 6-9% at 5m, DCM, and 890m (Table 2).
The taxonomic composition showed that the major classes were Syndiniales (28%),
Dinophyceae (17%), Polycystinea (17%), Spirotrichea (7%), Acantharea (4%), RAD-B (3%), and
Bacillariophyta (3%). Dinophyceae, Spirotrichea, Bacillariophyta, Prymnesiophyceae,
Mamiellophyceae, and MAST were important contributors above euphotic zone, while
Radiolaria (Polycystinea, Acantharea, and RAD-B) and Syndiniales increased in relative
abundances below the euphotic zone.
Discussion
This is the first long-term study of the microbial community from two size fractions (0.2-
1 μm and 1-80 μm) using three-domain universal primers that allow direct quantitative
comparisons between domains. When comparing the proportions of prokaryotic 16S,
chloroplast 16S and 18S rRNA genes obtained from the two size fractions (Fig. 3), we found that
>96% of sequences in the 0.2-1 μm size fraction were prokaryotic 16S, whereas there were 21-
69
28% 18S, 7-18% chloroplast 16S, and 55-67% prokaryotic 16S rRNA genes in the 1-80 μm size
fraction (Fig. 3). Our results indicated that even though the smaller size fraction is often
considered the “prokaryotic” one, when examined on a gene-by-gene basis, prokaryotic rRNA
genes dominated in both size fractions regardless of the sampling depth.
We identified clear depth-differentiation within prokaryotic assemblages, explaining
48% of variability (Table 1), in line with previous reports (Chow et al 2013, Cram et al 2015a,
Fuhrman et al 2006, Mestre et al 2017, Mestre et al 2018, Parada and Fuhrman 2017, Treusch
et al 2009, Wilson et al 2017, Zinger et al 2011). When considering seasonal changes by depth,
the seasonal effect dramatically decreased below the euphotic zone (Table 2). Moreover,
temporal variability of pairwise Bray-Curtis similarity (a way to examine repeating seasonality)
also increased with depth (Fig. 4 and 5), suggesting a relatively steady prokaryotic communities
in the deeper layer. The protistan communities, on the other hand, were slightly influenced by
depth, explaining 18% of variability. When considering seasonal changes by depth, the
strongest seasonal effect was found unexpectedly at 500 m, even though we recognize that
season should affect protists more above the euphotic zone where phytoplankton respond to
variations in light and nutrients. This finding suggests seasonal variations in transient sinking
particles from surface waters may be large enough to affect overall diversity patterns in
protistan communities (resident plus transient ones) at certain mid-depths. In addition,
temporal variability of pairwise Bray-Curtis similarity showed that protistan communities
change considerably more over time compared with prokaryotic communities (mean Bray-
Curtis similarity of each depth is 0.38-0.64 for prokaryotes vs. 0.12-0.19 for protistan). The
more predictable seasonality in prokaryotes suggests that interannual variability in properties
70
we did not measure (and/or stochastic factors like founder’s effects) may significantly influence
the protists more than they do prokaryotes.
Spatiotemporal dynamics within prokaryotic communities
Throughout the water column, there is a strong vertical gradient in physicochemical
variables (Fig. 2) as well as prokaryotic community composition (Fig. 4 and Table 1), in
accordance with previously reported trends (Agogué et al 2011, Brown et al 2009, DeLong et al
2006, Reji et al 2020, Treusch et al 2009, Walsh et al 2016). However, there are relatively few
previous studies that include multiple size fractions, such as those enriched in free-living vs
particle-associated microbes. By sampling two size fractions (0.2-1 μm and 1-80 μm) in the
water column, we found that free-living (0.2-1 μm) and particle-associated or large-celled (1-80
μm) prokaryotic communities exhibited significant differences in alpha- and beta-diversity,
especially bellow the euphotic zone (Fig. 4 and Table 2). Particle-associated or large-celled (1-80
μm) prokaryotic communities exhibited a higher diversity in terms of number of species
(richness) and evenness (Shannon index) than the free-living (0.2-1 μm) ones (Fig. 4), which is
consistent with previously studies (Crespo et al 2013, Ganesh et al 2014, Milici et al 2017). We
also found that dissimilarity between the two size fractions increased with depth (Table 2),
suggesting that the progressive breakdown of sinking particles at depth may provide more
ecological niches to be partitioned for particle-associated prokaryotes, compared to free-living
prokaryotes subsisting on dissolved material.
Overall, the richness and Shannon index were lowest at 5 m (Fig. 4), consistent with
previous findings that observed lowest alpha-diversity in the surface ocean (Ghiglione et al
71
2012, Reji et al 2020, Walsh et al 2016). In addition, richness and Shannon index peaked at 150
m and stayed relatively invariant at 500 m. This appears attributable to the endemic taxonomic
groups that were only present at depth (Fig. 8). These endemic groups were often more
metabolically versatile groups that might be involved in nitrogen and sulfur cycles as previously
studies have found these taxonomic groups would mediate diverse reactions, such as
Thiomicrospirales (sulfur oxidation), Marinimicrobia (nitrous oxide reduction), Nitrosopumilales
(ammonia oxidation), Nitrospinales (nitrite oxidation), and SAR324 (sulfur oxidation) (Aldunate
et al 2018, Hawley et al 2017, Murillo et al 2014, Santoro et al 2011). These profiles likely
indicated a combined effect of different depth-varying physicochemical components including
light intensity, organic matter composition, oxygen levels, nutrients, as well as biological
interactions between microbes.
In addition, vertical niche differentiation within SAR11 and Flavobacteriales was also
found in the water column (Fig. 8b and 8c). SAR11 has been classified into clades and subclades,
which are reported to have restricted vertical distributions. For example, SAR11 clade Ia and Ib
are associated with surface ocean, while SAR11 clade IIa is associated with the oxygen
minimum zone (Brown et al 2012, Giovannoni 2017, Kraemer et al 2020, Thrash et al 2014,
Tsementzi et al 2016). Consistent with previously findings, we found SAR11 clade II present in
high ASV diversity below the euphotic zone, especially at 150 m and 500 m. Flavobacteriales
also displayed distinct vertical patterns, with diversity of Flavobacteriaceae peaked at surface,
whereas diversity of NS9 marine group increased with depth. Flavobacteriales are highly
specialized in the degradation of complex organic compounds (Duret et al 2019); however, such
72
a depth-differentiation within Flavobacteriales has not been discussed before. Our results may
reflect the niche-partitioning of Flavobacteriales along the water column.
Seasonal changes were particularly dominant in the euphotic zone (Fig. 5 and Table 2),
where light and temperature have substantial seasonal variation (Fig. 1 and Fig. S1). Many of
these abundance patterns likely resulted from heterotrophic response to variations in primary
production, such as the annual peaks in spring (Fig 1), often in blooms (Buchan et al 2014, Lindh
et al 2015, Needham and Fuhrman 2016, Teeling et al 2016). Flavobacteriales, Rhodobacterales,
and Cellvibrionales that showed abundance maxima during the spring blooms are known to be
bloom-associated groups that are capable of degrading phytoplankton-derived polysaccharides
(Buchan et al 2014), whereas oligotrophic groups with streamlined genomes such as SAR11
appear to dominate the euphotic zone the rest of time when nutrient conditions are low
(Giovannoni et al 2005, Needham and Fuhrman 2016). In addition, seasonal effects greatly
decreased below the euphotic zone (Table 2), suggesting that the seasonal downward transport
of large, fast-sinking particles from the surface did not lead to strong seasonality of the deep
community as a whole, even though several particular taxa have been shown to respond to
sinking material at this site (Cram et al 2015b). This may suggest the overall community
variation over time in the deep water is significantly affected by other factors besides seasonal
sinking material.
The effects of size fraction on community composition, on the other hand, were more
prominent below the euphotic zone (Table 2). The difference between the two size fractions
resulted from many taxonomic groups that were enriched in either free-living or particle-
associated size fractions (Fig. S6). We observed that members of Alphaproteobacteria (SAR11,
73
Puniceispirillales, and Rhodobacterales) and SAR86 were enriched in the small size fraction in
the euphotic zone, whereas Nitrospinales, Thiomicrospirales, SAR324, Marinimicrobia,
Nitrosopumilales Archaea, and Marine Group II Archaea were enriched in the small size fraction
below the euphotic zone. In addition, Actinomarinales and members of the Bacteroidetes
(including Flavobacteriales), Planctomycetes, Verrucomicrobia, Betaproteobacteria,
Gammaproteobacteria, and Deltaproteobacteria were enriched in the large size fraction. These
patterns were mostly consistent with previously findings (Crespo et al 2013, Ganesh et al 2014,
Milici et al 2017, Mohit et al 2014, Rieck et al 2015, Salazar et al 2015), except for
Actinomarinales, which has been previously found via single amplified genomes (SAGs) to be
one of the smallest planktonic prokaryotes (Pachiadaki et al 2019); this apparent discrepancy
might be explained if some abundant Actinomarinales are particle-associated, in contrast with
SAGs that come from individual sorted cells
Spatiotemporal dynamics within protistan communities
We found depth changes along the water column in protistan assemblages (Fig. 4 and
7), which is consistent with the results obtained from previous studies (Countway et al 2007,
Giner et al 2020, Ollison et al 2021, Schnetzer et al 2011). As expected, photosynthetic groups
(e.g., Prymnesiophyceae, Mamiellophyceae, and Bacillariophyta) dominated the euphotic zone,
whereas heterotrophic groups, notably including Radiolaria (Polycystinea, Acantharea, and
RAD-B), increased in relative abundance with depth (Lampitt et al 2009, Martin et al 2010). In
addition, Syndiniales (previously called MALV-Marine Alveolates) were prevalent throughout
the water column. Syndiniales have been characterized as parasites on a wide range of hosts,
74
such as dinoflagellates, ciliates, and Radiolaria (Bachvaroff et al 2012, Guillou et al 2008,
Skovgaard et al 2005). As reported by other studies of protists at SPOT and elsewhere (Berdjeb
et al 2018, Clarke et al 2019, De Vargas et al 2015, Hu et al 2018, Kim et al 2014, Ollison et al
2021, Pernice et al 2016), the widespread distribution of Syndiniales suggests the importance of
parasitism throughout the water column.
Seasonal effects on protistan communities increased with depth and peaked at 500 m
(Fig. 5 and Table 2). We found that sampling season explained 6-9% of total variation at most
depths, except for 500 m, where variation explained by season is two times stronger (14%).
These findings are different from a previous study of protists at SPOT. Kim et al (2014)
documented the seasonality of protistan communities between 2000 and 2003 at SPOT using
terminal restriction fragment length polymorphism (T-RFLP), and they found seasonality at 150
m but not at 500 m by plotting temporal difference between samples against Bray-Curtis
similarity. Due to the difference of methodology (T-RFLP vs. tag sequencing) and sampling
period (2000-2003 vs. 2005-2018), the seasonality we observed at 500 m may not have been
present or detectable in their study. However, both results showed how seasonal inputs of
sinking materials from surface water can affect patterns of protistan assemblages at
intermediate depths.
The diversity analyses showed that protistan diversity was lowest at 890 m and peaked
at the DCM. This overall diversity trend may be driven by biogeochemical gradients, especially
dissolved oxygen concentration. There is a persistent hypoxic environment below 500 m due to
restricted circulation within the basin at SPOT (Berelson 1991, Countway et al 2010, Cram et al
2015a, Kim et al 2014). Unlike prokaryotes, which are capable of anaerobic and microaerobic
75
metabolism, only a subset of eukaryotic taxa is able to adapt to oxygen-depleted layers,
resulting in a lower diversity. Previous studies from marine oxygen minimum zones (OMZ) have
showed that protistan diversity decreases from surface waters to hypoxic waters (Orsi et al
2012, Parris et al 2014), which is consistent with our findings, supporting the importance of
dissolved oxygen in shaping protistan communities.
Phytoplankton communities using chloroplast 16S vs. eukaryotic 18S rRNA genes
As phototrophic eukaryotes in this study were examined by two marker genes,
chloroplast 16S and eukaryotic 18S rRNA genes, the phytoplankton community composition
(excluding dinoflagellates) as analyzed with two marker genes were compared (Fig. S4). The
results showed that seasonal dynamics at 5 m and DCM were similar and consistent with
previous observations such that Bacillariophyta (diatoms) dominated during spring blooms,
whereas the abundance of Prymnesiophyceae (haptophytes, mostly small flagellates) peaked in
summer (Endo et al 2018). Moreover, the relative abundances of the major groups were quite
comparable when using chloroplast 16S and 18S rRNA genes (Fig. S7). For example, relative
abundances of Mamiellophyceae and Cryptophyceae from chloroplast 16S and 18S fall close to
the one-to-one lines. Prymnesiophyceae, however, were more represented when using
chloroplast 16S. Bacillariophyta were slightly more represented when using eukaryotic 18S. This
difference could be linked to variations in number of gene copies per genome (Needham and
Fuhrman 2016, Prokopowich et al 2003, Zhu et al 2005) and number of chloroplasts per cell (Lin
et al 2019). Note that 18S rRNA gene copy number varies from one to thousands per genome,
which precludes cell abundance estimation from 18S rRNA genes, though some analyses
76
suggest that copy number variation relates generally (but not particularly accurately) to
biomass (De Vargas et al 2015, Needham et al 2018). On the other hand, the copy number of
chloroplast 16S rRNA is only 1 or 2 per chloroplast, while the number of chloroplast per cell can
vary from one to hundreds, which is a function of cell size, taxonomy, and to some extent
environmental factors (Needham and Fuhrman 2016). Also, dinoflagellates are not represented
by chloroplast 16S rRNA because the plastid DNA is highly fragmented (mini circles), preventing
amplification (Koumandou et al 2004).
Conclusions
In this 14-year long term study, we quantitatively measured the proportion of
prokaryotic 16S, chloroplast 16S, and eukaryotic 18S rRNA genes all at the same time using 3-
domain universal primers, and found different diversity patterns among protists, free-living,
and particle-associated or large-celled prokaryotes. Based on these patterns, we found
significant differences in long-term stability between prokaryotes and eukaryotes: free-living
and particle-associated prokaryotic communities largely persisted over time, whereas
eukaryotic communities changed much more dramatically at the ASV level. All these different
patterns may be due to underlying causes that include differences in population sizes, abilities
to adapt to oxygen-depleted environments, predation/parasitism/virus infection, and trophic
status. All of these need further investigation. This study can serve as a baseline for monitoring
spatiotemporal dynamics of Bacteria, Archaea, and Eukaryote all together, facilitated by their
analysis via a single PCR reaction, resulting in relative abundances with a shared denominator.
Putting all organisms into the same quantitative context can show us shifts in the relative
77
importance of prokaryotes and eukaryotes and helps us better follow and understand their
long-term trajectories and responses to environmental changes.
Materials and Methods
Sample collection and DNA extraction
Samples were collected monthly the San Pedro Ocean Time-series (SPOT) location
between January 2005 and July 2018. Seawater samples were collected from 5 depths,
including 5 m, chlorophyll maximum depth (DCM, sampling depths are listed in supplementary
information), 150 m, 500 m, and 890 m. 10-15L of seawater was sequentially filtered through
an 80-μm mesh, a 1-μm A/E filter (Pall, Port Washington, NY), and a 0.2-μm Durapore filter (ED
Millipore, Billerica, MA). Filters were stored at -80°C until DNA extraction. Durapore filters
(collecting material 0.2-1 μm) were used for free-living prokaryotic community analysis, and A/E
filters (collecting material between 1-80 μm) were used to analyze phytoplankton,
microzooplankton and particle-associated or large celled prokaryotic communities. DNA was
extracted from the Durapore filters using a hot SDS, phenol/chloroform/isoamyl alcohol,
ethanol precipitation extraction protocol as described by Fuhrman et al (1988). DNA on the A/E
filters was extracted using a NaCl/CTAB bead-beating extraction protocol as described by Lie et
al (2013) with slight modification by adding an ethanol precipitation step after lysis to reduce
the volume of crude extract, which helps minimize DNA loss during the subsequent purification.
16S/18S PCR and sequencing
78
To pool multiple samples in a single Illumina paired-end sequencing platform, a dual-
index sequencing strategy was used with the forward primer A-I-NNNN-barcode-515Y (A-I-
NNNN-barcode-GTGYCAGCMGCCGCGGTAA) and reverse primer A-index-I-926R (A-index-I-
CCGYCAATTYMTTTRAGTTT), where A is the Illumina sequencing adapter, I is the Illumina
primer, and barcode and index are sample specific tag (5-bp barcode and 6-bp index).
515Y/925R primer pairs target the V4-V5 hyper-variable region of the 16S/18S rRNA genes. All
DNA samples were amplified using the same conditions described at
doi.org/10.17504/protocols.io.vb7e2rn (Yeh et al 2021). PCR products were cleaned using 0.8X
Ampure XP magnetic beads (Beckman Coulter). Purified PCR products from samples were
pooled in equal amount and then sequenced on Illumina HiSeq 2500 in PE250 mode and MiSeq
PE300. For each sequencing run, multiple blanks (i.e., PCR water) and two versions of mock
communities (even and staggered) were included as controls, meaning they were amplified,
cleaned, and sequenced as environmental samples with the same conditions. This way, results
from different sequencing runs were comparable without significant bias and contamination
(Parada et al 2016, Yeh et al 2018).
Sequence Analysis
Sequences were demultiplexed by forward barcodes and reverse indices allowing no
mismatches using QIIME 1.9.1 split_libraries_fastq.py (Caporaso et al 2010). The fully
demultiplexed forward and reverse sequences were then split into per-sample fastq files using
QIIME 1.9.1 split_sequence_file_on_sample_ids.py and submitted to the EMBL database under
accession number PRJEB48162 and PRJEB35673.
79
Scripts necessary to reproduce the following analysis are available at
github.com/jcmcnch/eASV-pipeline-for-515Y-926R. Briefly, demultiplexed amplicon sequences
were trimmed with cutadapt, discarding any sequence pairs not containing the forward or
reverse primer (error rate set to 0.2). Amplicon sequences were then split into 16S and 18S
pools using bbsplit.sh from the bbtools package (http://sourceforge.net/projects/bbmap/)
against curated 16S/18S databases derived from SILVA 132 (Quast et al 2013) and PR2 (Guillou
et al 2013). The 16S and 18S amplicons were then analyzed in parallel to amplicon sequence
variants (ASVs) using DADA2 (Callahan et al 2016) via qiime2 q2-dada2 plugin (Bolyen et al
2019). 16S ASVs were classified with qiime2 classify-sklearn plugin against the SILVA 132
database (Quast et al 2013). 16S ASVs identified as Mitochondria were removed. Then, the ASV
table was subdivided into prokaryotic 16S ASV table and Chloroplast 16S ASV table (including all
16S ASV identified as Chloroplast). Chloroplast 16S ASVs were further classified against
PhytoRef database (Decelle et al 2015). 18S ASVs were assigned against the PR2 databases
(Guillou et al 2013).
Environmental data
Variables including temperature, oxygen, and fluorescence at sampling depths were
recorded by a CTD. Nutrient variables including nitrite, nitrate, and phosphate were analyzed by
MSI Analytical Lab at UCSB. Satellite sea surface temperature, chlorophyll-a concentration and
surface productivity estimates are download from the Coastwatch browser website. All the
environmental variables are in supplementary table.
80
Statistical analysis
All statistical analyses and visualization were conducted with the R software (v4.1.0)
using VEGAN (Oksanen et al 2020), ggplot2 (Wickham 2016), pheatmap (Kolde 2012). The
asymptotic estimates of ASV richness is calculated using iNEXT (Hsieh et al 2016).
Acknowledgments
This long-term time series is the collective effort made by the present and previous
members of SPOT team, who help with sampling and laboratory work. We especially thank
David Caron, David Hutchins, Naomi Levine, Jesse McNichol, and Yubin Raut for feedback on
manuscript. This work was supported by NSF OCE 1737409, Gordon and Betty Moore
Foundation Marine Microbiology Initiative grant 3779, and Simons Foundation Collaboration on
Computational Biogeochemical Modeling of Marine Ecosystems (CBIOMES) grant 549943.
References
Agogué H, Lamy D, Neal PR, Sogin ML, Herndl GJ (2011). Water mass-specificity of bacterial
communities in the North Atlantic revealed by massively parallel sequencing. Molecular ecology
20: 258-274.
Aldunate M, De la Iglesia R, Bertagnolli AD, Ulloa O (2018). Oxygen modulates bacterial
community composition in the coastal upwelling waters off central Chile. Deep Sea Research
Part II: Topical Studies in Oceanography 156: 68-79.
81
Bachvaroff TR, Kim S, Guillou L, Delwiche CF, Coats DW (2012). Molecular diversity of the
syndinean genus Euduboscquella based on single-cell PCR analysis. Applied and environmental
microbiology 78: 334-345.
Berdjeb L, Parada A, Needham DM, Fuhrman JA (2018). Short-term dynamics and interactions
of marine protist communities during the spring–summer transition. The ISME journal 12: 1907.
Berelson WM (1991). The flushing of two deep-sea basins, southern California borderland.
Limnology and oceanography 36: 1150-1166.
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA et al (2019).
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.
Nature biotechnology 37: 852-857.
Brown MV, Philip GK, Bunge JA, Smith MC, Bissett A, Lauro FM et al (2009). Microbial
community structure in the North Pacific ocean. The ISME journal 3: 1374-1386.
Brown MV, Lauro FM, DeMaere MZ, Muir L, Wilkins D, Thomas T et al (2012). Global
biogeography of SAR11 marine bacteria. Molecular systems biology 8: 595.
Buchan A, LeCleir GR, Gulvik CA, González JM (2014). Master recyclers: features and functions
of bacteria associated with phytoplankton blooms. Nature Reviews Microbiology 12: 686-698.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP (2016). DADA2: high-
resolution sample inference from Illumina amplicon data. Nature methods 13: 581-583.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al (2010). QIIME
allows analysis of high-throughput community sequencing data. Nature methods 7: 335-336.
82
Chénard C, Wijaya W, Vaulot D, dos Santos AL, Martin P, Kaur A et al (2019). Temporal and
spatial dynamics of Bacteria, Archaea and protists in equatorial coastal waters. Scientific
Reports 9: 1-13.
Chow C-ET, Sachdeva R, Cram JA, Steele JA, Needham DM, Patel A et al (2013). Temporal
variability and coherence of euphotic zone bacterial communities over a decade in the
Southern California Bight. The ISME journal 7: 2259-2273.
Chow C-ET, Kim DY, Sachdeva R, Caron DA, Fuhrman JA (2014). Top-down controls on bacterial
community structure: microbial network analysis of bacteria, T4-like viruses and protists. The
ISME journal 8: 816-829.
Clarke LJ, Bestley S, Bissett A, Deagle BE (2019). A globally distributed Syndiniales parasite
dominates the Southern Ocean micro-eukaryote community near the sea-ice edge. The ISME
journal 13: 734-737.
Countway PD, Gast RJ, Dennett MR, Savai P, Rose JM, Caron DA (2007). Distinct protistan
assemblages characterize the euphotic zone and deep sea (2500 m) of the western North
Atlantic (Sargasso Sea and Gulf Stream). Environmental microbiology 9: 1219-1232.
Countway PD, Vigil PD, Schnetzer A, Moorthi SD, Caron DA (2010). Seasonal analysis of
protistan community structure and diversity at the USC Microbial Observatory (San Pedro
Channel, North Pacific Ocean). Limnology and Oceanography 55: 2381-2396.
Cram JA, Chow C-ET, Sachdeva R, Needham DM, Parada AE, Steele JA et al (2015a). Seasonal
and interannual variability of the marine bacterioplankton community throughout the water
column over ten years. The ISME journal 9: 563-580.
83
Cram JA, Xia LC, Needham DM, Sachdeva R, Sun F, Fuhrman JA (2015b). Cross-depth analysis of
marine bacterial networks suggests downward propagation of temporal changes. The ISME
journal 9: 2573-2586.
Crespo BG, Pommier T, Fernández-Gómez B, Pedrós-Alió C (2013). Taxonomic composition of
the particle-attached and free-living bacterial assemblages in the Northwest Mediterranean Sea
analyzed by pyrosequencing of the 16S rRNA. Microbiologyopen 2: 541-552.
De Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R et al (2015). Eukaryotic plankton
diversity in the sunlit ocean. Science 348.
Decelle J, Romac S, Stern RF, Bendif EM, Zingone A, Audic S et al (2015). Phyto REF: a reference
database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.
Molecular ecology resources 15: 1435-1445.
DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard N-U et al (2006). Community
genomics among stratified microbial assemblages in the ocean's interior. Science 311: 496-503.
Duret MT, Lampitt RS, Lam P (2019). Prokaryotic niche partitioning between suspended and
sinking marine particles. Environmental microbiology reports 11: 386-400.
Endo H, Ogata H, Suzuki K (2018). Contrasting biogeography and diversity patterns between
diatoms and haptophytes in the central Pacific Ocean. Scientific reports 8: 1-13.
Fernández-Gomez B, Richter M, Schüler M, Pinhassi J, Acinas SG, González JM et al (2013).
Ecology of marine Bacteroidetes: a comparative genomics approach. The ISME journal 7: 1026-
1037.
84
Fuhrman JA, Comeau DE, Hagström Å, Chan AM (1988). Extraction from natural planktonic
microorganisms of DNA suitable for molecular biological studies. Applied and environmental
microbiology 54: 1426-1429.
Fuhrman JA, Hewson I, Schwalbach MS, Steele JA, Brown MV, Naeem S (2006). Annually
reoccurring bacterial communities are predictable from ocean conditions. Proceedings of the
National Academy of Sciences 103: 13104-13109.
Ganesh S, Parris DJ, DeLong EF, Stewart FJ (2014). Metagenomic analysis of size-fractionated
picoplankton in a marine oxygen minimum zone. The ISME journal 8: 187-211.
Ghiglione J-F, Galand PE, Pommier T, Pedrós-Alió C, Maas EW, Bakker K et al (2012). Pole-to-
pole biogeography of surface and deep marine bacterial communities. Proceedings of the
National Academy of Sciences 109: 17633-17638.
Giner CR, Pernice MC, Balagué V, Duarte CM, Gasol JM, Logares R et al (2020). Marked changes
in diversity and relative activity of picoeukaryotes with depth in the world ocean. The ISME
journal 14: 437-449.
Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, Baptista D et al (2005). Genome
streamlining in a cosmopolitan oceanic bacterium. science 309: 1242-1245.
Giovannoni SJ (2017). SAR11 bacteria: the most abundant plankton in the oceans. Annual
review of marine science 9: 231-255.
Guillou L, Viprey M, Chambouvet A, Welsh R, Kirkham A, Massana R et al (2008). Widespread
occurrence and genetic diversity of marine parasitoids belonging to Syndiniales (Alveolata).
Environmental microbiology 10: 3349-3365.
85
Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L et al (2013). The Protist Ribosomal
Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences
with curated taxonomy. Nucleic acids research 41: D579-D604.
Hawley AK, Nobu MK, Wright JJ, Durno WE, Morgan-Lang C, Sage B et al (2017). Diverse
Marinimicrobia bacteria may mediate coupled biogeochemical cycles along eco-thermodynamic
gradients. Nature communications 8: 1-10.
Hsieh T, Ma K, Chao A (2016). iNEXT: an R package for rarefaction and extrapolation of species
diversity (H ill numbers). Methods in Ecology and Evolution 7: 1451-1456.
Hu SK, Campbell V, Connell P, Gellene AG, Liu Z, Terrado R et al (2016). Protistan diversity and
activity inferred from RNA and DNA at a coastal ocean site in the eastern North Pacific. FEMS
microbiology ecology 92.
Hu SK, Connell PE, Mesrop LY, Caron DA (2018). A hard day's night: diel shifts in microbial
eukaryotic activity in the north pacific subtropical gyre. Frontiers in Marine Science 5: 351.
Kim DY, Countway PD, Jones AC, Schnetzer A, Yamashita W, Tung C et al (2014). Monthly to
interannual variability of microbial eukaryote assemblages at four depths in the eastern North
Pacific. The ISME journal 8: 515-530.
Kolde R (2012). Pheatmap: pretty heatmaps. R package version 1: 726.
Koumandou VL, Nisbet RER, Barbrook AC, Howe CJ (2004). Dinoflagellate chloroplasts–where
have all the genes gone? Trends in Genetics 20: 261-267.
Kraemer S, Ramachandran A, Colatriano D, Lovejoy C, Walsh DA (2020). Diversity and
biogeography of SAR11 bacteria from the Arctic Ocean. The ISME journal 14: 79-90.
86
Lampitt R, Salter I, Johns D (2009). Radiolaria: Major exporters of organic carbon to the deep
ocean. Global Biogeochemical Cycles 23.
Lie AA, Kim DY, Schnetzer A, Caron DA (2013). Small-scale temporal and spatial variations in
protistan community composition at the San Pedro Ocean Time-series station off the coast of
southern California. Aquatic Microbial Ecology 70: 93-110.
Lin Y, Gifford S, Ducklow H, Schofield O, Cassar N (2019). Towards quantitative microbiome
community profiling using internal standards. Applied and environmental microbiology 85:
e02634-02618.
Lindh MV, Sjöstedt J, Andersson AF, Baltar F, Hugerth LW, Lundin D et al (2015). Disentangling
seasonal bacterioplankton population dynamics by high-frequency sampling. Environmental
microbiology 17: 2459-2476.
Martin P, Allen JT, Cooper MJ, Johns DG, Lampitt RS, Sanders R et al (2010). Sedimentation of
acantharian cysts in the Iceland Basin: Strontium as a ballast for deep ocean particle flux, and
implications for acantharian reproductive strategies. Limnology and oceanography 55: 604-614.
Mestre M, Ferrera I, Borrull E, Ortega-Retuerta E, Mbedi S, Grossart HP et al (2017). Spatial
variability of marine bacterial and archaeal communities along the particulate matter
continuum. Molecular ecology 26: 6827-6840.
Mestre M, Ruiz-González C, Logares R, Duarte CM, Gasol JM, Sala MM (2018). Sinking particles
promote vertical connectivity in the ocean microbiome. Proceedings of the National Academy
of Sciences 115: E6799-E6807.
Milici M, Vital M, Tomasch J, Badewien TH, Giebel HA, Plumeier I et al (2017). Diversity and
community composition of particle-associated and free-living bacteria in mesopelagic and
87
bathypelagic Southern Ocean water masses: Evidence of dispersal limitation in the Bransfield
Strait. Limnology and Oceanography 62: 1080-1095.
Mohit V, Archambault P, Toupoint N, Lovejoy C (2014). Phylogenetic differences in attached
and free-living bacterial communities in a temperate coastal lagoon during summer, revealed
via high-throughput 16S rRNA gene sequencing. Applied and environmental microbiology 80:
2071-2083.
Murillo AA, Ramírez-Flandes S, DeLong EF, Ulloa O (2014). Enhanced metabolic versatility of
planktonic sulfur-oxidizing γ-proteobacteria in an oxygen-deficient coastal ecosystem. Frontiers
in Marine Science 1: 18.
Needham DM, Fuhrman JA (2016). Pronounced daily succession of phytoplankton, archaea and
bacteria following a spring bloom. Nature microbiology 1: 16005.
Needham DM, Fichot EB, Wang E, Berdjeb L, Cram JA, Fichot CG et al (2018). Dynamics and
interactions of highly resolved marine plankton via automated high-frequency sampling. The
ISME journal 12: 2417.
Oksanen J, Blanchet F, Friendly M, Kindt R, Legendre P, D M et al (2020). Vegan: community
ecology package. R package version 2.5-7. https://CRAN.R-project.org/package=vegan.
Ollison GA, Hu SK, Mesrop LY, DeLong EF, Caron DA (2021). Come rain or shine: Depth not
season shapes the active protistan community at station ALOHA in the North Pacific Subtropical
Gyre. Deep Sea Research Part I: Oceanographic Research Papers 170: 103494.
Orsi W, Song YC, Hallam S, Edgcomb V (2012). Effect of oxygen minimum zone formation on
communities of marine protists. The ISME journal 6: 1586-1601.
88
Pachiadaki MG, Brown JM, Brown J, Bezuidt O, Berube PM, Biller SJ et al (2019). Charting the
complexity of the marine microbiome through single-cell genomics. Cell 179: 1623-1635. e1611.
Parada AE, Needham DM, Fuhrman JA (2016). Every base matters: assessing small subunit rRNA
primers for marine microbiomes with mock communities, time series and global field samples.
Environmental microbiology 18: 1403-1414.
Parada AE, Fuhrman JA (2017). Marine archaeal dynamics and interactions with the microbial
community over 5 years from surface to seafloor. The ISME journal 11: 2510-2525.
Parris DJ, Ganesh S, Edgcomb VP, DeLong EF, Stewart FJ (2014). Microbial eukaryote diversity in
the marine oxygen minimum zone off northern Chile. Frontiers in microbiology 5: 543.
Pernice MC, Giner CR, Logares R, Perera-Bel J, Acinas SG, Duarte CM et al (2016). Large
variability of bathypelagic microbial eukaryotic communities across the world’s oceans. The
ISME journal 10: 945-958.
Prokopowich CD, Gregory TR, Crease TJ (2003). The correlation between rDNA copy number
and genome size in eukaryotes. Genome 46: 48-50.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al (2013). The SILVA ribosomal
RNA gene database project: improved data processing and web-based tools. Nucleic acids
research 41: D590-D596.
Reji L, Tolar BB, Chavez FP, Francis CA (2020). Depth-differentiation and seasonality of
planktonic microbial assemblages in the Monterey Bay upwelling system. Frontiers in
microbiology 11: 1075.
Rieck A, Herlemann DP, Jürgens K, Grossart H-P (2015). Particle-associated differ from free-
living bacteria in surface waters of the Baltic Sea. Frontiers in microbiology 6: 1297.
89
Salazar G, Cornejo-Castillo FM, Borrull E, Díez-Vives C, Lara E, Vaqué D et al (2015). Particle-
association lifestyle is a phylogenetically conserved trait in bathypelagic prokaryotes. Molecular
ecology 24: 5692-5706.
Santoro AE, Buchwald C, McIlvin MR, Casciotti KL (2011). Isotopic signature of N2O produced by
marine ammonia-oxidizing archaea. Science 333: 1282-1285.
Schnetzer A, Moorthi SD, Countway PD, Gast RJ, Gilg IC, Caron DA (2011). Depth matters:
microbial eukaryote diversity and community structure in the eastern North Pacific revealed
through environmental gene libraries. Deep Sea Research Part I: Oceanographic Research
Papers 58: 16-26.
Skovgaard A, Massana R, Balague V, Saiz E (2005). Phylogenetic position of the copepod-
infesting parasite Syndinium turbo (Dinoflagellata, Syndinea). Protist 156: 413-423.
Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY et al (2011). Marine bacterial,
archaeal and protistan association networks reveal ecological linkages. The ISME journal 5:
1414-1425.
Teeling H, Fuchs BM, Bennke CM, Krueger K, Chafee M, Kappelmann L et al (2016). Recurring
patterns in bacterioplankton dynamics during coastal spring algae blooms. elife 5: e11888.
Thrash JC, Temperton B, Swan BK, Landry ZC, Woyke T, DeLong EF et al (2014). Single-cell
enabled comparative genomics of a deep ocean SAR11 bathytype. The ISME journal 8: 1440-
1451.
Treusch AH, Vergin KL, Finlay LA, Donatz MG, Burton RM, Carlson CA et al (2009). Seasonality
and vertical structure of microbial communities in an ocean gyre. The ISME journal 3: 1148-
1163.
90
Tsementzi D, Wu J, Deutsch S, Nath S, Rodriguez-R LM, Burns AS et al (2016). SAR11 bacteria
linked to ocean anoxia and nitrogen loss. Nature 536: 179-183.
Walsh EA, Kirkpatrick JB, Rutherford SD, Smith DC, Sogin M, D'Hondt S (2016). Bacterial
diversity and community composition from seasurface to subseafloor. The ISME journal 10:
979-989.
Wickham H (2016). ggplot2-Elegant Graphics for Data Analysis. Springer International
Publishing. Cham, Switzerland.
Wilson B, Müller O, Nordmann E-L, Seuthe L, Bratbak G, Øvreås L (2017). Changes in marine
prokaryote composition with season and depth over an Arctic polar year. Frontiers in Marine
Science 4: 95.
Yeh Y-C, Needham DM, Sieradzki ET, Fuhrman JA (2018). Taxon Disappearance from
Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every
Sequencing Run. MSystems 3: e00023-00018.
Yeh YC, McNichol J, Needham DM, Fichot EB, Berdjeb L, Fuhrman JA (2021). Comprehensive
single-PCR 16S and 18S rRNA community analysis validated with mock communities, and
estimation of sequencing bias against 18S. Environmental Microbiology.
Zhu F, Massana R, Not F, Marie D, Vaulot D (2005). Mapping of picoeucaryotes in marine
ecosystems with quantitative PCR of the 18S rRNA gene. FEMS microbiology ecology 52: 79-92.
Zinger L, Amaral-Zettler LA, Fuhrman JA, Horner-Devine MC, Huse SM, Welch DBM et al (2011).
Global patterns of bacterial beta-diversity in seafloor and seawater ecosystems. PloS one 6:
e24570.
91
Figures and Tables
Table 1. Sources of overall variation, by PERMANOVA test, of the prokaryotic 16S,
chloroplast 16S, and eukaryotic 18S (excluding Metazoa sequences), considering all
depth and dates together in a single analysis. The R
2
values represent the fraction of
overall variation ascribed to depth, season, and size fraction. Note that chloroplasts were
only evaluated at 5m and DCM depths, and chloroplast 16S and eukaryote 18S were only
evaluated in the large size fraction. The seasons were defined here with March-May as
spring, June-August as summer, September-November as autumn, and December-
February as winter.
Depth Season Size fraction
R
2
p-value R
2
p-value R2 p-value
Prokaryotic 16S 0.48 0.001 0.02 0.001 0.05 0.001
Chloroplast 16S 0.07 0.001 0.06 0.001 - -
Eukaryotic 18S 0.18 0.001 0.02 0.001 - -
92
Table 2. Sources of overall variation, by PERMANOVA test, of the prokaryotic 16S,
chloroplast 16S, and eukaryotic 18S (excluding Metazoa sequences), considering
each depth separately. The R
2
values represent the fraction of overall variation
ascribed to season and size fraction for each depth. For prokaryotic 16S, season and
size fraction contribute similarly to variation in the 5m and DCM samples, but for
samples ≥ 150m, the variation due to size fraction is about 2-3x stronger than it is for
season, and 2-3x stronger than the euphotic zone seasonal effects. For chloroplast
16S in the large size fraction, season explained the same amount of variation at 5m
and DCM. For eukaryotic 18S, seasonal variation is about 2x strong than the other
depths. Note that chloroplasts were only evaluated at 5m and DCM depths, and
chloroplast 16S and eukaryote 18S were only evaluated in the large size fraction.
Depth
group
Prokaryotic 16S Chloroplast 16S Eukaryotic 18S
Season Size fraction Season Season
R2 p-value R2 p-value R2 p-value R2 p-value
5m 0.14 0.001 0.13 0.001 0.10 0.001 0.07 0.001
DCM 0.11 0.001 0.14 0.001 0.10 0.001 0.06 0.001
150m 0.06 0.001 0.29 0.001
0.09 0.001
500m 0.03 0.002 0.36 0.001
0.14 0.001
890m 0.03 0.003 0.36 0.001
0.09 0.001
93
Figure 1. Monthly average sea surface temperature (SST), satellite chlorophyll-a concentration
(Chl-a), and satellite primary productivity (PP) at the SPOT location during 2005-2018. The black
lines within the box plots represent the median values, and the box bottom and top show the
25
th
and 75
th
percentile. The whiskers represent the lower and the upper bounds, and the dots
represent outliers.
94
Figure 2. Principal component analysis (PCA) of environmental variables measured between
2005-2018. The environmental variables analyzed are indicated as vectors: CTD temperature
(temperature), CTD dissolved oxygen (oxygen), CTD chlorophyll fluorescence (fluorescence),
nitrite and nitrate (NO 2+NO 3), and phosphate (PO 4). Samples are color-coded by sampling
depth.
95
Figure 3. Dominance of prokaryotes in both size fractions, as shown by the proportions of 16S,
chloroplast, and 18S reads (including Metazoa sequences) found in 0.2-1 μm and 1-80 μm size
fractions. The corrected values (right column) include a 2-fold adjustment of the 18S sequences
to account for length-based bias in sequencing, determined from mixed mock communities (see
text).
96
Figure 4. Alpha- and beta-diversity patterns for prokaryotic 16S, chloroplast 16S, eukaryotic 18S
(excluding Metazoa sequences). Diversity, as Hill Number extrapolated ASV richness (a) and
Shannon (H’) index (b), showing a diversity maximum at 150 m and 500 m for prokaryotes and
at 5 m and DCM for eukaryotes. (c) Patterns of pairwise Bray-Curtis similarity between all
sampling dates within each size fractions for each sampling depth, showing that temporal
stability increased with depth. (d) nonmetric multidimensional scaling (NMDS), with ordination
computed based on Bray-Cutis distance of prokaryotic 16S, chloroplast 16S, and eukaryotic 18S
communities, showing depth stratification for all three types, and that for prokaryotes there
was a greater differentiation between size fractions in the depths ≥150 m
97
Figure 5. Temporal variation of free-living prokaryotic 16S, particle-associated or large celled
prokaryotic 16S, and 18S (excluding Metazoa sequences) communities by plotting Bray-Curtis
similarity between all sampling months for each sampling depth. Free-living (0.2-1 μm) and
particle-associated or large celled (1-80 μm) prokaryotes at 5 m and DCM exhibited a clear
repeated seasonality, whereas prokaryotes at depth exhibited a relatively steady community
over time (larger Bray-Curtis similarity), especially in the small size fraction. Eukaryotes, on the
other hand, were relatively unstable over time compared to prokaryotes, as shown by much
lower average similarities.
98
Figure 6. Heatmap of monthly average prokaryotic 16S communities at the order level (only
dominant orders were selected if their relative abundance is >2% of any sample). Columns were
clustered based on Bray-Curtis distance. Rows were clustered based on Euclidean distance. The
clustering on top showed that prokaryotic communities were first clustered by sampling depth
(surface vs. depth) and then by size fraction. The numbers in parentheses show the overall
average relative abundances in the 0.2-1 μm and 1-80 μm size fraction respectively.
99
Figure 7. Heatmap of monthly average chloroplast 16S and 18S (excluding Metazoa sequences)
communities in 1-80 μm size fraction. Only dominant classes were selected if their relative
abundance is >2% of any sample. Columns were clustered based on Bray-Curtis distance. Rows
were clustered based on Euclidean distance. (a) Chloroplast 16S communities were mostly
dominated by Prymnesiophyceae in summer/autumn and Bacillariophyta in spring/winter. (b)
18S communities were dominated by Syndiniales throughout the water column. The numbers
in parentheses show the overall average relative abundances.
100
Figure 8. (a) Heatmap of rarefied ASV richness observed within each prokaryotic order at each
sampling depth (rows were clustered based on Euclidean distance). There were endemic
taxonomic groups present at depth, such as Nitrospinales, UBA 10353 marine groups, and
SAR324. Note Flavobacteriales and SAR11 both exhibited high richness at 150 m and 500 m,
thus (b) and (c) were further analyses of richness patterns among subclades. (b) SAR11 clade I
was ubiquitously distributed in the water column, whereas SAR11 clade II showed high richness
at 150 m and 500 m. (c) Flavobacteriales subclades showed that all these groups were
particularly enriched in the large size fraction. Among them, NS9 marine groups were high in
richness at 150 m and 500 m.
101
Supplementary information
Temporal and depth dynamics of microbial communities across three domains over 14 years
at the San-Pedro Ocean Time-series
Yi-Chun Yeh
1
and Jed Fuhrman
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Correspondence:
Jed Fuhrman
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Email: fuhrman@usc.edu
102
Figure S1. The temporal variation of physiochemical variables measured at different depths.
103
Figure S2. Seasonality of chloroplast 16S communities by plotting the Bray-Curtis similarity
between all sampling months within each sampling depth (i.e., 5m and DCM).
104
Figure S3. Heatmap of monthly average 18S communities (including Metazoa sequences) in the
1-80 μm size fraction at the class level (only dominant classes were selected if their relative
abundance is >2% of any samples). The columns were clustered based on Bray-Curtis distance.
Rows were clustered based on Euclidean distance.
105
Figure S4. Heatmap of monthly average chloroplast 16S communities and phototrophic 18S
communities at 5m and DCM in the 1-80 μm size fraction. For phototrophic eukaryotic 18S, we
selected only ASVs corresponding to photosynthetic groups (divisions Chlorophyta,
Cryptophyta, Haptophyta, Katablepharidophyta, Ochrophyta, Rhodophyta, Streptophyta and
class Filosa-Chlorarachnea). Chloroplast 16S and phototrophic 18S communities both showed
that phytoplankton were mostly dominated by Prymnesiophyceae in summer/autumn and
Bacillariophyta in spring/winter
106
Figure S5. Heatmap of monthly average chloroplast 16S communities and phototrophic 18S
communities in the 1-80 μm size fraction both showed that the phototrophic eukaryotes at
150m were mainly sinking diatoms (Bacillariophyta). For phototrophic eukaryotic 18S, we
selected only ASVs corresponding to photosynthetic groups (divisions Chlorophyta,
Cryptophyta, Haptophyta, Katablepharidophyta, Ochrophyta, Rhodophyta, Streptophyta and
class Filosa-Chlorarachnea).
107
Figure S6. The heatmap of depth average prokaryotic communities. Rows were scaled to
examine how taxonomic groups were enriched at certain depth and size fraction, as shown by
warmer colors.
108
Figure S7. The comparison of relative abundance of chloroplast 16S vs. phototrophic eukaryotic
18S in the 1-80 μm size fraction. For phototrophic eukaryotic 18S, we selected only ASVs
corresponding to photosynthetic groups (divisions Chlorophyta, Cryptophyta, Haptophyta,
Katablepharidophyta, Ochrophyta, Rhodophyta, Streptophyta and class Filosa-Chlorarachnea).
There were 22 classes detected in both chloroplast 16S and eukaryotic 18S communities. Blue
lines are linear regression line, and black lines are 1:1.
109
Figure S7. (continued)
110
Figure S8. 8 classes that were only detected in chloroplast 18S communities
111
Figure S9. 13 classes that were only detected in eukaryotic 18S communities
112
Chapter 3: Effects of phytoplankton, viral communities and El Niño warming on
free-living and particle-associated marine prokaryotic community structure
Yi-Chun Yeh
1
, J. Cesar Ignacio-Espinoza
1
, Jed A. Fuhrman
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Correspondence:
Jed Fuhrman
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Email: fuhrman@usc.edu
113
Abstract
The free-living (FL) and particle-associated (PA) marine prokaryotes have physiological,
genomic, and phylogenetic differences. Factors influencing their temporal dynamics remain
poorly constrained. In this study, we aimed to investigate the effects of physicochemical factors
and biotic interactions (with eukaryotes and viruses) on FL and PA prokaryotes. Community
composition from the San-Pedro Ocean Time-series (SPOT) was quantified using universal SSU
rRNA sequencing (14 years monthly) and viral shotgun metagenomics (5 years monthly) and
analyzed using canonical analyses and variation partitioning. Our results showed that in
addition to physicochemical factors, chloroplast 16S (phytoplankton community composition),
eukaryotic 18S (protistan community composition), and dsDNA viral communities all
significantly explained (in a statistical sense) the seasonal succession of prokaryotes. In
particular, the dsDNA viral community is the strongest factor in statistically predicting FL
prokaryotes, explaining 28.1% of variability, whereas the phytoplankton community had the
strongest relationship to PA prokaryotes, explaining 30.9% of variability. Although correlation
does not imply causation, this analysis allows us to generate testable hypotheses about how FL
and PA prokaryotes were driven by eukaryotic phytoplankton, and drive and/or were driven by
viral communities. Our findings suggest that biotic interactions critically determine the
temporal dynamics of prokaryotes, and the relative importance of these interactions varies
depending on their lifestyles (free-living vs. particle-attached). Despite largely predictable
seasonally repeating prokaryotic communities over the prior decade, the warming due to the
strong 2015/2016 El Niño was accompanied by a year-round “summer-like” community and
shift in composition of cyanobacteria and other bacteria.
114
Introduction
Planktonic prokaryotes dominate the oceanic biogeochemical cycling (Azam et al., 1983,
Fuhrman & Caron, 2016, Gasol & Kirchman, 2018), thus understanding how prokaryotic
communities are structured and identifying the underlying processes are crucial to predict their
responses to environmental changes. Generally, community dynamics of prokaryotes are driven
by environmental variables, such as temperature and nutrients availability (Fuhrman et al.,
2006, Fuhrman et al., 2008, Gilbert et al., 2009, Gilbert et al., 2012, Hatosy et al., 2013, Ward et
al., 2017), and biotic interactions (e.g., predation, parasitism, mutualism, or competition) with
other organisms, including phytoplankton, microzooplankton, and viruses (Gonzalez et al.,
1990, Guixa-Boixereu et al., 1999, Šimek et al., 1999, Hewson et al., 2003, Schwalbach et al.,
2004, Winter et al., 2005, Chow et al., 2014). However, studies taking all these components into
account remain limited.
In aquatic ecosystems, prokaryotes can live either freely (as unattached individuals) or
attached to particles, and these two communities have physiological, genomic, and
phylogenetic differences (D'ambrosio et al., 2014, Rieck et al., 2015, Yung et al., 2016, Milici et
al., 2017, Suzuki et al., 2017). Free-living prokaryotes, dominated by groups such as SAR11,
often have streamlined genomes that enable efficient growth under oligotrophic conditions. In
contrast, particle-associated prokaryotes (e.g., Bacteroidetes) generally show more metabolic
diversity to utilize different kinds of particulate organic matter (Buchan et al., 2014). Due to
these differences, free-living and particle-associated prokaryotes should be considered as
independent components, controlled by different drivers. However, only a few studies have
115
differentiated them, and the comparisons have been mostly restricted to the studying how
different size fractions relate to environmental conditions (Crespo et al., 2013, D'ambrosio et
al., 2014, Rieck et al., 2015, Yung et al., 2016, Mestre et al., 2017, Milici et al., 2017, Suzuki et
al., 2017, Duret et al., 2019). The extent that biotic interactions with phytoplankton, protists,
and viruses, influence free-living and particle-associated prokaryotes has been examined far
less.
The lack of comprehensive investigations may be due, in part, to the difficulties of
accessing the diversity across three domains and viruses. The emergence of high-throughput
sequencing techniques allows us to identify the community composition using appropriate
marker genes (e.g., SSU rRNA genes). And recent studies have shown that 16S+18S universal
primer sets can quantitatively survey the diversity from Bacteria, Archaea, and Eukaryota in a
single PCR reaction (Parada et al., 2016, Needham et al., 2018, Yeh et al., 2021) with high
coverage of all three domains (McNichol et al., 2021). Unlike prokaryotes and eukaryotes,
viruses do not contain universal genes, and thus studies of the dynamics of the marine viral
community has tended to focus on subsets of the community, like T4-like phages using the
major capsid protein-encoding gene, such as g23 (Filée et al., 2005, Comeau & Krisch, 2008,
Chow & Fuhrman, 2012, Needham et al., 2013, Pagarete et al., 2013, Chow et al., 2014,
Needham et al., 2017, Ahlgren et al., 2019). But increasingly, shotgun metagenomic sequencing
and virus identification tools (e.g., VirSorter and VirFinder) are providing opportunities to
directly reveal the distributions and dynamics of viral sequences in marine and other
environments (Brum et al., 2015, Roux et al., 2015, Ren et al., 2017, Ignacio-Espinoza et al.,
2020).
116
This study aimed to examine the effects of abiotic (environmental conditions) and biotic
factors (protists, phytoplankton, and viruses) on free-living and particle-associated prokaryotic
community structure. We explored the temporal dynamics of free-living (0.2-1 μm) and
particle-associated or large-celled (1-80 μm) prokaryotes, phytoplankton, and protists from 5-m
depth at the San Pedro Ocean Time-series (SPOT) location during 2005-2018 using universal
SSU rRNA sequencing. Previously published free dsDNA viral contigs collected from the same
water samples in 2009-2014 (Ignacio-Espinoza et al., 2020) were used to investigate the viral
effects on prokaryotic dynamics. In addition, satellite sea surface temperature, chlorophyll-a
concentration, and primary productivity data show that this location has experienced a reduced
amplitude of spring phytoplankton blooms and extremely warm winters in 2014-2015, likely
related to the strong 2015/2016 El Niño (Fig. 1). This ocean warming event due to El Niño can
be used to understand prokaryotic responses to environmental changes in a natural setting. We
thus examined whether prokaryotic community shifted in response to the warming event in
2014-2015 and how warming affects seasonal and interannual variations of microbial dynamics.
Results
The temporal variation of community composition
We explored microbial community composition by SSU rRNA sequencing with 16S+18S
universal primers and denoising these sequences into amplicon sequence variants (ASVs) using
DADA2 (Callahan et al., 2016). The taxonomic composition showed that abundant prokaryotic
taxa in both size fractions were SAR11, Flavobacteriales, Rhodobacterales, and Synechococcales
(Fig. 2a and 2b). In the free-living size fraction (0.2-1 μm), SAR11 dominated most of the time,
117
accounting for on average 31.6% of total community composition (Fig. 2a). In the particle-
associated or large-celled size fraction (1-80 μm), Flavobacteriales dominated during spring
blooms, whereas SAR11 dominated the rest of time (Fig. 2b). In addition, Synechococcales
(cyanobacteria Synechococcus and Prochlorococcus) became particularly abundant during 2014-
2015 in both size fractions.
Phototrophic eukaryotes were identified using the chloroplast 16S rRNA gene in the
large size fraction (1-80 μm). The average relative abundance (Fig. 2c) showed that
phytoplankton identified via chloroplast 16S reads were dominated by Prymnesiophyceae
(40.8%), Bacillariophyta (diatoms) (14.8%), Mamiellophyceae (14%), Dictyochophyceae (6.2%),
and Pelagophyceae (6.1%). Prymnesiophyceae were consistently abundant throughout the
time-series, whereas Bacillariophyta only peaked during spring blooms. Eukaryotic 18S reads in
the large size fraction (Fig. 2d), on the other hand, were dominated by Dinophyceae (28.5%),
and Spirotrichea (ciliates) (18.5%). They showed distinct seasonal patterns; the proportion of
Dinophyceae reads increased in summer, whereas Spirotrichea peaked during spring blooms.
The environmental effects on free-living and particle-associated or larger-celled prokaryotic
communities
The CCA analysis showed that temperature and chlorophyll-a concentration significantly
explained 4.4 and 3.8% of the variance in free-living and particle-associated or large-celled
prokaryotic communities (Fig. 3). The first CCA axis (CCA1) represented primarily a seasonal
change of community structure, which was positively correlated with temperature and
negatively correlated with chlorophyll-a concentration, indicating that communities with
118
positive CCA1 scores were summer-like communities, and communities with negative CCA1
scores were bloom-associated communities (Fig. 3). The temporal dynamics of CCA1 scores
showed the seasonality of both free-living and particle-associated or larger-celled communities
throughout the time series, shifting from positive to negative CCA1 scores, except for 2014 and
2015, which were mostly dominated by summer-like communities the entire year.
The roles of biotic factors in explaining free-living and particle-associated or larger-celled
prokaryotic communities
To test if changes in free-living and particle-associated or large-celled prokaryotic
communities can be explained by biotic factors, including eukaryotic 18S, chloroplast 16S, and
free dsDNA viral communities, a Mantel test was used to ask the question of how closely the
temporal variation in each subset of the microbial community correlated to variation in another
subset (Fig. 4). The results showed that all pairs of beta-diversity matrices were highly
correlated (p<0.001). In general, chloroplast 16S and free dsDNA viruses were correlated with
free-living and particle-associated or large-celled prokaryotes the most closely. However, these
correlations may be caused in part by seasonal reoccurring patterns (i.e., they were all
influenced by the same seasonal changes in environmental conditions). Thus, partial CCA
(pCCA) analysis was used to remove the environmental effect (i.e., temperature and chlorophyl-
a) (Table 1). The pCCA analysis indicated that all the components of the microbial community
(i.e., eukaryotic 18S, chloroplast 16S, and free-dsDNA viral contigs) significantly explained
temporal variation of free-living and particle-associated prokaryotic community structure after
removing the environmental effect (p<0.05). In addition, chloroplast 16S and free dsDNA
119
viruses were strong predictors of free-living prokaryotic communities, explaining 27.4% and
28.1% of the variation. For particle-associated or large-celled prokaryotes, chloroplast 16S
communities significantly predicated the community composition, explaining 30.9% of the
variation, against 9.3% attributed by free dsDNA viruses. The eukaryotic 18S communities
(which has phytoplankton as well as heterotrophic protists), on the other hand, only explained
2.7% and 5.7% of the variation. After testing the significance of each component, we further
determined the relative importance of each component in explaining temporal variation of
free-living and particle-associated prokaryotic community structure via a variation partitioning
method (Fig. 5). Note that a subset of the time-series between 2009 and 2014 was used to
include the free dsDNA viruses into the analysis (when such data were available). The results
showed that for the whole time-series (2005-2018), free-living and particle-associated
prokaryotic communities were significantly explained by unique effects of chloroplast 16S and
environmental variables. Unique chloroplast 16S explained 22.8% and 24.5% of the variation,
whereas the unique environmental effect was only marginally significant and explained <1.5%
of the variation. The unique effects of eukaryotic 18S were insignificant for both size fractions.
For the subset of time-series between 2009-2014, the unique effects of chloroplast 16S and
free dsDNA viruses were observed for particle-associated or large-celled prokaryotes, but not
for the free-living ones.
Shifts in ASV composition within major taxonomic groups
Although the warming event during 2014-2015 did not affect prokaryotic community
composition at the order level (Fig. 2a and 2b), we found shifts in ASV composition within the
120
major taxonomic group in 2014-2015. For SAR11, a total 558 ASVs were observed throughout
the time-series. Of these, 4 major ASVs were persistently abundant (Fig. S1), accounting for
>50% of total SAR11 sequences in both size fractions (Fig 6a). SAR11_ASV1 (clade Ia) showed
maximum abundances during spring, whereas SAR11_ASV2 (clade Ia), SAR11_ASV3 (clade II),
and SAR11_ASV4 (clade Ia) peaked in summer. The seasonal pattern was consistent in both size
fractions, except for SAR11_ASV3 (clade II), which was enriched in the large size fraction,
especially in 2014-2015. There also was a slightly decreasing trend of SAR11_ASV1 (clade Ia)
between 2005-2015. For Synechococcales (Cyanobacteria), 6 out of total 111 ASVs were either
persistently occurred or had become >5% at least once (Fig. S1), and they contributed >42% of
total Synechococcales (Cyanobacteria) sequences (Fig 6b). Cyano_ASV1 (Syn IV), Cyano_ASV2
(Pro HLI), and Cyano_ASV3 (Syn I) were the dominant ASVs most of the time. However, there
was a large increase in the relative abundance of Cyano_ASV2 (Pro HLI) and Cyano_ASV5 (Syn
II/III) and a slight increase of Cyano_ASV6 (Pro HLII) in 2014-2015. In addition, Cyano_ASV4 (Syn
VII) occasionally became abundant in the larger size fraction.
For Rhodobacterales, 7 out of total 1234 ASVs were persistently abundant (Fig. S1),
accounting for >78% of total Rhodobacterales sequences in both size fractions (Fig. S2).
Rhodo_ASV1 (Rhodobacteraceae) was the dominant ASV in both size fraction and peaked in
spring blooms. Rhodo_ASV2 (Rhodobacteraceae) increased in the relative abundance in 2014-
2015. Rhodo_ASV4 (Rhodobacteraceae) was enriched in the small size fraction, whereas
Rhodo_ASV3 (Rhodobacteraceae) and Rhodo_ASV7 (Rhodobacteraceae) was enriched in the
large size fraction. For Flavobacteriales, a total 4298 ASVs were detected throughout the time-
series. Among these, 24 ASVs were either persistently occurred or had become >5% at least
121
once (Fig. S1), accounting for on average 50% of Flavobacteriales sequences. Flavo_ASV2
(Flavobacteriaceae) and Flavo_ASV3 (Flavobacteriaceae) were enriched in the small size
fraction, whereas Flavo_ASV6 (Crocinitomicaceae), Flavo_ASV13 (Flavobacteriaceae),
Flavo_ASV15 (NS9 marine group), Flavo_ASV17 (Flavobacteriaceae) were enriched in the large
size fraction.
Discussion
This study used SSU rRNA amplicon and shotgun metagenomic sequencing to survey the
large majority of microbial food web members (i.e., free dsDNA viruses, free-living and particle-
associated prokaryotes, phytoplankton, and protists). With a combination of canonical analyses
and variation partitioning, our results determined that the seasonal dynamics of free-living and
particle-associated prokaryotes were significantly explained, statistically, by biotic interactions
with phytoplankton (i.e., chloroplast 16S) and free dsDNA viral communities (i.e., viral contig
abundance). Although correlation does not imply causation, the biological interactions we know
of from prior marine research leads us to believe that prokaryotic community composition may
be driven to a large extent by substrates provided by phytoplankton, suggesting a significant
element of bottom-up control (Grossart et al., 2005, Teeling et al., 2012, Teeling et al., 2016,
Chafee et al., 2018, Unfried et al., 2018, Francis et al., 2021). However, the cause and effect
relationship between viruses and prokaryotes is potentially more fluid, and we consider two
non-mutually-exclusive explanations, that may both apply, considering the large number of
virus-host pairs and varied, dynamic conditions over the several years studied. One is that many
viruses may be following their hosts closely, because these obligate parasites require their
122
hosts in order to reproduce, and under some circumstances the viruses and their hosts can
coexist indefinitely (Thingstad & Lignell, 1997, Thingstad et al., 2014, Thingstad et al., 2015). By
this thinking, changes in the virus community would be largely responding to changes in
prokaryotes caused by other factors. The second potential explanation is that for some of the
prokaryotes, their composition could in part be driven by viruses through viral lysis and non-
steady-state “kill the winner” dynamics, where high density of a blooming host prokaryote
leads to rapid spread of a viral infection that knocks down the host population (i.e., top-down
control). Our current analyses cannot directly point to the dominance of either mechanism.
However, the previously reported observation of largely steady populations of most viruses as
detected at the 98% nucleotide level, plus the “Red Queen” like dynamics of rapidly changing
viral strains with many different, though primarily synonymous, nucleotide variations (Ignacio-
Espinoza et al., 2020), is consistent with the hypothesis that at near the species level the viruses
are primarily following their hosts, but viruses are often driving strain level changes in host
composition. If this hypothesis is correct, the effect we are seeing at the ASV level is most likely
the viruses following their hosts.
Viral infection is generally believed to be strongly taxon-specific, and it plays important
roles in the microbial loop. However, previously studies have only focused on the free-living
size fraction. How viruses affect particle-associated marine prokaryotes has rarely been
discussed previously. Given the differences in lifestyle, we expect that infection patterns are
different between free-living and particle-associated prokaryotes. For free-living prokaryotes,
viruses lysing one cell may have a strongly limited ability to infect another cell, as free-living
prokaryotes are widely dispersed in the environment, and susceptible hosts to each viral type
123
(only a small fraction of the community) can be hard to encounter. Particle-associated
prokaryotes, on other hand, are often largely clonal (spreading rapidly from initially infecting
pioneer cells) and ‘geographically’ restricted to the particle, which give lysing viruses ample
opportunities to infect adjacent cells. Our results show that the free dsDNA viral community
significantly explained 28.1% and 9.3% of variation for free-living and particle-associated
prokaryotes respectively, suggesting that free dsDNA viruses may follow the free-living
prokaryotes more closely. We can currently only speculate why the correlations between free
dsDNA viruses and particle-associated prokaryotes were relatively weaker, such as the
possibility that free viruses are dominated by those infecting the free-living planktonic bacteria,
and less well represent those infecting prokaryotes on particles.
It is interesting that we found insignificant effects of 18S communities on prokaryotic
community structure, which suggests that grazing by particular protists may not selectively and
predictably affect prokaryotic communities in both size fractions (Fig. 5 and Table 1). The 18S
communities at SPOT were dominated by Dinophyceae (i.e., dinoflagellates) and Spirotrichea
(i.e., ciliates) (Fig. 2d), though we must keep in mind that it is likely that some of that apparent
dominance is due to high 18S rRNA copy number (Prokopowich et al., 2003, Zhu et al., 2005,
Needham & Fuhrman, 2016), and not necessarily dominant biomass. These groups, and several
others reported in our results and known to be phagotrophs or mixotrophs (e.g.,
Mamiellophyceae and Prymnesiophyceae), have members that are known to effectively graze
on various microorganisms and remove their biomass out of the microbial loop. Previous
studies based on culture experiments have shown that various flagellates and ciliates generally
prefer larger prey and/or prey with high activity, and only some effectively graze
124
bacterioplankton (Andersson et al., 1986, Gonzalez et al., 1990, Del Giorgio et al., 1996, Sintes
& Del Giorgio, 2014). In addition, the ability of prokaryotes to develop grazing resistance
mechanisms (e.g., motility and toxin release, or survival in vacuoles) can protect some
prokaryotes from some grazing losses (Pernthaler, 2005), and this is hard to predict from ASV
data. As different prokaryotic taxonomic groups have distinct properties, protists could
selectively graze on particular prokaryotic populations. However, our results suggest that
protist grazing is not sufficiently taxa-selective to show up as controlling prokaryote
composition. This is in accordance with previously studies (Suzuki, 1999, Yokokawa & Nagata,
2005, Baltar et al., 2016).
Because of our long-term sampling and resolution of SSU rRNA sequences to exact
amplicon sequence variants (ASVs), we were able to find that ASV composition shifted within
major taxonomic groups in response to the warming event between 2014 and 2015 (Fig. 6 and
S2). This confirms the expectation that warming can result in considerable changes in key
microbial populations. For SAR11, 4 major ASVs displayed seasonal variation (Fig. 6).
SAR11_ASV1 (SAR11 clade Ia) dominated during spring, whereas SAR11_ASV2 (SAR11 clade Ia),
SAR11_ASV3 (SAR11 clade II), and SAR11_ASV4 (SAR11 clade Ia) peaked in summer. Previous
studies have subdivided SAR11 clade Ia into cold-water (Ia.1) and warm-water (Ia.3) ecotypes,
which have distinct latitudinal distributions (Brown et al., 2012) and seasonal patterns (Eren et
al., 2013). Although the SAR11 ASVs in this study were not identified to the finest level, their
seasonal patterns imply that SAR11_ASV1 may be cold-water ecotypes (Ia.1), and SAR11_ASV2
and SAR11_ASV_4 might be warm-water ecotypes (Ia.3). Given this assumption, warming
resulted in a decrease in relative abundance of cold-water ecotypes (SAR11_ASV1). In addition,
125
we found that SAR11_ASV3 (SAR11 clade II) was enriched in the large size fraction, especially in
2014-2015, which is surprising. SAR11 members are widely believed to be free-living bacteria
growing on mostly small molecules in dissolved material, and such significant enrichment on
particles have not been discussed before. Our finding suggests that SAR11 clade II might have a
previously unknown niche on particles, which calls for further investigation.
Among the major primary producers, Synechococcales cyanobacteria were found to
exhibit compositional changes at the ASV level during warming. There were 6 major ASVs
dominating the cyanobacterial community, and they were identified as different ecotypes (Fig.
6). Synechococcales can be further classified into genera Prochlorococcus and Synechococcus,
which are different in size, photosynthetic pigments, and general ecological preferences.
Prochlorococcus tends to occur in more oligotrophic conditions, is smaller (0.6-0.8 um) and
possess divinyl chlorophyll a and b, whereas Synechococcus tends to live in more mesotrophic
conditions, is larger (0.6-2 um) and has phycobilisomes (Coleman & Chisholm, 2007, Scanlan et
al., 2009). Within each genus, several ecotypes are reported to have distinct niches in light,
temperature, and iron requirements (Moore et al., 1998, Johnson et al., 2006, Zwirglmaier et
al., 2008, Martiny et al., 2009, Rusch et al., 2010). Our results showed that the warming event
in 2014-2015 resulted in a shift from cold-water ecotypes (i.e., Cyano_ASV1 (Syn IV) and
Cyano_ASV3 (Syn I)) to warm-water and more oligotrophic ecotypes (i.e., Cyano_ASV2 (Pro
HLI), Cyano_ASV5 (Syn II/III), and Cyano_ASV6 (Pro HLII)). Similar patterns were also observed
in the MICRO time-series at Newport Pier, California, which is 44 km away from the SPOT
location (Larkin et al., 2020). This suggests that shifts in the cyanobacterial community due to
126
warming occurred at a regional scale from the coastal to the central of Southern California
Bight.
In contrast to SAR11 and cyanobacteria, Flavobacteriales and Rhodobacterales only
showed modest changes in their ASV composition in 2014-2015. These two taxonomic groups
have been described as major components of bloom-associated communities. Flavobacteriales
are specialized to degrade complex organism matter, whereas Rhodobacterales consume
primarily low molecular weight phytoplankton metabolites. Previous studies have reported
experimental warming increased the relative abundance of bloom-associated communities,
also increasing their chemotaxis ability and metabolism (Arandia-Gorostidi et al., 2017,
Arandia-Gorostidi et al., 2020), suggesting that the ability to detect, pursue, and utilize
phytoplankton-derived substrates may help protect them from environmental variability, such
as warming.
Overall, we found that the warming event in 2014-2015 due to the strong 2015/2016 El
Niño resulted in an increase in relative abundance of warm-water ecotypes. There are two
potential causes of this shift in community composition. First, pre-existing warm-water
ecotypes can gain a competitive advantage under warming conditions and thus outcompete
cold-water ecotypes. Second, the warm-water ecotypes may undergo geographic range
expansion from more oligotrophic offshore warm water into the Southern California Bight
(Fagan et al., 2019). Our results are consistent with both scenarios. For example, the major
ASVs of SAR11 persistently occurred at the SPOT location throughout the whole time-series
(Fig. 6a), indicating that they belong to the local community. Thus, the shift in SAR11 ecotypes
to year-round warmer-water ecotypes in an El Niño year is a potentially expected result from
127
competition. However, cyanobacterial ASV_4 (Pro HLII) was not observed at SPOT in the several
years before 2014, and it appeared and suddenly peaked in 2014-2015 (Fig. 6b); this suggests
ASV_4 (Pro HLII) was probably introduced and took hold due to range expansion from more
oligotrophic offshore Pacific waters.
In summary, our study revealed the bottom-up control of phytoplankton communities
on free-living and particle-associated marine prokaryotes and how closely free dsDNA viruses
were infecting and following their hosts, especially the free-living ones. Protistan communities,
however, did not affect prokaryotic communities, at least in a statistical sense. Overall, our
results suggested the importance of bottom-up control on prokaryotic community structure,
yet whether the community changes influence their functioning remains unknown and needs
further investigations. In addition, we found that warming due to the 2015/2016 El Niño
resulted in significant changes in prokaryotic populations, especially for cyanobacteria, shifting
from cold-water ecotypes to warm-water ecotypes. Although this shift cannot indicate a regime
shift due to warming, it shows the necessity to continuously monitor climate changes using
long-term time series efforts.
Materials and Methods
Sample collection and DNA extraction
Samples were collected monthly from 5 m at San Pedro Ocean Time-series (SPOT)
location during 2005-2018. Approximately 12 L of seawater was sequentially filtered through an
80-μm mesh, a 1-μm A/E filter (Pall, Port Washington, NY), and a 0.2-μm Durapore filter (ED
Millipore, Billerica, MA). Filters were stored at -80°C until DNA extraction. Durapore filters (0.2-
128
1 μm) were used for free-living prokaryotic community analysis, and A/E filters (80-1 μm) were
used to analyze phytoplankton, protists, and particle-associated or large-celled prokaryotic
communities. DNA was extracted from the Durapore filters using a hot SDS,
phenol/chloroform/isoamyl alcohol, ethanol precipitation extraction protocol as described by
Fuhrman et al. (1988). DNA was extracted from the A/E filters using a NaCl/CTAB bead-beating
extraction protocol as described by Lie et al. (2013) with slight modification by adding an
ethanol precipitation step after lysis to reduce the volume of crude extract, which helps
minimize DNA loss during the subsequent purification.
Environmental data
Monthly average sea surface temperature, surface chlorophyll-a concentrations, and
primary productivity were downloaded as satellite data from the Coastwatch browser website
(https://coastwatch.pfeg.noaa.gov/erddap/index.html). Multivariate ENSO index (MEI) was
download from the National Oceanographic and Atmospheric Administration (NOAA).
16S/18S PCR and sequencing
To pool multiple samples in a single Illumina paired-end sequencing platform, a dual-
index sequencing strategy was used with the forward primer A-I-NNNN-barcode-515Y (A-I-
NNNN-barcode-GTGYCAGCMGCCGCGGTAA) and reverse primer A-index-I-926R (A-index-I-
CCGYCAATTYMTTTRAGTTT), where A is the Illumina sequencing adapter, I is the Illumina
primer, and barcode and index are sample specific tag (5-bp barcode and 6-bp index).
515Y/925R primer pairs target the V4-V5 hyper-variable region of the 16S/18S rRNA genes. All
129
DNA samples were amplified using the same conditions described at
doi.org/10.17504/protocols.io.vb7e2rn. PCR products were cleaned using 0.8X Ampure XP
magnetic beads (Beckman Coulter). Purified PCR products from samples were pooled in equal
amount and then sequenced on Illumina HiSeq 2500 in PE250 mode and MiSeq PE300. For each
sequencing run, multiple blanks (i.e., PCR water) and two versions of mock communities (even
and staggered) were included as internal controls, meaning they were amplified, cleaned and
sequenced as environmental samples with the same conditions. This way, results from different
sequencing runs were comparable without significant bias and contamination (Parada et al.,
2016, Yeh et al., 2018, Yeh et al., 2021).
Sequence analysis
Sequences were demultiplexed by forward barcodes and reverse indices allowing no
mismatches using QIIME 1.9.1 split_libraries_fastq.py (Caporaso et al., 2010). The fully
demultiplexed forward and reverse sequences were then split into per-sample fastq files using
QIIME 1.9.1 split_sequence_file_on_sample_ids.py and submitted to the EMBL database under
accession number PRJEB48162 and PRJEB35673.
Demultiplexed amplicon sequences were trimmed with cutadapt, discarding any sequence
pairs not containing the forward or reverse primer (error rate set to 0.2). Amplicon sequences
were then split into 16S and 18S pools using bbsplit.sh from the bbtools package
(http://sourceforge.net/projects/bbmap/) against curated 16S/18S databases derived from
SILVA 132 (Quast et al., 2013) and PR2 (Guillou et al., 2013). The 16S and 18S amplicons were
then analyzed in parallel to amplicon sequence variants (ASVs) using DADA2 (Callahan et al.,
130
2016) via qiime2 q2-dada2 plugin (Amir et al., 2017). 16S ASVs were classified with qiime2
classify-sklearn plugin against the SILVA 132 database (Quast et al., 2013). 16S ASVs identified
as Mitochondria were removed. Then, the ASV table was subdivided into prokaryotic 16S ASV
table and Chloroplast 16S ASV table (including all 16S ASV identified as Chloroplast). Chloroplast
16S ASVs were further classified against PhytoRef database (Decelle et al., 2015). 18S ASVs
were assigned against the PR2 databases (Guillou et al., 2013). Metazoa and Syndiniales ASVs
were removed from the final 18S ASV table because they represent organismal fragments, eggs,
juveniles, and parasites that do not directly interact with prokaryotes. Scripts necessary to
reproduce the sequencing analyses are available at github.com/jcmcnch/eASV-pipeline-for-
515Y-926R.
Free dsDNA viral community
The same seawater samples collected monthly from the 5 m depth at SPOT during 2009-
2014 were used for viral community analysis using metagenomic sequences from the 0.02-0.22
μm size fraction (Ignacio-Espinoza et al., 2020). 19,907 putative viral contigs were generated
from the metagenome assembly, and read recruitment to those contigs was used to assess
their relative abundance, as previously reported (Ignacio-Espinoza et al., 2020)
Statistical analyses
The prokaryotic 16S ASV table was subdivided into two groups, free-living (0.2-1 μm) and
particle-associated (1-80 μm) prokaryotes according to size fraction. The chloroplast 16S ASV
table from the 1-80 μm size fraction was used to represent the phytoplankton (noting it misses
131
most dinoflagellates; Needham & Fuhrman (2016)). The 18S ASV table from the 1-80 μm size
fraction used to represent all protists. The dominant ASVs (i.e., 0.5% of relative abundance
occurred at least once throughout the time-series) were selected for further analysis to reduce
the noise associated with stochastic variations of the rarer species.
As the focus of this study in on explaining the temporal variation in free-living and particle-
associated or large-celled prokaryotic communities, we used canonical analyses with
prokaryotic communities always in the dependent role. The environmental data, chloroplast
16S, eukaryotic 18S, and dsDNA virus communities were used as explanatory data sets. The
compositional data were Hellinger transformed (Legendre & Gallagher, 2001), and
environmental variables were standardized prior to canonical analyses. Detrended
correspondence analysis (DCA) was first used to determine the appropriate response model
(canonical correspondence analysis (CCA) or redundancy analysis (RDA)) for the prokaryotic
communities. The longest gradient lengths from DCA determine a suitable canonical analysis for
a given community composition (CCA for >3; RDA for <3). The results showed that CCA worked
better for prokaryotic communities, thus CCA was directly used to relate standardized
environmental variables to prokaryotic communities. For the compositional data (chloroplast
16S, 18S, and dsDNA virus), a two-step (indirect) approach was used because CCA breaks down
when the number of species (i.e., ASVs/virus contigs) is larger than the number of sampling
months in our case. Thus, we applied correspondence analysis (CA) to the compositional data
(chloroplast 16S, 18S, and dsDNA virus) and took the first few CA axes which retain at least 70%
of variation as explanatory variables. Then, only significant CA axes selected by a stepwise
forward selection procedure were used for the final CCA model. In addition, partial CCA (pCCA)
132
is used to remove the effects from environmental variables. To further determine how much of
variation in prokaryotic composition was solely explained by each component, variation
partitioning analysis was used here.
References
Ahlgren NA, Perelman JN, Yeh YC & Fuhrman JA (2019) Multi-year dynamics of fine-scale
marine cyanobacterial populations are more strongly explained by phage interactions than
abiotic, bottom-up factors. Environmental microbiology 21: 2948-2963.
Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Xu ZZ, Kightley EP, Thompson LR,
Hyde ER & Gonzalez A (2017) Deblur rapidly resolves single-nucleotide community sequence
patterns. MSystems 2.
Andersson A, Larsson U & Hagström Å (1986) Size-selective grazing by a microflagellate on
pelagic bacteria. Marine Ecology Progress Series 51-57.
Arandia-Gorostidi N, Huete-Stauffer TM, Alonso-Sáez L & G. Morán XA (2017) Testing the
metabolic theory of ecology with marine bacteria: different temperature sensitivity of major
phylogenetic groups during the spring phytoplankton bloom. Environmental microbiology 19:
4493-4505.
Arandia-Gorostidi N, Alonso-Sáez L, Stryhanyuk H, Richnow HH, Morán XAG & Musat N (2020)
Warming the phycosphere: Differential effect of temperature on the use of diatom-derived
carbon by two copiotrophic bacterial taxa. Environmental microbiology 22: 1381-1396.
Azam F, Fenchel T, Field JG, Gray J, Meyer-Reil L & Thingstad F (1983) The ecological role of
water-column microbes in the sea. Marine ecology progress series 257-263.
133
Baltar F, Palovaara J, Unrein F, Catala P, Horňák K, Šimek K, Vaqué D, Massana R, Gasol JM &
Pinhassi J (2016) Marine bacterial community structure resilience to changes in protist
predation under phytoplankton bloom conditions. The ISME journal 10: 568-581.
Brown MV, Lauro FM, DeMaere MZ, Muir L, Wilkins D, Thomas T, Riddle MJ, Fuhrman JA,
Andrews-Pfannkoch C & Hoffman JM (2012) Global biogeography of SAR11 marine bacteria.
Molecular systems biology 8: 595.
Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S, Cruaud C, De
Vargas C & Gasol JM (2015) Patterns and ecological drivers of ocean viral communities. Science
348.
Buchan A, LeCleir GR, Gulvik CA & González JM (2014) Master recyclers: features and functions
of bacteria associated with phytoplankton blooms. Nature Reviews Microbiology 12: 686-698.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA & Holmes SP (2016) DADA2: high-
resolution sample inference from Illumina amplicon data. Nature methods 13: 581-583.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG,
Goodrich JK & Gordon JI (2010) QIIME allows analysis of high-throughput community
sequencing data. Nature methods 7: 335-336.
Chafee M, Fernàndez-Guerra A, Buttigieg PL, Gerdts G, Eren AM, Teeling H & Amann RI (2018)
Recurrent patterns of microdiversity in a temperate coastal marine environment. The ISME
journal 12: 237-252.
Chow C-ET, Kim DY, Sachdeva R, Caron DA & Fuhrman JA (2014) Top-down controls on bacterial
community structure: microbial network analysis of bacteria, T4-like viruses and protists. The
ISME journal 8: 816-829.
134
Chow CET & Fuhrman JA (2012) Seasonality and monthly dynamics of marine myovirus
communities. Environmental microbiology 14: 2171-2183.
Coleman ML & Chisholm SW (2007) Code and context: Prochlorococcus as a model for cross-
scale biology. Trends in microbiology 15: 398-407.
Comeau AM & Krisch HM (2008) The capsid of the T4 phage superfamily: the evolution,
diversity, and structure of some of the most prevalent proteins in the biosphere. Molecular
biology and evolution 25: 1321-1332.
Crespo BG, Pommier T, Fernández-Gómez B & Pedrós-Alió C (2013) Taxonomic composition of
the particle-attached and free-living bacterial assemblages in the Northwest Mediterranean Sea
analyzed by pyrosequencing of the 16S rRNA. Microbiologyopen 2: 541-552.
D'ambrosio L, Ziervogel K, MacGregor B, Teske A & Arnosti C (2014) Composition and enzymatic
function of particle-associated and free-living bacteria: a coastal/offshore comparison. The
ISME journal 8: 2167-2179.
Decelle J, Romac S, Stern RF, Bendif EM, Zingone A, Audic S, Guiry MD, Guillou L, Tessier D & Le
Gall F (2015) Phyto REF: a reference database of the plastidial 16S rRNA gene of photosynthetic
eukaryotes with curated taxonomy. Molecular ecology resources 15: 1435-1445.
Del Giorgio PA, Gasol JM, Vaqué D, Mura P, Agustí S & Duarte CM (1996) Bacterioplankton
community structure: protists control net production and the proportion of active bacteria in a
coastal marine community. Limnology and Oceanography 41: 1169-1179.
Duret MT, Lampitt RS & Lam P (2019) Prokaryotic niche partitioning between suspended and
sinking marine particles. Environmental microbiology reports 11: 386-400.
135
Eren AM, Maignien L, Sul WJ, Murphy LG, Grim SL, Morrison HG & Sogin ML (2013) Oligotyping:
differentiating between closely related microbial taxa using 16S rRNA gene data. Methods in
ecology and evolution 4: 1111-1119.
Fagan AJ, Moreno AR & Martiny AC (2019) Role of ENSO conditions on particulate organic
matter concentrations and elemental ratios in the Southern California Bight. Frontiers in Marine
Science 6: 386.
Filée J, Tétart F, Suttle CA & Krisch H (2005) Marine T4-type bacteriophages, a ubiquitous
component of the dark matter of the biosphere. Proceedings of the National Academy of
Sciences 102: 12471-12476.
Francis TB, Bartosik D, Sura T, Sichert A, Hehemann J-H, Markert S, Schweder T, Fuchs BM,
Teeling H & Amann RI (2021) Changing expression patterns of TonB-dependent transporters
suggest shifts in polysaccharide consumption over the course of a spring phytoplankton bloom.
The ISME Journal 1-15.
Fuhrman JA & Caron DA (2016) Heterotrophic planktonic microbes: virus, bacteria, archaea, and
protozoa. Manual of environmental microbiology 4.2. 2-1-4.2. 2-34.
Fuhrman JA, Comeau DE, Hagström Å & Chan AM (1988) Extraction from natural planktonic
microorganisms of DNA suitable for molecular biological studies. Applied and environmental
microbiology 54: 1426-1429.
Fuhrman JA, Hewson I, Schwalbach MS, Steele JA, Brown MV & Naeem S (2006) Annually
reoccurring bacterial communities are predictable from ocean conditions. Proceedings of the
National Academy of Sciences 103: 13104-13109.
136
Fuhrman JA, Steele JA, Hewson I, Schwalbach MS, Brown MV, Green JL & Brown JH (2008) A
latitudinal diversity gradient in planktonic marine bacteria. Proceedings of the National
Academy of Sciences 105: 7774-7778.
Gasol JM & Kirchman DL (2018) Microbial ecology of the oceans. John Wiley & Sons.
Gilbert JA, Field D, Swift P, Newbold L, Oliver A, Smyth T, Somerfield PJ, Huse S & Joint I (2009)
The seasonal structure of microbial communities in the Western English Channel.
Environmental microbiology 11: 3132-3139.
Gilbert JA, Steele JA, Caporaso JG, Steinbrück L, Reeder J, Temperton B, Huse S, McHardy AC,
Knight R & Joint I (2012) Defining seasonal marine microbial community dynamics. The ISME
journal 6: 298-308.
Gonzalez JM, Sherr EB & Sherr BF (1990) Size-selective grazing on bacteria by natural
assemblages of estuarine flagellates and ciliates. Applied and Environmental Microbiology 56:
583-589.
Grossart HP, Levold F, Allgaier M, Simon M & Brinkhoff T (2005) Marine diatom species harbour
distinct bacterial communities. Environmental Microbiology 7: 860-873.
Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, Boutte C, Burgaud G, De Vargas C &
Decelle J (2013) The Protist Ribosomal Reference database (PR2): a catalog of unicellular
eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic acids research 41:
D579-D604.
Guixa-Boixereu N, Vaque D, Gasol JM & Pedros-Alio C (1999) Distribution of viruses and their
potential effect on bacterioplankton in an oligotrophic marine system. Aquatic Microbial
Ecology 19: 205-213.
137
Hatosy SM, Martiny JB, Sachdeva R, Steele J, Fuhrman JA & Martiny AC (2013) Beta diversity of
marine bacteria depends on temporal scale. Ecology 94: 1898-1904.
Hewson I, Vargo G & Fuhrman J (2003) Bacterial diversity in shallow oligotrophic marine
benthos and overlying waters: effects of virus infection, containment, and nutrient enrichment.
Microbial ecology 46: 322-336.
Ignacio-Espinoza JC, Ahlgren NA & Fuhrman JA (2020) Long-term stability and Red Queen-like
strain dynamics in marine viruses. Nature microbiology 5: 265-271.
Johnson ZI, Zinser ER, Coe A, McNulty NP, Woodward EMS & Chisholm SW (2006) Niche
partitioning among Prochlorococcus ecotypes along ocean-scale environmental gradients.
Science 311: 1737-1740.
Larkin AA, Moreno AR, Fagan AJ, Fowlds A, Ruiz A & Martiny AC (2020) Persistent El Niño driven
shifts in marine cyanobacteria populations. PloS one 15: e0238405.
Legendre P & Gallagher ED (2001) Ecologically meaningful transformations for ordination of
species data. Oecologia 129: 271-280.
Lie AA, Kim DY, Schnetzer A & Caron DA (2013) Small-scale temporal and spatial variations in
protistan community composition at the San Pedro Ocean Time-series station off the coast of
southern California. Aquatic Microbial Ecology 70: 93-110.
Martiny AC, Tai AP, Veneziano D, Primeau F & Chisholm SW (2009) Taxonomic resolution,
ecotypes and the biogeography of Prochlorococcus. Environmental microbiology 11: 823-832.
McNichol J, Berube PM, Biller SJ & Fuhrman JA (2021) Evaluating and Improving Small Subunit
rRNA PCR Primer Coverage for Bacteria, Archaea, and Eukaryotes Using Metagenomes from
Global Ocean Surveys. Msystems 6: e00565-00521.
138
Mestre M, Borrull E, Sala M & Gasol JM (2017) Patterns of bacterial diversity in the marine
planktonic particulate matter continuum. The ISME Journal.
Milici M, Vital M, Tomasch J, Badewien TH, Giebel HA, Plumeier I, Wang H, Pieper DH, Wagner-
Döbler I & Simon M (2017) Diversity and community composition of particle-associated and
free-living bacteria in mesopelagic and bathypelagic Southern Ocean water masses: Evidence of
dispersal limitation in the Bransfield Strait. Limnology and Oceanography 62: 1080-1095.
Moore LR, Rocap G & Chisholm SW (1998) Physiology and molecular phylogeny of coexisting
Prochlorococcus ecotypes. Nature 393: 464-467.
Needham DM & Fuhrman JA (2016) Pronounced daily succession of phytoplankton, archaea
and bacteria following a spring bloom. Nature microbiology 1: 16005.
Needham DM, Sachdeva R & Fuhrman JA (2017) Ecological dynamics and co-occurrence among
marine phytoplankton, bacteria and myoviruses shows microdiversity matters. The ISME journal
11: 1614-1629.
Needham DM, Chow C-ET, Cram JA, Sachdeva R, Parada A & Fuhrman JA (2013) Short-term
observations of marine bacterial and viral communities: patterns, connections and resilience.
The ISME journal 7: 1274-1285.
Needham DM, Fichot EB, Wang E, Berdjeb L, Cram JA, Fichot CG & Fuhrman JA (2018) Dynamics
and interactions of highly resolved marine plankton via automated high-frequency sampling.
The ISME journal 12: 2417.
Pagarete A, Chow C-E, Johannessen T, Fuhrman J, Thingstad T & Sandaa R (2013) Strong
seasonality and interannual recurrence in marine myovirus communities. Applied and
environmental microbiology 79: 6253-6259.
139
Parada AE, Needham DM & Fuhrman JA (2016) Every base matters: assessing small subunit
rRNA primers for marine microbiomes with mock communities, time series and global field
samples. Environmental microbiology 18: 1403-1414.
Pernthaler J (2005) Predation on prokaryotes in the water column and its ecological
implications. Nature Reviews Microbiology 3: 537-546.
Prokopowich CD, Gregory TR & Crease TJ (2003) The correlation between rDNA copy number
and genome size in eukaryotes. Genome 46: 48-50.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J & Glöckner FO (2013) The
SILVA ribosomal RNA gene database project: improved data processing and web-based tools.
Nucleic acids research 41: D590-D596.
Ren J, Ahlgren NA, Lu YY, Fuhrman JA & Sun F (2017) VirFinder: a novel k-mer based tool for
identifying viral sequences from assembled metagenomic data. Microbiome 5: 1-20.
Rieck A, Herlemann DP, Jürgens K & Grossart H-P (2015) Particle-associated differ from free-
living bacteria in surface waters of the Baltic Sea. Frontiers in microbiology 6: 1297.
Roux S, Enault F, Hurwitz BL & Sullivan MB (2015) VirSorter: mining viral signal from microbial
genomic data. PeerJ 3: e985.
Rusch DB, Martiny AC, Dupont CL, Halpern AL & Venter JC (2010) Characterization of
Prochlorococcus clades from iron-depleted oceanic regions. Proceedings of the National
Academy of Sciences 107: 16184-16189.
Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczarek L, Hess WR, Post AF, Hagemann M,
Paulsen I & Partensky F (2009) Ecological genomics of marine picocyanobacteria. Microbiology
and Molecular Biology Reviews 73: 249-299.
140
Schwalbach MS, Hewson I & Fuhrman JA (2004) Viral effects on bacterial community
composition in marine plankton microcosms. Aquatic Microbial Ecology 34: 117-127.
Šimek K, Kojecká P, Nedoma J, Hartman P, Vrba J & Dolan JR (1999) Shifts in bacterial
community composition associated with different microzooplankton size fractions in a
eutrophic reservoir. Limnology and Oceanography 44: 1634-1644.
Sintes E & Del Giorgio PA (2014) Feedbacks between protistan single-cell activity and bacterial
physiological structure reinforce the predator/prey link in microbial foodwebs. Frontiers in
microbiology 5: 453.
Suzuki MT (1999) Effect of protistan bacterivory on coastal bacterioplankton diversity. Aquatic
Microbial Ecology 20: 261-272.
Suzuki S, Kaneko R, Kodama T, Hashihama F, Suwa S, Tanita I, Furuya K & Hamasaki K (2017)
Comparison of community structures between particle-associated and free-living prokaryotes in
tropical and subtropical Pacific Ocean surface waters. Journal of Oceanography 73: 383-395.
Teeling H, Fuchs BM, Becher D, Klockow C, Gardebrecht A, Bennke CM, Kassabgy M, Huang S,
Mann AJ & Waldmann J (2012) Substrate-controlled succession of marine bacterioplankton
populations induced by a phytoplankton bloom. Science 336: 608-611.
Teeling H, Fuchs BM, Bennke CM, Krueger K, Chafee M, Kappelmann L, Reintjes G, Waldmann J,
Quast C & Gloeckner FO (2016) Recurring patterns in bacterioplankton dynamics during coastal
spring algae blooms. elife 5: e11888.
Thingstad TF & Lignell R (1997) Theoretical models for the control of bacterial growth rate,
abundance, diversity and carbon demand. Aquatic microbial ecology 13: 19-27.
141
Thingstad TF, Pree B, Giske J & Våge S (2015) What difference does it make if viruses are strain-,
rather than species-specific? Frontiers in microbiology 6: 320.
Thingstad TF, Våge S, Storesund JE, Sandaa R-A & Giske J (2014) A theoretical analysis of how
strain-specific viruses can control microbial species diversity. Proceedings of the National
Academy of Sciences 111: 7813-7818.
Unfried F, Becker S, Robb CS, Hehemann J-H, Markert S, Heiden SE, Hinzke T, Becher D, Reintjes
G & Krüger K (2018) Adaptive mechanisms that provide competitive advantages to marine
bacteroidetes during microalgal blooms. The ISME journal 12: 2894-2906.
Ward CS, Yung C-M, Davis KM, Blinebry SK, Williams TC, Johnson ZI & Hunt DE (2017) Annual
community patterns are driven by seasonal switching between closely related marine bacteria.
The ISME journal 11: 1412-1422.
Winter C, Smit A, Herndl GJ & Weinbauer MG (2005) Linking bacterial richness with viral
abundance and prokaryotic activity. Limnology and oceanography 50: 968-977.
Yeh Y-C, Needham DM, Sieradzki ET & Fuhrman JA (2018) Taxon Disappearance from
Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every
Sequencing Run. MSystems 3: e00023-00018.
Yeh YC, McNichol J, Needham DM, Fichot EB, Berdjeb L & Fuhrman JA (2021) Comprehensive
single-PCR 16S and 18S rRNA community analysis validated with mock communities, and
estimation of sequencing bias against 18S. Environmental Microbiology.
Yokokawa T & Nagata T (2005) Growth and grazing mortality rates of phylogenetic groups of
bacterioplankton in coastal marine environments. Applied and Environmental Microbiology 71:
6799-6807.
142
Yung C-M, Ward CS, Davis KM, Johnson ZI & Hunt DE (2016) Insensitivity of diverse and
temporally variable particle-associated microbial communities to bulk seawater environmental
parameters. Applied and environmental microbiology 82: 3431-3437.
Zhu F, Massana R, Not F, Marie D & Vaulot D (2005) Mapping of picoeucaryotes in marine
ecosystems with quantitative PCR of the 18S rRNA gene. FEMS microbiology ecology 52: 79-92.
Zwirglmaier K, Jardillier L, Ostrowski M, Mazard S, Garczarek L, Vaulot D, Not F, Massana R,
Ulloa O & Scanlan DJ (2008) Global phylogeography of marine Synechococcus and
Prochlorococcus reveals a distinct partitioning of lineages among oceanic biomes.
Environmental microbiology 10: 147-161.
143
Figures and Tables
Table 1. Variance of free-living and particle-associated or large-celled 16S
community explained by each component after partitioning out the
environmental effect (i.e., temperature and chlorophyll-a) using partial CCA.
Free-living
prokaryotes
Particle-associated
prokaryotes
Var P-value Var P-value
Chloroplast 16S community 27.4 0.001 30.9 0.001
Eukaryotic 18S community 2.7 0.027 5.7 0.015
Free virus contig community 28.1 0.001 9.3 0.001
144
Figure 1. Temporal variation of multivariate ENSO index (MEI), sea surface temperature (SST),
monthly average satellite chlorophyll-a concentration (Chla), and monthly average satellite
primary productivity (PP) at the SPOT location. MEI index is used to characterize the intensity of
the El Niño/Southern Oscillation (ENSO) event; large positive MEI values indicate El Niño
conditions, whereas large negative MEI values indicate La Niña conditions. In 2014-2015 (red
box), probably due to El Niño (MEI>0), this location experienced reduced productivity and
unusually warm temperatures (particularly winters), which also persisted to a lesser extent
through the end of the study.
145
Figure 2. Order level taxonomic composition of (a) free-living prokaryotes, (b) particle-
associated or large-celled prokaryotes, (c) chloroplast 16S, and (d) eukaryotic 18S communities
(excluding Metazoa and Syndiniales sequences to better show the phytoplankton and protistan
phagotrophs; see text). Note this shows only abundant taxonomic groups with relative
abundance >10% in any month.
146
Figure 3. (Left) Canonical correspondence analysis (CCA) ordination diagram of the first two
axes for free-living (0.2-1 um) and particle-associated or large-celled (1-80 um) prokaryotes. The
environmental variables analyzed are indicated as vectors: CTD temperature and monthly
average satellite chlorophyll-a concentration (chlorophyll-a). The prokaryotic communities for
each sampling month are represented as circles and color-coded with the sampling months.
(Right) Temporal variation of the first CCA scores, which are measures of the incidence of
particular community components, with higher (positive) numbers reflecting summer-like
(warm, lower chlorophyll) communities, and lower (negative) numbers indicating nominal
winter and spring-like communities (lower temperatures, higher chlorophyll). Note that
prokaryotic communities in both size fractions were mainly summer-like in 2014-2015,
corresponding to warmer years with reduced productivity and no pronounced spring bloom (Fig
1).
147
Figure 4. Mantel test showing correlations among different microbial components. Scatterplots
of each pair of beta-diversity (i.e., Bray-Curtis similarity) are shown on the left part of the figure.
Results of each Mantel test are shown on the right. The distribution of beta-diversity of each
component is shown on the diagonal. The results show that free-living and particle-associated
prokaryotic communities were correlated most closely (r=0.880).
148
Figure 5. Variation partitioning of components influencing prokaryotic community composition.
Percentages indicate the portion of variance in free-living (a and c) and larger or particle-
associated (b and d) prokaryotic community composition statistically explained by the
respective variable. Environmental variables include temperature and chlorophyll-a. Virus data
(in c and d) are available for only 2009-2014.
149
Figure 6. Relative abundance of major ASVs within SAR11 and Synechococcales cyanobacteria.
Grey areas in the top graphs represent rarer ASVs of the clades not specifically included in the
lower graphs.
150
Supplementary information
Effects of phytoplankton and viral communities on free-living and particle-associated marine
prokaryotic community structure
Yi-Chun Yeh
1
, J. Cesar Ignacio-Espinoza
1
, and Jed A. Fuhrman
1
1
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Correspondence:
Jed Fuhrman
Department of Biological Sciences, University of Southern California, Los Angeles, California
90089-0371, USA
Email: fuhrman@usc.edu
151
Figure S1. Occurrence frequency and abundance of ASVs within each major taxonomic group.
The mean relative abundance (left) and maximum relative abundance (right) are plotted against
the occurrence frequency (the proportion of communities/times in which each ASV was
detected). Rhodobacterales and SAR11 were all dominated by a few major ASVs that occurred
>75% of the time. Flavobacteriales and Synechococcales were dominated by major ASVs that
occurred >75% of the time and a few ASVs that were occasionally abundant (>5%).
Synechococcales
SAR11
Rhodobacterales
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
Flavobacteriales
0.00 0.25 0.50 0.75 1.00
0.000
0.005
0.010
0.015
0.00
0.01
0.02
0.03
0.00
0.05
0.10
0.00
0.01
0.02
0.03
0.04
occurrence
mean relative abundance
Filter AE D
Synechococcales
SAR11
Rhodobacterales
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
Flavobacteriales
0.00 0.25 0.50 0.75 1.00
0.00
0.05
0.10
0.15
0.20
0.25
0.0
0.1
0.2
0.3
0.0
0.1
0.2
0.3
0.00
0.05
0.10
occurrence
max relative abundance
Filter AE D
152
Figure S2. Relative abundance of major ASVs within Rhodobacterales and Flavobacteriales. Grey
areas in the top graphs represent rarer ASVs of the clades not specifically included in the lower
graphs.
153
Summary
This dissertation explored the spatiotemporal dynamics of free-living and particle-
associated prokaryotes and protists at a monthly scale that provided insights into how
microbial communities respond to climate change in a subtropical marine system. I was able to
examine prokaryotic 16S, chloroplast 16S, and eukaryotic 18S rRNA gene abundances all
together in a single PCR reaction and resolve them to exact amplicon sequences variants
(ASVs), which differ as little as single nucleotide. This dissertation first showed the fidelity and
quantitative nature of this methodology using mock communities. I then applied it to the long-
term microbial ocean time series (i.e., SPOT), which allowed me to survey the whole microbial
communities from two size fractions (0.2-1 and 1-80 μm) along the water column. Finally, I used
this time-series dataset to test ecological questions (i.e., bottom-up vs. top-down controls) and
warming effects by examining a El Niño event occurred during 2014-2015.
Chapter 1 evaluated the application of 3-domain universal primers (515Y/926R) and
sequence analysis pipeline (i.e., DADA2) using mixed mock communities consisting of both 16S
and 18S rRNA genes with known concentrations. This chapter demonstrated that 515Y/926R
were able to recover 16S and 18S mock communities as expected, especially when sequences
matched the primer exactly. However, mock community members with single base mismatch to
the primer resulted in 3 to 8-fold underestimation, suggesting that an evaluation of primer
coverage across three domains in field samples is needed. Fortunately, McNichol et al (2021)
have compared 515Y/926R with SSU rRNA sequences retrieved from several marine
metagenomic datasets (BioGEOTRACES, Malaspina, MBARI and TARA). They found that 96% of
bacteria and archaea 16S, 88% of eukaryotic 18S and 99% of cyanobacteria and chloroplast 16S
154
rRNA sequences perfectly matched 515Y/926R, indicating that this primer set is suitable for
marine samples. In addition, I found a consistent 2-fold underestimation of longer 18S
amplicons due to sequencing bias, suggesting a 2-fold correction factor is required to estimate
the true proportion of 16S and 18S rRNA gene abundances. Taken together, this methodology
allows us to quantitatively survey all the major components of marine microbial food webs in a
single PCR, with biases we could quantify and manage.
In chapter 2, I applied this methodology to SPOT samples, and found that >92% of
sequences in the 0.2-1 μm size fraction were prokaryotic 16S, whereas there were 6-34% 18S,
19-23% chloroplast 16S, and 46-93% prokaryotic 16S rRNA genes in the 1-80 μm size fraction,
indicating that prokaryotic 16S rRNA genes dominated in both size fractions regardless of the
sampling depth. To my knowledge no other study has shown this. I further characterized the
spatiotemporal variation of the whole microbial communities, especially on vertical and
seasonal changes of free-living prokaryotes, particle-associated and/or large-celled
prokaryotes, and protistan communities. I found strong depth-related changes in both
prokaryotic and protistan communities. Seasonal changes, on the other hand, were more
prominent in the euphotic zone. These findings, which frankly were not unexpected based on
previously published work from several labs, suggested that the whole microbial communities
were mostly driven by depth-varying physicochemical components such as light intensity,
organic matter composition, oxygen levels, nutrient availability. Seasonal inputs such as annual
spring blooms primarily affected the microbial communities in the euphotic zone. In addition, I
found distinct alpha- and beta-diversity patterns between prokaryotes and protists. Particle-
associated prokaryotes were more diverse than the free-living ones, especially in the deeper
155
layers, ascribed to endemic taxonomic groups and vertical niche differentiation of SAR11 and
Flavobacteriales. The diversity of protistan communities, on the other hand, peaked at the deep
chlorophyll maximum depth and decreased with depth, probability due to the hypoxia at depth
(a situation which applies to SPOT and only certain ocean habitats). Moreover, the temporal
beta-diversity patterns showed that prokaryotic community composition generally persisted
over time, whereas protists changed dramatically from month to month. The stability of
prokaryotes compared to protists indicates fundamental differences in their ecology, though at
this time we can only speculate about possible causes, including factors like population sizes,
functional redundancy among taxa, and interspecific interactions.
Chapter 3 reported the effects of environmental conditions and biotic factors (i.e., free
dsDNA viruses, phytoplankton, protists) on free-living and particle-associated prokaryotic
community structure at 5-m depth, using SSU rRNA sequencing and viral metagenomics. With a
combination of canonical analysis and variation partitioning, I found that temporal variation of
free-living and particle-associated community structure was significantly explained statistically
by the phytoplankton communities (i.e., chloroplast 16S) and free dsDNA viruses (i.e., viral
contigs). Because we know most prokaryotes depend on phytoplankton for nutrition, we
concluded these results suggested that the dynamics of prokaryotic community structure were
driven significantly by bottom-up control from phytoplankton. But because we know viruses
require hosts for reproduction, and sometimes may also affect communities through “kill the
winner” dynamics, we could not say to what extent the viruses were following their hosts, and
to what extent they may have been driving changes in host communities. In addition, I
examined how the warming effect due to El Niño during 2014-2015 affected prokaryotic
156
communities at 5-m depth. Although prokaryotic community composition at the order level did
not change considerably during 2014-2015, I found a shift within the major taxonomic groups at
the ASV level, especially for cyanobacteria, shifting from cold-water ecotypes to warm-water
ecotypes. These findings showed microbial community changes in response to the warming
event.
To our knowledge, this dissertation work provided the longest time-series ever
published on depth profiles of protists, bacteria, and archaea in two size fractions, which can
serve as a baseline information for future detailed studies on biological oceanography and
microbial ecology. For example, the different alpha- and beta-diversity patterns between
prokaryotes and protists may be due to underlying causes that include differences in population
sizes, abilities to adapt to oxygen-depleted environments, predation/parasitism/virus infection,
and trophic status. All of these are worth examining in future studies, which would further
contribute to understanding fundamental ecology of this system.
References
McNichol J, Berube PM, Biller SJ, Fuhrman JA (2021). Evaluating and Improving Small Subunit
rRNA PCR Primer Coverage for Bacteria, Archaea, and Eukaryotes Using Metagenomes from
Global Ocean Surveys. Msystems 6: e00565-00521.
Abstract (if available)
Abstract
Marine microbes perform a wide range of biogeochemical process, and thus understanding microbial community dynamics is critically important. My dissertation investigates microbial communities at the San Pedro Ocean Time-series (SPOT) location over the period 2005-2018 from the surface to near bottom. Community composition was analyzed by sequencing 16S and 18S rRNA genes from two size fractions (0.2-1 and 1-80 μm), with 3-domain universal primers, which allowed me to analyze eukaryotes as well as free-living and particle-associated prokaryotes simultaneously. My research first validated this methodology with 16S and 18S mock communities, and found that this primer set quantitatively recovered both 16S and 18S mock community composition. However, due to sequencing bias against the longer 18S amplicons, a two-fold correction factor was required to estimate the true proportion of each category. I thus applied this to SPOT samples and found that >92% of sequences in the small size fraction were prokaryotic 16S, and the larger size fraction contained 6-34% 18S, 19-23% chloroplast 16S, and 46-93% prokaryotic 16S rRNA genes. Throughout the time-series, prokaryotes were relatively stable temporally compared to protists. Particle-associated and/or large-celled prokaryotes were more diverse than the smaller free-living ones, especially at the deeper depths, which I ascribed to endemic taxonomic groups and niche differentiation of SAR11 and Flavobacteriales. Further, by also including viral metagenomics into the overall analyses, I concluded that changes in free-living and particle-associated prokaryotic community structure at 5 m depth were to a large extent driven by the bottom-up control of phytoplankton communities, while changes in free dsDNA viruses could have been to some extent following the changes in prokaryotic communities. In addition, a warming event due to El Niño in 2014-2015 resulted in significant changes in prokaryotic populations, especially for cyanobacteria, shifting from cold-water ecotypes to warm-water ecotypes. This dissertation work shows how long-term observations help us better understand microbial community responses to climate change.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Characterizing protistan diversity and quantifying protistan grazing in the North Pacific Subtropical Gyre
PDF
Genetic characterization of microbial eukaryotic diversity and metabolic potential
PDF
Spatial and temporal dynamics of marine microbial communities and their diazotrophs in the Southern California Bight
PDF
Patterns of molecular microbial activity across time and biomes
PDF
Microbe to microbe: monthly microbial community dynamics and interactions at the San Pedro Ocean Time-series
PDF
Temporal variability of marine archaea across the water column at SPOT
PDF
Changes in the community composition of marine microbial eukaryotes across multiple temporal scales of measurement
PDF
Multi-dataset analysis of bacterial heterotrophic variability at the San Pedro Ocean Time-series (SPOT): an investigation into the necessity and feasibility of incorporating a dynamic bacterial c...
PDF
Developing statistical and algorithmic methods for shotgun metagenomics and time series analysis
PDF
Ecological implications of daily-to-weekly dynamics of marine bacteria, archaea, viruses, and phytoplankton
PDF
Clustering 16S rRNA sequences: an accurate and efficient approach
PDF
Technological advancements in microbial analyses of periodontitis patients: focus on Illumina® sequencing using the Miseq system on the 16s rRNA gene: a clinical and microbial study
PDF
Feature engineering and supervised learning on metagenomic sequence data
PDF
Unexplored microbial communities in marine sediment porewater
PDF
Microbial ecology in the deep terrestrial biosphere: a geochemical, metagenomic and culture-based approach
PDF
Marine protistan diversity, spatiotemporal dynamics, and physiological responses to environmental cues
PDF
The molecular adaptation of Trichodesmium to long-term CO₂-selection under multiple nutrient limitation regimes
Asset Metadata
Creator
Yeh, Yi-Chun
(author)
Core Title
Ecological patterns of free-living and particle-associated prokaryotes, protists, and viruses at the San Pedro Ocean Time-series between 2005 and 2018
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Biology
Degree Conferral Date
2022-05
Publication Date
01/12/2024
Defense Date
12/13/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
16s rRNA gene,18S rRNA gene,amplicon sequencing,microbial ecology,OAI-PMH Harvest,SPOT,three-domain universal primers,time-series,warming
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fuhrman, Jed (
committee chair
), Hutchins, David (
committee member
), Levine, Naomi (
committee member
), Sun, Fengzhu (
committee member
)
Creator Email
yichuny@usc.edu,yichunyeh.tw@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110455248
Unique identifier
UC110455248
Legacy Identifier
etd-YehYiChun-10337
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yeh, Yi-Chun
Type
texts
Source
20220112-usctheses-batch-907
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
16s rRNA gene
18S rRNA gene
amplicon sequencing
microbial ecology
SPOT
three-domain universal primers
time-series
warming