Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Using molecular techniques to explore the diversity, ecology and physiology of important protistan species, with an emphasis on the Prymnesiophyceae
(USC Thesis Other)
Using molecular techniques to explore the diversity, ecology and physiology of important protistan species, with an emphasis on the Prymnesiophyceae
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
USING MOLECULAR TECHNIQUES TO EXPLORE THE DIVERSITY, ECOLOGY
AND PHYSIOLOGY OF IMPORTANT PROTISTAN SPECIES, WITH AN EMPHASIS
ON THE PRYMNESIOPHYCEAE
by
Amy E. Koid
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOLOGICAL SCIENCES)
May 2014
2
Table of Contents
Acknowledgements ....................................................................................................................... 6
Dissertation abstract ..................................................................................................................... 7
Chapter One .................................................................................................................................. 9
Comparative analysis of eukaryotic marine microbial assemblages from 18S rRNA gene or gene
transcript clone libraries using different methods of extraction ..................................................... 9
Abstract .......................................................................................................................................... 9
Introduction ................................................................................................................................. 10
Methods ........................................................................................................................................ 11
Sample collection ........................................................................................................................... 11
Nucleic acids extraction ................................................................................................................. 12
Mechanical lysis ............................................................................................................................ 12
Chemical lysis ................................................................................................................................ 12
DNA and cDNA library construction and sequencing ................................................................... 13
Mechanical lysis library ........................................................................................................................... 13
Chemical lysis library ............................................................................................................................... 14
Community richness and diversity analyses .................................................................................. 14
Sequence pre-processing .......................................................................................................................... 14
OTU calling and taxonomic assignment ................................................................................................... 15
Diversity analyses ..................................................................................................................................... 15
Phylogenetic trees .................................................................................................................................... 15
Results .......................................................................................................................................... 16
Comparison of taxonomic composition of DNA vs. cDNA libraries ............................................ 17
Comparison of taxonomic compositions of mechanical lysis vs. chemical lysis libraries ............ 18
3
Phylogenetic affiliations of OTUs ................................................................................................. 19
Discussion..................................................................................................................................... 19
Conclusions .................................................................................................................................. 23
Acknowledgements ..................................................................................................................... 24
Tables ........................................................................................................................................... 25
Figure Legends ............................................................................................................................ 29
Figures .......................................................................................................................................... 30
Supplemental Information ......................................................................................................... 33
Chapter Two ................................................................................................................................ 38
Comparative transcriptome analysis of four prymnesiophyte algae ............................................. 38
Abstract ........................................................................................................................................ 38
Introduction ................................................................................................................................. 39
Methods ........................................................................................................................................ 41
Culture conditions .......................................................................................................................... 41
RNA Isolation ................................................................................................................................ 42
Library Preparation and Sequencing .............................................................................................. 42
Transcriptome Assembly ............................................................................................................... 43
Transcriptome Annotation and Comparison .................................................................................. 44
Polyketide synthase analysis .......................................................................................................... 45
KOG analysis ................................................................................................................................. 45
Results .......................................................................................................................................... 47
Overview of the transcriptome ....................................................................................................... 47
Core prymnesiophyte genes ........................................................................................................... 47
B-vitamin biosynthesis pathways .................................................................................................. 49
4
Polyketide synthase genes .............................................................................................................. 51
KOG patterns ................................................................................................................................. 52
Discussion..................................................................................................................................... 55
Core, shared and unique genes .................................................................................................. 55
B-vitamin biosynthesis genes ........................................................................................................ 57
Polyketide synthase ........................................................................................................................ 59
Analysis of KOG relative abundances reveals interesting clustering patterns............................... 61
Acknowledgements ..................................................................................................................... 65
Tables ........................................................................................................................................... 66
Figures .......................................................................................................................................... 71
Supplemental information ......................................................................................................... 78
Chapter Three ............................................................................................................................. 86
Changes in gene expression of Prymnesium parvum due to nitrogen and phosphorus limitation 86
Abstract ........................................................................................................................................ 86
Introduction ................................................................................................................................. 87
Methods ........................................................................................................................................ 90
Culture conditions .......................................................................................................................... 90
RNA Isolation ................................................................................................................................ 90
Library Preparation and Sequencing .............................................................................................. 91
Transcriptome assembly ................................................................................................................ 91
Transcriptome annotation .............................................................................................................. 92
Differentially expressed genes ....................................................................................................... 92
Polyketide synthase analysis .......................................................................................................... 93
Results .......................................................................................................................................... 93
5
General transcriptome characteristics ............................................................................................ 93
Differentially expressed genes ....................................................................................................... 94
Nitrogen uptake, transport and assimilation .................................................................................. 94
Other nitrogen metabolism genes .................................................................................................. 96
Phosphate transporters ................................................................................................................... 97
Photosynthesis proteins .................................................................................................................. 98
Protein synthesis and degradation .................................................................................................. 99
Polyketide synthase genes ............................................................................................................ 100
Fatty acid oxidation and tricarboxylic acid (TCA) cycle genes ................................................... 101
Discussion................................................................................................................................... 102
Nitrogen uptake and metabolism genes ....................................................................................... 103
Other nitrogen metabolism genes ................................................................................................ 105
Phosphate transporters ................................................................................................................. 106
Photosynthesis ............................................................................................................................. 107
Genes involved in protein production and turnover ..................................................................... 109
Polyketide synthase (PKS) genes ................................................................................................. 109
Fatty acid oxidation and TCA cycle ............................................................................................ 111
Acknowledgements ................................................................................................................... 112
Tables ......................................................................................................................................... 113
Figure legends............................................................................................................................ 115
Figures ........................................................................................................................................ 118
References .................................................................................................................................. 125
6
Acknowledgements
This dissertation would not have been completed without the help of numerous people, only
some of whom are given particular mention here. I would like to thank my advisor, Karla
Heidelberg, for her support and confidence in me throughout the years. Thank you for providing
me with opportunities to explore interesting questions and to grow my skill sets. My co-advisor,
David Caron, provided invaluable input and direction on all three chapters of my dissertation. I
am also grateful to my other dissertation committee members, John Heidelberg and David
Bottjer for their support and guidance. Special thanks to John for taking the time to answer all of
my questions especially early on, when my bioinformatics knowledge was at its infancy. The
same goes for Bill Nelson, who provided a lot of guidance as I navigated a new field; your
patience and sense of humor was very much appreciated. Thanks to Jay Liu and Ramon Terrado
who were infinitely helpful with all the minutiae of analyses that make up a dissertation. I would
also like to thank my fellow graduate students, Benjamin Tully, Johanna Holm, and Rohan
Sachdeva, for their support and friendship through the years. Other mentors and colleagues who
have made a significant impact during this journey are: Diane Kim, Adriane Jones, Maxine
Chaney, Marina Ramon, Andy Gracey and Peter Countway. I am also extremely grateful to my
friends at Church of the Redeemer; you have been my family, and a bedrock of friendship,
support and prayer. Thanks to my parents, Koid Teng Hye and Heong Siew Lin, and my sisters,
Audrey and Andrea. Though you all live far away, you have been a source of unending love and
support; mere thanks do not suffice. Likewise with my husband, Arthur, and my son, Johan, who
bring so much joy and laughter to my life. Finally, I want to thank God for his provision and
sustenance throughout this entire journey.but the
7
Dissertation abstract
Protists are diverse unicellular eukaryotes that are widespread and ecologically important in
aquatic and terrestrial environments. As phototrophs, they can be responsible for a significant
proportion of carbon fixation, and as heterotrophs they transfer organic matter higher up the food
chain and also remineralize nutrients for further use by other organisms. Increasingly, more
protistan species are being recognized as mixotrophs, which are able to function both as
phototrophs and heterotrophs, but very little is known about the underlying physiology of this
nutritional mode. In this dissertation, I investigated the effects of different 18S rRNA or rDNA
sample processing techniques to evaluate both species diversity and activity of marine microbial
eukaryotes at the San Pedro Ocean Time-series (SPOT) off the coast of Los Angeles, CA. The
results of this study indicated that some taxonomic groups that do not comprise significant
fractions of 18S rRNA libraries may well be active and important components of the
environment at sampling time. I also used a comparative transcriptomics approach to evaluate
functional and physiological differences between the transcriptomes of four prymnesiophytes:
Prymnesium parvum, Chrysochromulina brevifilum, Chrysochromulina ericina and Phaeocystis
antarctica, the first three of which are mixotrophic species. My analysis revealed the presence of
a set of core genes as well as differences in genes involved in secondary metabolite pathways. A
larger comparison of gene content with other microbial eukaryotic groups revealed distinct
differences in the distribution of functional genes among phototrophic, heterotrophic and
mixotrophic species. Finally, I undertook a comparative transcriptomic analysis of P. parvum, a
toxigenic, bloom-forming mixotrophic species that is common in both marine and freshwater
ecosystems. This species displayed a robust transcriptomic response when exposed to nitrogen
(N) and phosphorus (P) limitation, relative to a nutrient-replete condition. The response to N-
8
and P-limitation included the increased relative expression of transporters for nitrogen and
phosphorus respectively. Additionally, genes involved in protein synthesis and turnover, TCA
cycle and fatty acid synthesis were also relatively upregulated under both nutrient limited
conditions. Under N-limitation, genes involved in the intracellular processing of organic nitrogen
were more highly expressed relative to the replete condition. Under P-limitation, photosynthesis
genes and polyketide synthase genes were more highly expressed than in the replete treatment.
Together, these studies show how molecular data in the form of DNA or RNA sequences can
reveal important insights for protistan species diversity and the molecular underpinnings of
different nutritional modes and cellular and physiological responses to different nutrient
conditions.
9
Chapter One
Comparative analysis of eukaryotic marine microbial assemblages from 18S rRNA gene or
gene transcript clone libraries using different methods of extraction*
*As published in:
Koid, A., W. C. Nelson, A. Mraz, and K. B. Heidelberg. 2012. Comparative analysis of
eukaryotic marine microbial assemblages from 18S rRNA gene and gene transcript clone
libraries by using different methods of extraction. Appl. Environ. Microbiol. 78: 3958–3965.
Abstract
Eukaryotic marine microbes play pivotal roles in biogeochemical nutrient cycling and ecosystem
function, but studies that focus on the protistan biogeography and genetic diversity lag studies of
other microbes. 18S rRNA PCR amplification and clone library sequencing are commonly used
to assess diversity that is culture independent. However, molecular methods are not without
potential biases and artifacts. In this study, we compare community composition of clone
libraries generated from the same water sample collected at the San Pedro Ocean Time-series
(SPOT) station in the NW Pacific Ocean. Community composition was assessed after using
different cell lysis methods (chemical vs. mechanical) and extraction of different nucleic acids
(DNA vs. RNA reverse transcribed to cDNA) in Sanger ABI clone libraries. We describe
specific biases for ecologically important phylogenetic groups resulting from differences in
nucleic acid extraction methods that will help inform the design of future eukaryotic diversity
studies regardless of the target sequencing platform planned.
10
Introduction
Biologists who sample natural microbial communities face challenges in estimating a
community's “true” diversity through population subsamples. Since the late 1980s, culture-
independent polymerase chain reaction (PCR)-amplified clone libraries that target the small
subunit (SSU) 16S rRNA taxonomic genes have served as a proxy for the diversity and
composition of natural microbial bacterial and archaeal populations (Giovannoni et al. 1990;
Fuhrman et al. 1993; Narasingarao et al. 2012). However, studies of microbial eukaryotic
populations have lagged those of their generally smaller prokaryote counterparts (Caron et al.
2009b; Heidelberg et al. 2010). More recently, natural assemblages of microbial eukaryotes have
been assessed and compared using the small subunit (SSU) 18S rRNA gene. These culture
independent gene surveys have revealed extensive microbial diversity that was previously
undetected with culture-dependent methods and morphological identification (Díez et al. 2001;
López-García et al. 2001; Moon-van der Staay et al. 2001; Worden 2006; Countway et al. 2010).
The sampling of the 18S rRNA gene or gene transcript is commonplace for studies of
microbial eukaryotes, but several factors can complicate or bias diversity assessments
(Wintzingerode and Göbel 1997; Acinas et al. 2005; Richards and Bass 2005; Stoeck et al. 2006).
Resulting analysis of community composition can be dependent on the filter size fraction, lysis
method, whether DNA or RNA is targeted, the PCR primers used as well as the PCR thermal
cycling regime (Aguilera et al. 2006). Each of these different steps can affect library diversity
and skew results away from the “true” community composition. For example, a comparison of
the contribution of five marine stramenopile (MAST) groups to the total sample using a PCR-
amplified library vs. fluorescence in situ hybridization (FISH) showed that the PCR primers that
targeted the full 18S rRNA gene overestimated two groups and underestimate one group,
11
whereas the primer set that only targeted a portion of the gene gave a more accurate
representation of the actual abundance of each group (Massana et al. 2004). Different primer sets
designed to target a particular group or organisms can also retrieve sequences with very little
overlap (Stoeck et al. 2006). Additionally both chimeric sequences (Berney et al. 2004) and
intra-individual ribosomal-RNA polymorphisms (reviewed by Richards and Bass 2005) can
artificially increase diversity.
While PCR bias has received some attention and has been studied quantitatively, there
are still unknown other biases in sequencing libraries that have not been fully evaluated,
especially for microbial eukaryotes. The purpose of this study is to tease out the biasing effects
on community composition specifically due to the use of different extraction methods
(mechanical vs. chemical) and extraction of different nucleic acids (DNA vs. RNA) .
Methodological decisions for extraction methodologies take place prior to library construction,
so results should be informative no matter what type of downstream sequencing platform is
planned.
Methods
Sample collection
Seawater was collected at the San Pedro Ocean Time Series station (SPOTs) off the coast of
Southern California (33
o
33’N, 118
o
24’W) on April 17, 2009. This site is the focus of a longterm
USC Microbial Observatory and is well characterized (Countway et al. 2010). 500L of seawater
was collected from the surface (5 m) and pre-filtered using a 20μm Nitex mesh into acid-washed
and pre-conditioned carboys. The seawater sample was sequentially filtered through 142 mm
12
3.0μm Versapor and 0.8 and 0.1μm Supor impact membrane filters (Pall Life Sciences). Filters
were frozen at -80
o
C immediately after filtration.
Nucleic acids extraction
We used two nucleic extraction methods for both the 3.0 and 0.8 μm size fraction filters.
Replicate filters were either extracted by mechanical lysis (ML) or by a gentler chemical lysis
(CL).
Mechanical lysis
DNA and RNA were extracted using the PowerSoil Total RNA Extraction kit and the DNA
Elution Accessory kit (MoBio, Carlsbad, CA) according to the manufacturer’s protocol. Eluted
ML DNA was treated with RNase ONE Ribonuclease (Promega, Madison, MA) in a 10μl
reaction that comprised of 1μl enzyme (10U), 1μl reaction buffer and 8 μl extracted DNA. The
reaction was incubated at 37
o
C for 10 mins. Then, the enzyme, salts and oligos were removed
using the DNA Clean and Concentrator-5 Kit (Zymo, Orange, CA), and the eluted DNA was
used in subsequent PCR reactions.
Chemical lysis
Total nucleic acids were extracted by Amplicon Express using a method based on Miller et al.
(Miller et al. 1999) and Howe (http://hg.wustl.edu/hdk_lab_manual/yeast/yeast4.html) with
additional phenol/ chloroform extractions and ethanol washes. RNA samples derived from both
mechanical and chemical lysis were treated in a 10μl reaction containing 1μl RQ1 RNase-free
DNase (1U) (Promega, Madison, WI), 1 μl reaction buffer, and 8 μl of extracted RNA and
13
incubated at 37
o
C for 10 min. After adding 1 μl of stop solution, the mixture was incubated for
an additional 10 min at 65
o
C to inactivate the enzyme. The treated RNA was purified using the
RNeasy MinElute clean-up kit (Qiagen, Valencia, CA). This procedure was repeated until a PCR
reaction to check for the presence of 18S DNA was negative. The RNA sample was then reverse-
transcribed into cDNA using SuperScript III Reverse Transcriptase (Invitrogen, Carlsbad, CA)
using random decamers (Integrated DNA Technologies) as the primer. The resulting cDNA was
used in the PCR amplification reaction.
DNA and cDNA library construction and sequencing
Mechanical lysis library
18S SSU rRNA genes were amplified from DNA and cDNA using universal eukaryotic primers
Euk-A (5’-AACCTGGTTGATCCTGCCAGT-3’) and Euk-B
(5’GATCCTTCTGCAGGTTCACCTAC-3’) (Medlin et al. 1988). The manufacturer’s protocol
for GoTaq Polymerase (Promega) was modified in the following way to obtain optimal
amplifications. The final concentrations in each 50μl PCR reactions were: 0.5 mM of each
primer, 1X Promega buffer B, 2.5 mM Promega MgCl
2
, 250 mM Promega dNTPs, plus 2.5 U of
Promega Taq in buffer B and 1–2 μl of DNA extract in 50 ml reaction volumes. The thermal
cycling protocol consisted of an initial denaturation step for 2 min at 95
º
C followed by 25-35
cycles of 30s at 95
o
C, 30s at 55
o
C, 2 min at 72
o
C, and a final extension step for 7 min at 72
o
C.
Four to five replicate PCR reactions were pooled for the subsequent cloning reaction.
PCR products were separated on 1.5% agarose gel, and the products of the expected size
(~1,800bp) were excised from the gel. DNA was purified from the gel slices using the QIAquick
gel purification kit (Qiagen) according to the manufacturer’s protocol. The eluted DNA was
14
further cleaned and concentrated using Clean and Concentrator-5 Kit (Zymo). The purified PCR
products were ligated into the pCR2.1-TOPO Vector (Invitrogen). Products were then purified
again, mixed with TOP10-competent cells (Invitrogen) and then electroporated in a cuvette using
a BTX ECM 399 electroporator (Holliston, MA). After shocking, sterile SOC media (MP
Biomedicals, Salon, OH) was added to the cell suspension; the mixture was transferred to a 14
ml cell culture tube and was incubated at 37
o
C for 1 hr with shaking at 250 rpm. Subsequently,
the bacterial clones were plated and picked according to standard protocol. Two 96-well plates
from each library were sequenced using the Euk-570F primer (5’-
GTAATTCCAGCTCCAATAGC-3’) (Weekers et al. 1994) on an ABI 377 Sanger sequencer.
Chemical lysis library
The CL DNA libraries were constructed by Amplicon Express with the following deviations
from the above. The libraries were transformed into Invitrogen DH10B T1r electrocompetent
cells. 768 clones were picked from each library and arrayed into 3,384-well plates using a
Genetix Qpix. DNA was extracted for 192 of these clones from each library and also sequenced
using an ABI 377 Sanger DNA Sequencer with the Euk-570F primer.
Community richness and diversity analyses
Sequence pre-processing
Raw Sanger sequences were trimmed using Phred (Ewing and Green 1998; Ewing et al. 1998)
with the default error probability cut-off of 0.05. Trimmed sequences were screened for chimeras
using a local implementation of pintail (Ashelford et al. 2006) that used a subset of the aligned
SILVA 18S dataset (ver. 104) (Knittel et al. 2007), as the reference for calculating the expected
15
percentage differences. The Deviation from Expectation (DE) statistic was manually examined in
each of the samples to determine the appropriate cut-off value for each sample. Of the initial
1,152 clones sequenced, 209 (18%) were suspected to be chimeras and excluded from
subsequent analyses.
OTU calling and taxonomic assignment
Sequences from the six clone libraries were combined and classified into operational taxonomic
units (OTUs) using the Microbial Eukaryote Species Assignment (MESA) program (Caron et al.
2009a) at a sequence similarity cut-off of 95%. Using BLAST, the sequences in each OTU were
searched against the curated SILVA eukaryotic small subunit database (Knittel et al. 2007) and
assigned to the highest common taxonomic level of the component sequences.
Diversity analyses
The Species Prediction and Diversity Estimation program (SPADE) (Chao et al. 2005), was used
to estimate commonly used non-parametric alpha-diversity statistics (the inverse of the Simpson
index (D
s
-1
), bias corrected Chao1, and ACE1) for the total sample and for individual libraries.
Phylogenetic trees
Maximum likelihood trees of groups of interest were constructed using the PhyML plug-in
(Guindon and Gascuel 2003) in Geneious. The Hasegawa-Kishino-Yano substitution model was
used (Hasegawa et al. 1985). One hundred bootstraps were generated with the following
parameters: proportion of invariable sites: 0; number of substitution rate categories: 1.
16
Results
Six libraries were constructed, and a total of 1,152 sequences were obtained. After removing
low-quality and potentially chimeric sequences, and 7 metazoan clones, 936 clones with an
average length of 643 bp were retained for subsequent analysis. When all sequences were pooled,
the 936 clones grouped into 126 microbial eukaryotic OTUs using the MESA OTU-calling
algorithm at a 95% sequence similarity cutoff (Caron et al. 2009a). Diversity (richness) estimates
(D
s
-1
, Chao1 and ACE1) for total, individual and grouped libraries were estimated (Colwell and
Coddington 1994; Hughes et al. 2001) (Table 1). The inverse of the Simpson index (D
s
-1
) is a
common diversity statistic that can range from 1 to the maximum number of OTUs in each
sample; the higher the value, the greater the diversity. This simple biodiversity statistic accounts
for taxonomic richness (number of OTUs), eveness, (relative abundance of sequences within
each OTU) and the sequencing effort of the clone library (Countway et al. 2010). The Chao1-bc
species richness estimate (also called the rarefaction estimator) uses a bias-corrected method to
estimate missing OTUs calculated from the number of singletons and doubletons to estimate
total OTUs in a sample. ACE1 is a non-parametric abundance-based coverage estimator for
highly-heterogeneous communities that uses rare OTUs to estimate the number of missing OTUs.
This estimator corrects from observed number of OTUs but is not independent of sample size.
Total protistian diversity as measured by both parametric and non-parametric indexes
showed higher diversity (D
S
-1
=22.4, Chao1-bc=265.2 and ACE1=380.7) than any individual or
paired sample (Table 1). Rarefaction plots of Chao-1 and the number of OTUs indicated that
additional protistan diversity remained undetected (data not shown). However, using the inverse
Simpson’s index (D
S
-1
), our range of values for individual samples processed from one water
sample was 7.2-17.4, which is comparable to other estimates of D
S
-1
taken at the same site and
17
depth as part of another 18S rRNA gene protistan diversity study over a 4 month period in 2001
(6.0-28.8) (Countway et al. 2010). Combined DNA and cDNA samples that were extracted
using a mechanical lysis (ML) method had lower values than the combined samples using a
chemical lysis (CL) extraction method (D
s
-1
=14.9 and 15.5 vs. 19.1) (Table 1).
Higher-taxonomic (OTU classifications) of the clone libraries from each of the 6 samples
revealed differences in the structure of the protistan assemblages (Table 2 and SFig. 1). When all
groups were evaluated together, the Alveolates was the largest group, comprising 36.3% of the
total clones (22.9% Syndiniales, 12.7% ciliates and 0.8% dinoflagellates). The Stramenopile
group had the second highest number of sequences at 26.1%; it consisted of diatoms (15.1%),
marine Stramenopiles (MAST) (8.3%), and other groups (2.7%). The contribution of the
subsequent groups to the combined sample dropped precipitously, with the groups Telonema,
Chlorophytes, Haptophytes and Chrysophytes respectively accounting for 8.5%, 7.8%, 5.2% and
4.1% of the total sample. These were followed by the Picobiliphytes, Cercozoa and
Choanoflagellates that contribute 3.8%, 3.2% and 2.3%. Finally, the Cryptophytes and “other”
protists (Centroheliozoa, Cryomonadida, Katablepharidophyta and Rhodophyta) made up the
final 2.6% of the sample (Table 2).
Comparison of taxonomic composition of DNA vs. cDNA libraries
DNA samples extracted by mechanical lysis (ML) had 304 clones; 73 OTUs. RNA was reverse-
transcribed to cDNA and yielded 277 clones and 65 OTUs (Table 2). Forty-one OTUs were
shared between the cDNA and DNA samples, while the rest were unique to each sample. Biases
that could be attributed to nucleic acid (DNA vs. RNA) are shown in Fig. 1A. The biggest
difference between the two datasets was for a dominant Group I Syndiniales. This group was
18
highly represented DNA library (21.4%) when compared to the cDNA sample (0.4%).
Syndiniales Group II also showed higher representation in DNA vs. cDNA libraries (9.5% vs.
0.4%), respectively (Table 2; Fig. 1A). The ciliates were the next most dominant group and were
almost equally represented in both DNA and cDNA libraries (18.1% vs. 20.9%), respectively.
The MAST Stramenopiles, were slightly favored in cDNA libraries vs. DNA libraries (10.5% vs.
3.3%), respectively, but others Stramenopiles (including diatoms) were generally equally
represented. Other groups that demonstrated increased representation in the cDNA sample
compared to the DNA sample were the Cercozoa (6.9% vs. 1.3%), and the Picobiliphytes (8.7%
to 2.6%). The contribution of the next most numerous groups dropped to less than 10% of the
samples, making comparisons less clear. Chlorophytes, Haptophytes and Telonema were all
fairly equally represented. The remaining groups have too low clonal representation (< 5%) to
compare (Table 2; Fig. 1A).
Comparison of taxonomic compositions of mechanical lysis vs. chemical lysis libraries
Taxonomic affiliations of 18S DNA clone libraries generated using a mechanical (ML) lysis
method vs. a chemical lysis (CL) extraction method from replicate filters were compared. There
were 304 and 354 clones and 73 and 70 OTUs in the ML and CL samples, respectively. Among
OTUs, 41 were common to both samples. Although the ML and CL clone libraries were
constructed using the same seawater sample, each method produced a strikingly different picture
of natural protistan diversity (Fig. 1B; Table 2), especially for ecologically important groups. In
the largest groups, ciliates were overrepresented in ML samples (18.1% vs. 1.7%). Of the
abundant Stramenopiles, both diatoms and MAST groups were overrepresented in CL samples
when compared to ML samples (diatoms; 24.3% vs. 8.2%) and MAST (11.0% vs. 3.3%). The
19
dominant group in the ML sample, Syndiniales Group I (21.4%), had a slightly decreased
representation in the CL sample (16.9%). Other groups were represented at similar levels in both
ML and CL groups (Fig. 1B; Table 2).
Phylogenetic affiliations of OTUs
Maximum likelihood trees were constructed for the major taxonomic groups to confirm the
taxonomic affiliations of sequences obtained in this study and to investigate their phylogenetic
diversity. Two are included here. Alveolate sequences belonged to five different groups:
Syndiniales Group I, II, and V, ciliates and dinoflagellates. Although more sequences were
affiliated with Syndiniales Group I (n=131), they only comprised three previously-defined (21)
independent clades: clades 1, 4 and 5 (Fig. 2). Within the more diverse Syndiniales Group II, 82
clones from this study belonged to seven independent clades. Only one sequence was retrieved
that belonged to Syndiniales Group V, which is a less diverse group with fewer sequences
retrieved thus far, relative to Groups I and II (21) (STable 2). While the ML method was most
successful at retrieving a diversity of clones compared to the CL method, there were nonetheless
a few OTUs that were only retrieved by the CL method.
The marine stramenopile (MAST) groups were also well-represented in our samples.
Sequences from 7 out of the 12 previously defined clusters (30) were retrieved. MAST-1
comprised 2 distinct clades (Fig. 3) with each clade formed by a different OTU. Each of the
other clades comprised 1 OTU. The CL method was successful at retrieving a few more OTUs
than the ML method, which did not produce any unique OTUs (STable 3)
Discussion
20
Microbial eukaryotes are ubiquitous in marine systems and crucial to the structure and function
of ecosystems (Caron et al. 2009b). They have important roles as part of the microbial loop as
photosynthesizers, grazers and remineralizers of nutrients (Sherr et al. 2007). They may also lend
resiliency to whole ecosystems in the face of changing conditions (Caron and Countway 2009).
The growing awareness of their importance has resulted in increased focus on studying
community composition and distribution, and 18S rRNA gene surveys have provided an
attractive alternative (or complement) to traditional microscopic assessments of eukaryotic
communities. These surveys have revealed the presence of high-level taxonomic groups that
comprise mostly or entirely of uncultured environmental clones and patchy distribution of
communities.
Different PCR primers can bias results (Massana et al. 2006), however less is known
about the biases resulting from extraction methods and nucleic acid extraction types to determine
potential biases. Our results will be useful for researchers that may be trying to compare past
studies that have used different methods or want to avoid methodological biases that could mask
diverse or transient assemblages. While many eukaryotic groups appeared to have similar
representation in libraries, we found that a few ecologically significant group-specific biases can
be introduced depending on the extraction method chosen.
The release of nucleic acids during nucleic acid extraction is dependent on the structure
of the membranes. Rigorous lysis procedures may be required for some groups but be
detrimental to the target genes of other groups. For example, in our sample, ciliates were more
represented in libraries generated using a ML method. Conversely, relatively few ciliate clones
were observed in filters that were extracted with a chemical lysis method, while the
Stramenopiles (e.g. diatoms and MAST groups) are much more represented in libraries that have
21
been extracted using chemical lysis. Diatoms are common members of the phytoplankton
assemblages in Southern California waters near the SPOTs sampling station, but their relative
abundance in clone libraries often does not reflect typical numerical dominance over
dinoflagellates during periods of the spring bloom (Countway et al. 2010), and it has been
suggested that this could be due to an extraction, amplification or cloning bias. Our study results
show a clear bias for extraction method (CL vs. ML), with chemical lysis yielding a higher
proportion of (especially diatom) clones. To date, organisms within the MAST groups have not
been cultured, and very little is known about their morphology. However, a pelagic protist
known as Solenicola setigera was observed to form colonies on the diatom Leptocylindrus
mediterranus (insert citation). This Solenicola-Leptocylindrus consortium has been found to be
ubiquitous in marine environments ranging from the tropical to Artic and Antarctic waters
(Gomez 2007). Recently, the phylogenetic position of Leptocylindrus has been clarified and
shown to branch within the MAST-3 group (Gómez et al. 2010). If a similar lifestyle holds true
for the MAST-3 and other MAST organisms at the study site, as diatoms are preferentially lysed
by the CL method, any organism living in consortium with the diatoms may also show up at
greater frequency. We also observed the co-occurrence of potentially different but closely related
Solenicola strains in their sample of colonized diatoms; therefore, it is possible that ML might
miss strains of commensals or parasites that are more closely attached to the host.
The last group that has biased representation was the syndiniales, a novel alveolate
related to dinoflagellates (Harada et al. 2007; Guillou et al. 2008). The syndiniales can be quite
diverse (Fig. 2) and can contribute up to 50% of sequences retrieved in clone libraries from
coastal and oceanic regions around the world (Romari and Vaulot 2004; Guillou et al. 2008).
Many are thought to be parasitic on other protists, zooplankton and fish. In surface samples from
22
SPOTs collected over a four month period in 2001, syndiniales contributed between 3.5 and
8.5% of the community (Countway et al. 2010). Syndiniales groups in our combined samples
made up 22.9% of the sample, but were more represented in 18S rRNA gene libraries vs. the 18S
rRNA gene transcript (cDNA) libraries. Overrepresentation of the syndiniales in the rRNA gene
versus gene transcript libraries was also seen in a study of Arctic picoeukaryotes (Terrado et al.
2011). However they appeared to be more equally represented in both chemical and mechanical
lysis extraction methods. This result suggests that this group may be abundant but in an inactive
state. Alternatively, the overrepresentation of 18S DNA sequences compared to RNA sequences
could be due to multiple copies of the gene in the genome (Zhu et al.). For this group the nucleic
acid type, whether DNA or RNA, is important.
Many microbial eukaryotes can have high levels of gene duplication (Van Dolah et al.
2009). While precise information on the gene copy number of the Syndiniales is not known, the
number of 18S gene copies in microbial eukaryotes can vary by four orders of magnitude and is
generally correlated to genome size (Prokopowich et al. 2003). The Syndiniales groups are
closely related to dinoflagellates, which are renowned for their large genomes (Hackett et al.
2004). In addition, the low and sometimes non-existent branch lengths of the Syndiniales tree
support the multiple gene copy explanation. On the other hand, although trophic mode cannot be
inferred based on phylogenetic affiliation, if the Syndiniales are parasites like the
Duboscquellidae, it makes sense for there to be a lot of dormant cells or cysts in the water
column that are ready to take advantage of changing conditions to begin infecting other cells.
Picobiliphytes, Cercozoa and Chrysophytes showed slight bias with higher proportions in
samples with cDNA vs. DNA. Our phylogenetic trees showed a greater number and diversity of
transcript clones, including some clades that exclusively contained clones only found in the
23
transcript samples (trees not shown). This pattern is indicative of rare but highly active
organisms. Alternatively, they may have fewer rRNA gene copies, and may have been swamped
out by organisms with higher gene copy numbers in 18S rRNA gene clone libraries. The
Picobiliphytes are a recently discovered group of pigmented protists that are related to
telonemids, cryptophytes and katablepharids (Not et al. 2007; Yoon et al. 2011). A previous
study found that the Picobiliphytes may be underrepresented using conventional PCR-based
clone libraries (Heywood et al. 2010). Again, a lower gene copy number may have allowed their
presence to be masked by the presence of eukaryotes with higher gene copy numbers.
Fluorescence in-situ hybridization (FISH) enumeration estimated this group to be present in low
abundance (<1%) in the Artic Ocean, Norwegian Sea and other places off the coast of Europe
(Not et al. 2007). The current study indicates that while this group of organisms may be present
in low abundances, they are nonetheless active and could be significant in the environment, an
observation that would have been lost if just looking at conventional 18S rRNA gene clone
libraries.
Conclusions
Our results indicate that clone certain types of 18S rRNA gene and gene transcript clone libraries
may not always represent abundant organisms, or groups that are ecologically important and
active in the environment at sampling time. An 18S rRNA transcript clone library may be more
likely to reflect the portion of the microbial community that was active at the time that the
sample was taken. Therefore, organisms that are retrieved in low numbers in DNA clone libraries
cannot be assumed to be minor components of the community, as they may not have been
effectively sampled or they may not contribute significantly to the activity of the community. For
24
example, diatoms have historically been underrepresented in SPOTs libraries, especially during
the Spring bloom, when they may dominate the community in microscopic counts. Learning that
diatoms are not well-lysed with the more commonly used mechanical extraction methods may
explain some of this discrepancy. The abundances of siliceous organisms were greater in clone
libraries created using the chemical lysis method. Recognizing that the uncultured Group II
Novel Alveolates (Syndiniales) also tend to also be more abundant in chemically lysed samples
provides more insight into their proposed parasitic/commensal lifestyle with diatoms.
This result further underscores the need for caution when using relative abundances alone
in clone libraries to infer natural abundances. While no method is perfect or bias-free, teasing out
the groups that are preferentially amplified by the different methods can inform decisions on the
best method for a target application. We recommend that combinations of methods and 18S
rRNA gene and rRNA gene transcripts be used to optimize targeted sampling. While it is
intuitive that a combination of ML and CL and DNA and cDNA methods will yield a higher
number and diversity of clones, this research provides an important look into group specific
biases caused by different sampling techniques, which may help interpret past studies or
optimize DNA yields for targeted future studies when costs and time are constrained.
Acknowledgements
This work was supported by National Science Foundation (NSF) grant EF 0626526 to K.B.H., J.
Heidelberg and D. Caron. A.E.K acknowledges funding from an NIH Training Grant for salary.
Both authors would like to thank William C. Nelson for his assistance with the data analysis as
well as David A. Caron, Peter D. Countway and John F. Heidelberg for constructive discussions
on the research presented in this paper.
25
Tables
Table 1. Protistan diversity estimates for each clone library, sample and combined clone
libraries
a
.
Richness (95% CI)
n OTUs DS-1 Chao1-bc ACE1
Individual clone libraries
1 116 41 17.4 58.3 (46.9, 91.6) 92.4 (59.4, 184.9)
2 159 46 11.9 108.1 (69.6, 209.7) 254.7 (118.2, 649.9)
3 178 42 10.4 126.3 (68.5, 309.9) 116.2 (67.4, 259.2)
4 161 37 8.7 47.5 (40.1, 72.5) 58.2 (43.6, 105.7)
5 146 45 7.2 107.1 (68.6, 208.7) 225.8 (109.6, 550.8)
6 176 44 12.6 94.6 (61.0, 194.3) 105.8 (65.5, 221.7)
Total 936 126 22.4 265.2 (197.3, 398.8) 380.7 (257.7, 619.4)
Combined clone libraries
1+4 277 65 15.5 101.25 (79.6, 155.3) 135.8 (94.8, 232.9)
2+5 305 74 14.9 177.5 (119.8, 307.7) 335.5 (186.1, 684.0)
3+6 354 67 19.1 144.5 (96.0, 273.9) 135.2 (95.0, 233.0)
a
Analyses included the following: 1, 0.8μm filter, mechanical lysis, cDNA; 2, 0.8μm filter,
mechanical lysis, DNA; 3, 0.8μm filter, chemical lysis, DNA; 4, 3.0μm filter, mechanical lysis,
cDNA; 5, 3.0μm filter, mechanical lysis, DNA; 6, 3.0μm filter, chemical lysis, DNA. Combined
26
libraries are indicated for ML cDNA, ML DNA and CL DNA comparisons. D
S
-1
is the inverse of
the Simpson’s index. Chao1 and ACE1 are non-parametric richness estimators. Rare OTUs were
defined as those with just singletons and doubletons (bias-corrected Chao1) or those with ≤ 10
members (ACE1) (Chao et al. 2005).
27
Table 2. Number of microbial eukaryote sequences by group for each sample type
a
.
ML cDNA ML DNA CL DNA Total
Taxonomic group
No. of
sequences
Relative
distribution
No. of
sequences
Relative
distribution
No. of
sequences
Relative
distribution
No. of
sequences
Relative
distribution
Alveolates
Ciliates 58 (3) 20.9 55 (4) 18.1 6 (2) 1.7 119 (6) 12.7
Dinoflagellates 0 (0) 0.0 2 (2) 0.7 5 (3) 1.4 7 (4) 0.8
Syndiniales group I 6 (3) 2.2 65 (7) 21.4 60 (5) 16.9 131 (11) 14.0
Syndiniales group II 1 (1) 0.4 29 (7) 9.5 52 (11) 14.7 82 (12) 8.8
Syndiniales group V 0 (0) 0.0 1 (1) 0.3 0 (0) 0.0 1 (1) 0.1
Cercozoa 19 (8) 6.9 4 (3) 1.3 7 (5) 2.0 30 (12) 3.2
Chlorophytes 17 (6) 6.1 29 (7) 9.5 27 (5) 7.6 73 (9) 7.8
Choanoflagellates 11 (5) 4.0 8 (5) 2.6 3 (2) 0.8 22 (6) 2.4
Chrysophytes 17 (6) 6.1 7 (4) 2.3 14 (4) 4.0 38 (8) 4.1
Cryptophytes 3 (2) 1.1 2 (2) 0.7 3 (1) 0.8 8 (3) 0.9
Haptophytes 20 (4) 7.2 22 (6) 7.2 7 (2) 2.0 49 (7) 5.2
Stramenopiles
Diatoms 30 (6) 10.8 25 (8) 8.2 86 (9) 24.3 141 (12) 15.1
28
MAST 29 (7) 10.5 10 (4) 3.3 39 (10) 11.0 78 (11) 8.3
Other Stramenopiles 9 (8) 3.2 11 (8) 3.6 5 (5) 1.4 25 (17) 2.7
Picobiliphytes 24 (2) 8.7 8 (2) 2.6 4 (1) 1.1 36 (2) 3.9
Telonema 25 (2) 9.0 24 (2) 7.9 30 (2) 8.5 79 (2) 8.5
Other protists 8 (2) 2.9 2 (1) 0.7 6 (3) 1.7 16 (3) 1.7
Total 277 (65) 304 (73) 354 (70) 935 (126)
a
Relative distributions are provided as percent composition. ML, mechanical lysis extraction; CL, chemical lysis extraction. Numbers
in parentheses indicate the number of OTUs at 95% similarity.
29
Figure Legends
Figure 1. Taxonomic ratio differences in DNA vs. cDNA (A) and mechanical vs. chemical lysis
extraction methods (B) for specified clone libraries. The dotted line represents a 1:1 line. The
number of sequences are included in the corners corresponding to the library type.
Figure 2. Maximum likelihood tree of Syndiniales groups I, II, and V based on analysis of
partial 18S rRNA gene sequences. Representative sequences from this dataset obtained by
chemical lysis (*) or mechanical lysis (#) are shown. The ciliate Strombidium purpureum was
used as an outgroup.
Figure 3. Maximum likelihood tree of marine stramenopile (MAST) groups based on analysis of
partial 18S rRNA sequences. Only representative sequences from this study (indicated by
asterisks) in each clade are shown on this condensed tree. Clones beginning with EMC and TMC
were extracted by mechanical lysis. Clones beginning with ECD and TCD were extracted by
chemical lysis. Opalina ranarun, an opalinid, was used as an outgroup.
Figures
30
Figure 1
31
Figure 2
32
Figure 3
33
Supplemental Information
Supplementary Table 1. Number of OTUs by taxa at 95% similarity for each sample (number
of clones).
Taxonomic group cDNA ML DNA CL DNA Total
Cercozoa 8 (19) 3 (4) 5 (7) 12 (30)
Chlorophytes 6 (17) 7 (29) 5 (27) 9 (73)
Choanoflagellates 5 (11) 5 (8) 2 (3) 6 (22)
Cryptophytes 2 (3) 2 (2) 1 (3) 3 (8)
Haptophytes 4 (2) 6 (22) 2 (7) 7 (49)
Alveolates
Ciliates 3 (58) 4 (55) 2 (6) 6 (119)
Dinoflagellates 0 (0) 2 (2) 3 (5) 4 (7)
Syndiniales Group I 3 (6) 7 (65) 5 (60) 11 (131)
Syndiniales Group II 1 (1) 7 (29) 11 (52) 12 (82)
Syndiniales Group V 0 (0) 1 (1) 0 (0) 0 (0)
Stramenopiles
Diatoms 6 (30) 8 (25) 9 (86) 12 (141)
Chrysophytes 6 (17) 4 (7) 4 (14) 8 (38)
MAST 7 (29) 4 (10) 10 (39) 11 (78)
Other stramenopiles 8 (9) 8 (11) 5 (5) 17 (25)
Picobiliphytes 2 (24) 2 (8) 1 (4) 2 (36)
Telonema 2 (25) 2 (24) 2 (30) 2 (79)
34
Other protists 2 (8) 1 (2) 3 (6) 3 (16)
65 (277) 73 (304) 70 (354) 125 (917)
35
Supplementary Table 2. Number of OTUs and clones that belong to Syndiniales groups
No. of OTUs EMC EMD ECD
Group I Clade 1 5 6 53 57
Group I Clade 3 1 1
Group I Clade 4 1 1
Group I Clade 5 4 2 3
Group II Clade 1 1 1
Group II Clade 10+11 4 1 22 43
Group II Clade 16 1 1
Group II Clade 39 1 1
Group II Clade 44 1 1
Group II Clade 6 1 3
Group II Clade 7 2 4 4
Group II Clade 8 1 1
Group V 1 1
36
Supplementary figure 1. Percent microbial eukaryotic taxa from 18S rRNA gene or gene
transcript (cDNA) clone libraries in each of the comparative groups. ML = mechanical lysis;
DNA = 18S rRNA genes; cDNA = rNA converted to cDNA
37
Supplementary figure 1
38
Chapter Two
Comparative transcriptome analysis of four prymnesiophyte algae
Abstract
Genomic studies of bacteria, archaea and viruses have provided insights into the microbial world
by unveiling potential functional capabilities and molecular pathways. However, the rate of
discovery has been slower among microbial eukaryotes, whose genomes are larger and more
complex. Transcriptomic approaches provide a cost-effective alternative for examining genetic
potential and physiological responses of microbial eukaryotes to environmental stimuli. In this
study, we generated and compared the transcriptomes of four globally-distributed, bloom-
forming prymnesiophyte algae: Prymnesium parvum, Chrysochromulina brevifilum,
Chrysochromulina ericina and Phaeocystis antarctica. Our results revealed that the four
transcriptomes possess a set of core genes that are similar in number and shared across all four
organisms. The functional classifications of these core genes using the euKaryotic Orthologous
Genes (KOG) database were also similar among the four study organisms. More broadly, when
the frequencies of different cellular and physiological functions were compared with other
protists, the species clustered by both phylogeny and nutritional modes. Thus, these clustering
patterns provide insight into genomic factors relating to both evolutionary relationships as well
as trophic ecology. This paper provides a novel comparative analysis of the transcriptomes of
ecologically important and closely related prymnesiophyte protists and advances an emerging
field of study that uses transcriptomics to reveal ecology and function in protists.
39
Introduction
Genome sequencing of microorganisms has unveiled a wealth of new information regarding the
ecology, physiology and interactions of organisms in the environment. In contrast to most
bacteria, archaea and viruses, protistan genomes tend to be large (10-200 Mb compared to 1-10
Mb in bacteria) and more complex, factors that obfuscate bioinformatic analyses, and make for a
slower rate of assembly, annotation and gene discovery (Caron et al. 2009b). The lack of well
annotated reference genomes also make de novo sequence analysis extremely challenging.
Consequently, the current repository of sequenced and annotated eukaryotic genomes covers a
small portion of microbial eukaryotic diversity, and is biased toward model organisms and
parasitic species that cause human diseases (Anantharaman et al. 2007; Pagani et al. 2011).
Transcriptomes contain only the transcribed portions of genomes (mRNA), which
simplifies genetic analyses of eukaryotes by removing complex genetic elements of large
intergenic regions, introns and repetitive DNA. In protists, the poly-A+ tail of mRNA transcripts
can be selected for sequencing, enriching eukaryotic sequences even in a bacterialized, uni-
protistan culture. As such, transcriptomes can be used for the molecular study of protists of
interest, circumventing difficult issues such as complicated sequence assembly procedures, to
interrogate metabolic and cellular processes.
A mixotrophic nutritional mode among some photosynthetic flagellates (defined here as
chloroplast-containing protistan species that also possess the ability for phagotrophy) is a
geographically and phylogenetically widespread phenomenon among aquatic protists. A growing
body of literature indicates that mixotrophy, especially the consumption of bacteria by
phototrophic plankton, is a significant ecological strategy in global marine systems (Unrein et al.
2007; Zubkov and Tarran 2008; Jeong et al. 2010; Hartmann et al. 2012). Mixotrophy may
40
confer a variety of ecological advantages including carbon, macro- or micronutrient acquisition,
and/or supplementation of energy generation (Sanders 1991, 2011).
Within the broad spectrum of taxa and nutritional strategies that have been reported, the
mixotrophic capabilities of prymnesiophyte (haptophyte) algae have been well documented.
Molecular surveys and pigment composition analyses have indicated that prymnesiophytes are
globally distributed and abundant in both marine and freshwater ecosystems (Frada et al. 2009;
Cuvelier et al. 2010; Bielewicz et al. 2011; Kong et al. 2012; Unrein et al. 2013) where they play
key roles in nutrient and organic carbon cycling (Chavez et al. 1990; Green 1991). Among
mixotrophic flagellates studied year-round off the coast of Catalan (Mediterranean), for example,
prymnesiophytes were found to be the most important phylogenetic group, accounting for on
average 40% of total bacterivory by mixotrophs and 9-27% of total bacterivory (Unrein et al.
2013; Jones et al. 2013).
The transcriptomes of four prymnesiophyte algae were compared in this study:
Prymnesium parvum, Chrysochromulina brevifilum, Chrysochromulina ericina and Phaeocystis
antarctica. P. parvum is a toxin producer that is capable of developing large, monospecific
blooms that cause massive fish kills, ecosystem disruption, and significant economic losses
(Moestrup 1994). Chrysochromulina spp. are also found globally (Thomsen et al. 1994), with
some species capable of forming blooms and mass mortality events (Hansen et al. 1995). P.
antarctica forms colonies of cells that are embedded in a polysaccharide gel matrix. It is a key
species in the Southern Ocean, and is capable of forming blooms of up to 10
7
cells L
-1
(Edvardsen and Imai 2006). P. antarctica may also play a significant role in global carbon and
sulfur cycles (Smith et al. 1991; DiTullio et al. 2000). Three of the four species in this study have
also been reported to exhibit mixotrophic nutrition. P. parvum can ingest bacteria and other
41
protists, and is capable of taking up organic nutrients (Nygaard and Tobiesen 1993; Tillmann and
Hesse 1999; Carvalho and Granéli 2010). Members of the diverse genus Chrysochromulina,
including C. brevifilum and C. ericina have also been reported to ingest prey (Hansen and Hjorth
2002; Jones et al. 2009).
The purpose of this study was to compare the transcripts of closely related
prymnesiophytes in order to understand commonalities and differences attributable to both
taxonomic relatedness and trophic mode. Our analysis revealed a set of core genes that were
shared among all four targeted organisms. The frequencies of functional gene categories in these
prymnesiophytes were compared to other protistan organisms in publicly available databases,
and indicated that species clustered by genomic information based on both phylogeny and
nutritional mode.
Methods
Culture conditions
Prymnesium parvum (clone Texoma1) was isolated from Lake Texoma, Oklahoma, USA, and
made clonal and axenic by micropipetting single cells through rinses of sterile medium.
Chrysochromulina brevifilum (clone UTEX LB985) and Chrysochromulina ericina (clone
CCMP281) were obtained from UTEX The Culture Collection of Algae and the National Center
for Marine Algae and Microbiota, respectively. Phaeocystis antarctica was isolated from the
Ross Sea, Antarctica by Robert Sanders. P. antarctica was collected under permits issued by the
US Antarctic Program through the US Department of State for work in Antarctica. P. parvum
was collected at the University of Oklahoma Biological Station, Lake Texoma. No collection
permits were required.The four study organisms were grown in optimal replete medium for each
species (Table 1) in 1-2 L volumes in 2800 ml Pyrex glass Fernbach flasks. Irradiance was
42
measured using a QSL-100 sensor with QSP-170 deckbox (Biospherical Instruments Inc.) and
was approximately 300 μE·m
-2
s
-1
. Cultures were grown in a 12:12 hour light:dark cycle with
illumination provided by Philips F20T12CW bulbs. P. parvum was grown axenically while the
other three species were uniprotistan but non-axenic (bacterized) cultures.
RNA Isolation
All cultures were harvested during mid-exponential growth phase by centrifugation in an
Eppendorf 5810R centrifuge using the A-4-62 rotor at 3200 rcf for 15 min at 15°C. The
supernatant was carefully decanted, and 1-2 ml of TRI reagent (Ambion) was added to the pellet
and vortexed until the pellet fully dissolved. Homogenates were then either processed
immediately using Ribopure kit (Ambion), or stored at -80˚C for later processing. The eluted
RNA was treated with DNase (Sigma) to remove DNA contamination. Samples were cleaned
and concentrated using RNA Clean and Concentrator-25 (Zymo Research). The RNA was
quantified using a Qubit 2.0 Fluorometer (Invitrogen) and nucleic acid quality was checked using
an E-gel Gel EX 1% (Invitrogen).
Library Preparation and Sequencing
All samples were quantified again at the sequencing center using Invitrogen Qubit Q32855 and
RNA quality was assessed using the Agilent 2100 Bioanalyzer. Libraries were made from 2 μg
RNA using Illumina’s TruSeq RNA Sample Preparation Kit. The average insert size of each
library ranged from 250 to 350 bp. Libraries were sequenced on an Illumina HiSeq 2000 to
obtain 2 x 50 bp (paired-end) reads. Over 2 Gbp of sequence was generated per library. Library
preparation and sequencing were performed as part of the Marine Microbial Eukaryote
43
Transcriptome Sequencing Project (MMETSP) supported by the Gordon and Betty Moore
Foundation (http://marinemicroeukaryotes.org/).
Transcriptome Assembly
Transcriptome assemblies for P. parvum, C. ericina and P. antarctica were obtained using the
National Center for Genome Research’s (NCGR) internal pipeline BPA1.0 (Batch Parallel
Assembly version 1.0). Sequence reads were preprocessed using SGA preprocess (Simpson and
Durbin 2012) for quality trimming (swinging average) at Q15. Reads less than 25 nt after
trimming were discarded. Preprocessed sequence reads were assembled into contigs with
ABySS v.1.3.0 (Simpson et al. 2009), using 20 unique kmers between k=26 and k=50. ABySS
was run requiring a minimum kmer coverage of 5, bubble popping at >0.9 branch identity with
the scaffolding flag enabled to maintain contiguity for divergent branching. Paired-end
scaffolding was performed on each kmer. Sequence read pairing information was used in
GapCloser v.1.10 (Li et al. 2008) (part of SOAP de novo package) to walk in on gaps created
during scaffolding in each individual kmer assembly. Contigs from all gap-closed kmer
assemblies were combined. The OLC (overlap layout consensus) assembler miraEST (Chevreux
et al. 2004) was used to identify minimum 100 bp overlaps between the contigs and to assemble
larger contigs, while collapsing redundancies. The Burrows-Wheeler Alignment (BWA) (Li and
Durbin 2009) was used to align sequence reads back to the contigs. Alignments were processed
by SAMtools mpileup (http://samtools.sourceforge.net) to generate consensus nucleotide calls at
positions where IUPAC bases were introduced by miraEST (Chevreux et al. 2004), and read
composition showed a predominance of a single base. The consensus contigs were filtered at a
minimum length of 150 nt to produce the final set of contigs.
44
Transcriptome assembly for C. brevifilum was obtained using NCGR’s internal pipeline
BPA2.0 (Batch Parallel Assembly v.2.0). The differences between BPA1.0 and BPA2.0 are as
follows: preprocessed sequence reads were assembled into contigs with a newer version of
ABySS (Simpson et al. 2009) v.1.3.3, with the scaffolding flag disabled to avoid over-reduction
of divergent regions. Unitigs from all kmer assemblies were combined and redundancies were
removed using CD-HIT-EST (Li and Godzik 2006) with a clustering threshold of 0.98 identity.
The OLC assembler CAP3 (Huang and Madan 1999) was then used to identify minimum 100 bp
overlaps between the resultant contigs and assemble larger sequence. The resulting contigs were
paired-end scaffolded using ABySS (Simpson et al. 2009). Sequence read pairing information
was used in GapCloser (Li et al. 2008) (part of SOAP de novo package) v. 1.10 to walk in on
gaps created during scaffolding. Redundant sequences were again removed using CD-HIT-EST
(Li and Godzik 2006) at a clustering threshold of 0.98 identity. The consensus contigs were
filtered at a minimum length of 150 nt to produce the final set of contigs.
The sequences for the four transcriptomes used in this study have been deposited in
CAMERA (Community Cyberinfrastructure for Advanced Microbial Ecology Research and
Analysis) with the following accession numbers: CAM_ASM_000151 (P. parvum),
CAM_ASM_000808 (C. brevifilum), CAM_ASM_000453 (C. ericina), CAM_ASM_000460 (P.
antarctica).
Transcriptome Annotation and Comparison
Non-coding ribosomal RNAs and transfer RNAs were detected using RNAmmer (Lagesen et al.
2007) and tRNAscan (Lowe and Eddy 1997), respectively. Coding nucleotide sequences and
corresponding translated peptide sequences were predicted using ESTScan (Iseli et al. 1999;
Lottaz et al. 2003) with a Bacillariophyta scoring matrix. Sequence reads were aligned back to
45
the nucleotide motifs of the predicted coding sequences using BWA (Li and Durbin 2009).
Peptide predictions over 30 amino acids in length were annotated. Blastp (Altschul et al. 1990)
was used to generate hits against the UniProtKB/Swiss-Prot database. Protein sequences were
also functionally characterized using HMMER3 (Zhang and Wood 2003) against Pfam-A (Finn
et al. 2010), TIGRFAM (Haft et al. 2001), and SUPERFAMILY (Gough et al. 2001) databases.
Only protein sequences greater than 70 amino acids were used in subsequent analyses.
Predicted proteins sequences were also grouped into gene clusters using orthomcl (Chen et al.
2006). The resulting data were used in a comparative analysis of predicted gene clusters that
were shared and unique among the four transcriptomes.
Polyketide synthase analysis
Proteins coding for putative polyketide synthase genes were initially identified by a local
BLAST against Emiliania huxleyi polyketide synthase sequences obtained from GenBank.
Sequences with HMM annotations to polyketide synthases were further identified using the
NRPS-PKS tool (Bachmann and Ravel 2009) to identify the PKS domains present. Sequences
that contained the ketosynthase (KS) domain were used to construct a maximum likelihood tree
with 100 bootstraps using the software MEGA5 (Tamura et al. 2011).
KOG analysis
Putative protein sequences were functionally annotated using KOG categories by blasting all the
sequences against the KOG database (Tatusov et al. 2003; Koonin et al. 2004). The E-value
cutoff for a positive hit was 10
-10
. In addition, we used a bit score ratio cut-off, which was
derived by dividing the bit score for a hit by the bit score from a self-self hit. A ratio of 0.2 was
46
determined to be a good cut-off after manually evaluating the quality of a number of
representative hits.
Predicted protein sequences of 37 other protistan genomes and transcriptomes were
downloaded from the JGI website (Grigoriev et al. 2012) and the CAMERA website (Sun et al.
2011), respectively. The distribution of these sequences among KOG categories were obtained,
as above, by performing a BLAST search against the KOG database and accepting hits that met
the E-value and bit score ratio cut-offs of 10
-10
and 0.2. The KOG category distribution of each
species was obtained by dividing the number of hits in each category by the total number of hits
to the KOG database. The relative frequencies of each KOG functional category were used to
generate a non-metric multidimensional scaling (NMDS) plot using the statistical package R
(Team 2005) to compare the overall similarity of the organisms to each other.
A subsequent principal components analysis was performed on the relative frequencies of
KOG categories across the 41 protistan datasets to confirm the results from the NMDS analysis
and to elucidate how much of the KOG frequency pattern could be explained by the principal
axes. Additional statistical analyses were done to investigate the influence of trophic mode
versus phylogeny on the observed KOG frequencies. Since data were not normally distributed,
the influence of phylogeny and trophic mode was tested independently with a non-parametric
Krustal-Wallis test, followed by a Steel-Fligner test if significant differences were observed. All
statistical analyses and plots were calculated or generated using XLSTAT (v.2013.06.04,
Adinsoft TM).
Statistical analysis were used with two datasets: (1) the original dataset of 41 species,
with the exception of the mycetozoan, Dictyostelium purpureum, the choanoflagellate Monosiga
brevicollis and the rhodophyte Cyanidioschyzon merolae, which were excluded for statistical
47
reasons (no replicates within the phylogenetic group); and (2) a reduced dataset that included
only the stramenopiles and prymnesiophytes. The reduced dataset was used to prevent the strong
phylogenetic signal present within the alveolates and chlorophytes from masking a signal based
on trophic mode.
Results
Overview of the transcriptome
Sequencing of the transcriptomes of the four target prymnesiophyte species generated datasets
ranging in size from 35.6 million to 61.0 million reads, resulting in 36.4 to 52.0 Mbp of
assembled sequence data (Table 2). The number of assembled contigs for P. parvum, C.
brevifilum, C. ericina and P. antarctica ranged from 30,986 to 56,193. Approximately 72% to
83% of these contigs were predicted to be protein-coding sequences. Non-coding RNAs such as
rRNAs and tRNAs formed a small number of the total contigs (Table S1). The transcriptome
sizes in our study were comparable to other publicly available transcriptomes sequenced by the
Gordon and Betty Moore Foundation MMETSP. The complete pathways for essential metabolic
and cellular processes were present in each transcriptome, indicating that the sequencing depth
provided good coverage. In addition, the number of ribosomal protein genes present in our
datasets (94-101) was similar to the number in E. huxleyi (99), another prymnesiophyte alga
(Read et al. 2013).
Core prymnesiophyte genes
We found a set of gene clusters that were common to all four prymnesiophyte species (“core
transcriptome”). This core set of proteins consisted of 3,338 gene clusters (Fig. 1A), which
comprised 37%, 28%, 26% and 29% of the total predicted proteins in P. parvum, C. brevifilum,
48
C. ericina and P. antarctica, respectively (Fig. 1C). C. brevifilum and C. ericina shared the most
gene clusters (1,021) followed by P. parvum and C. ericina (925) and then P. antarctica and C.
ericina (928) (Fig. 1A). Each transcriptome contained unique gene clusters (each gene cluster
was comprised of multi-copy genes) not found in the other three transcriptomes; the number of
gene clusters that were unique to each species were 2,032 (P. parvum), 1,366 (C. brevifilum),
4,016 (C. ericina) and 4,412 (P. antarctica). In addition to the unique gene clusters, each species
also had unique single-copy genes (Fig. 1A). The unique gene clusters and single-copy genes
together comprised 37-52% of the transcriptomes (Fig. 1C). C. brevifilum and C. ericina had the
highest proportions of unique genes (each with 52%), while P. antarctica had 49% and P.
parvum had 37% (Fig. 1C). Approximately 20-26% of the transcriptomes was comprised of
proteins that were shared between at least two of the four prymnesiophyte species. Of the 3,338
gene clusters shared among all four species, 2,752 (82%) were annotated by KEGG or orthomcl.
In contrast, the number of annotated unique genes in each species was much smaller: P. parvum:
1,398 (20%); C. brevifilum 2,828 (20%); C. ericina: 4,205 (25%); P. antarctica: 2,057 (13%)
(Fig. 1B).
The majority of core prymnesiophyte proteins were mapped by KEGG to essential
metabolic pathways such as the biosynthesis of amino acids, carbon metabolism, fatty acid
metabolism and purine and pyrimidine metabolism. All or nearly all the genes necessary for
these pathways were present. The non-mevalonate pathway or 2-C-methyl-D-erythritol 4-
phosphate/1-deoxy-D-xylulose 5-phosphate pathway (MEP/DOXP pathway) that is part of the
terpenoid backbone biosynthesis pathway was also nearly complete. The core transcriptome also
included genes for the synthesis of ribosomal proteins, proteasomes and spliceosomes. Pathways
with only some of the necessary genes include the biosynthesis of other secondary metabolites,
49
starch and sucrose metabolism, and the urea cycle. Some genes involved in the metabolism of
cofactors and vitamins were present in all four species, however, the core transcriptome did not
contain the full pathway for the metabolism of any of the cofactors and vitamins.
We also compared the core transcriptome of our four prymnesiophyte species with the
predicted proteins from the E. huxleyi genome. Of the 1,433 proteins with KEGG annotations
that were in our shared core transcriptome, 1,303 (91%) were also present in E. huxleyi. There
were 130 predicted proteins that were present in P. parvum, C. brevifilum, C. ericina and P.
antarctica but not in E. huxleyi. They consisted of a few enzymes in various pathways, such as
nitrogen metabolism (1), cysteine and methionine metabolism (1), and TCA cycle (3), among
others. They also included alternative enzymes for the same pathway, or alternative pathways for
the same substrate/product conversion. In some cases, a protein that was unique to the four target
species might be involved in a different intermediate pathway that feeds into the same main
pathway, with no resulting change in the final product. For example, isocitrate dehydrogenase,
which is present in the core transcriptome of the four target prymnesiophyte species but not in E.
huxleyi, can convert isocitrate to oxalosuccinate to 2-oxoglutarate whereas the gene that is
present in E. huxleyi converts isocitrate directly to 2-oxoglutarate. In both cases, isocitrate is
converted to 2-oxoglutarate, which are two major compounds in the TCA cycle.
B-vitamin biosynthesis pathways
A majority of the predicted proteins unique to each species were not annotated, making it more
challenging at this time to understand the differences among the different species. Nonetheless,
some of the differences in the proteins were present in three vitamin biosynthesis pathways,
specifically thiamine, biotin and cobalamin.
50
The number of predicted proteins involved in thiamine biosynthesis was variable among
the four target species. The P. parvum and P. antarctica transcriptomes contained only the IscS
gene, which is a cysteine desulfurase (Fig. S1). Although this enzyme is part of the thiamine
metabolic pathway, it also functions in the sulfur relay system, which may explain its presence in
the two species that did not otherwise have other key thiamine synthesis enzymes. In contrast,
the two Chrysochromulina species, and C. ericina especially, possessed an interesting
composition of genes that belong to this pathway. The synthesis of thiamine pyrophosphate
(TPP)—the biologically active form of thiamine—begins with the formation of a pyrimidine
moiety (HMP-PP) and a thiazole moiety (HET-P). The former is formed by THIC and THID
from 5-aminoimidazole ribonucleotide (AIR), while the latter is catalyzed by THI4 and THIM
from NAD+, glycine and an unknown sulfur-containing compound. In bacteria, HET-P is
synthesized de novo from 1-deoxy-D-xylulose 5-phosphate (DXP) by a suite of different
enzymes (Jurgenson et al. 2009). The ThiE enzyme catalyzes the condensation of HMP-PP and
HET-P to form thiamine monophosphate. In some organisms, the function of THID and ThiE are
combined into one enzyme. Thiamine monophosphate is dephosphorylated by an unknown
phosphatase to form thiamine that is then pyrophosphorylated by TPK into thiamine
pyrophosphate. C. ericina has ThiE and TPK, and as such, given both the pyrimidine moiety and
the thiazole moiety, might be able to synthesize TPP. However, C. brevifilum had the ThiDE and
the ThiE gene but not TPK, and might have a reduced and potentially non-functional thiamine
biosynthesis pathway (Fig. S1).
The presence of biotin metabolism genes in the four transcriptomes was also variable. In
bacteria, four enzymes convert pimeloyl CoA to biotin in a sequential fashion, beginning with
BioF, followed by BioA, BioD and finally, BioB (Entcheva et al. 2002). Genomes of protists that
51
do not need exogenous thiamine such as Thalassiosira pseudonana, Chlamydomonas reinhardtii
and Cyanidioschyzon merolae have been found to contain the bioF, bioA and bioB genes (data
culled from the KEGG database). However, because these protists are able to synthesize biotin, it
has been hypothesized that an unknown enzyme might carry out the activity of the missing BioD
enzyme (Croft et al. 2006). Regardless, none of our four prymnesiophytes had all three genes. P.
parvum had bioF and bioA while C. ericina and P. antarctica had bioF and bioB. C. brevifilum
only had the bioF gene.
All four prymnesiophyte species contained the CobW gene sequence, which is annotated
as a cobalamin synthesis protein. The numbers of putative CobW genes differ among the four
transcriptomes, with the smallest number in P. parvum (2 copies) and the largest in C. ericina
(10 copies) (Table S2). In addition, C. brevifilum also had a predicted protein with sequence
similarity to cobyrinic acid a,c-diamide synthase, or CobB. C. ericina and P. antarctica both had
a CobN-like protein, which is a cobaltochelatase. P. parvum, C. brevifilum and C. ericina all
possessed the B
12
-dependent form of methionine synthase MetH, but not the B
12
-independent
form, MetE. All four prymnesiophytes also had the methylmalonyl-coA mutase (MCM) gene in
their transcriptomes (Table S2).
Polyketide synthase genes
Our datasets contained 15 putative polyketide synthase genes with a ketosynthase (KS) domain.
These sequences were found in three out of four of the target species: P. parvum, C. brevifilum
and P. antarctica (Table S3). The phylogeny of our prymnesiophyte KS sequences was analyzed
by generating a maximum likelihood tree. This analysis showed that our prymnesiophyte KS
sequences fell into two distinct clusters (Fig. 2). All the sequences from C. brevifilum and one
sequence from P. antarctica (ORF4093) clustered with E. huxleyi KS sequences in a
52
prymnesiophyte-specific PKS clade. Meanwhile, all the sequences from P. parvum and five
sequences from P. antarctica were dispersed among sequences from a diverse group of species,
including E. huxleyi, Karenia brevis and various bacteria. All the sequences that fell into the
prymnesiophyte-specific PKS clade contained multiple PKS domains (Table S3) with variable
organizations. The longest PKS sequence, ORF106225, contained two ketoreductase (KR), three
ketosynthase (KS) and five acyl-carrier protein (ACP) domains (Fig. 3). Another sequence
contained the KR, KS and ACP domains as well as a leading dehydratase (DH) domain. A
domain that carries out adenylation (A domain) was found at the start of ORF107078, followed
by ACP, KS and KR domains (Fig. 3). In P. antarctica, ORF4093 contained one ACP domain
and one KS domain. In contrast, sequences that clustered with the mixed bacterial/protistan PKS
clade only contained the KS domain.
KOG patterns
EuKaryotic Orthologous Groups (KOGs) is a tool used to identify orthologous and paralogous
proteins in eukaryotes, and assigns a functional category (Tatusov et al. 2003). The overall
distribution of KOG functions of the four prymnesiophytes in this study was very similar (Fig. 4).
Generally, the KOG category with the greatest number of peptides was O (posttranslational
modification, protein turnover, and chaperones). This was closely followed by R (general
function prediction only). The greatest difference between any two species was observed for
KOG function T, signal transduction mechanisms. C. brevifilum had the greatest number of
peptides belonging to this category relative to the other species, followed by C. ericina (2.1%
less than C. brevifilum).
Comparing the KOG distribution of our four species to that of other protists in an NMDS
analysis resulted in distinct groupings that appeared to reflect both phylogenetic relationships
53
and nutritional mode (Fig. 5). The stress value of the NMDS was 0.12, indicating that the 2D
plot was a good depiction of the separation among the data. We found that heterotrophic species
such as the choanoflagellate, Monosiga brevicollis, the stramenopile, Paraphysomonas
imperforata, the slime mold, Dictyostelium purpureaum and water molds, Phytophthora capsici
and P. ramorum, occupied a wide area of the plot, but occupied a space separate from species
that are autotrophic or mixotrophic (Fig. 5). The autotrophs formed a group that overlapped
somewhat with species that are known to be photosynthetic species capable of phagotrophy (i.e.
mixotrophic). The latter group had a large spread, mostly due to the position of the
dinoflagellates, which occupied an area distinct from other groups. The prymnesiophytes and
chrysophytes clustered together in a region of overlap between non-phagotrophic autotrophic and
mixotrophic protists. Some phylogenetic groups formed distinct clusters, such as the diatoms
Pseudo-nitzschia arenysensi, Pseudo-nitzschia delicatissima, Phaeodactylum tricornutum,
Fragilariopsis cylindrus and Leptocylindrus danicus. Additionally, the prasinophytes,
Ostreococcus and Micromonas, clustered together. In contrast, the other two chlorophytes,
Chlamydomonas reinhardtii and Chlamydomonas sp. CCMP681 did not cluster with each other,
and the chrysophyte congeners, Ochromonas sp. CCMP1899 and Ochromonas sp. BG-1 also did
not group together.
We undertook a more detailed principle components analysis (PCA) to confirm and
elucidate the NMDS results. The spatial distribution of the PCA plot was in concordance with
that of the NMDS (Fig. 6), but the two principle axes accounted for only 54% of the variability,
indicating that 46% was not explained by this representation. The top five variables that
explained up to 40% of the variability in the first axis were: C (energy production and
conversion), D (cell cycle and division), G (carbohydrate transport and metabolism), R (general
54
function prediction only) and J (translation and ribosomal structure). Along the second axis, up
to 55% of the variability could be accounted for by the following five variables: A (RNA
processing), T (signal transduction mechanisms), L (replication, recombination and repair), Z
(cytoskeleton) and Y (nuclear structure).
Statistical analyses performed on each of the KOG functions were able to tease out
specific functions that were different based on either phylogeny or trophic mode. In the original
large dataset, phylogenetic grouping had a statistically significant effect in ten KOG categories
(Table S4), while trophic mode was significant for six KOG categories. After removing the
alveolates and the chlorophytes, assuming that their strong phylogenetic signal might bias the
results, four KOG categories retained significant differences based on trophic mode, namely: D
(cell cycle control, cell division and chromosome partitioning), G (carbohydrate transport and
metabolism), H (coenzyme transport and metabolism) and K (transcription).
Box plots of the frequencies of the different KOG categories showed interesting patterns
(Fig. 7). Grouped by phylogenetic relatedness, alveolates generally had KOG functions that
differed from chlorophytes, prymnesiophytes and stramenopiles (Fig. 7: column of panels on
left). These patterns were different for taxa grouped by nutritional modes (), once the alveolates
were removed to reduce their strong effect (Fig. 7: column of panels on right). For example, the
frequency of KOG function D (cell cycle control, cell division and chromosome partitioning, top
row of panels) was significantly different for mixotrophic species compared to heterotrophs and
autotrophs, which were similar to each other (Fig. 7: top right panel). For KOG category G
(carbohydrate transport and metabolism, right hand column, panel second from top), all three
trophic modes had frequencies that were statistically significant from each other. In category H
(coenzyme transport and metabolism, right hand column, panel third from top), the heterotrophic
55
group was significantly different from the mixotrophic and photosynthetic groups. The frequency
of KOG function K (Transcription, bottom right panels) was significantly different in the
mixotrophs compared to the phototrophs, but not significantly different compared to the
heterotrophs. There was no significant difference between the heterotrophs and the phototrophs
for this KOG category.
Discussion
The numbers of predicted peptides for the four species in this study varied greatly, but
were within the range of other MMETSP dataset transcriptomes and the number of predicted
protein coding genes in E. huxleyi (30,569) (Read et al. 2013). The high number of peptides was
not due to bacterial contamination because preparation of the cDNA biased against bacterial
mRNA, and we observed negligible numbers of bacterial genes in the transcriptomes. It is
possible that the large number of peptides in part reflects the existence of fragments of the same
gene called as two different genes in cases where the sequence that joins the two fragments was
not sequenced. However, the N50 for these four datasets were between 1,297 and 1,612 aa
(Table 2), which is close to the average gene length for eukaryotes (Xu et al. 2006).
Core, shared and unique genes
Our results indicate that the four prymnesiophytes in our study share a core set of genes that may
also be more broadly shared among other prymnesiophytes. These genes code for essential
cellular and metabolic functions such as carbon metabolism, amino acid synthesis, DNA
synthesis and fatty acid metabolism. The transcriptomes also contained “shared” genes, that were
observed in two or three of the four target species. There is limited data on how much
physiological diversity might be present among congeneric protistan species because these taxa
56
have been traditionally defined based on morphological features. Therefore, we were interested
in what our data might reveal about relatively closely related taxa. The congeners C. brevifilum
and C. ericina shared more gene clusters with each other than either did with P. parvum or P.
antarctica. The proportion of genes shared between these two species (5%) was greater than the
proportion of genes Thalassiosira pseudonanna (a centric diatom) shared with Fragilariopsis
cylindrus (3%) and Pseudo-nitzschia multiseries (2%) (pennate diatoms), but much lower than
the proportion of genes shared between F. cylindrus and P. multiseries (26-28%) (Bender et al.
2014).
The remainder of the genes in each of our four prymnesiophyte transcriptomes were only
found in a single species (Fig. 1A, C). Approximately half of the transcriptome of C. brevifilum,
the species with the largest transcriptome in our study, consisted of genes unique to that species.
However, this percentage is within the range of other studies. The proportion of unique genes in
our four transcriptomes (37-52%) was similar to the results of a study that compared three
diatom transcriptomes: Thalassiosira pseudonana, Fragilariopsis cylindrus and Pseudo-nitzschia
multiseries (39-43%) (Bender et al. 2014). A majority of our unique proteins were not annotated
due to the limited genomic databases for free-living, environmentally relevant microeukaryotes.
Therefore, the genes that were found to be unique to each prymnesiophyte species may contain
functionally similar sequences and the proportion of truly unique genes in each transcriptome
may actually be lower. The proportion of peptides that were annotated in this study (33% to
47%) was similar to recent results that have been obtained for other sequenced protistan
transcriptomes. For example, 41% and 31% of the transcriptomic sequences obtained from two
dinoflagellates within the genus Symbiodinium were annotated (Bayer et al. 2012), while 33% of
the contigs from another dinoflagellate transcriptome, Heterocapsa circularisquama were
57
annotated (Salcedo et al. 2012). Only 23% of the transcriptome of the heterotrophic
dinoflagellate, Oxyrrhis marina, could be annotated using a variety of databases, including
Genbank’s nr database (Lowe et al. 2011). The same pattern of a highly annotated core genome
compared to poorly annotated unique genes has also been reported previously (Worden et al.
2009).
B-vitamin biosynthesis genes
Each of our four prymnesiophytes showed slightly different abilities to synthesize vitamins. This
result was not surprising as such differences have been observed within genera and even among
different strains of the same species of algae (Tang et al. 2010).
Thiamine is a cofactor for enzymes involved in many different metabolic pathways,
including carbohydrate and amino acid metabolism. Thiamine auxotrophy is widespread among
protistan species, with 20% of eukaryotic phytoplankton surveyed requiring exogenous thiamine
(Croft et al. 2006). The proportion of thiamine auxotrophs was found to be even higher among
harmful algal bloom species, at almost 74% (n=27) (Tang et al. 2010). Hence, the ability to
synthesize this vital molecule might confer an ecological advantage to marine protists, rather
than scavenging exogenous thiamine. Previous studies have indicated that P. parvum is a
thiamine auxotroph (McLaughlin 1958) and that prymnesiophytes in general tend to require
thiamine for growth (Croft et al. 2006; Tang et al. 2010). Our dataset indicates a difference
between the two Chrysochromulina species and P. parvum and P. antarctica in the number of
enzymes for thiamine synthesis present in their transcriptome, implying that these species may
also differ in their ability to make thiamine. Based on the genes that are present in the
transcriptomes, it would seem that P. parvum and P. antarctica are least likely to be able to
synthesize thiamine while C. brevifilum and C. ericina may be able to synthesize the vitamin,
58
either de novo, or from an intermediate in the pathway. Past studies have shown that in some
species, the need for exogenous thiamine was alleviated when either the thiazole or pyrimidine
moiety was added to the growth medium (Provasoli and Carlucci 1974). This might provide a
species with some flexibility competing against other organisms that specifically require
thiamine for growth. It is also possible that the ThiD, ThiE, ThiDE, ThiF and TPK enzymes that
are present in the two Chrysochromulina spp. are remnants of the thiamine pathway and do not
represent a functional thiamine biosynthesis pathway. Some of these genes have also been found
in Ostreococcus tauri and Micromonas pusilla CCMP1545, both of which are thiamine
auxotrophs (Worden et al. 2009; Bertrand and Allen 2012).
Biotin is a cofactor for carboxylase enzymes that are used in fatty acid synthesis, and thus
is required across all domains of life. All the haptophytes surveyed in a previous study did not
require biotin (Croft et al. 2006) but it was not an exhaustive survey. None of our four target
species contained all three biotin synthesis genes found in C. reinhardtii, T. pseudonanna and C.
merolae, three species capable of biotin synthesis. It is, of course, still possible that a functional
biosynthetic pathway is present in our target prymnesiophytes due to the presence of yet-
unidentified enzymes.
So far, only prokaryotes have been shown to synthesize cobalamin, but many protists
require cobalamin for the synthesis of amino acids and deoxyriboses, and for C1 metabolism.
Examples of enzymes that require cobalamin include methionine synthase (METH) and
methylmalonyl coA mutase (MCM). Previous studies have shown P. parvum to have a specific
and non-replaceable requirement for cobalamin in its growth media (Droop 1954; Rahat and
Reich 1963), and P. globosa, which is a congener of P. antarctica, also requires exogenous
cobalamin (Tang et al. 2010). The lack of all but one or two genes in the cobalamin biosynthetic
59
pathway and the necessity of exogenous cobalamin in the growth media of most microalgae
strongly indicate that all four of our study species are unable to synthesize cobalamin and are
dependent on external cobalamin. Additionally, only METH was present in our datasets, and not
METE, the cobalamin-independent form of methionine synthase. The latter is strongly correlated
with cobalamin independence (Helliwell et al. 2011). We also found putative MCM genes in all
four transcriptomes, further evidence for cobalamin dependence among our four target organisms.
Macronutrients such as nitrogen and phosphorus have long been known to be important
factors structuring species composition and distribution in the ocean, but in recent years
micronutrients such as vitamins have been found to also play an important role (Croft et al. 2006;
Tang et al. 2010; Bertrand and Allen 2012). Our comparative transcriptome analysis of four
prymnesiophytes has revealed potential differences in the ability of these closely related species
to synthesize some B-vitamins, perhaps indicating unique metabolic abilities or dependences that
might explain differences in their autecologies.
Polyketide synthase
Polyketide synthase genes are thought to be involved in the synthesis of at least some of the
toxins that have been found in P. parvum and C. polylepsis (Manning and La Claire 2010;
Manning and La Claire John W 2013). Two of the toxins produced by P. parvum that have been
isolated to date, prym1 and prym2 (Igarashi et al. 1999; John et al. 2010; Manning and La Claire
2010) are ladder-like polycyclic ethers that resemble other algal toxins produced by Type I PKS
genes such as brevitoxin and okadaic acid, the latter compounds produced by marine
dinoflagellates (Monroe and Van Dolah 2008; Perez et al. 2008).
Type I PKS are modular, multi-domain proteins that are similar to proteins involved in
fatty acid synthesis (FAS). These proteins sequentially add acyl units onto a growing carbon
60
chain via a condensation reaction. The following three domains are required for the synthesis of
polyketide molecules: ketosynthase (KS), acyltransferase (AT) and acyl-carrier protein (ACP).
Additional domains encode ketoreductase (KR), dehydratase (DH) and enoyl reductase (ER)
proteins, which catalyze the reduction of the initial 2-, 3- and 4-carbon skeletons. The
thioesterase (TE) domain releases the polyketide molecule from its attachment site when the
final chain length has been achieved (Manning and La Claire 2010).
In general, these domains are organized into modules, with each module containing the
domains that are required for one round of chain elongation and modification (Manning and La
Claire 2010). However, PKS sequences have been found in K. brevis that contain only one or
two catalytic domains (Monroe and Van Dolah 2008), similar to some of the sequences in our
dataset. A previous study in C. polylepsis found KS, KR and AT domains in their EST dataset,
but no information was provided on how these domains were organized, likely due to the short
average sequence length (~600bp) (John et al. 2010). As such, there is insufficient information to
ascertain what a ‘typical’ PKS gene might look like in a prymnesiophyte. Even within the
transcriptome of a single species, i.e. C. brevifilum, the putative PKS sequences were of different
lengths and had different numbers and organization of domains (Fig. 3). Thus, they may be
responsible for synthesizing polyketide molecules of different lengths and configurations.
Our results also indicated the existence of two different KS gene families within our
prymnesiophyte datasets, one comprising prymnesiophyte-specific sequences and one containing
sequences from diverse bacterial and protistan species (Fig. 2). All of the P. parvum KS
sequences clustered with the latter clade. P. parvum was grown axenically in our study, thus the
sequences could not have been derived from a bacterium. While the mixed bacterial/protistan
clade itself is not well-supported, the subclade containing the P. parvum, P. antarctica (except
61
one) and E. huxleyi sequences was a well-supported clade in our dataset. A previously sequenced
P. parvum PKS sequence from an EST library (La Claire 2006) was also found within this mixed
bacterial/protistan PKS cluster. It is unknown if the toxigenic C. polylepsis KS sequences would
group with the P. parvum and P. antarctica sequences or with the C. brevifilum sequences, but to
our knowledge the C. polylepsis sequences are not in public databases.
The prymnesiophyte-specific clade was more closely related to the Ostreococcus
sequences than to the dinoflagellate-specific clade, a finding similar to a previous study (John et
al. 2008), and may suggest a common origin for green algal and prymnesiophyte PKS distinct
from that of the dinoflagellates. It may be significant that all of the PKS sequences that clustered
with the haptophyte-specific clade contained multiple domains whereas the sequences in the
mixed bacterial/protistan clade only contained the KS domain. However, it is important to be
cautious when interpreting these results because this and other PKS trees tend to have large
sequence divergences and lack a suitable outgroup, which results in poorly supported branching
order.
Analysis of KOG relative abundances reveals interesting clustering patterns
The functional annotations of the four prymnesiophytes using the KOG database did not differ
markedly, presumably because they share similar core functions (Fig. 4). This similarity may be
a consequence of the close phylogenetic relationship among these four species, or because they
share similar physiologies or nutritional modes, factors which are not mutually exclusive. Three
of the four prymnesiophytes examined in this study exhibit phagotrophic behavior but one, P.
antarctica, is so far not known to be mixotrophic (Moorthi et al. 2009). Nonetheless, there were
no clear differences in the KOG functions between the three known mixotrophs and the non-
mixotrophic P. antarctica in our study (Fig. 4). However, when we included KOG data from
62
other protistan species in a non-metric multidimensional scaling (NMDS) analysis and a
principal components analysis (PCA), some interesting patterns emerged (Figs. 5, 6). Our
subsequent statistical analysis therefore took into account the effects of both trophic mode and
phylogenetic grouping.
Phylogenetic identity appeared to be a significant determinant of the species clusters on
the NMDS plot (Fig. 5). For example, alveolate taxa (dinoflagellates and a ciliate) clustered
separately from all other species, presumably indicating a strong phylogenetic signal in their
transcriptomes. Dinoflagellates generally have large genomes, and a lot of genes appear to be
constitutively expressed and modified post-translationally (Van Dolah et al. 2009). This
tendency might result in a greater variety of transcribed genes and hence, larger variations in
their transcriptomes and in their KOG distribution patterns, and explain why these organisms
occupy a location far away from other species on the NMDS plot, as well as being relatively
spread out from each other compared to other phylogenetic groups.
Other apparently phylogenetic-based groupings on the NMDS plot included the diatoms
which all clustered close to on another, and the chlorophytes with one notable exception. The
outlier from this cluster,C. reinhardtii, showed greater similarity to heterotrophic species than to
its congener, Chlamydomonas sp. CCMP681. Chlamydomonas sp. CCMP681 (Raymond et al.
2009) was isolated from the Southern Ocean near Antarctica while C. reinhardtii is usually
found in freshwater ecosystems and in soil (Guiry and Guiry 2007). We speculate that the
substantial distance between these congeners on the NMDS plot could be due to differences in
physiological adaptations to very different habitats. Interestingly, the two Ochromonas species
were also situated some distance from each other on the NMDS plot. Ochromonas clone
63
CCMP1899 was isolated from the Ross Sea, Antarctica, while clone BG-1 is a freshwater isolate
from a botanical garden in Malaysia.
Another interesting pattern observed in the NMDS analysis was the tendency for
organisms to cluster based on similar nutritional modes but distant phylogenetic relationships.
Heterotrophic taxa, including the water molds (oomycetes), a slime mold (mycetozoa), a
choanoflagellate and a heterotrophic chrysomonad all clustered away from those taxa possessing
phototrophic ability (including the kleptoplastidic ciliate, Mesodinium pulex). This pattern is
perhaps not unexpected because these heterotrophic taxa would presumably lack the
photosynthetic machinery possessed by phototrophic protists, but it is interesting that the broad
grouping of the transcriptomes of these heterotrophs appear to reflect their nutritional mode.
Aureococcus anophagefferens and Chlamydomonas reinhardtii are labeled as
phototrophs in the NMDS plot, yet these taxa occurred relatively close to the heterotrophic
protists. As noted above, the habitat for C. reinhardtii is quite different than for most of the
photrophic protists examined in this study. A. anophagefferens has strong osmotrophic
capabilities (Berg et al. 1997; Gobler and Sañudo-Wilhelmy 2001), which may explain its
proximity to the other heterotrophs on the NMDS plot, albeit at lesser proximity than C.
reinhardtii to the heterotrophs.
The non-alveolate mixotrophs in our dataset (chrysophytes and prymnesiophytes,
including three of the four species examined in this study) formed a cluster on the NMDS plot
that occupied a central space between the alveolates on the left side of the plot, chlorophytes and
diatoms above, and heterotrophs to the right (Fig. 5A). Our fourth prymnesiophyte, P. antarctica,
also clustered with these species. Their intermediate position on the plot between purely (or
predominantly) photosynthetic organisms and exclusively heterotrophic species may reflect the
64
mixed nutritional mode that is characteristic of these organisms. Phagotrophic algae possess the
cellular machinery that allows them to carry out photosynthesis, therefore their KOG distribution
might be expected to exhibit a fair amount of similarity with phototrophic protists. It is
interesting that while a comparison of four closely related species did not reveal large differences
in KOG functions, comparing these four species with a larger set of protistan taxa resulted in
distinct clusters based on phylogeny, or nutritional mode, or both, despite the fact that only a
small fraction of the transcriptomes of these organisms could be assigned KOG annotations at
this time. One might expect that physiological differences between heterotrophs, phototrophs and
mixotrophs could be explained by the presence or absence of particular pathways (e.g.
photosynthetic pathways), but the distribution of annotated genes within certain KOG functions
were also different. Our data provide some good starting points for probing more in-depth
differences among protists with different nutritional modes. For instance, it might be expected
that an in-depth analysis of KOG category G (carbohydrate transport and metabolism) might
unveil a greater diversity of isoenzymes to process and digest different sugars synthesized by
prey. In this regard, we also observed differences in the KOG category H (coenzyme transport
and metabolism), which is not that surprising as phototrophs and heterotrophs would likely have
differences in this category because prey biomass might be able to supply some of these
necessary molecules for enzymatic reactions.
Our data and analysis have demonstrated the utility of transcriptomic data for analyzing
functional and physiological capabilities of closely-related or nutritionally similar protists.
Despite the present paucity of reference databases that presently allow only a small fraction of
the peptides in this study to be annotated, we were nonetheless able to gain insight by comparing
the four transcriptomes to each other, and to other transcriptomes that were available in public
65
databases. The ability to more fully annotate these datasets will add significantly to the depth of
future analyses, by enabling a fuller elucidation of pathways and functions that are shared or
novel among the species. In this study, we were able to show that four prymnesiophytes share a
set of core genes that mostly comprise the essential metabolic and cellular pathways in the cell.
We also found evidence to suggest that investigations into functional and perhaps, by extension,
ecological differences between closely related species should be focused on “secondary”
pathways such as vitamin biosynthesis or secondary metabolic pathways. Finally, our data
indicated that the nutritional mode of a species, as well as its phylogeny, can influence the
proportion of its genome that is devoted to specific KOG functions.
Acknowledgements
This research was funded in part by the Gordon and Betty Moore Foundation through Grant
#3299 to D.A. Caron and K.B. Heidelberg. The sequencing was funded by the Gordon and Betty
Moore Foundation through Grant #2637 to the National Center for Genome Resources. Samples
MMETSP0006, MMETSP1094, MMETSP1096 and MMETSP1100 were sequenced, assembled
and annotated at the National Center for Genome Resources. We would like to thank K. David
Hambright (University of Oklahoma) for making the culture of Prymnesium parvum available,
Victoria Campbell for harvesting the RNA, and John Heidelberg for his discussions on
bioinformatics analysis procedures. We would also like to acknowledge the two anonymous
reviewers whose comments and suggestions greatly improved this manuscript.
66
Tables
Table 1. Culture conditions for the four prymnesiophyte species in this study.
Species Media Temp L:D cycle Irradiance
Prymnesium parvum L1 –silica, 18ppt
a
18°C 12:12 300 μE·m
-2
s
-1
Chrysochromulina brevifilum Modified F/2
b
, 30ppt 18°C 12:12 300 μE·m
-2
s
-1
Chrysochromulina ericina LKS
c
–silica, 36ppt 18°C 12:12 300 μE·m
-2
s
-1
Phaeocystis antarctica LKS
c
–silica, 36ppt 1°C 12:12 300 μE·m
-2
s
-1
a
Salinity is indicated as parts per thousand (ppt).
b
Modified F/2 contains the following: NaNO
3
2.33mM; Na
2
HPO
4
0.067mM; No silica; Soil
extract; L1 Trace Metals; F/2 vitamins.
c
LKS media is a combination of L1 and K media and soil extract
(https://ncma.bigelow.org/algal-recipes).
67
Table 2. Basic statistics for the four transcriptomes sequenced in this study.
Species Transcriptome
size (Mbp)
No. of reads
(million)
No. of
contigs
No. of
peptides
N50
P. parvum 36.4 42.9 30,986 25,579 1612
C. brevifilum 35.3 35.6 40,494 29,229 1397
C. ericina 52.0 61.0 50,899 40,488 1297
P. antarctica 50.6 46.6 56,193 45,611 1384
68
Figure legends
Figure 1. Core, shared and unique transcriptome genes in four prymnesiophyte species:
Prymnesium parvum, Chrysochromulina brevifilum, Chrysochromulina ericina and Phaeocystis
antarctica. A) Venn diagram showing the number of shared or unique genes (in italics) and gene
clusters (in bold) among the four prymnesiophytes as classified by the orthomcl program.
Among the genes unique to each of the four prymnesiophytes, multi-copy genes refer to genes
that were present in gene clusters, single-copy genes refer to genes that did not cluster with any
other gene. Pp: Prymnesium parvum, Cb: Chrysochromulina brevifilum, Ce: Chrysochromulina
ericina, Pa: Phaeocystis antarctica. B) Proportion of annotated and unannotated genes in the
“core” gene set, i.e. genes shared by all four species, and in the gene set unique to each species.
C) Proportion of the transcripts that comprised core, shared and unique genes. Shared genes are
genes present in two or three of the four species. Unique genes are genes that are only present in
one species.
Figure 2. Polyketide synthase maximum likelihood tree with 100 iterated bootstraps using only
the keto-synthase (KS) domain. The tree was inferred using MEGA5 (Tamura et al. 2011) with
maximum likelihood method based on Jones-Taylor-Thornton model. The analysis involved 78
amino acid sequences. All positions with less than 95% site coverage were eliminated. There
were 181 total sites in the final dataset. Bootstrap support values, if greater than 50%, are shown
as the percentages of 100 trees inferred in the analysis. The scale bar represents the number of
substitutions per site. The tree is rooted with Aspergillus nidulans polyketide synthase.
69
Sequences from our dataset are shown in bold. Multiple branches have the same identifying
ORFs,GI or accession numbers due to multiple KS domains on the same gene.
Figure 3. Putative domain organization and length of polyketide synthase sequences in genes
containing more than one domain, as annotated by the NRPS-PKS tool. KR: ketoreductase, ACP:
acyl carrier protein, KS: keto-synthase, DH: dehydratase, A: adenylation.
Figure 4. KOG function distribution of the peptides for the four target species in this study. The
KOG functions are as follows: A: RNA processing and modification; B: chromatin structure and
dynamics; C: Energy production and conversion; D: Cell cycle control, cell division,
chromosome partitioning; E: Amino acid transport and metabolism; F: Nucleotide transport and
metabolism; G: Carbohydrate transport and metabolism; H: Coenzyme transport and
metabolism; I: Lipid transport and metabolism; J: Translation, ribosomal structure and
biogenesis; K: Transcription; L: Replication, recombination and repair; M: Cell
wall/membrane/envelope biogenesis; N: Cell motility; O: Posttranslational modification, protein
turnover, chaperones; P: Inorganic ion transport and metabolism; Q: Secondary metabolites
biosynthesis, transport and catabolism; R: General function prediction only; S: Function
unknown; T: Signal transduction mechanisms; U: Intracellular trafficking, secretion and
vesicular transport; V: Defense mechanisms; W: Extracellular structures; Y: Nuclear structure;
Z: Cytoskeleton.
Figure 5. Nonmetric multidimensional scaling (NMDS) plot of the KOG distributions of the four
prymnesiophytes in this study and of other protistan genomes and transcriptomes. The genomes
were obtained from the Joint Genome Institute database, and the transcriptomes were obtained
70
from the Marine Microbial Environmental Transcriptome Sequencing Project (MMETSP)
database. The stress value for this plot was 0.12, which indicates that the two-dimensional plot is
a good representation of the data. The four target species in this study are highlighted in boxes.
The trophic modes of each organism are denoted in green (phototrophs), black (heterotrophs) and
red (mixotrophs).
Figure 6. Principal component analysis (PCA) plot of the KOG distributions of the four
prymnesiophytes in this study and of other protistan genomes and transcriptomes. The same
dataset from Fig. 5 was used to generate this figure. The color scheme and species identification
by numbering also correspond to Fig. 5. Explained cumulative variability for this plot was 54.2%,
with eigenvalues of 8.5 (F1) and 4.5 (F2). Only top variables for F1 and F2 are plotted in the
graph.
Figure 7. Box plots of the proportion of genes assigned to KOG functions that had a statistically
significant difference among phylogenetic and trophic modes (see Table S4). A) The full dataset
of 41 species (excluding the mycetozoan Dictyostelium purpureum, the choanoflagellate
Monosiga brevicollis and the rhodophyte Cyanidioschyon merolae) showing the proportion of
genes annotated with a particular KOG function and grouped by higher-level taxonomic
affiliation; B) A reduced dataset of the proportion of genes assigned to particular KOG function
in the prymnesiophytes and stramenopiles, grouped by trophic modes. The alveolates and
chlorophytes were excluded to reduce phylogenetically-based bias in the dataset. Small case
letters over each bar summarize the different statistical groups found by multiple pairwise
comparisons. Red crosses indicate the mean for each group and black dots represent the outliers.
71
Figures
Figure 1
72
Figure 2
73
Figure 3
74
Figure 4
75
Figure 5
76
Figure 6
77
Figure 7
78
Supplemental information
Table S1. Number of contigs containing rRNAs and tRNAs in each transcriptome.
Species rRNAs tRNAs
P. parvum 2 3
C. brevifilum 8 0
C. ericina 17 13
P. antarctica 7 1
79
Table S2. Predicted proteins related to cobalamin biosynthesis. METH: B12-dependent
methionine synthase; MCM: methylmalonyl-CoA mutase; CobB: cobyrinic acid a,c-diamine
synthase; CobNST: CobN subunit of cobaltochelatase; CobW: protein putatively involved in
cobalamin biosynthesis but its specific catalytic role is unclear.
ORF Annotation
P. parvum
6621_1
5553_1
19779_1, 17579_1
METH
MCM
CobW
C. brevifilum
63818_1
109570_1
26401_1, 110009_1, 78490_1
13986_1
METH
MCM
CobW
CobB
C. ericina
13279_1, 32542_1
29808_1
35493_1, 40785_1, 50658_1, 16219_1, 10713_1, 45973_1, 26167_1,
3438_1, 2169_1, 38987_1
39377_1
METH
MCM
CobW
CobNST
P. antarctica
27826_1, 46214_1, 11053_1
38804_1, 38597_1, 46771_1
METH
MCM
80
24567_1, 7519_1, 17091_1, 24039_1, 41170_1, 28986_1
35735_1
CobW
CobNST
81
Table S3. Proteins containing polyketide synthase ketosynthase (KS) domains.
Protein ID Species PKS domains
18839 P. parvum KS
2015 P. parvum KS
26981 P. parvum KS
30494 P. parvum KS
7797 P. parvum KS
106225 C. brevifilum KS, KR, PP
106729 C. brevifilum KS, DH, KR, PP
107078 C. brevifilum KS, PP, KR
107100 C. brevifilum KS, PP
4093 P. antarctica KS, PP
15245 P. antarctica KS
23625 P. antarctica KS
4182 P. antarctica KS
49699 P. antarctica KS
54160 P. antarctica KS
82
Table S4. Results of the non-parametric Krustal-Wallis tests for each KOG function. The
influence of phylogeny and trophic mode was tested independently with a non-parametric
Krustal-Walis test followed by a Steel-Fligner test if significant differences were observed. All
calculations done with XLSTAT (v.2013.06.04, Adinsoft TM) with an alpha of 0.01. Two data
sets were used in this statistical analysis: (1) a dataset with most of the species present in Fig. 4
but for the Mycetozoa D. purpureaum, the Choanoflagellate M. brevicollis and the Rhodophyte
C. merolae due to statistical reasons; and (2) a reduced dataset considering only the
Stramenopiles and Prymnesiophyta. Abbreviations as follows: NS, not significant; YES,
significant difference detected.
Original dataset Reduced dataset
KOG Phyla Type KOG Phyla Type
A NS NS A NS NS
B YES NS B NS NS
C YES NS C NS NS
D NS YES D NS YES
E NS NS E NS NS
F NS NS F NS NS
G NS YES G NS YES
H NS YES H NS YES
I YES NS I NS NS
J NS NS J NS NS
K YES YES K NS YES
83
L YES YES L NS NS
M YES NS M NS NS
N NS NS N NS NS
O YES NS O NS NS
P NS NS P NS NS
Q NS NS Q NS NS
R NS NS R NS NS
T NS NS T NS NS
U YES NS U NS NS
V NS NS V NS NS
W NS NS W NS NS
Y YES NS Y NS NS
Z NS YES Z NS NS
84
Figure S1. Key components of the thiamine biosynthesis pathway. Colored squares represent
presence in Prymnesium parvum (brown), Chrysochromulina brevifilum (red),
Chrysochromulina ericina (blue), and Phaeocystis antarctica (green). In some organisms, the
functionalities of THID and ThiE are combined into a single enzyme, such as ThiDE.
Abbreviations: HMP-P, 4-amino-2-methyl-5-hydroxymethylpyrimidine phosphate; HMP-PP, 4-
amino-2-methyl-5-hydroxymethylpyrimidine pyrophosphate; HET-P, hydroxyethylthiazole
phosphate; DXP, 1-deoxy-D-xylulose 5-phosphate.
85
Figure S1
86
Chapter Three
Changes in gene expression of Prymnesium parvum due to nitrogen and phosphorus
limitation
Abstract
Prymnesium parvum is globally distributed prymnesiophyte alga found predominantly in
brackish water and inshore lakes. It possesses a suite of toxins with ichthyotoxic, cytotoxic and
hemolytic effects, which along with its mixotrophic nutritional capabilities, allows it to form
massive ecosystem disrupting blooms. While blooms of high density coincide high levels of
nitrogen (N) and phosphorus (P), field and laboratory research have shown that P. parvum
toxicity is augmented when the N:P ratio is high or when P is limiting. This interesting
interaction between ambient macronutrient concentration and bloom-formation as well as
toxicity prompted us to investigate changes in gene expression in P. parvum under N- and P-
limitation using RNA-seq. The analysis of our transcriptomic dataset revealed diverse
physiological and responses to the limitation of the two different macronutrients. When either N
or P was limiting, genes encoding transporters for the respective nutrient were more highly
expressed relative to the replete condition. Under both nutrient-limited conditions, ribosomal and
lysosomal protein genes were also expressed at higher levels compared to the replete condition.
Differential responses under N and P limitation were seen in photosynthesis genes, which were
more highly expressed under P-limitation but not under N-limitation.
87
Introduction
Prymnesium parvum is a unicellular haptophyte alga that is found predominantly in brackish
waters and inshore lakes, as well as in some coastal marine systems (Moestrup 1994; Edvardsen
and Paasche 1998). P. parvum can produce an array of toxins with hemolytic, cytotoxic,
ichthyotoxic and possibly neurotoxic activities (Yariv and Hestrin 1961; Shilo 1971, 1981;
Cichewicz and Hambright 2010; Manning and La Claire 2010). It has been responsible for
recurrent ecosystem-disrupting blooms in different parts of the world that resulted in significant
economic losses in the aquaculture industry (Moestrup 1994; Edvardsen and Paasche 1998;
Sunda et al. 2006). In recent years, the spread of this species into southern states like Texas and
Oklahoma and along the northeast coast of the United States has caused concern (Aguiar and
Kugrens 2000; Watson 2001; Hargraves and Maranda 2002; Baker and Grover 2009; Hambright
et al. 2010).
Availability of nutrients such as nitrogen (N) and phosphorus (P) are of decisive
importance for the physiology of microbial eukaryotes, impacting their temporal and spatial
distributions. P. parvum is able to grow and bloom in a wide range of N and P concentrations,
although blooms dense enough to result in fish kills tend to require high levels of both nutrients
(Lindholm et al. 1999; Granéli and Salomon 2010). Laboratory studies have shown that while P.
parvum may produce toxins constitutively (Granéli and Johansson 2003b), its toxicity increases
under induced N- and P-limitation (Uronen et al. 2005); nevertheless, in natural environments
toxicity is observed to be the highest under high N:P ratios (Kaartvedt et al. 1991; Aure and Rey
1992; Lindholm et al. 1999). P. parvum toxins likely constitute a suite of compounds with
diverse cellular origins and biological activities, collectively termed ‘prymnesins’ (Manning and
La Claire 2010), of which only two have been isolated, prym-1 and prym-2 (Igarashi et al. 1999).
88
The configuration of these cyclic polyethers have led researchers to postulate that—like the
dinoflagellate toxins brevitoxin and okadaic acid—the synthesis of these molecules might
involve polyketide synthase genes (Manning and La Claire 2010; Freitag et al. 2011; Manning
and La Claire John W 2013).
P. parvum is a mixotrophic alga with the ability to carry out both phagotrophy and
phototrophy (Stoecker 1998; Sanders 2011). The ability to ingest other organisms or take up
dissolved organic matter is hypothesized to provide a competitive advantage when essential
nutrients such as nitrogen and phosphorus, as well as iron, vitamins and other trace elements are
limiting in the environment. Mixotrophy is a widespread phenomenon in both freshwater and
marine aquatic protists, and can account for up to 50% of the bacterivory in some systems
(Stoecker 1998; Burkholder et al. 2008; Mitra and Flynn 2010). P. parvum is considered a model
IIA mixotroph (per Stoecker 1998), an algae that is primarily photosynthetic but turns to
phagotrophy when dissolved inorganic nutrients are limiting.
In recent years, the transcriptomics field has developed rapidly. The study of
transcriptomes, defined by C. Auffray in 1996 as “the complete complement of mRNA
molecules generated by a cell or population of cells” (McGettigan 2013), provides an
opportunity for probing the molecular underpinnings of the function and ecology of protists.
Changes in gene expression of ecologically important species have been studied using methods
such as expressed sequence tag (EST), microarray libraries or tag-based sequencing such as
long-serial analysis of gene expression (long-SAGE) (von Dassow et al. 2009; Moustafa et al.
2010; Park et al. 2010; Wurch et al. 2010). More recently, RNA-seq is becoming the technology
of choice for these gene expression analyses. The ability to manipulate levels of light, nutrients,
salinity, life cycle, co-occurring taxa, etc. while the cells are growing allows researchers to
89
investigate cellular responses to specific conditions, depending on what is interesting about the
ecology and physiology of the organism. These studies have led to insights into physiological or
cellular processes of protists whose genomes have not yet been sequenced.
Given the importance of P. parvum as a recurring bloom-former and toxin-producer, it
has been the subject of several transcriptomic studies. A first study using an EST library
approach provided a first insight into P. parvum’s genetics basis for bloom formation and
toxicity, revealing the presence of phosphate transport genes and genes that might be involved in
polyketide synthesis (La Claire 2006). A subsequent study using a combination of EST libraries
and microarray approaches investigated the differential expression of genes under N- and P-
limitation, showing a strong response in P-limited cells by upregulating genes involved in the
uptake of P (Beszteri et al. 2012). More recently, an RNA-seq study has focused on the
mixotrophic nutrition of this alga, showing that different genes and pathways were upregulated
depending on whether P. parvum was grown in the presence of bacteria or ciliates as prey (Liu et
al., submitted). A later comparative transcriptome analysis based on RNA-seq of P. parvum
among other prymnesiophyte species, found that the frequencies of functional gene categories of
mixotrophic species was different compared to heterotrophic and phototrophic species (Koid et
al., submitted).
Our study aimed to investigate changes in P. parvum gene expression under N- and P-
limitation using RNA-seq. We explored how nutrient limitation impacts gene expression with an
emphasis on the genes responding uniquely to N and P limitation and to the genes that respond to
both. Our results indicate that ribosomal and lysosomal proteins were more highly expressed
when either N or P was limiting. In addition, photosynthesis genes were more highly expressed
90
under P-limitation but not under N-limitation. The expression of transporters for nitrogen and
phosphorus was induced when the respective nutrient was limiting.
Methods
Culture conditions
Prymnesium parvum (clone Texoma1) was isolated from Lake Texoma, Oklahoma, USA, by K.
David Hambright, University of Oklahoma, and made clonal and axenic by micropipetting single
cells through rinses of sterile medium. The nutrient replete culture was grown in 1-2 L volumes
in 2800 ml Pyrex glass Fernbach flasks in L1 media minus silica at 18ppt salinity and at 18°C.
The N- and P-limited cultures were grown under the same conditions but the concentrations for
nitrate and phosphate were at f2/200 concentrations. Irradiance was measured using a QSL-100
sensor with QSP-170 deckbox (Biospherical Instruments Inc.) and was approximately 300 μE·m
-
2
s
-1
. Cultures were grown in 12:12 hour Light:Dark cycles with illumination provided by Philips
F20T12CW bulbs.
RNA Isolation
The replete treatment was harvested during mid-exponential growth phase and the nutrient-
limited treatments during stationary phase. RNA was isolated from the P. parvum cultures as
described previously (Koid et al., submitted). Briefly, cultures were spun down and the
supernatant was decanted. The pellet was dissolved in TRI reagent (Ambion) and RNA was
extracted from the homogenates using the Ribopure kit (Ambion). The eluted RNA was treated
with DNase (Sigma) and cleaned and concentrated. The RNA was quantified using a Qubit 2.0
91
Fluorometer (Invitrogen) and run on an E-gel iBase with E-gel Gel EX 1% (Invitrogen) to check
for nucleic acid quality.
Library Preparation and Sequencing
RNA quality was assessed using the Agilent 2100 Bioanalyzer. Libraries were made using a
previously published protocol (Koid et al., submitted). Briefly, Illumina’s TruSeq RNA Sample
Preparation Kit was used with 2 μg of RNA. The average insert size of each library ranged from
250 to 350 bp. Libraries were sequenced on an Illumina HiSeq 2000 to obtain 2 x 50 bp (paired-
end) reads. Over 2 Gbp of sequence was generated per library. Library preparation and
sequencing were performed as part of the Marine Microbial Eukaryote Transcriptome
Sequencing Project (MMETSP) (https://www.marinemicroeukaryotes.org) supported by the
Gordon and Betty Moore Foundation.
Transcriptome assembly
Transcriptome assembly procedures were adapted and developed from the general guideline
established by MMETSP, as described in Liu et al (submitted). Briefly, only sequences that
passed the quality criteria were retained; sequences with more than 10 low quality nucleotides,
with Phred quality scores lower than 20, were removed. Then, the replete, P-limited and N-
limited transcriptomes were combined and assembled de novo using a combination of a de Brujin
graph (AbySS) and overlap-based algorithms (CAP3), with the AbySS run first followed by
further assembly of the AbySS output using CAP3 (Huang and Madan 1999; Simpson et al.
2009). Redundant contigs were removed using CD-HIT-EST (Li and Godzik 2006). Contigs
shorter than 150 bp were discarded.
92
Transcriptome annotation
The annotation of the transcriptome was done as previously described (Liu et al., submitted).
Briefly, ribosomal RNA sequences were first removed by BLASTN searches against the SILVA
database (Knittel et al. 2007). Protein-coding genes longer than 150bp were predicted from the
assembled genome using ESTscan (Iseli et al. 1999). Genes were then annotated using
HMMER3 (http://hmmer.janelia.org) against the Pfam (http://pfam.sanger.ac.uk) and Tigrfam
(http://www.jcvi.org/cgi-bin/tigrfams/index.cgi) databases and BLASTP against NCBI’s nr
database. The e-value cutoff used in both cases was 1e
-5
. Additional functional annotations were
obtained using the KEGG annotation server (http://www.genome.jp/tools/kaas/) and
BLAST2GO (Altschul et al. 1990; Haft et al. 2001; Zhang and Wood 2003; Kanehisa et al. 2004;
Conesa et al. 2005; Finn et al. 2010). Finally, the automated annotation of selected genes in our
datasets were manually inspected and curated.
Differentially expressed genes
Genes that were differentially expressed between any two datasets were calculated according to
Liu et al. (submitted). Briefly, sequences of all three datasets were aligned to the annotated genes
of the assembled transcriptome using BWA (Li and Durbin 2009) and then processed with a
custom Perl script to assign read pairs to genes. Reads were only assigned to genes if both reads
of a read pair were aligned to the same gene with the correct orientation and distance. The
percentage of total reads that mapped to the annotated genes was 48% in the replete sample, 49%
in the nitrogen-limited sample and 23% in the phosphorus-limited sample. Reads were only
retained if both reads of a read pair aligned in the correct orientation and at the right distance. A
93
larger proportion of reads were not retained in the P-limited transcriptome due to an increased
presence of rRNA gene sequences in the dataset. Numbers of read pairs assigned to every genes
in all three samples were counted and normalized by total number of all assigned read pairs in
each sample. Statistical analyses of the read counts of each gene were carried out in edgeR
(Robinson et al. 2010) and p-values were adjusted to false discovery rate using p.adjust in R
(Benjamini and Hochberg 1995). Only genes with adjusted p-values smaller than 0.05 were
accepted as having significantly different expression levels between different samples.
Polyketide synthase analysis
Putative polyketide synthase genes were identified by the automated annotation pipeline and a
local BLAST search against Emiliania huxleyi polyketide synthase sequences obtained from
GenBank. The NRPS-PKS tool (Bachmann and Ravel 2009) was used to identify the PKS
domains present.
Results
General transcriptome characteristics
The assembled Prymnesium parvum transcriptome of the replete, N-limited and P-limited
samples contained 51,580 contigs and 42,862 genes for a total assembled transcriptome size of
43.96 Mbp (Table 1). The number of putative protein-coding genes is comparable to other
transcriptomes in the Moore Foundation MMETSP database and to the number of predicted
genes in Emiliania huxleyi, a related haptophyte (Read et al. 2013). All necessary genes involved
in glycolysis/gluconeogenesis, the TCA cycle, biosynthesis of amino acids, and nitrogen
94
metabolism were present in the assembled transcriptome, indicating that the depth of sequencing
was sufficient to cover most of the biologically essential transcribed genes.
Differentially expressed genes
Approximately 25% (10,525) of the genes in our assembled transcriptome were differentially
expressed in the N- and P-limited treatments relative to the replete treatment (Fig. 1). More
genes were uniquely up- or down-regulated in the P-limited transcriptome (4,714) compared to
the N-limited transcriptome (2,295). In the former, 3,103 genes were expressed at higher levels
while 1,611 genes were expressed at lower levels relative to the replete treatment. Under N-
limitation, 1,995 and 301 genes had higher and lower relative expression compared to the replete
condition. There were 715 genes and 2,800 genes that were jointly over- and under-expressed in
both nutrient-limited treatments relative to the replete treatment.
Approximately 43% (4,559) of the differentially expressed genes had no annotations,
while 89 and 1,141 genes were respectively annotated as predicted proteins (sequences
determined to be proteins based homology to ESTs or other proteins in the database) and
hypothetical proteins (open reading frames without a characterized protein homolog). The
remaining 4,736 genes (45%) had some annotation ascribed to the sequence, but included
general domain and function designations.
Nitrogen uptake, transport and assimilation
Under N-limitation an overall change in relative gene expression is observed for genes related to
the N-uptake, transport and assimilation capabilities (Fig. 2). Genes coding for nitrate reductase,
nitrite reductase, glutamine synthetase (GS) and glutamate synthase (GOGAT) were upregulated
95
in the N-limited condition compared to the replete (Fig. 3A, D). There were three isoforms of GS
in our dataset, two annotated as glutamine synthetase III (GSIII), most likely cytosolic isoforms;
and the third presented a signal peptide indicating a chloroplastidic localization, thus most likely
being a putative glutamine synthetase II (GSII). One of the GSIII genes showed a marked
response under N-limitation, being expressed at ~33 times more than in the replete treatment.
This gene was expressed at 3x the level of the replete condition, in the P-limited treatment. The
other GSIII gene and GSII genes were expressed at 1x and 3x the level of the replete treatment,
respectively. The two GOGAT genes were expressed at 1-2-fold compared to the replete
treatment (Fig. 3A, D). Under P-limitation, most of the GS and GOGAT transcripts had
relatively lower expression levels compared to the replete treatment, with the exception of the
GSIII gene mentioned previously.
In addition to nitrogen metabolism genes, we found 20 putative inorganic nitrogen
transporter genes: eight ammonium transporters, one nitrate transporter, and genes annotated as
formate/nitrite transporters. The ammonium transporters were expressed ~1.2 to 6-fold under N-
limitation compared to the replete condition and presented a relative lower expression in the P-
limited condition (Fig. 3B, D). The nitrate transporter were expressed at 2.9-fold in N-limited
treatment compared to replete but presented a lower relative expression in P-limited condition. In
contrast, genes annotated as formate/nitrite transporters did not show the same pattern of
regulation (Fig. 3C, D) and presented a lower expression level in both N- and P-limited
compared to the replete treatment. Additionally two urea transporters were detected in the
transcriptome (Table 2). The first one corresponded to an urea permease of the UT family that
appeared targeted to the mitochondria and was not expressed differently in either N- or P-limited
96
compared to replete; the second one corresponded to an active urea transporter (DUR3) with a
higher relative expression of 3.7-fold compared to the replete treatment.
Other nitrogen metabolism genes
One of the most highly expressed genes in the N-limited treatment was an uracil-xanthine
permease; this gene was upregulated 5.8-fold in the N-limited treatment relative to the replete
treatment (Table 2). A xanthine dehydrogenase that catalyzes the conversion of xanthine to urate
(or uric acid), presented also a higher relative expression of approximately 3.8-fold in the N-
limited treatment compared to the replete (Table 2).
Genes related to the biosynthesis of purines also had a higher relative expression level in
the N-limited treatment compared to the replete, including the whole pathway converting ribose-
5-phosphate to inosine monophosphate (IMP) and then interconverting IMP to xanthine
monophosphate (XMP), guanine monophosphate (GMP) and adenosine monophosphate (AMP).
Four of these genes, purL, purM, purB and purH, were expressed between 29 to 110 times more
in the N-limited sample compared to the replete sample (Fig. 7A). These four genes were
expressed at about 2x in the P-limited treatment compared to the replete.
We detected a transcript coding for a copper amine oxidase (CuAO) that was
downregulated by approximately 12x under P-limitation compared to the replete treatment
(Table 2). The translated protein presents an α-helix in N-terminal, suggesting that it corresponds
to a membrane-bound CuAO (Fig. 2). This gene was not differentially expressed under N-
limitation.
There were also differences in expression levels of genes involved in intracellular
nitrogen processing. Two types of carbamoyl phosphate synthase (CPS) genes were present in
97
the combined transcriptome, previously designated as unCPS and pgCPS (Allen et al. 2011) (Fig.
3D). The former is localized to the mitochondria, is involved in the urea cycle and uses
ammonium as a substrate while the latter is likely cytosolic and uses glutamine as a substrate in
the first committed step of pyrimidine synthesis (Allen et al. 2011). There were six copies of
pgCPS in the transcriptome, three of which were not differentially expressed in any of the
treatments. The other three were not differentially expressed under N-limitation but had
relatively lower expression under P-limitation (Fig. 3D). Of the two unCPS genes found in the
transcriptomes, one was downregulated and one was not found in the P-limited treatment. Both
these genes were not differentially expressed under N-limitation.
Two other genes involved in the urea cycle were also upregulated under N-limitation,
argininosuccinate synthase and argininosuccinate lyase. The former was expressed at 8-fold and
the latter at 2-fold relative to the replete condition (Table 2). In the P-limited condition,
argininosuccinate synthase was not differentially expressed while argininosuccinate lyase was
downregulated compare to the replete treatment (Table 2). The other urea cycle genes, arginine
and ornithine transcarbamoyltransferase were not differentially expressed among the three
treatments.
Phosphate transporters
Overall, phosphate transporters presented a higher relative expression level in both nutrient
limited treatments, although under N-limited this only represented an average of 2.2-fold
increase relative to the replete treatment. There were two types of phosphate transporters in our
dataset, the phosphate transporter family pho4 and sodium-dependent inorganic phosphate
transporters (Fig. 4). The most highly expressed gene (gene ID: 135755_1) in the P-limited
98
transcriptome was one of three genes that were annotated as a pho4 family gene, which is a high-
affinity phosphate transporter family. This gene was expressed ~55-fold more in the P-limited
treatment compared to the replete treatment, the second pho4 gene was upregulated ~35 times
more in the P-limited sample, while the third pho4 gene was not differentially expressed (Fig. 4).
Of the four sodium-dependent inorganic phosphate transporters, three were more highly
expressed in the P-limited condition (8-49x) and one was not differentially expressed. In the N-
limited condition, two of these transporters were expressed at approximately 4x compared to
replete while two were not differentially expressed (Fig. 4).
Photosynthesis proteins
Photosynthesis-related proteins found in the P. parvum transcriptome consisted of chlorophyll A-
B binding proteins and light-harvesting proteins. Other photosynthesis-related genes are encoded
in the chloroplast genome, and thus were not recovered in our datasets due to the poly-A+
enrichment technique used.
We found 38 chlorophyll A-B binding protein genes that were differentially expressed
among all three treatments (Fig. 5A, C). Twenty-six of those 38 genes were upregulated in the P-
limited treatment compared to the replete, with a range in the levels of overexpression. Seven
genes were highly upregulated, between 8 to 64-fold while 6 genes moderately upregulated,
between 2 to 8-fold, and thirteen genes were expressed between 1-2 fold compared to the level of
the replete sample. The remaining twelve genes presented a relatively lower expression
compared to the replete sample. In the N-limited sample, three chlorophyll A-B binding proteins
were expressed at 1-2x compared to the replete, two were expressed at 5-10x while the
remaining 33 were expressed at levels lower than the replete treatment.
99
Relative expression levels of genes coding for light-harvesting proteins had opposing
patterns under P- and N-limitation. That is, these genes were generally upregulated in the P-
limited treatment relative to the replete treatment, but downregulated in the N-limited treatment
relative to the replete treatment (Fig. 5B, C). There were seven genes that were more highly
expressed in the P-limited sample at levels ranging from 2-11.7 times that of the replete sample.
All of the light harvesting protein genes were downregulated in the N-limited sample compared
to the replete, at levels approximately one-half to one-eighth of the expression in the replete
sample.
Protein synthesis and degradation
The genes involved in protein synthesis in our dataset were ribosomal proteins that are involved
in the creation of new proteins, lysosomal proteolytic enzymes, and protein-folding genes (Fig.
6). A total of 153 genes were annotated as 90 different ribosomal proteins in the assembled P.
parvum transcriptome, with some genes annotated as the same ribosomal protein. Of these 153
ribosomal protein genes, 98 were differentially expressed. Eighty genes were upregulated 2 to
34-fold in the P-limited treatment, while 91 genes were upregulated 2 to 30-fold in the N-limited
treatment (Fig. 6A, D).
A group of proteolytic enzymes that were annotated by KEGG as lysosomal proteins
were similarly upregulated under both P- and N-limitation. These genes code for three aspartyl
proteases, three cysteine proteases and two serine carboxypeptidases. Six of these eight genes
were upregulated under P-limitation by 1.7-16.5 times, and seven of the eight were upregulated
under N-limitation by 3-10 times (Fig. 6B, D).
100
We found 29 differentially expressed genes that were annotated as involved in protein-
folding. Thirteen genes were expressed at more than 2-fold in the N-limited treatment compared
to the replete. This group of genes included three cyclophilin-type peptidyl prolyl cis-trans
isomerases, four FKBP-tyle peptidyl prolyl cis-trans isomerases, one heat shock protein, three
chaperone proteins and one protein disulfide isomerase. One of the FKBP-tyle peptidyl prolyl
cis-trans isomerases was expressed at 28x in the N-limited condition compared to the replete
condition (Fig. 6C, D). Seventeen genes were more highly expressed in the P-limited treatment
compared to the replete treatment. This group of genes included six cyclophilin-type peptidyl
prolyl cis-trans isomerases, eight FKBP-tyle peptidyl prolyl cis-trans isomerases, two chaperone
proteins and one protein disulfide isomerase. One of the cyclophilin-type peptidyl prolyl cis-
trans isomerases was expressed at 27x more in the P-limited condition compared to the replete.
The genes that were under-expressed in both nutrient-limited conditions were all peptidyl-prolyl
cis-trans isomerases (n=7).
Polyketide synthase genes
There were fifteen differentially expressed polyketide synthase (PKS) genes in our dataset. Five
of them were only found in the P-limited transcriptome. An additional four PKS genes were
upregulated 4-15x under P-limitation while the remaining six were downregulated under P-
limitation. Under N-limitation, three genes were expressed at 2 to 4-fold compared to the replete
condition, and seven genes were underexpressed compared to replete (Fig. 7B).
The PKS domains found in the fifteen genes include the ketosynthase (KS),
ketoreductase (KR), dehydratase (DH) and acyl-carrier-protein (ACP) domains. Of these
domains, the KS, KR and ACP domains are the minimum required domains for PKS synthesis.
101
The gene with the most number of domains, containing a KR domain followed by two ACP
domains, was upregulated under both N and P limitation (PID: 135954_2). The two genes with
KS domains were relatively more highly expressed in the replete treatment compared to both the
N and P treatments (PID: 9418_1, 135098_1). One gene with an ACP domain (PID: 2474) and
another with two ACP domains (PID: 13979_1) were also more highly expressed in the replete
treatment relative to the nutrient-limited treatments. The other genes annotated as PKS genes did
not have domain annotations.
Fatty acid oxidation and tricarboxylic acid (TCA) cycle genes
Most of the genes present in the fatty acid oxidation pathway were upregulated 5-9-fold under P-
and N-limitation, while some were upregulated over 20-fold (Fig. 7C). There were five copies of
a long-chain acyl-CoA synthetase, one of which was upregulated 17x in the P-limited condition
and 6x in the N-limited condition compared to the replete condition. The next step,
dehydrogenation by FAD, is catalyzed by acyl CoA dehydrogenases, an enzyme that catalyzes
the initial step in each cycle of fatty acid β-oxidation in the mitochondria (Thorpe and Kim 1995).
There were three of these genes in our transcriptome, two of which were upregulated when either
P or N was limited. One of the four genes putatively coding for enoyl-CoA hydratase (ECH) was
upregulated under N- and P-limitation, and one was upregulated in N-limitation but
downregulated under P-limitation.
The formation of acetyl-coA through fatty oxidation can be used for different processes
within the cell, such as the generation of energy through the TCA or glyoxylate cycle or
biosynthetic pathways such as the formation of fatty acids or polyketides. The two genes
particular to the glyoxylate cycle, malate synthase and isocitrate lyase, were not upregulated in
102
the N-limited sample. In contrast, many of the genes involved in the TCA cycle were
upregulated under N-limitation. The greatest difference in gene expression was seen in pyruvate
carboxylase, which converts pyruvate to oxaloacetate; this gene was expressed at 67x in the N-
limited treatment compared to the replete treatment (Fig. 7D). This gene was expressed five-fold
more in the P-limited treatment compared to the replete treatment. There were two putative
isocitrate dehydrogenase genes, one of which was expressed at 55-fold and 63-fold in the N- and
P-limited treatments compared to the the replete treatment. Aconitate hydratase, which catalyzes
the interconversion of citrate to isocitrate was upregulated 28-fold in the N-limited treatment and
6-fold in the P-limited treatment compared to the replete treatment. Fumerase was upregulated
11-fold and 5-fold relative to the replete in the N- and P-limited treatments respectively. Malate
dehydrogenase, 2-oxoglutarate dehydrogenase, succinyl-coA synthetase were all upregulated
approximately 3-fold in the N-limited treatment relative to the replete treatment. In the P-limited
treatment, malate dehydrogenase and succinyl coA synthetase were expressed at 2-3x compared
to the replete treatment while 2-oxoglutarate dehydrogenase was underexpressed relative to the
replete treatment. Citrate synthase and succinate dehydrogenase were not upregulated in the N-
limited treatment compared to the replete condition but the latter was upregulated in the P-
limited treatment (4-fold compared to the replete treatment).
Discussion
P. parvum is a recurring bloom-forming toxic alga and thus the study of its gene expression to
better understand its physiology and ecology has been the focus of increased interest in recent
years. The first study of P. parvum using a transcriptomic approach was based on EST libraries
of nutrient replete cultures, identifying approximately 3,400 putative expressed genes that
103
perform a diversity of metabolic and cellular functions such as photosynthesis, protein synthesis,
folding and degradation, carbon metabolism and fatty acid synthesis (La Claire 2006). A
subsequent study based on a combination of EST libraries and microarrays studied changes of
expression under N- and P-limited conditions. The number of putative genes identified increased
to 6,381 in this study, which showed an upregulation of phosphorus acquisition genes under P-
limitation (Beszteri et al. 2012). More recently, approaches using RNA-seq techniques have
recovered a much higher number of putative genes and have given insights in the mixotrophic
characteristics of this alga (Liu et al., Koid et al., submitted). Here we use the much higher
resolution of RNA-seq to understand how P. parvum cells respond to nitrogen and phosphorus
limitation. Similar to previous transcriptome comparison studies (Dyhrman et al. 2012; Bender et
al. 2014), we investigated the relative changes in gene expression between our nutrient-limited
conditions and the nutrient-replete condition.
Nitrogen uptake and metabolism genes
Nitrogen is an important substrate for phytoplankton that is used in essential molecules for cell
maintenance, growth and proliferation, such as proteins, DNA, RNA and photosynthetic
pigments. In addition to incorporating nitrogen into these molecules, cells also contain
unassimilated nitrogen in the form of nitrate, ammonium and amino acids. Based on our results,
we hypothesize that under N-limitation, P. parvum cells meet their nitrogen requirements by
increasing the expression of genes that enable them to acquire more exogenous nitrogen and by
repartitioning their intracellular pools of nitrogen.
All nitrogen transporters in the assembled transcriptome - seven for ammonium and one
each for nitrate, nitrite and urea - were more highly expressed under N-limitation compared to
104
the nutrient-replete condition. Beszteri et al. (2012) did not observe a similar response in
nitrogen transporters in P. parvum, but a nitrate transporter and an ammonium transporter were
upregulated in a related haptophyte, E. huxleyi, and the pelagophyte A. anophagefferens under
nitrogen stress (Dyhrman et al. 2006; Wurch et al. 2010).
The general, but non-universal preference among the phytoplankton for ammonium over
nitrate seems to apply to P. parvum as well. This is indicated by the presence of seven
ammonium transporter genes in our assembled transcriptome, compared to just one nitrate
transporter. Diatoms, which tend to be competitive assimilators of nitrate (Sarthou et al. 2005),
have also been shown to have more ammonium transporters compared to nitrate transporters,
although the ratio is closer to 2:1 (Hildebrand 2005). Preference from ammonium is energetically
favorable as it can be incorporated directly into organic matter via the glutamine-synthetase-
glutamate synthase (GS-GOGAT) cycle, compared to nitrate which requires energy to undergo
reduction to ammonium before it can be utilized in biological processes. Interestingly, the seven
ammonium transporters in our P. parvum transcriptome were not all regulated similarly. Three
transporters were more highly upregulated at 4-6x compared to levels in the replete treatment,
while four others were only slightly upregulated, between 1.2-2.1x compared to the replete
condition. The difference in expression levels of these genes might stem from the transporters
having different affinities for ammonium. Alternatively, it is possible that they may play
different physiological roles, including regulatory functions, similar to what was hypothesized in
the diatom Cylindrotheca fusiformis have indicated that ammonium transporters that are very
similar at the DNA level may play different physiological roles. In addition to the primary
function of transporting ammonium, they may have also have a regulatory function (Hildebrand
2005).
105
Genes belonging to the formate/nitrite transporter family were not more highly expressed
under nitrogen limitation (Fig. 3A). Only three transporters within this family have been
characterized, each transporting different molecules including nitrite, hydrosulfide and a range of
anions, including formate (Lü et al. 2013). Our sequences had the highest similarity with the
nitrite transporter. As these transporters are localized to the chloroplast membrane and likely
transport nitrite from the cytosol to the chloroplast, their activity might be regulated by the
presence of nitrite in the cytosol.
In addition to increasing exogenous uptake of nitrogen, our results indicate an increase in
transcription of nitrogen assimilation genes, specifically for the reduction of nitrate to
ammonium and in the GS-GOGAT cycle. Interestingly, the cytosolic GSIII genes were
upregulated under N-limitation, while the chloroplastic GSII was not. The former converts
ammonium that is taken up from outside the cell into glutamine while the latter uses the
ammonium that is reduced from nitrate within the cell. The upregulation of ammonium
transporters and cytosolic GSIII genes could be due to the cell taking up ammonium released by
lysed dead cells.
Other nitrogen metabolism genes
We observed a relative higher expression for purine transporters under N-limited conditions,
which was in concordance with the report by Palenik and Henson (1997) that the haptophyte E.
huxleyi was able to grow using different purines (Palenik 1997), as their transporters were
induced by N-limitation (Shah and Syrett 1984). Furthermore, at least two important enzymes in
the catabolic process of purines, xanthine dehydrogenase and urease appear also to have relative
higher expression levels under N-limited conditions. Together with a relative higher expression
106
of the active urea transporter DUR3 (Fig. 2), our results suggested that under N-limited
conditions P. parvum presents a coordinated response to scavenge N from organic substrates (Fig.
2), contrasting with previous reports that did not observe this response (Beszteri et al 2012). On
the other hand, and as observed by Beszteri and collaborators (2012), a transcript coding for a
copper-amine oxidase presents a relative lower expression in the P-limited treatment compared
to the replete (CuAO in Fig. 2). This CuAO presented a α-helix at N-terminal, suggesting that it
corresponds to the same membrane-bound enzyme described in P. parvum by Palenik and Morel
(1991) that oxidizes primary amines to produce extracellular ammonium. The relative lower
expression of this enzyme is thus in line with the lower relative expression of ammonium
transporters in the P-limited treatment.
Phosphate transporters
The most highly expressed gene under P-limitation was a phosphate transporter gene belonging
to the PHO4 family of phosphate permeases. The PHO4 superfamily includes both high- and
low-affinity phosphate transporters. In yeast, the low-affinity phosphate transport system—
thought to be constitutively expressed—satisfies the cellular requirement for inorganic phosphate
when external levels of phosphate are high or sufficient. In contrast, the high-affinity phosphate
transport system is used when ambient phosphate levels are limiting to growth (Persson et al.
2003). In microbial eukaryotes, genes with homology to pho4 have been found in species from
different phylogenetic groups including chlorophytes Tetraselmis chui and Dunialiella viridis
(Chung et al. 2003; Li et al. 2006), several prasinophytes (Monier et al. 2012), the pelagophyte A.
anophagefferens (Wurch et al. 2010), and the haptophyte E. huxleyi (Dyhrman et al. 2006).
107
Experimental evidence and the phylogenetic placements of these pho4 genes suggests that they
are all high-affinity transporters, and P-depletion induces an upregulation of these genes.
In addition to the two putative pho4 genes, other transporters that were annotated as
sodium-dependent inorganic phosphate transporters were detected. These transporters couple the
uptake of inorganic phosphorus to the ionic gradient formed by the movement of Na
+
(Persson et
al. 2003). While three of these transporters were upregulated under P-limitation, one was
downregulated compared to the replete condition. One possible explanation for the differential
regulation of these genes with putatively similar function is that they may have different
affinities for transporting inorganic phosphorus. Of the two phosphate permeases and two acid
phosphatases found in another study of P. parvum gene expression, one of each type was
upregulated under P-limitation (Beszteri et al. 2012), indicating that different phosphorus
acquisition systems might be at work depending on the nutrient status of the cell. Taken together,
the general upregulation of two different phosphate transporter families in this and previous
studies indicates an ability to mount a cellular response to acquire more phosphorus when the
nutrient is limited.
Photosynthesis
Photosynthesis and nitrogen metabolism are tightly coupled processes, with N deficiency usually
resulting in a general decrease in photosynthetic activity in a cell. Physiological changes under
N-deficiency have been well-studied in algae, and include changes in the amount of chlorophyll
in the cell, the ratio of chlorophyll to accessory pigments and the shape and size of photosystem
and antenna proteins (as reviewed by Turpin 1991). While there is also a relationship between
photosynthetic efficiency (F
v
/F
m
) and phosphorus availability, the two are much less tightly
108
coupled, and tend to be variable across different species and even among different strains of the
same species (Litchman et al. 2003; Shen and Song 2007). In the unicellular cyanobacterium
Synechococcus spp., components of the cell’s photosynthetic machinery such as chlorophyll,
phycocyanin and phycobilisomes responded differently to N- and P-deprivation, generally with a
greater negative effect seen under the former (Collier and Grossman 1992). In fact, synthesis of
new phycobilisomes was found to take place even after P-deprivation. In the green alga,
Chlamydomonas sp., the decline in the number of photosystem II reaction centers in the cell was
also found to be greater under N-deprivation compared to P-deprivation (Grossman 2000).
There are relatively few studies comparing photosynthesis-related gene expression under
N- and P-limitation in algae, and the results do not show a clear correlation between expression
levels and general nutrient limitation. The coccolithophore, E. huxleyi, yielded results similar to
our study, wherein five putative fucoxanthin chlorophyll a-b binding protein genes were found to
be upregulated ~2-7-fold under P-limitation and down-regulated under N-limitation (Dyhrman et
al. 2006). In the pelagophyte, A. anophagefferens, light harvesting proteins and chlorophyll a-c
proteins were found to be upregulated between 3-4-fold under both N- and P-limitation (Wurch
et al. 2010). In Chlamydomonas reinhardtii, genes involved in photosynthesis were
downregulated under N-limitation (Miller et al. 2010).
P. parvum in our study responded differently to the two nutrient limitations. Under P-
limitation, a relatively higher expression level of photosynthesis genes was observed, whereas
these genes were relatively underexpressed in the N-limited treatment. We speculate that under
P-limitation, P. parvum is upregulating photosynthesis genes as a directed effort to obtain energy
in order to address the phosphorus-stress within the cell, as could be the case under induced Fe-
stress in a diatom (Thamatrakoln et al. 2012). As such, future gene expression studies should also
109
find the coordinated increased expression of genes involved in photosynthesis, protein synthesis
and phosphorus acquisition such as phosphate transporters.
Genes involved in protein production and turnover
Our results indicate that P. parvum increased the expression of genes involved in protein
synthesis and degradation under both N- and P- limitation. The former result is surprising,
particularly under N-limitation, as we anticipated that protein synthesis might be downregulated
under N-limitation because protein is a significant component of total cellular nitrogen. The
result was slightly less surprising for the P-limited condition, as a previous study found that
multiple rRNA transcripts were upregulated under P-limitation in E. huxleyi (Dyhrman et al.
2006). However, a T. pseudonana gene expression study found that genes encoding ribosomal
proteins were downregulated when phosphorus was deficient in the media (Dyhrman et al. 2012).
The parallel increased expression of genes involved in protein degradation (i.e. lysosomal
proteins) might represent an internal recycling of compounds (especially nitrogen) needed for the
stress response to either N or P limitation. In fact, without new sources of N or P, the degradation
of extraneous or non-essential proteins is likely tightly linked with the synthesis of new proteins.
For instance, under N-limitation, an increase in the expression of transporters for ammonium,
nitrite, nitrate and urea suggests that the cell might be breaking down proteins to produce more
of these transporters. Under P-limitation, the cell might be synthesizing phosphate transporters
and/or polyketide compounds.
Polyketide synthase (PKS) genes
110
PKS compounds have been implicated as the basis for toxins in many dinoflagellates (Kellmann
et al. 2010) and have been thought to be responsible for some of the toxins present in
prymnesiophytes (Freitag et al. 2011). Two P. parvum toxins have been isolated to date, known
as prym1 and prym2 (Igarashi et al. 1999; Manning and La Claire 2010). These are polyether
compounds similar to other algal toxins such as brevitoxin and okadaic acid, which are produced
by Type I PKS genes. A previous study found PKS genes in the transcriptomes of P. parvum and
other related prymnesiophytes (Koid et al., submitted). The ketosyntase (KS) domain of those
PKS genes clustered into two separate clades, one comprising prymnesiophyte-specific
sequences and one that had sequences from diverse bacterial and protistan origin (Koid et al.,
submitted).
While P. parvum displays toxicity under nutrient-replete conditions, its toxic ability has
been shown to increase under N- and P-limited conditions (Graneli and Johansson, 2003, Uronen
et al., 2005). The presence of PKS genes in the transcriptome of cells grown in nutrient replete
medium supports the observation that toxins are present during exponential growth. Our dataset
contained many different PKS genes that exhibited different expression patterns. Some were not
differentially expressed among the three treatments, some were more highly expressed under
replete conditions, while others are more highly expressed under P-limitation. The presence of
multiple PKS genes in our dataset supports the hypothesis that P. parvum toxicity consists of a
suite of different toxic components (Manning and La Claire 2010). The presence of some PKS
genes that are constitutively active is consistent with P. parvum being toxic throughout its life
cycle. However, the expression of other PKS genes that do appear to be environmentally
regulated is also consistent with the high toxicity seen in P. parvum under nutrient-limitation
111
(Granéli and Johansson 2003a; Uronen et al. 2005), and implies a more complex underlying
story.
It has been suggested that under P-limitation, the toxic effect of the P. parvum toxins may
be a strategy to obtain more extracellular phosphorus from lysed cells of other organisms
(Granéli and Salomon 2010). The higher expression of some PKS genes in our study support the
idea that P-limited cells have higher toxicity than nutrient-replete cells, which might be part of a
coordinated physiological response to P-limitation.
Fatty acid oxidation and TCA cycle
Fatty acid breakdown involves multiple cycles of β-oxidation—depending on the length of the
fatty acid—and results in acetyl-CoA and NADH and FADH
2
. We saw an increase in genes
involved in fatty acid (FA) oxidation, which was an unexpected result as phytoplankton such as
Chlamydomonas are thought to synthesize fatty acids under nutrient limitation. While many of
these enzymes function bidirectionally, acyl-coA dehydrogenase is unidirectional. Therefore, its
upregulation under P- and N-limitation might suggest that the other enzymes are also catalyzing
reactions that result in an increase in Coenzyme A or acetyl CoA. Acetyl CoA can feed into a
number of different downstream pathways, one of which is TCA cycle, the other is as a starter
unit for PKS synthesis (Dewick 2001). Our data showed genes involved in both processes being
upregulated; TCA cycle genes under both P and N-limitation and PKS-synthesis genes under P-
limitation, suggesting that FA oxidation is being increased to produce substrates for those two
processes. Additionally, under N-limitation, the TCA cycle might be upregulated to produce 2-
oxoglutarate for the GS/GOGAT cycle. Under P-limitation, it might be that FA oxidation is still
feeding into TCA cycle, and it’s generating ATP for general cellular metabolism.
112
P. parvum exhibited two types of responses under N- and P-limitation: responses that are
specific to either phosphorus or nitrogen stress, and a general stress response common to both
nutrient stress conditions. P-limitation resulted in the increased expression of phosphate
transporters, photosynthesis genes and polyketide synthase genes. The increased expression of
these particular sets of genes hint at the possibility that toxin synthesis might be related to the
acquisition of phosphorus when that nutrient is limiting, with the relative upregulation of
photosynthesis genes providing the energy necessary to fuel the increased cellular activity. N-
and P-limitation both resulted in an increase in genes involved in protein synthesis and turnover,
fatty acid oxidation and the TCA cycle. Under nitrogen limitation, cells prioritized the
acquisition and assimilation of nitrogen through a myriad of different transporters and nitrogen-
related pathways. This robust cellular response included the processing or reallocation of
intracellular nitrogen, such as seen in the de novo purine biosynthesis pathway. Taken together,
the results of this study highlight the ability of P. parvum to mount a coordinated and varied
cellular and physiological response to nutrient limitation, which supports the ecological success
of this species.
Acknowledgements
This research was funded in part by the Gordon and Betty Moore Foundation through Grant
#3299 to D.A. Caron and K.B. Heidelberg. The sequencing was funded by the Gordon and Betty
Moore Foundation through Grant #2637 to the National Center for Genome Resources. We
would like to thank K. David Hambright (University of Oklahoma) for making the culture of
Prymnesium parvum available, Victoria Campbell for harvesting the RNA, and John Heidelberg
for his discussions on bioinformatics analysis procedures.
113
Tables
Table 1. Summary statistics of the three P. parvum transcriptomes in this study.
Replete P-limited N-limited
No. of read pairs* 19,277,859 18,785,652 11,404,203
No. of read pairs mapped to coding region
of the transcriptome†
9,329,823 4,288,716 5,604,472
No. of CDS with at least 1 read pair 41,699 36,174 39,211
*After quality filtering.
† Read pairs were only counted if both reads aligned to the same gene at the appropriate distance
and orientation. See Materials and Methods for details.
114
Table 2. Single-copy genes that were differentially expressed under phosphorus (P) or nitrogen
(N) limitation, relative to the replete condition. DE: differentially expressed.
Gene ID Annotation Relative expression Function
P N
57610
18242
55569
62477_2
63323
92575
63924
Xanthine dehydrogenase
Xanthine/uracil permease (XUV)
Copper amine oxidase
Urease
Urea transporter (DUR3)
Argininosuccinate synthase
Argininosuccinate lyase
-2.4
-4.1
-12.4
not DE
-4.6
not DE
-2.1
3.8
9.7
not DE
3.8
3.7
8.1
2.1
Purine metabolism
Purine metabolism
Amine metabolism
Urea metabolism
Urea transport
Urea cycle
Urea cycle
115
Figure legends
Figure 1. Distribution of genes that are differentially expressed in the phosphorus and nitrogen-
limited treatments relative to the replete treatment. Arrows indicate relatively higher or lower
expression relative to replete.
Figure 2. Overview of differentially expressed pathways and genes. Squares indicate response
under N-limitation compared to the replete condition while circles indicate response under P-
limitation compare to the replete condition. Green and red indicate up- and down-regulation
respectively. AMT: ammonium transporter, CPT: carnithine palmitoyl transferase, CuAO:
Copper amine oxidase, DUR3 and UT: urea transporter, FNT: formate/nitrite transporter,
GOGAT: glutamate synthase, GS: glutamine synthetase, Na/P: sodium-dependent inorganic
phosphate transporter, NiR: nitrite reductase, NR: nitrate reductase, NT: nitrate transporter,
pgCPS: carbamoyl phosphate synthase (involved in pyrimidine syntheses, uses glutamine as
substrate), PHO4: phosphate transporter of the pho4 family, unCPS: carbamoyl phosphate
synthase (involved in the urea cycle, uses ammonium as substrate), URE: urease, XDH: xanthine
dehydrogenase, XUV: xanthine-uracil permease.
Figure 3. Average normalized expression levels of genes involved in A) ammonium
Transport (A), nitrogen assimilation (B) (nitrite and nitrate reductase, glutamate synthase and
glutamine synthetase genes), and formate/nitrite transport (C) under P- and N-
limited conditions compared to the replete condition. Error bars indicate SEM. Expression levels
of individual genes involved in different aspect of nitrogen uptake and assimilation (D). The
value on the x-axis indicates the fold difference between the P-limited and replete sample while
116
the value on the y-axis indicates the fold difference between the N-limited and replete sample.
Points above the x-axis represent genes that are overexpressed under N-limitation relative to the
replete condition while points to the right of the y-axis represent genes that are overexpressed
under P-limitation relative to the replete condition. CPS: carbamoyl-phosphate synthase.
Figure 4. Average normalized expression levels of phosphate transporter genes under P- and N-
limited conditions compared to the replete condition. Error bars indicate SEM.
Figure 5. Average normalized expression levels of chlorophyll A-B binding protein (A) and
light harvesting protein (B) genes under P- and N-limited conditions compared to the replete
condition. Error bars indicate SEM. Expression levels of individual genes involved in
photosynthesis (C). The value on the x-axis indicates the fold difference between the P-limited
and replete sample while the value on the y-axis indicates the fold difference between the N-
limited and replete sample. Points above the x-axis represent genes that are overexpressed under
N-limitation relative to the replete condition while points to the right of the y-axis represent
genes that are overexpressed under P-limitation relative to the replete condition.
Figure 6. Average normalized expression levels of ribosomal protein (A), lysosomal protein (B)
and protein-folding (C) genes under P- and N-limited conditions compared to the replete
condition. Error bars indicate SEM. Expression levels of individual genes involved in protein
processing (D). The value on the x-axis indicates the fold difference between the P-limited and
replete sample while the value on the y-axis indicates the fold difference between the N-limited
and replete sample. Points above the x-axis represent genes that are overexpressed under N-
117
limitation relative to the replete condition while points to the right of the y-axis represent genes
that are overexpressed under P-limitation relative to the replete condition.
Figure 7. Average normalized expression levels of genes involved purine metabolism (A),
polyketide synthesis (B), fatty acid degradation (C) and TCA cycle (D) under P- and N-limited
conditions compared to the replete condition. Error bars indicate SEM.
Figures
118
Figure 1
119
Figure 2
120
Figure 3
121
Figure 4
122
Figure 5
123
Figure 6
124
Figure 7
125
References
Acinas, S. G., R. Sarma-Rupavtarm, V. Klepac-Ceraj, and M. F. Polz. 2005. PCR-induced
sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries
constructed from the same sample. Appl. Environ. Microbiol. 71: 8966–8969.
Aguiar, R., and P. Kugrens. 2000. New and rare chrysophytes from Wyoming and Colorado
lakes. J. Phycol. 36: 1–2.
Aguilera, A., F. Gómez, E. Lospitao, and R. Amils. 2006. A molecular approach to the
characterization of the eukaryotic communities of an extreme acidic environment: methods
for DNA extraction and denaturing gradient gel electrophoresis analysis. Syst. Appl.
Microbiol. 29: 593–605.
Allen, A. E., C. L. Dupont, M. Oborník, A. Horák, A. Nunes-Nesi, J. P. McCrow, H. Zheng, D.
A. Johnson, H. Hu, A. R. Fernie, and C. Bowler. 2011. Evolution and metabolic
significance of the urea cycle in photosynthetic diatoms. Nature 473: 203–7.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment
search tool. J. Mol. Biol. 215: 403–10.
Anantharaman, V., L. Iyer, and L. Aravind. 2007. Comparative genomics of protists: new
insights into the evolution of eukaryotic signal transduction and gene regulation. Annu.
Rev. Microbiol. 61: 453–475.
Ashelford, K. E., N. A. Chuzhanova, J. C. Fry, A. J. Jones, and A. J. Weightman. 2006. New
screening software shows that most recent large 16S rRNA gene clone libraries contain
chimeras. Appl. Environ. Microbiol. 72: 5734–5741.
126
Aure, J., and F. Rey. 1992. Oceanographic conditions in the sandsfjord system, western Norway,
after a bloom of the toxic prymnesiophyte Prymnesium parvum Carter in August 1990.
Sarsia 76: 247–254.
Bachmann, B. O., and J. Ravel. 2009. Methods for in silico prediction of microbial polyketide
and nonribosomal peptide biosynthetic pathways from DNA sequence data. Methods
Enzymol. 458: 181–217.
Baker, J., and J. Grover. 2009. Growth at the edge of the niche: an experimental study of the
harmful alga Prymnesium parvum. Limnol. Oceanogr. 54: 1679–1687.
Bayer, T., M. Aranda, S. Sunagawa, L. K. Yum, M. K. Desalvo, E. Lindquist, M. A. Coffroth, C.
R. Voolstra, and M. Medina. 2012. Symbiodinium transcriptomes: genome insights into the
dinoflagellate symbionts of reef-building corals. PLoS One 7: e35269.
Bender, S. J., C. A. Durkin, C. T. Berthiaume, R. Morales, and E. V. Armbrust. 2014.
Transcriptional responses of three model diatoms to nitrate limitation of growth. Front. Mar.
Sci. In press.
Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: a practical and
powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57: 289–300.
Berg, G. M., P. M. Glibert, M. W. Lomas, and M. A. Burford. 1997. Organic nitrogen uptake
and growth by the chrysophyte Aureococcus anophagefferens during a brown tide event.
Mar. Biol. 129: 377–387.
Berney, C., J. Fahrni, and J. Pawlowski. 2004. How many novel eukaryotic “kingdoms”? Pitfalls
and limitations of environmental DNA surveys. BMC Biol. 2: 13.
Bertrand, E. M., and A. E. Allen. 2012. Influence of vitamin B auxotrophy on nitrogen
metabolism in eukaryotic phytoplankton. Front. Microbiol. 3: 375.
127
Beszteri, S., I. Yang, N. Jaeckisch, U. Tillmann, S. Frickenhaus, G. Glöckner, A. Cembella, and
U. John. 2012. Transcriptomic response of the toxic prymnesiophyte Prymnesium parvum
(N. Carter) to phosphorus and nitrogen starvation. Harmful Algae 18: 1–15.
Bielewicz, S., E. Bell, W. Kong, I. Friedberg, J. C. Priscu, and R. M. Morgan-Kiss. 2011. Protist
diversity in a permanently ice-covered Antarctic lake during the polar night transition.
ISME J. 5: 1559–64.
Burkholder, J. M., P. M. Glibert, and H. M. Skelton. 2008. Mixotrophy, a major mode of
nutrition for harmful algal species in eutrophic waters. Harmful Algae 8: 77–93.
Caron, D. A., and P. D. Countway. 2009. Hypotheses on the role of the protistan rare biosphere
in a changing world. Aquat. Microb. Ecol. 57: 227–238.
Caron, D. A., P. D. Countway, P. Savai, R. J. Gast, A. Schnetzer, S. D. Moorthi, M. R. Dennett,
D. M. Moran, and A. C. Jones. 2009a. Defining DNA-based operational taxonomic units for
microbial-eukaryote ecology. Appl. Environ. Microbiol. 75: 5797–5808.
Caron, D. A., A. Z. Worden, P. D. Countway, E. Demir, and K. B. Heidelberg. 2009b. Protists
are microbes too: a perspective. ISME J. 3: 4–12.
Carvalho, W. F., and E. Granéli. 2010. Contribution of phagotrophy versus autotrophy to
Prymnesium parvum growth under nitrogen and phosphorus sufficiency and deficiency.
Harmful Algae 9: 105–115.
Chao, A., R. L. Chazdon, R. K. Colwell, and T.-J. Shen. 2005. A new statistical approach for
assessing similarity of species composition with incidence and abundance data. Ecol. Lett.
8: 148–159.
Chavez, F. P., K. R. Buck, and R. T. Barber. 1990. Phytoplankton taxa in relation to primary
production in the equatorial Pacific. Deep Sea Res. 37: 1733–1752.
128
Chen, F., A. J. Mackey, C. J. Stoeckert, and D. S. Roos. 2006. OrthoMCL-DB: querying a
comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34: D363–8.
Chevreux, B., T. Pfisterer, B. Drescher, A. J. Driesel, W. E. G. Müller, T. Wetter, and S. Suhai.
2004. Using the miraEST assembler for reliable and automated mRNA transcript assembly
and SNP detection in sequenced ESTs. Genome Res. 14: 1147–59.
Chung, C., S. Hwang, and J. Chang. 2003. Identification of a high-affinity phosphate transporter
gene in a prasinophyte alga, Tetraselmis chui, and its expression under nutrient limitation.
Appl. Environ. Microbiol. 69: 754–759.
Cichewicz, R. H., and K. D. Hambright. 2010. A revised amino group pKa for prymnesins does
not provide decisive evidence for a pH-dependent mechanism of Prymnesium parvum’s
toxicity. Toxicon 55: 1035–1037.
La Claire, J. W. 2006. Analysis of expressed sequence tags from the harmful alga, Prymnesium
parvum (Prymnesiophyceae, Haptophyta). Mar. Biotechnol. (NY). 8: 534–546.
Collier, J. L., and A. R. Grossman. 1992. Chlorosis induced by nutrient deprivation in
Synechococcus sp. strain PCC 7942: not all bleaching is the same. J. Bacteriol. 174: 4718–
4726.
Colwell, R. K., and J. A. Coddington. 1994. Estimating terrestrial biodiversity through
extrapolation. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 345: 101–18.
Conesa, A., S. Götz, J. M. García-Gómez, J. Terol, M. Talón, and M. Robles. 2005. Blast2GO: a
universal tool for annotation, visualization and analysis in functional genomics research.
Bioinformatics 21: 3674–6.
129
Countway, P. D., P. D. Vigil, A. Schnetzer, S. D. Moorthi, and D. A. Caron. 2010. Seasonal
analysis of protistan community structure and diversity at the USC Microbial Observatory
(San Pedro Channel, North Pacific Ocean). Limnol. Oceanogr. 55: 2381–2396.
Croft, M. T., M. J. Warren, and A. G. Smith. 2006. Algae need their vitamins. Eukaryot. Cell 5:
1175–1183.
Cuvelier, M. L., A. E. Allen, A. Monier, J. P. McCrow, M. Messié, S. G. Tringe, T. Woyke, R.
M. Welsh, T. Ishoey, J.-H. Lee, B. J. Binder, C. L. DuPont, M. Latasa, C. Guigand, K. R.
Buck, J. Hilton, M. Thiagarajan, E. Caler, B. Read, R. S. Lasken, F. P. Chavez, and A. Z.
Worden. 2010. Targeted metagenomics and ecology of globally important uncultured
eukaryotic phytoplankton. Proc. Natl. Acad. Sci. U. S. A. 107: 14679–14684.
Von Dassow, P., H. Ogata, I. Probert, P. Wincker, C. Da Silva, S. Audic, J.-M. Claverie, and C.
de Vargas. 2009. Transcriptome analysis of functional differentiation between haploid and
diploid cells of Emiliania huxleyi, a globally significant photosynthetic calcifying cell.
Genome Biol. 10: R114.
Dewick, P. M. 2001. Medicinal Natural Products: A Biosynthetic Approach, 2nd ed. John Wiley
& Sons, Ltd.
Díez, B., C. Pedrós-Alió, and R. Massana. 2001. Study of genetic diversity of eukaryotic
picoplankton in different oceanic regions by small-subunit rRNA gene cloning and
sequencing. Appl. Environ. Microbiol. 67: 2932–2941.
DiTullio, G. R., J. M. Grebmeier, K. R. Arrigo, M. P. Lizotte, D. H. Robinson, A. Leventer, J. P.
Barry, M. L. VanWoert, and R. B. Dunbar. 2000. Rapid and early export of Phaeocystis
antarctica blooms in the Ross Sea, Antarctica. Nature 404: 595–8.
130
Van Dolah, F. M., K. B. Lidie, E. A. Monroe, D. Bhattacharya, L. Campbell, G. J. Doucette, and
D. Kamykowski. 2009. The Florida red tide dinoflagellate Karenia brevis: new insights into
cellular and molecular processes underlying bloom dynamics. Harmful Algae 8: 562–572.
Droop, M. R. 1954. Cobalamin Requirement in Chrysophyceae. Nature 174: 520–521.
Dyhrman, S. S. T., B. D. B. Jenkins, T. A. Rynearson, M. A. Saito, M. L. Mercier, H. Alexander,
L. P. Whitney, A. Drzewianowski, V. V Bulygin, E. M. Bertrand, Z. Wu, C. Benitez-
Nelson, and A. Heithoff. 2012. The transcriptome and proteome of the diatom Thalassiosira
pseudonana reveal a diverse phosphorus stress response. PLoS One 7: e33768.
Dyhrman, S. T., S. T. Haley, S. R. Birkeland, L. L. Wurch, M. J. Cipriano, and A. G. McArthur.
2006. Long serial analysis of gene expression for gene discovery and transcriptome
profiling in the widespread marine coccolithophore Emiliania huxleyi. Appl. Environ.
Microbiol. 72: 252–260.
Edvardsen, B., and I. Imai. 2006. The ecology of harmful flagellates within Prymnesiophyceae
and Raphidophyceae, p. 67–79. In E. Graneli and J. Turner [eds.], Ecology of Harmful
Algae. Springer.
Edvardsen, B., and E. Paasche. 1998. Bloom dynamics and physiology of Prymnesium and
Chrysochromulina, p. 193–208. In D. Anderson, A. Cembella, and G. Hallegraeff [eds.],
Physiological ecology of harmful algal blooms. Springer.
Entcheva, P., D. A. Phillips, and W. R. Streit. 2002. Functional analysis of Sinorhizobium
meliloti genes involved in biotin synthesis and transport. Appl. Environ. Microbiol. 68:
2843–8.
Ewing, B., and P. Green. 1998. Base-Calling of Automated Sequencer Traces Using Phred. II.
Error Probabilities. Genome Res. 8: 186–194.
131
Ewing, B., L. D. Hillier, and M. C. Wendl. 1998. Base-calling of automated sequencer traces
using Phred. I. Accuracy assessment. Genome Res. 8: 175–185.
Finn, R. D., J. Mistry, J. Tate, P. Coggill, A. Heger, J. E. Pollington, O. L. Gavin, P.
Gunasekaran, G. Ceric, K. Forslund, L. Holm, E. L. L. Sonnhammer, S. R. Eddy, and A.
Bateman. 2010. The Pfam protein families database. Nucleic Acids Res. 38: D211–22.
Frada, M., F. Not, and C. de Vargas. 2009. Extreme diversity in noncalcifying haptophytes
explains a major pigment paradox in open oceans. Proc. Natl. Acad. Sci. U. S. A. 106:
12803–12808.
Freitag, M., S. Beszteri, H. Vogel, and U. John. 2011. Effects of physiological shock treatments
on toxicity and polyketide synthase gene expression in Prymnesium parvum
(Prymnesiophyceae). Eur. J. Phycol. 46: 193–201.
Fuhrman, J. A., K. McCallum, and A. A. Davis. 1993. Phylogenetic diversity of subsurface
marine microbial communities from the Atlantic and Pacific Oceans. Appl. Environ.
Microbiol. 59: 1294.
Giovannoni, S. J., T. B. Britschgi, C. L. Moyer, and K. G. Field. 1990. Genetic diversity in
Sargasso Sea bacterioplankton. Nature 345: 60–63.
Gobler, C., and S. Sañudo-Wilhelmy. 2001. Effects of organic carbon, organic nitrogen,
inorganic nutrients, and iron additions on the growth of phytoplankton and bacteria during a
brown tide bloom. Mar. Ecol. Prog. Ser. 209: 19–34.
Gomez, F. 2007. The consortium of the protozoan Solenicola setigera and the diatom
Leptocylindrus mediterraneus in the Pacific Ocean. Acta Protozool. 46: 15–24.
132
Gómez, F., D. Moreira, K. Benzerara, and P. López-García. 2010. Solenicola setigera is the first
characterized member of the abundant and cosmopolitan uncultured marine stramenopile
group MAST-3. Environ. Microbiol. 13: 193–202.
Gough, J., K. Karplus, R. Hughey, and C. Chothia. 2001. Assignment of homology to genome
sequences using a library of hidden Markov models that represent all proteins of known
structure. J. Mol. Biol. 313: 903–19.
Granéli, E., and N. Johansson. 2003a. Effects of the toxic haptophyte Prymnesium parvum on the
survival and feeding of a ciliate: the influence of different nutrient conditions. Mar. Ecol.
Prog. Ser. 254: 49–56.
Granéli, E., and N. Johansson. 2003b. Increase in the production of allelopathic substances by
Prymnesium parvum cells grown under N- or P-deficient conditions. Harmful Algae 2: 135–
145.
Granéli, E., and P. Salomon. 2010. Factors influencing allelopathy and toxicity in Prymnesium
parvum. J. Am. Water Resour. Assoc. 46: 108–120.
Green, J. 1991. Phagotrophy in prymnesiophyte flagellates, p. 401–414. In D.J. Patterson and J.
Larsen [eds.], Systematics Association Special Volume Series; The Biology Of Free-Living
Heterotrophic Flagellates. Clarendon Press.
Grigoriev, I. V, H. Nordberg, I. Shabalov, A. Aerts, M. Cantor, D. Goodstein, A. Kuo, S.
Minovitsky, R. Nikitin, R. A. Ohm, R. Otillar, A. Poliakov, I. Ratnere, R. Riley, T.
Smirnova, D. Rokhsar, and I. Dubchak. 2012. The genome portal of the Department of
Energy Joint Genome Institute. Nucleic Acids Res. 40: D26–32.
Grossman, A. 2000. Acclimation of Chlamydomonas reinhardtii to its nutrient environment.
Protist 151: 201–24.
133
Guillou, L., M. Viprey, A. Chambouvet, R. M. Welsh, A. R. Kirkham, R. Massana, D. J.
Scanlan, and A. Z. Worden. 2008. Widespread occurrence and genetic diversity of marine
parasitoids belonging to Syndiniales (Alveolata). Environ. Microbiol. 10: 3349–3365.
Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Syst. Biol. 52: 696–704.
Guiry, M., and G. Guiry. 2007. AlgaeBase version 4.2. Electron. Publ. Natl. Univ. Irel.
Hackett, J. D., D. M. Anderson, D. L. Erdner, and D. Bhattacharya. 2004. Dinoflagellates: a
remarkable evolutionary experiment. Am. J. Bot. 91: 1523–1534.
Haft, D. H., B. J. Loftus, D. L. Richardson, F. Yang, J. A. Eisen, I. T. Paulsen, and O. White.
2001. TIGRFAMs: a protein family resource for the functional identification of proteins.
Nucleic Acids Res. 29: 41–3.
Hambright, K., R. Zamor, and J. Easton. 2010. Temporal and spatial variability of an invasive
toxigenic protist in a North American subtropical reservoir. Harmful Algae 9: 568–577.
Hansen, P. J., and M. Hjorth. 2002. Growth and grazing responses of Chrysochromulina ericina
(Prymnesiophyceae): the role of irradiance, prey concentration and pH. Mar. Biol. 141:
975–983.
Hansen, P. J., T. G. Nielsen, and H. Kaas. 1995. Distribution and growth of protists and
mesozooplankton during a bloom of Chrysochromulina spp. (Prymnesiophyceae,
Prymnesiales). Phycologia 34: 409–416.
Harada, A., S. Ohtsuka, and T. Horiguchi. 2007. Species of the parasitic genus Duboscquella are
members of the enigmatic Marine Alveolate Group I. Protist 158: 337–347.
Hargraves, P. E., and L. Maranda. 2002. Potentially toxic or harmful microalgae from the
Northeast coast. Northeast. Nat. 9: 81–120.
134
Hartmann, M., C. Grob, G. A. Tarran, A. P. Martin, P. H. Burkill, D. J. Scanlan, and M. V
Zubkov. 2012. Mixotrophic basis of Atlantic oligotrophic ecosystems. Proc. Natl. Acad.
Sci. U. S. A. 109: 5756–5760.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular
clock of mitochondrial DNA. J. Mol. Evol. 22: 160–174.
Heidelberg, K. B., J. A. Gilbert, and I. Joint. 2010. Marine genomics: at the interface of marine
microbial ecology and biodiscovery. Microb. Biotechnol. 3: 531–43.
Helliwell, K. E., G. L. Wheeler, K. C. Leptos, R. E. Goldstein, and A. G. Smith. 2011. Insights
into the evolution of vitamin B
12
auxotrophy from sequenced algal genomes. Mol. Biol.
Evol. 28: 2921–33.
Heywood, J. L., M. E. Sieracki, W. Bellows, N. J. Poulton, and R. Stepanauskas. 2010.
Capturing diversity of marine heterotrophic protists: one cell at a time. ISME J. 1–11.
Hildebrand, M. 2005. Cloning and functional characterization of ammonium transporters from
the marine diatom Cylindrotheca fusiformis. J. Phycol. 41: 105–113.
Huang, X., and A. Madan. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9:
868–77.
Hughes, J. B., J. J. Hellmann, T. H. Ricketts, and B. J. M. Bohannan. 2001. Counting the
uncountable: statistical approaches to estimating microbial diversity. Appl. Environ.
Microbiol. 67: 4399–4406.
Igarashi, T., M. Satake, and T. Yasumoto. 1999. Structures and partial stereochemical
assignments for Prymnesin-1 and Prymnesin-2: Potent hemolytic and ichthyotoxic
glycosides isolated from the red tide alga Prymnesium parvum. J. Am. Chem. Soc. 121:
8499–8511.
135
Iseli, C., C. V Jongeneel, and P. Bucher. 1999. ESTScan: a program for detecting, evaluating,
and reconstructing potential coding regions in EST sequences. Proc. Int. Conf. Intell. Syst.
Mol. Biol. 138–48.
Jeong, H. J., Y. Du Yoo, J. S. Kim, K. A. Seong, N. S. Kang, and T. H. Kim. 2010. Growth,
feeding and ecological roles of the mixotrophic and heterotrophic dinoflagellates in marine
planktonic food webs. Ocean Sci. J. 45: 65–91.
John, U., B. Beszteri, E. Derelle, Y. Van de Peer, B. Read, H. Moreau, and A. Cembella. 2008.
Novel insights into evolution of protistan polyketide synthases through phylogenomic
analysis. Protist 159: 21–30.
John, U., S. Beszteri, G. Glöckner, R. Singh, L. Medlin, and A. Cembella. 2010. Genomic
characterisation of the ichthyotoxic prymnesiophyte Chrysochromulina polylepis, and the
expression of polyketide synthase genes in synchronized cultures. Eur. J. Phycol. 45: 215–
229.
Jones, A. C., T. S. V. Liao, F. Z. Najar, B. A. Roe, K. D. Hambright, and D. A. Caron. 2013.
Seasonality and disturbance: annual pattern and response of the bacterial and microbial
eukaryotic assemblages in a freshwater ecosystem. Environ. Microbiol. 15: 2557–72.
Jones, H. L. J., B. S. C. Leadbeater, and J. C. Green. 2009. Mixotrophy in marine species of
Chrysochromulina (Prymnesiophyceae): ingestion and digestion of a small green flagellate.
J. Mar. Biol. Assoc. UK 73: 283–296.
Jurgenson, C. T., T. P. Begley, and S. E. Ealick. 2009. The structural and biochemical
foundations of thiamin biosynthesis. Annu. Rev. Biochem. 78: 569–603.
136
Kaartvedt, S., T. M. Johnsen, D. L. Aksnes, U. Lie, and H. Svendsen. 1991. Occurrence of the
toxic phytoflagellate Prymnesium parvum and associated fish mortality in a norwegian fjord
system. Can. J. Fish. Aquat. Sci. 48: 2316–2323.
Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. 2004. The KEGG resource for
deciphering the genome. Nucleic Acids Res. 32: D277–80.
Kellmann, R., A. Stüken, R. J. S. Orr, H. M. Svendsen, and K. S. Jakobsen. 2010. Biosynthesis
and molecular genetics of polyketides in marine dinoflagellates. Mar. Drugs 8: 1011–1048.
Knittel, K., B. M. Fuchs, W. Ludwig, E. Pruesse, C. Quast, J. Peplies, and F. O. Glockner. 2007.
SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA
sequence data compatible with ARB. Nucleic Acids Res. 35: 7188–7196.
Kong, W., D. C. Ream, J. C. Priscu, and R. M. Morgan-Kiss. 2012. Diversity and expression of
RubisCO genes in a perennially ice-covered Antarctic lake during the polar night transition.
Appl. Environ. Microbiol. 78: 4358–66.
Koonin, E. V, N. D. Fedorova, J. D. Jackson, A. R. Jacobs, D. M. Krylov, K. S. Makarova, R.
Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, I. B. Rogozin, S. Smirnov, A. V
Sorokin, A. V Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, and D. A. Natale. 2004. A
comprehensive evolutionary classification of proteins encoded in complete eukaryotic
genomes. Genome Biol. 5: R7.
Lagesen, K., P. Hallin, E. A. Rødland, H.-H. Staerfeldt, T. Rognes, and D. W. Ussery. 2007.
RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res.
35: 3100–8.
Li, H., and R. Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25: 1754–60.
137
Li, Q., X. Gao, Y. Sun, Q. Zhang, R. Song, and Z. Xu. 2006. Isolation and characterization of a
sodium-dependent phosphate transporter gene in Dunaliella viridis. Biochem. Biophys. Res.
Commun. 340: 95–104.
Li, R., Y. Li, K. Kristiansen, and J. Wang. 2008. SOAP: short oligonucleotide alignment
program. Bioinformatics 24: 713–4.
Li, W., and A. Godzik. 2006. Cd-hit: a fast program for clustering and comparing large sets of
protein or nucleotide sequences. Bioinformatics 22: 1658–9.
Lindholm, T., P. Öhman, K. Kurki-Helasmo, B. Kincaid, and J. Meriluoto. 1999. Toxic algae
and fish mortality in a brackish-water lake in Åland, SW Finland. Hydrobiologia 397: 109–
120.
Litchman, E., D. Steiner, and P. Bossard. 2003. Photosynthetic and growth responses of three
freshwater algae to phosphorus limitation and daylength. Freshw. Biol. 46: 2141–2148.
López-García, P., F. Rodríguez-Valera, and C. Pedrós-Alió. 2001. Unexpected diversity of small
eukaryotes in deep-sea Antarctic plankton. Nature 409: 603–607.
Lottaz, C., C. Iseli, C. V. Jongeneel, and P. Bucher. 2003. Modeling sequencing errors by
combining Hidden Markov models. Bioinformatics 19: ii103–ii112.
Lowe, C. D., L. V Mello, N. Samatar, L. E. Martin, D. J. S. Montagnes, and P. C. Watts. 2011.
The transcriptome of the novel dinoflagellate Oxyrrhis marina (Alveolata: Dinophyceae):
response to salinity examined by 454 sequencing. BMC Genomics 12: 519.
Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer
RNA genes in genomic sequence. Nucleic Acids Res. 25: 955–64.
Lü, W., J. Du, and N. Schwarzer. 2013. The formate/nitrite transporter family of anion channels.
Biol. Chem. 394: 715–727.
138
Manning, S. R., and J. W. La Claire. 2010. Prymnesins: toxic metabolites of the golden alga,
Prymnesium parvum Carter (Haptophyta). Mar. Drugs 8: 678–704.
Manning, S. R., and I. I. La Claire John W. 2013. Isolation of polyketides from Prymnesium
parvum (Haptophyta) and their detection by liquid chromatography/mass spectrometry
metabolic fingerprint analysis. Anal. Biochem. 442: 189–195.
Massana, R., J. Castresana, V. Balagué, L. Guillou, K. Romari, A. Groisillier, and K. Valentin.
2004. Phylogenetic and ecological analysis of novel marine stramenopiles. Appl. Environ.
Microbiol. 70: 3528–3534.
Massana, R., R. Terrado, I. Forn, C. Lovejoy, C. Pedros-Alio, and C. Pedrós-Alió. 2006.
Distribution and abundance of uncultured heterotrophic flagellates in the world oceans.
Environ. Microbiol. 8: 1515–1522.
McGettigan, P. A. 2013. Transcriptomics in the RNA-seq era. Curr. Opin. Chem. Biol. 17: 4–11.
McLaughlin, J. J. A. 1958. Euryhaline chrysomonads: nutrition and toxigenesis in Prymnesium
parvum, with notes on Isochrysis galbana and Monochrysis lutheri. J. Protozool. 5: 75–81.
Medlin, L., H. J. Elwood, S. Stickel, and M. L. Sogin. 1988. The characterization of
enzymatically amplified eukaryotic 16S-like rRNA-coding regions. Gene 71: 491–499.
Miller, D. N., J. E. Bryant, E. L. Madsen, and W. C. Ghiorse. 1999. Evaluation and optimization
of DNA extraction and purification procedures for soil and sediment samples. Appl. Envir.
Microbiol. 65: 4715–4724.
Miller, R., G. Wu, and R. Deshpande. 2010. Changes in transcript abundance in Chlamydomonas
reinhardtii following nitrogen deprivation predict diversion of metabolism. Plant Physiol.
154: 1737–1752.
139
Mitra, A., and K. J. Flynn. 2010. Modelling mixotrophy in harmful algal blooms: More or less
the sum of the parts? J. Mar. Syst. 83: 158–169.
Moestrup, Ø. 1994. Economic aspects: blooms, nuisance species, and toxins, p. 265–285. In J.
Green and B. Leadbeater [eds.], The Haptophyte Algae. Clarendon Press.
Monier, A., R. M. Welsh, C. Gentemann, G. Weinstock, E. Sodergren, E. V. Armbrust, J. A.
Eisen, and A. Z. Worden. 2012. Phosphate transporters in marine phytoplankton and their
viruses: cross-domain commonalities in viral-host gene exchanges. Environ. Microbiol. 14:
162–76.
Monroe, E. A., and F. M. Van Dolah. 2008. The toxic dinoflagellate Karenia brevis encodes
novel type I-like polyketide synthases containing discrete catalytic domains. Protist 159:
471–482.
Moon-van der Staay, S. Y., R. De Wachter, and D. Vaulot. 2001. Oceanic 18S rDNA sequences
from picoplankton reveal unsuspected eukaryotic diversity. Nature 409: 607–610.
Moorthi, S. D., D. A. Caron, R. J. Gast, and R. W. Sanders. 2009. Mixotrophy: a widespread and
important ecological strategy for planktonic and sea-ice nanoflagellates in the Ross Sea,
Antarctica. Aquat. Microb. Ecol. 54: 269–277.
Moustafa, A., A. N. Evans, D. M. Kulis, J. D. Hackett, D. L. Erdner, D. M. Anderson, and D.
Bhattacharya. 2010. Transcriptome profiling of a toxic dinoflagellate reveals a gene-rich
protist and a potential impact on gene expression due to bacterial presence. PLoS One 5:
e9688.
Narasingarao, P., S. Podell, J. A. Ugalde, C. Brochier-Armanet, J. B. Emerson, J. J. Brocks, K.
B. Heidelberg, J. F. Banfield, and E. E. Allen. 2012. De novo metagenomic assembly
140
reveals abundant novel major lineage of Archaea in hypersaline microbial communities.
ISME J. 6: 81–93.
Not, F., K. Valentin, K. Romari, C. Lovejoy, R. Massana, K. Tobe, D. Vaulot, and L. K. Medlin.
2007. Picobiliphytes: A marine picoplanktonic algal group with unknown affinities to other
eukaryotes. Science. 315: 253–255.
Nygaard, K., and A. Tobiesen. 1993. Bacterivory in algae: a survival strategy during nutrient
limitation. Limnol. Oceanogr. 38: 273–279.
Pagani, I., K. Liolios, J. Jansson, I. M. A. Chen, T. Smirnova, B. Nosrat, V. M. Markowitz, and
N. C. Kyrpides. 2011. The Genomes OnLine Database (GOLD) v.4: status of genomic and
metagenomic projects and their associated metadata. Nucleic Acids Res. 40: D571–D579.
Palenik, B. 1997. The use of amides and other organic nitrogen sources by the phytoplankton
Emiliania huxleyi. Limnol. Oceanogr. 42: 1544 – 1551.
Park, S., G. Jung, Y. Hwang, and E. Jin. 2010. Dynamic response of the transcriptome of a
psychrophilic diatom, Chaetoceros neogracile, to high irradiance. Planta 231: 349–360.
Perez, R., L. Liu, J. Lopez, T. An, and K. S. Rein. 2008. Diverse bacterial PKS sequences
derived from okadaic acid-producing dinoflagellates. Mar. Drugs 6: 164–79.
Persson, B. L., J. O. Lagerstedt, J. R. Pratt, J. Pattison-Granberg, K. Lundh, S. Shokrollahzadeh,
and F. Lundh. 2003. Regulation of phosphate acquisition in Saccharomyces cerevisiae.
Curr. Genet. 43: 225–44.
Prokopowich, C. D., T. R. Gregory, and T. J. Crease. 2003. The correlation between rDNA copy
number and genome size in eukaryotes. Genome 46: 48–50.
Provasoli, L., and A. Carlucci. 1974. Vitamins and growth regulators, p. 741–787. In W. Stewart
and M. Abbot [eds.], Algal Physiology and Biochemistry. Blackwell Science.
141
Rahat, M., and K. Reich. 1963. The B
12
vitamins and methionine in the metabolism of
Prymnesium parvum. J. Gen. Microbiol. 31: 203–9.
Raymond, J. A., M. G. Janech, and C. H. Fritsen. 2009. Novel ice-binding proteins from a
psychrophilic Antarctic alga (Chlamydomonadaceae, Chlorophyceae). J. Phycol. 45: 130–
136.
Read, B. A., J. Kegel, M. J. Klute, A. Kuo, S. C. Lefebvre, F. Maumus, C. Mayer, J. Miller, A.
Monier, A. Salamov, J. Young, M. Aguilar, J.-M. Claverie, S. Frickenhaus, K. Gonzalez, E.
K. Herman, Y.-C. Lin, J. Napier, H. Ogata, A. F. Sarno, J. Shmutz, D. Schroeder, C. de
Vargas, F. F. Verret, P. von Dassow, K. Valentin, Y. Van de Peer, G. Wheeler, E. huxleyi
A. Consortium, J. B. Dacks, C. F. Delwiche, S. T. Dyhrman, G. Glöckner, U. John, T.
Richards, A. Z. Worden, X. Zhang, and I. V Grigoriev. 2013. Pan genome of the
phytoplankton Emiliania underpins its global distribution. Nature 499: 209–213.
Richards, T. A., and D. Bass. 2005. Molecular screening of free-living microbial eukaryotes:
diversity and distribution using a meta-analysis. Curr. Opin. Microbiol. 8: 240–52.
Robinson, M. D., D. J. McCarthy, and G. K. Smyth. 2010. edgeR: a Bioconductor package for
differential expression analysis of digital gene expression data. Bioinformatics 26: 139–40.
Romari, K., and D. Vaulot. 2004. Composition and temporal variability of picoeukaryote
communities at a coastal site of the English Channel from 18S rDNA sequences. Limnol.
Oceanogr. 49: 784–798.
Salcedo, T., R. J. Upadhyay, K. Nagasaki, and D. Bhattacharya. 2012. Dozens of toxin-related
genes are expressed in a nontoxic strain of the dinoflagellate Heterocapsa circularisquama.
Mol. Biol. Evol. 29: 1503–1506.
142
Sanders, R. W. 1991. Mixotrophic protists in marine and freshwater ecosystems. J. Protozool.
38: 76–81.
Sanders, R. W. 2011. Alternative nutritional strategies in protists: symposium introduction and a
review of freshwater protists that combine photosynthesis and heterotrophy. J. Eukaryot.
Microbiol. 58: 181–4.
Sarthou, G., K. R. Timmermans, S. Blain, and P. Tréguer. 2005. Growth physiology and fate of
diatoms in the ocean: a review. J. Sea Res. 53: 25–42.
Shah, N., and P. J. Syrett. 1984. The uptake of guanine and hypoxanthine by marine microalgae.
J. Mar. Biol. Assoc. United Kingdom 64: 545–556.
Shen, H., and L. Song. 2007. Comparative studies on physiological responses to phosphorus in
two phenotypes of bloom-forming Microcystis. Hydrobiologia 592: 475–486.
Sherr, B. F., E. Sherr, D. A. Caron, D. Vaulot, and A. Z. Worden. 2007. Oceanic Protists.
Oceanography 20: 130–134.
Shilo, M. 1971. Toxins of Chrysophyceae, p. 67–103. In S. Ajl, A. Ciegler, S. Kadis, G. Montie,
and G. Weinbaum [eds.], Microbial Toxins. Academic Press.
Shilo, M. 1981. The toxic principles of Prymnesium parvum, p. 37–47. In W. Carmichael [ed.],
The Water Environment. Plenum Press.
Simpson, J. T., and R. Durbin. 2012. Efficient de novo assembly of large genomes using
compressed data structures. Genome Res. 22: 549–56.
Simpson, J. T., K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol. 2009. ABySS:
a parallel assembler for short read sequence data. Genome Res. 19: 1117–23.
143
Smith, W. O., L. A. Codispoti, D. M. Nelson, T. Manley, E. J. Buskey, H. J. Niebauer, and G. F.
Cota. 1991. Importance of Phaeocystis blooms in the high-latitude ocean carbon cycle.
Nature 352: 514–516.
Stoeck, T., B. Hayward, G. T. Taylor, R. Varela, and S. S. Epstein. 2006. A multiple PCR-primer
approach to access the microeukaryotic diversity in environmental samples. Protist 157: 31–
43.
Stoecker, D. K. 1998. Conceptual models of mixotrophy in planktonic protists and some
ecological and evolutionary implications. Eur. J. Protistol. 34: 281–290.
Sun, S., J. Chen, W. Li, I. Altintas, A. Lin, S. Peltier, K. Stocks, E. E. Allen, M. Ellisman, J.
Grethe, and J. Wooley. 2011. Community cyberinfrastructure for Advanced Microbial
Ecology Research and Analysis: the CAMERA resource. Nucleic Acids Res. 39: D546–51.
Sunda, W., E. Graneli, and C. Gobler. 2006. Positive feedback and the development and
persistence of ecosystem disruptive algal blooms. J. Phycol. 42: 963–974.
Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei, and S. Kumar. 2011. MEGA5:
molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance,
and maximum parsimony methods. Mol. Biol. Evol. 28: 2731–9.
Tang, Y. Z., F. Koch, and C. J. Gobler. 2010. Most harmful algal bloom species are vitamin B
1
and B
12
auxotrophs. Proc. Natl. Acad. Sci. U. S. A. 107: 20756–61.
Tatusov, R. L., N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin, E. V Koonin, D. M.
Krylov, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, S. Smirnov, A. V
Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, and D. A. Natale. 2003. The COG database:
an updated version includes eukaryotes. BMC Bioinformatics 4: 41.
Team, R. D. C. 2005. R: A language and environment for statistical computing.
144
Terrado, R., E. Medrinal, C. Dasilva, M. Thaler, W. F. Vincent, and C. Lovejoy. 2011. Protist
community composition during spring in an Arctic flaw lead polynya. Polar Biol. 34: 1901–
1914.
Thamatrakoln, K., O. Korenovska, A. K. Niheu, and K. D. Bidle. 2012. Whole-genome
expression analysis reveals a role for death-related genes in stress acclimation of the diatom
Thalassiosira pseudonana. Environ. Microbiol. 14: 67–81.
Thomsen, H. A., K. R. Buck, and F. P. Chavez. 1994. Haptophytes as components of marine
phytoplankton, J. Green and B. Leadbeater [eds.]. Clarendon Press.
Thorpe, C., and J. J. Kim. 1995. Structure and mechanism of action of the acyl-CoA
dehydrogenases. FASEB J. 9: 718–25.
Tillmann, U., and K. J. Hesse. 1999. Large-scale parasitic infection of diatoms in the
Northfrisian Wadden Sea. J. Sea Res. 42: 255–261.
Turpin, D. H. 1991. Effects of inorganic N availability on algal photosynthesis and carbon
metabolism. J. Phycol. 27: 14–20.
Unrein, F., J. M. Gasol, F. Not, I. Forn, and R. Massana. 2013. Mixotrophic haptophytes are key
bacterial grazers in oligotrophic coastal waters. ISME J. 8: 164–176.
Unrein, F., R. Massana, and L. Alonso-Sáez. 2007. Significant year-round effect of small
mixotrophic flagellates on bacterioplankton in an oligotrophic coastal system. Limnol.
Oceanogr. 52: 456–469.
Uronen, P., S. Lehtinen, and C. Legrand. 2005. Haemolytic activity and allelopathy of the
haptophyte Prymnesium parvum in nutrient-limited and balanced growth conditions. Mar.
Ecol. Prog. Ser. 299: 137–148.
145
Watson, S. 2001. Literature review of the microalga Prymnesium parvum and its associated
toxicity.
Weekers, P. H., R. J. Gast, P. A. Fuerst, and T. J. Byers. 1994. Sequence variations in small-
subunit ribosomal RNAs of Hartmannella vermiformis and their phylogenetic implications.
Mol. Biol. Evol. 11: 684–690.
Wintzingerode, F. V, and U. B. Göbel. 1997. Determination of microbial diversity in
environmental samples: pitfalls of PCR-based rRNA analysis. FEMS Microbiol. Rev. 21:
213–229.
Worden, A. Z. 2006. Picoeukaryote diversity in coastal waters of the Pacific Ocean. Aquat.
Microb. Ecol. 43: 165–175.
Worden, A. Z., J. H. Lee, T. Mock, P. Rouzé, M. P. Simmons, and A. L. Aerts. 2009. Green
evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes
Micromonas. Science. 324: 268–272.
Wurch, L. L., S. T. Haley, E. D. Orchard, C. J. Gobler, and S. T. Dyhrman. 2010. Nutrient-
regulated transcriptional responses in the brown tide-forming alga Aureococcus
anophagefferens. Environ. Microbiol. 13: 468–481.
Xu, L., H. Chen, X. Hu, R. Zhang, Z. Zhang, and Z. W. Luo. 2006. Average gene length is
highly conserved in prokaryotes and eukaryotes and diverges only between the two
kingdoms. Mol. Biol. Evol. 23: 1107–1108.
Yariv, J., and S. Hestrin. 1961. Toxicity of the extracellular phase of Prymnesium parvum
cultures. J. Gen. Microbiol. 24: 165–175.
146
Yoon, H. S., D. C. Price, R. Stepanauskas, V. D. Rajah, M. E. Sieracki, W. H. Wilson, E. C.
Yang, S. Duffy, and D. Bhattacharya. 2011. Single-cell genomics reveals organismal
interactions in uncultivated marine protists. Science. 332: 714–717.
Zhang, Z., and W. I. Wood. 2003. A profile hidden Markov model for signal peptides generated
by HMMER. Bioinformatics 19: 307–8.
Zhu, F., R. Massana, F. Not, D. Marie, and D. Vaulot. Mapping of picoeucaryotes in marine
ecosystems with quantitative PCR of the 18S rRNA gene. FEMS Microbiol. Ecol. 52: 79–
92.
Zubkov, M. V, and G. A. Tarran. 2008. High bacterivory by the smallest phytoplankton in the
North Atlantic Ocean. Nature 455: 224–226.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Annual pattern and response of the bacterial and microbial eukaryotic communities in an aquatic ecosystem restructured by disturbance
PDF
Marine protistan diversity, spatiotemporal dynamics, and physiological responses to environmental cues
PDF
The microbiome of gorgonian octocorals, Muricea, with a description of a novel, photosynthetic protistan symbiont
PDF
Ecological implications of daily-to-weekly dynamics of marine bacteria, archaea, viruses, and phytoplankton
PDF
Characterizing protistan diversity and quantifying protistan grazing in the North Pacific Subtropical Gyre
PDF
Using sequencing techniques to explore the microbial communities associated with ferromanganese nodules and sediment from the South Pacific gyre
PDF
Genetic characterization of microbial eukaryotic diversity and metabolic potential
PDF
Iron-dependent response mechanisms of the nitrogen-fixing cyanobacterium Crocosphaera to climate change
PDF
The molecular adaptation of Trichodesmium to long-term CO₂-selection under multiple nutrient limitation regimes
PDF
Comparative physiological studies of marine invertebrate larvae from Antarctic and temperate environments
PDF
Comparative transcriptomics: connecting the genome to evolution
PDF
Transcriptional and morphological impacts of copper on Mytilus californianus larval development in current and future ocean conditions
PDF
Physiological and ecological consequences of environmental temperature on Antarctic protists
PDF
The early Triassic recovery period: exploring ecology and evolution in marine benthic communities following the Permian-Triassic mass extinction
PDF
Molecular and behavioral mechanisms of circatidal biological rhythms in intertidal mollusks
PDF
Patterns of molecular microbial activity across time and biomes
PDF
Characterizing the physiological roles and regulatory mechanisms of Maf1
PDF
Temporal variability of marine archaea across the water column at SPOT
PDF
Phytoplankton bloom initiation in the Southern California Bight: a multi-year local and regional analysis
PDF
Probing the genetic basis of gene expression variation through Bayesian analysis of allelic imbalance and transcriptome studies of oil palm interspecies hybrids
Asset Metadata
Creator
Koid, Amy E.
(author)
Core Title
Using molecular techniques to explore the diversity, ecology and physiology of important protistan species, with an emphasis on the Prymnesiophyceae
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Marine and Environmental Biology
Publication Date
06/18/2016
Defense Date
04/03/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bioinformatics,comparative transcriptomics,nutrient limitation,OAI-PMH Harvest,prymnesiophytes,transcriptomics
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Heidelberg, Karla B. (
committee chair
), Bottjer, David J. (
committee member
), Caron, David A. (
committee member
), Heidelberg, John F. (
committee member
)
Creator Email
amyyarthur@gmail.com,koid@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-421640
Unique identifier
UC11285929
Identifier
etd-KoidAmyE-2559.pdf (filename),usctheses-c3-421640 (legacy record id)
Legacy Identifier
etd-KoidAmyE-2559.pdf
Dmrecord
421640
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Koid, Amy E.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
bioinformatics
comparative transcriptomics
nutrient limitation
prymnesiophytes
transcriptomics