Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Using sequencing techniques to explore the microbial communities associated with ferromanganese nodules and sediment from the South Pacific gyre
(USC Thesis Other)
Using sequencing techniques to explore the microbial communities associated with ferromanganese nodules and sediment from the South Pacific gyre
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
USING SEQUENCING TECHNIQUES TO EXPLORE THE MICROBIAL
COMMUNITIES ASSOCIATED WITH FERROMANGANESE NODULES AND
SEDIMENT FROM THE SOUTH PACIFIC GYRE
by
Benjamin J Tully
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOLOGICAL SCIENCES)
August 2013
i
Table of Contents
List of Tables…………………………………………………………………………………….iv
List of Figures……………………………………………………………………………….…...vi
Chapter One
Introduction, Background, and Context of Microbial Environmental Genomics…………..1
1.1 Environmental Microbiology……………………………………………………………..1
1.2 Environmental Genomics…………………………………………………………………2
1.3 Future Directions………………………………………………………………………….4
1.4 Thesis Chapter Titles and Abstracts………………………………………………………7
1.4.1 Chapter 2 – Metagenomic analysis of a complex marine planktonic thaumarchaeal
community from the Gulf of Maine………………………………………………..7
1.4.2 Chapter 3 – Comparative genomics of planktonic Flavobacteriaceae from the Gulf
of Maine using environmentally derived genomes………………………………...8
1.4.3 Chapter 4 – Microbial communities associated with ferromanganese nodules and
the surrounding sediments………………………………………………………….8
1.4.4 Chapter 5 – Metagenomic analysis of microbial communities from the South Pacific
Gyre reveals putative mechanisms for biological generation of ferromanganese
nodules and complex biogeochemical interactions………………………………...9
1.5 References………………………………………………………………………………..10
Chapter Two
Metagenomic analysis of a complex marine planktonic thaumarchaeal community from the
Gulf of Maine……………………………………………………………………….…………...13
2.1 Introduction………………..……………………………………………………………..13
2.2 Results and Discussions……………………………………………………………….…14
2.2.1 Environmental Sequencing and Binning…………………..……………………....14
2.2.2 Ammonia Oxidation and Carbon Fixation………………………………………....15
2.2.3 Comparative genomics of Ca. Nitrosopumilus SCM1…………….………………16
2.2.4 Metagenomic population diversity……………..……………………………….....20
2.2.5 Functional diversity………………………………………………………………..21
2.3 Experimental Procedures..……………………………………………………………….24
2.3.1 Sample Collection…………………..………………………………………...…....24
2.3.2 Sequencing and Assembly………………………………………………………....24
2.3.3 Identification of thaumarchaeal signature…………….……………………………25
2.3.4 Functional Gene Assignment……………..………………….………………….....27
2.3.5 Within-Bin Diversity…………………………………………………..…………..27
2.3.6 Coverage…………………………………………………………………………...28
2.3.7 Thaumarchaeal scaffold analysis…………………………………………………..28
2.4 Availability of sequences…………………………………...……………………………28
2.5 Acknowledgements……………………………………………………………………....28
2.6 References………………………………………………………………………………..29
ii
Chapter Three
Comparative genomics of planktonic Flavobacteriaceae from the Gulf of Maine using
environmentally derived genomes……………………………………………………………..55
3.1 Introduction………………..……………………………………………………………..55
3.2 Materials and Methods…………………………………………………………..…….…56
3.2.1 Metagenome Sequencing…………………..……………………………………....56
3.2.2 Metagenomic binning, annotation, bin refinement and estimation of completeness
and duplication..…………………………………………..……………………....57
3.2.3 Identification of catalytic domains……………………………...….………………57
3.2.4 Population analysis of variable sites……………..………………………………...58
3.2.5 Bin functionality variation……………………..…………………………………..58
3.2.6 Phylogenetic and proteorhodopsin marker trees………………………………...…58
3.3 Results and Discussions……………………………………………………………….…59
3.3.1 Sequencing, community structure, assembly and binning assessment…………….59
3.3.2 Spatial and temporal differentiation………………………………………………..60
3.3.3 Phylogenetic bin functionality…………….………………………………….……61
3.3.4 Comparative genomics……………..………………………………………...….....64
3.3.5 Population variation………………………………………………………………..66
3.4 Conclusion…………………………………...………………………………..…………68
3.5 Acknowledgements……………………………………………………………………....68
3.6 Abbreviations………………………………………………………………………...…..68
3.7 Sequence data………………………………………………………………………...…..69
3.8 References………………………………………………………………………...……...69
3.9 SI Materials and Methods………………………………………………………………..74
Chapter Four
Microbial communities associated with ferromanganese nodules and the surrounding
sediments………………………………………………………………………………………..98
4.1 Introduction………………..……………………………………………………………..98
4.2 Materials and Methods…………………………………………………………..……...100
4.2.1 Sample collection…………………..……………………..……………………....100
4.2.2 DNA extraction..…………………………………………..………...…………....100
4.2.3 Sample amplification and pyrosequencing……………………………...…..……100
4.2.4 Data analysis……………..…………………………………………………….....102
4.2.5 Phylogenetic tree construction……………………..……………………………..104
4.3 Results and Discussions……………………………………………………..……….…104
4.3.1 Alpha-diversity…………………………………………………………………...104
4.3.2 Beta-diversity……………………………………………………………………..106
4.3.3 Community composition…………….……………………………………………108
4.4 Conclusion……………………………………………………………………………...112
4.5 Acknowledgements………………………………………………………………….….113
4.6 References……………………………………………………………………………....113
iii
Chapter Five
Metagenomic analysis of the microbial communities associated with South Pacific Gyre
sediment and ferromanganese nodules………………………………………………………124
5.1 Introduction………………..……………………………………………………………124
5.1.1 Background…………………..……………………..………………………….....124
5.1.2 Previous results..…………………………………………..………...…………....125
5.1.3 Current research..…………………………………………..………...…………...127
5.2 Materials and Methods…………………………………………………………..……...127
5.2.1 Sample collection…………………..……………………..……………………....127
5.2.2 DNA extraction..…………………………………………..………...…………....128
5.2.3 DNA amplification, Sequencing, and Quality Control………………………...…128
5.2.4 Reconstruction and Taxonomy of Full-Length 16S rRNA Genes………….….....129
5.2.7 Sequence Assembly……………………..………………………………………..131
5.2.8 MG-RAST Annotations and Sequence Analysis………………………………....131
5.2.9 Phylogenomic Trees……………………………………………………………....133
5.2.10 Taxonomic Binning……………………………………………………………..133
5.3 Results and Discussions……………………………………………………..……….…134
5.3.1 Sequencing and Assembly Results…………………………………………..…...134
5.3.2 Full-length 16S rRNA Gene Analysis…………………………………..………..134
5.3.2.1 EMIRGE Output Statistics………………………………………………....134
5.3.2.2 Archaeal Phylogenetic Diversity…………………………………………...135
5.3.2.3 Bacterial Phylogenetic Diversity…………………………………………...136
5.3.3 Functional Assignments of Putative Coding Sequences…………….……………138
5.3.3.1 Putative Metal Biochemistry CDSs………………………………………...138
5.3.3.2 Phylogenomics of Putative Metal Biochemistry CDS……………………...140
5.3.3.3 Identification of Putative Magnetosome Genes…………………………….143
5.3.3.4 N & C Cycling……………………………………………………………...144
5.3.4 Putative Phylogenetic Bins……………………………………………………….147
5.3.4.1 Unsupervised Hierarchical Clustering……………………………………...147
5.3.4.2 Supervised Phylogenetic Binning…………………………………………..149
5.4 Concluding Remarks…………………………………………………………………....150
5.5 References……………………………………………………………………………....151
iv
List of Tables
Table 2.1. Sample location and sequencing effort…......………………………………………..37
Table 2.2. Sequence recruitment for genes related to the ammonia oxidation pathway………...38
Table 2.3. Supplemental 1. Sequence recruitment for genes related to the carbon fixation
pathway…………………………………………………………………………………………..39
Table 2.4. Supplemental 2. Location of gaps after reads are recruited to Ca. N. maritimus SCM1
genome…………………………………………………………………………………………...40
Table 2.5. Supplemental 3. Comparison of genes of interest between assembled scaffolds and
Ca. N. maritimus SCM1…………………………………………………………………………42
Table 3.1. Summary of the assembly data and the numbers of identified peptidases and glycoside
hydrolases (GH) from each phylogenetic bin…………………………………………………...78
Table 3.2. Data pertaining to the seasonality and spatial heterogeneity of sequences from each
bin. * Percentage (%) of the total number of sequences that compose all assemblies within each
bin by season. # Percent (%) abundance of sequences identified in each bin from corresponding
libraries or season. (No. of sequences from bin ÷ total sequences from library [or season])……79
Table 3.3. The number of peptidases and glycoside hydrolases identified within each
phylogenetic bin. Peptidases are broken down in to subcategories, based on predicted cellular
location. * Cell exterior peptidases, includes peptidases predicted to associated with the outer
membrane and extracellular. # Unique refers to a lack of significant similarity when searched
against the GenBank NR………………………………………………………………………...80
Table 3.4. Breakdown of variant data for each phylogenetic bin……………………………….81
Table 3.5. Supplemental 1. Site summary data for samples collected from the Gulf of Maine in
2006.…………………………………………………………………………………………......82
Table 3.6. Supplemental 2. Data pertaining to assembly efficiency, the number of predicted
coding sequences, and various values for the comparison of identified peptidases and glycoside
hydrolases, including the number of unique genes after 1) bin-to-bin comparisons, 2) comparison
to the pangenome, & 3) comparison to NR……………………………………………………..83
Table 3.7. Supplemental 3. Data pertaining to the seasonality and spatial heterogeneity of
sequences from each scaffold within each bin. “By season” is the percentage (%) of the total
number of sequences that compose all assemblies within each scaffold by season. “Perc.
Abundance of Reads by Season and Sampling Location” is the percent (%) abundance of
sequences identified in each bin from corresponding libraries or season. (No. of sequences from
bin ÷ total sequences from library [or season]) * denote scaffolds identified as outliers using
Iglewicz and Hoaglin (1993)……………………………………………………………………84
v
Table 3.8. Supplemental 4. Functions of interest used to examine features common in marine
microbes and Flavobacteria. + denotes the presence of genes representing that function. –
represents processes lacking identified genes…………………………………………………..85
Table 3.9. Supplemental 5. The seven genomes used in the pangenome……………………….86
Table 3.10. Supplemental 6. The genes identified in each bin with discernable variants used in
the dN/dS analysis. The name “hypothetical protein” is used multiple times and denotes the
annotation of the CDS and not a unique identifier.……………………………………………..87
Table 3.11. Supplemental 7. Values used as cutoffs for the various BLAST analyses, including a
brief description to link with the Methods and Material of the main manuscript………………89
Table 4.1. Sampling site locations and depths…………………………………………………100
Table 4.2. Sequence effort and diversity measures for each sample………………………..…105
Table 4.3. Putative phylogenetic assignments using the SILVA taxonomy, to the highest
discernable phylogenetic level………………………………………………………………….109
Table 4.4. Appendix Table 1. Beta-Diversity Signifcance Tests For Various Beta-Diversity
Calculators and Sample Designations (+ = statisitical significance)…………………………...117
Table 4.5. Appendix Table 2. AMOVA Results (+ = statistical significance)………………...119
Table 5.1. Results of Illumina Sequencing and IDBA-UD Metagenomic Assembly………….157
Table 5.2. Top 10 full-length 16S rRNA EMIRGE Results…………………………………...158
Table 5.3. Predicted Localization of Putative CDS with Possible Metal-related
Biochemistries………………………………………………………………………………….159
Table 5.4. Results of Unsupervised Binning on Sediment Contigs……………………………160
Table 5.5 Results of Supervised Binning of South Pacific Gyre Contigs……………………...161
vi
List of Figures
Figure 2.1. 3-Hydroxypropionate/4-Hydroxybutyrate cycle. Green boxes indicate genes with
reads that recruit at 30% amino acid identity. Yellow boxes indicate genes with reads that recruit
at 20%/< 30% amino acid identity. Red boxes indicate genes without recruited reads. Numbers
with each box correspond to Enzyme Commission (EC) numbers. Enzymes represented by each
box: (1) acetyl-CoA carboxylase; (2) malonyl-CoA reductase (NADPH); (3) malonate
semialdehyde reductase (NADPH); (4) 3-hydroxypropionyl-CoA synthetase (AMP-forming); (5)
3-hydroxypropionyl-CoA dehydratase; (6) acryloyl-CoA reductase (NADPH); (7) propionyl-
CoA carboxylase; (8) methylmalonyl-CoA epimerase; (9) methylmalonyl-CoA mutase; (10)
succinyl-CoA reductase (NADPH); (11) succinate semialdehyde reductase (NADPH); (12) 4-
hydroxybutyryl-CoA synthetase (AMP-forming); (13) 4-hydroxybutyryl-CoA dehydratase; (14)
crotonyl-CoA hydratase; (15) 3-hydroxybutyryl-CoA dehydrogenase (NAD+); (16) acetoacetyl-
CoA b-ketothiolase………………………………………………………………………………45
Figure 2.2. Recruitment plot of GOM metagenome and GOS Phase 1 metagenomes against the
‘Ca. N. maritimus SCM1’ reference genome. Red boxes indicate integrase regions (IR), genomic
islands with an annotated integrase gene. Black boxes indicate large islands present in both
GOM and GOS. Purple coloured reads are from the various GOM libraries. GOS metagenomes
are represented by all other colours……………………………………………………………..46
Figure 2.3. Representation of integrase region 1 (IR1). Orange dashed lines indicate amino acid
identity between genes > 30% identity. Black dashed lines indicate syntenic genes with the same
PFAM assignment. Black arrows are hypothetical genes………………………………………47
Figure 2.4. Maximum likelihood tree (bootstrap: 1000) of a 422 bp region of the amoA gene
from 16 GOM sequences (ID starting 110-), environmental sequences obtained from the
GenBank nt database (black; Accession No.), sequences from ‘water column group A’ (Francis
et al., 2005) (blue; Accession No.), and sequences from ‘water column group B’ (Francis et al.,
2005) (purple; Accession No.)………………………………………………………………….48
Figure 2.5. Maximum likelihood tree (bootstrap: 1000) of a 388 bp region of the methylmalonyl-
CoA synthetase gene from 15 GOM sequences (ID starting 110-) and environmental sequences
obtained from the CAMERA database (CAMERA ID No.)……………………………………49
Figure 2.6 MAUVE alignments of environmental ‘Scaffold A’ and ‘Scaffold B’ compared with
‘Ca. N. maritimus SCM1’. White bars indicate annotated genes. Black bars represent internal
sequencing gaps. Orange bars indicate genes unique to the scaffold. Colours on horizontal axis
illustrate aligned regions………………………………………………………………………..50
Figure 2.7. MAUVE alignments of environmental ‘Scaffold C’ and ‘Scaffold D’ compared with
‘Ca. N. maritimus SCM1’. White bars indicate annotated genes. Green bar indicates tRNA
sequence. Colours on horizontal axis illustrate aligned regions………………………………..51
Figure 2.8. MAUVE alignments of environmental ‘Scaffold E’ and ‘Scaffold F’ compared with
‘Ca. N. maritimus SCM1’. White bars indicate annotated genes. Black bars represent internal
vii
sequencing gaps. Orange bars indicate genes unique to the scaffold. Turquoise bars indicate
genes in the urease operon. Colours on the horizontal axis illustrate aligned regions………….52
Figure 2.9. Supplemental 1. Maximum likelihood tree (bootstrap: 100) of a 403 bp region of the
4-OH-butyryl-CoA gene from 19 GOM sequences (110XXXXXXXXXX) and 2 environmental
sequences obtained from the CAMERA database (GOS ID #)…………………………………53
Figure 2.10. Supplemental 2. Maximum likelihood tree (bootstrap: 100) of a 419 bp region of
the radA gene from 21 GOM sequences (110XXXXXXXXXX)………………………………54
Figure 3.1. Maximum likelihood tree generated using PHYML (bootstrap: 1,000) of DNA
primase (DnaG) identified in the four phylogenetic bins and other members of the Class
Flavobacteriaceae (489 amino acids). Only bootstrap values greater than 50% are shown……90
Figure 3.2. Gene synteny map of a ~21 kbp homologous region identified using the
progressiveMauve aligner. Annotations were generated using the RAST service. Arrows indicate
putative CDS and the direction indicates strand orientation. CDS are color-coded to indicate
genes common between phylogenetic bins (green = identified in all 4 bins; black = only present
in corresponding bin; purple = identified in FlavI and FlavG; grey = identified in FlavH, FlavI,
and FlavG; orange = identified in FlavA, FlavH, and FlavI; blue = identified in FlavA and
FlavH)……………………………………………………………………………………………91
Figure 3.3. Gene synteny map of a ~64 kbp homologous region identified using the
progressiveMauve aligner. Annotations were generated using the RAST service. Arrows indicate
putative CDS, the direction of which indicates strand orientation. CDS are color-coded to
indicate genes common between phylogenetic bins (green = identified in all 4 bins; black = only
present in corresponding bin; purple = identified in FlavI and FlavG; yellow = identified in
FlavA, FlavG, and FlavH. Red asterisks denote putative CDS with identified glycoside hydrolase
domains.)…………………………………………………………………………………………92
Figure 3.4. Diversity of the underlying reads for putative CDS annotated as phospho-N-
acetylmuramoyl-pentapeptide transferase. Each Variant has a final protein sequence (grey bars).
Below are the nucleotide sequences used to reconstruct the protein. The nucleotide sequences are
color-coded based on season and sampling location (yellow colors = summer; blue colors =
winter; gold = GOM13; yellow = GOM12; blue-green = GOM04; blue = GOM06). Unique
sequence reads were aligned to the assembled nucleotide sequence using ClustalW. The colors
represent SNP locations in relation to the assembled nucleotide sequence (green = A; blue = C;
black = G; red = T). The three protein variants were aligned to each other using ClustalW. Colors
along the length of the protein represent amino acid changes………………………………......93
Figure 3.5. dN/dS ratio plotted against dS. dN/dS and dS values were calculated by comparing
the identified proteins and nucleotide sequences for each variant pair using the PAL2NAL
software and the codeml program within PAML. dS values are a sufficient indicator of time
since divergence, with lower values suggesting less time since divergence. A) Display all
variants and corresponding dS and dN/dS values. B) Enlargement of the indicated section of A.
C) Enlargement of the indicated section of B.……………………………….............................94
viii
Figure 3.6. Supplemental 1. MODIS satellite data showing the surface chlorophyll-a levels in
pre-bloom conditions, averaged over the January (winter) and August (summer) of 2006. The
locations of the sampling sites are noted..………………………………....................................95
Figure 3.7. Supplemental 2. Number of identified 16S rRNA gene fragments from the Gulf of
Maine metagenome from each sampling season for several phylum level groups in the Bacteria
and Archaea, and the class level groups within the Proteobacteria (plotted on a different scale).
Numbers above each bar indicate the percent of the total sequences each bar represents for each
season)…………………………………………………………………….………….………….96
Figure 3.8. Supplemental 3. Maximum likelihood tree generated using PHYML (bootstrap:
1,000) of: A) DNA helicase (RecG), B) DnaG-FusA, C) DnaG-DnaE-RecG, D) DnaG-RpoC,
and E) proteorhodopsin. These genes were identified in the four phylogenetic bins and other
members of the Class Flavobacteriaceae. Only bootstrap values greater than 50% are shown...97
Figure 4.1. Dendrogram representing the community similarity of the 20 different sediment and
nodule samples using the !
YC
calculator in the mothur package. Samples are classified by sample
site and source…………………………………………………………………………………..120
Figure 4.2. Three-dimensional plot of the 20 different sediment and nodule samples using the
!
YC
calculator. Sediment = purple ; SPG2 = red; SPG9 = green, SPG10 = blue. Stress: 0.1055.
R
2
: 0.933……………………………………………………………………………………..…121
Figure 4.3. Rank abundance curves for the 30 most abundant OTUs (by total sequences
assigned) for the (a) sediment samples and the nodules from (b) SPG2, (c) SPG9, and (d)
SPG10. The OTU assignments are the same for each graph. It should be noted that Y-axis scale
for each graph is different to capture the correct values for each sample set…………………..122
Figure 4.4. Maximum likelihood phylogenetic tree constructed using 16S rRNA gene sequences
(289 bp) covering the V4 hypervariable region from representatives of each OTU putatively
assigned as belonging to the Thaumarchaea MG-1 group (Sequence ID OTU No.|No. of total
sequences assigned to OTU), sequences from Durbin and Teske (2010) (Accession No.), and
sequenced members of the Phylum. Colored circles correspond to the samples represented by the
OTU (at least one subsample must have > 0.5% abundance). In fixed order: sediment = purple,
SPG2 = red, SPG9 = green, SPG10 = blue. Scale bar: 0.02 changes per site. Bootstraps:
1,000………………………………………………………………………………………….....123
Figure 5.1. Maximum likelihood phylogenetic tree of the sediment and nodule archaeal 16S
rRNA sequences generated by EMIRGE. Black circles ("): uncultured 16S rRNA sequence from
SILVA database. Blue circles ("): sediment 16S rRNA sequences. Red circle ("): nodule 16S
rRNA sequences. Alignment length = 783 bp. 1,000 bootstraps. Only bootstrap values #50 are
shown…………………………………………………………………………………………...162
Figure 5.2. Maximum likelihood phylogenetic tree of the sediment bacterial 16S rRNA
sequences generated by EMIRGE. Blue circles ("): sediment 16S rRNA sequences. Tree tips
ix
without labels are uncultured 16S rRNA sequence from SILVA. 1,000 bootstraps. (A) 64 16S
rRNA sequences, from 10 phylogenetic groups. Alignment length = 737 bp. (B) 57 16S rRNA
sequences, from 10 phylogenetic groups. Alignment length = 809 bp.(C) Proteobacterial tree.
Alignment length = 1,162 bp. Only bootstrap values !50 are shown………………………….163
Figure 5.3. Maximum likelihood phylogenetic tree of the nodule bacterial 16S rRNA sequences
generated by EMIRGE. Red circle ("): nodule 16S rRNA sequences. Tree tips without labels are
uncultured 16S rRNA sequence from SILVA. Alignment length = 867 bp. 1,000 bootstraps.
Only bootstrap values !50 are shown…………………………………………………………..166
Figure 5.4. Maximum likelihood phylogenomic tree of putative CDS with homology determined
by HMMER3 to molybdopterin oxidoreductases (MOBs) from Fe
2+
oxidizing organisms. Blue
circles ("): sediment sequences. Red circle ("): nodule sequences. 1,000 bootstraps. Only
bootstrap values !50 are shown. Abbr.: TTR = tetrathionate reductase; FDH - A = formate
dehydrogenase (FdhA); FDH - G = formate dehydrogenase (FdnG); PSR = polysulfide
reductase; NAR = nitrate reductase; DSM = dimethylsulfide reductase; ACIII = alternative
complex III. Bord par = Bordetella parapertussis; Vibr par = Vibrio parahemolyticus; Salm typ
= Salmonella typhimurium; Hyph MC1 = Hypomicrobium sp. MC1; Shew one = Shewanella
oneidensis MR-1; Mari ELB17 = Marinobacter sp. ELB17; Rose 217 = Roseovarius sp. 217;
Phae gal = Phaeobacter gallaeciensis 2.10; Mari pro = Mariprofundus ferrooxydans PV-1; Geo
M18, uri, met = Geobacter sp. M18, G. uraniireducens, G. metallireducens; Salm ent =
Salmonella enterica; Geo lov = Geobacter lovleyi; Esch col = Escherichia coli; Baci art =
Bacillus anthracis; Rhod mar = Rhodothermus marinus; Myxo xan = Myxococcus xanthus; Chlo
aur = Chloroflexus auranticas; Lept orc = Leptothrix ochracea; Side lit = Sideroxydans
lithotrophicus ES-1; Gall cap = Gallionella capsiferriformans………………………………..167
Figure 5.5. Maximum likelihood phylogenomic tree of putative CDS with homology determined
by HMMER3 to fungal laccases (MCOs). Blue circles ("): sediment sequences. Red circle ("):
nodule sequences. Tree tips without labels are database sequences used to anchor either fungal
laccase activity or MCOs related to Cu resistance. Blank 1,000 bootstraps. Only bootstrap values
!50 are shown. Accession no. represent fungal laccases that fall outside of the fungal laccase
group. Abbr.: MoxA1 & MoxA2 = MCOs from Aurantimonas sp. SI85-9A1; Mox = MCO from
Pedomicrobium sp. ACM 3067; Xanth euv = putative Cu resistance gene from Xanthomonas
euvesicatoria; CumA = MCO from Pseudomonas putida MnxG = MCO from Bacillus sp. SG-1;
CueO = Cu resistance gene from Escherichia coli……………………………………………..168
Figure 5.6. Schematic of possible Nitrogen cycling mechanism identified for putative CDSs
within the South Pacific Gyre metagenome samples. Red X (X): gene in the denitrification
pathway without similarity to putative CDSs. Purple names: Lowest taxonomic name assigned to
putative CDS based on top hits in NCBI nr. Blue numbers: no. of sediment sequences. Red
numbers: no. of nodule sequences. Abbr.: AOA = ammonia oxidizing Archaea; AOB = ammonia
oxidizing Bacteria; amoA = ammonia monooxygenase; narGH = nitrate reductase, alpha and
beta subunits; nirK = nitrite reductase; norB = nitric oxide reductase; nosZ = nitrous oxide
reductase. PON = particulate organic nitrogen…………………………………………………169
1
Chapter One
Introduction, Background, and Context of Microbial Environmental Genomics
1.1. Environmental Microbiology
Microorganisms play an enormous role in fundamentally shaping the world around us.
All of the biologically relevant elements on the planet cycle through the biochemical processes
and metabolisms of the Bacteria and Archaea. Microorganisms are responsible for a significant
portion of oxygen production, carbon fixation, and nitrogen fixation on the planet and are
ubiquitous in every environment. Moving from an anthropocentric view of microbiology,
modern environmental microbiology explores the composition and physiology of all
environments that, while including the human body, encompasses everything between the
deepest oceans and highest points of the atmosphere. Studying microbial and environmental
interactions has increased our understanding of Earth’s life history, and generating countless new
branches of research and sources of biotechnology.
The transition from human and medicinal-related microbiology to environmental
microbiology was limited by the available technologies. The most effective ways of studying
microbes was through culturing isolates randomly from the environment or through measuring
processes that occurred in the environments. The discoveries were huge; researchers developed
genetically tractable populations of Escherichia coli that expanded our understanding of cellular
biology and discovered carbon-fixing organisms removed from the photosynthetic world and
extended the limits of life. But these techniques had their limits and it is generally accepted that
less than 1% of the total microbial diversity has been categorized through these methods.
The advent of molecular methods has allowed microbiologists to explore the expanse of
microbial diversity to a degree previously impossible. The diversity is immense. Starting from
11-documented bacterial phyla in 1987 (Woese, 1987), current numbers estimate that there are
over a 100 bacterial phyla, many of which lack cultivable representatives (Schloss and
Handelsman, 2004). The exploration of the diversity of these organisms still continues with the
use of classic microbiological techniques, i.e. culturing, metabolic testing and genetic studies,
2
but advances in genomic techniques have produced a revolution in understanding and elucidating
the hidden diversity within the microbial world.
1.2. Environmental Genomics
Environmental genomics is the direct sequencing of genomic material from the
environment without isolation of distinct microbial populations via traditional methods
(culturing). This term encompasses several different techniques that include: 1) single-cell
microfluidic sorting and genomic amplification, 2) metagenomics, the shotgun sequencing of the
total environmental microbial DNA (Chapters 2, 3, & 5), and 3) amplification, sequencing, and
analysis of the 16S rRNA gene (or other phylogenetic or functional marker) (Chapter4). While
the analysis of 16S rRNA amplicon data is ideal for understanding microbial diversity at an
operational level and useful in comparing the community composition of different environments,
it fails at answering questions about the potential underlying genomic function of a microbial
community. Single-cell genomics is a new technology that relies on the amplification of 1-10
identical cells and can elucidate the genome of single organisms with varying degrees of success
(Stepanauskas, 2012). Metagenomics has been used to explore almost all types of biomes with
varying degrees of success (Abulencia et al., 2006; Debroas et al., 2009; Grzymski et al., 2008;
Kunin et al., 2008; Morgan et al., 2010; Palenik et al., 2009; Yokouchi et al., 2006), but offers a
whole community snapshot that has yet to be achieved with single-cell genomics and is not
possible with 16S rRNA amplicon studies.
Metagenomics is an interdisciplinary field of research relying on microbial genetics,
genomics, and proteomics to generate the relevant biological data to analyze and compare the
environmentally generated sequence data using bioinformatics and computational biology.
Directly sequencing from the environment helps to alleviate the limitations that accompany
traditional culturing techniques by capturing genomic material from all the present organisms,
not exclusively organisms that can grow in artificial environments. Metagenomics can be, and
has been, used as a hypothesis-generating technique. For many of the environments
metagenomics has been applied to, microbiologists lacked the essential knowledge to understand
the microbial processes in a system. The complexity of microbial interactions and the
biochemical pathways they utilize further confounds the ability to tease apart dominant or
3
keystone processes making it difficult to predict which organisms are the most important for
understanding the complete system.
Metagenomic research has predominantly come in one of two varieties, “whole
community functional analysis” or “environmental genome reconstruction”. Whole community
functional analysis constitutes many of the results from the early studies using metagenomics
and is currently widely used to characterize environmental microbial communities using low
levels of sequencing efforts (Bodaker et al., 2009; Deleon-Rodriguez et al.; Fierer et al.;
Stepanauskas, 2012; Tringe et al., 2005; Wilkins et al.). The first effort of sequencing from the
environment in the Sargasso Sea revealed complex functional diversity and revealed a previously
unknown level of diversity and abundance of bacterial proteorhodopsins (Venter et al., 2004).
Similar research has been performed across the ocean with the Global Ocean Survey (Rusch et
al., 2007), used to compare microbial function across nine diverse biomes (Dinsdale et al.,
2008), and explore anaerobic deep-sea sediments (Biddle et al., 2008). These studies have
illustrated how little is known about genomic potential. Many of the recovered putative coding
sequences have never been characterized before and, as such, their putative functions may have
unknowable with an unforeseeable impact. We have begun to understand the redundancy that
occurs in microbial communities, where the community composition can vary sharply, but the
underlying functions and biochemical pathways remain constant, providing stability in
biogeochemical processes across spatial and temporal scales (Rodriguez-Brito et al., 2010).
Successful whole community functional analyses can provide novel insight into microbial
processes of environments for which the microbial community has not been studied and help to
generate clear, concise hypotheses for future studies.
Environmental genome reconstruction has been a primary goal of metagenomics since its
inception. Unlike 16S amplicon studies, which can only inform us about the identity of present
organisms, and whole community functional studies, which can provide a total view of function
with limited identity, environmental genome reconstruction allows researchers to link
phylogenies with potential physiologies. Generally, these studies require either, a) simple
microbial systems (Tyson et al., 2004), or b) extensive sequencing and computational analysis
(Hess et al.). The reconstruction of environmental genomes has become more common in
4
published research, but few metagenomic projects have achieved high degrees of success. Again,
the first success of this nature came from the Sargasso Sea and demonstrated for the first time
that ammonia monooxygenase linked with an archaeal phylogenetic marker (Venter et al., 2004).
This discovery has since shaped our understanding of the Archaea and provided insight into the
physiology of one of the most abundant microbial groups on the planet. Dozens of near-complete
genomes have been generated from the complex community within the cow rumen (Hess et al.)
and a complete genome has provided functional identity to the previously uncultivated Group II
Euryarchaea (Iverson et al., 2012). Sequencing to the necessary depth, previously determined by
cost, is no longer the limiting step. Understanding how the microbial sequences govern the
results of assembly and binning techniques has become the limiting step in reaching significant
genome reconstruction, as none of the available computational processes are directly
interchangeable. As it becomes easier to produce multiple high quality genomes from
metagenomic datasets, the emphasis will shift from results that explore the most basic questions
about the microbial community towards asking relevant biological questions.
1.3. Future Directions
The future of environmental genomics and metagenomics is dependent on researchers
addressing biologically relevant questions. Three major areas for hypothesis development and
testing involve different levels of analysis.
(1) Analysis at the whole community genomics level will involve the successful
reconstruction of multiple (if not all) of the dominant organisms in an environment. The
questions at this level address how microorganisms interact with different species. There have
been many examples from environmental genomics and enrichment experiments illustrating tight
coupling between organisms from different phylogenetic classes and domains. For the
phototrophic "Chlorochromatium aggregatum" (Overmann, 2010) and anaerobic methane
oxidation consortia (Pernthaler et al., 2008), genomic evidence suggests that each partner has
undergone selective pressure in their respective genomes to remove redundant processes. Such
tight coupling of microbial organisms offers a complex mutli-species view of evolutionary
pressures that requests a more robust understanding. These mutualistic relationships may be
much more common than previously thought. Further, how these mutualistic relationships
5
balance with the competitive niche selective pressures is unknown and drastically influences our
understanding of microbial community structure.
(2) Analysis at the population level will involve the reconstruction of multiple closely-
related organisms at the same species classification. Questions should be structured to try and
answer what constitutes a true microbial species. Microbial genomics has revealed the high
degree of gene content dissimilarity between organisms within the same species. And just as in
E. coli, where gene content differentiates between pathogenic and non-pathogenic strains (Perna
et al., 2001), so to in the environment could significant functional differences be present in
organisms with similar 16S rRNA genes and DNA hybridization levels. There is a copious
amount of evidence to illustrate this complexity and species-level functional variation, with
specific examples from the environmental genomes generated from the Gulf of Maine (GoMA)
within the Thaumarchaea and Flavobacteriaceae (Chapter 2 & 3). An understanding of how
much gene content varies between microbial species is extremely relevant in terms of shaping
conclusions from 16S rRNA amplicon studies. Conclusions drawn from nearest phylogenetic
neighbors may prove useless if the environments are fundamentally different with entirely
unique selective pressures.
To further confound this understanding of microbial species is the unknown rate of
horizontal gene transfer events and gene gain/loss in variable genetic islands. The GoMA
Flavobacteriaceae provide a clear example of closely related organisms with highly conserved
genomic synteny that also contain variable genetic islands with ecologically relevant putative
coding sequences (Chapter 3). Future studies that try to address questions could look at closed,
semi-closed, or in situ environments overtime to understand how often gene transfer, gain and
loss events occur and how fundamentally they effect the community function.
(3) Analysis at an even finer scale is possible using metagenomic techniques. These
questions begin to address the evolutionary pressures of individual genes and operons within,
what is currently classified as, subspecies. The flexible nature of a microorganism’s genome
implies that genes with disparate evolutionary histories from the rest of the genome are
potentially being inserted and removed continually. Using measures of selective pressure from
the GoMA Flavobacteriaceae, it was possible to elucidate genes with potential varying
6
evolutionary histories (Chapter 3). As much as gene transfer events play a role determining
minute differences between organisms, further variation could occur with selective pressures on
individual genes. Nonsynonymous mutations in coding regions can have unknown effects on
gene specificity and activity. For many point mutations within a gene, the effects may only be
slightly deleterious or beneficial and it is only over multiple generations that selection acts on
these variations. As multiple variations accumulate, the direct impact such changes have on the
environment and microbial community is completely unknown, but even strains with identical
genomic content can have niche-partitioning mutations within individual genes (Connor et al.).
This aspect of microbiology has never been performed for uncultivated organisms and thus our
understanding is limited to organisms with different selective pressures.
Analyzing current metagenomics datasets with these types of questions in mind is an
important first step in applying these questions to future, possibly more ideal datasets and allows
for the development of tools and techniques while total sequencing output is still relatively
manageable.
Metagenomics can be used in conjunction with a number of developing technologies and
methodologies to further characterize microbial communities. There is an ever-growing number
of “–omic” related techniques, including (meta)transcriptomics, (meta)proteomics, and
(meta)metabolomics. Each has been used successfully with metagenomics to map expression and
phenotypic information back to gene content and putative function. Other complimentary
methodologies include environmental genomics through single-cell genomics and isolation of
nucleotides from target organisms performing a specific process using stable isotope probing
(SIP) techniques. Single-cell genomics allows specific organisms or groups of organisms to be
targeted based on a desired trait, such as, size/back scatter profile, natural fluorescence, 16S
rRNA sequences, or functional gene sequences (Stepanauskas, 2012). It has a number of
deficiencies including incomplete reconstruction of environmental genomic sequences, but could
be used to generate simplified metagenomic datasets where the number of organisms for each
group is known and assembly algorithms are be more efficient. SIP techniques can be used to
target organisms performing specific functions by monitoring the incorporation of labeled
substrates in the genomic material (Friedrich, 2006). Such experiments could used to better
7
understand the organisms that play key roles in organic matter recycling, nitrogen fixation, and
carbon fixation in the environment.
1.4. Thesis Chapter Titles and Abstracts
The chapters of this thesis explore environmental microbiology using environmental
genomic techniques. Chapters 2 and 3 utilize metagenomics to reconstruct environmental
genomes of organisms from the phylum Thaumarchaea and the class Flavobacteriaceae.
Chapter 4 uses 16S rRNA gene amplification methodologies to explore the microbial
communities associated with deep-sea ferromanganese nodules and the neighboring sediment
communities from the South Pacific Gyre. Finally, Chapter 5 applies whole community
functional analysis to the microbial communities of a single nodule and the corresponding
sediments for a first examination of potential metabolic pathways involved in nodule formation
and maintenance and other globally relevant biogeochemical cycles.
1.4.1. Chapter 2 – Metagenomic analysis of a complex marine planktonic thaumarchaeal
community from the Gulf of Maine
Thaumarchaea, which represent as much as 20% of prokaryotic biomass in the open
ocean, have been linked to environmentally relevant biogeochemical processes, such as ammonia
oxidation (nitrification) and inorganic carbon fixation. We have used culture- independent
methods to study this group because current cultivation limitations have proved a hindrance in
studying these organisms. From a metagenomic dataset obtained from surface waters from the
Gulf of Maine, we have identified 36 111 sequence reads (containing 30 Mbp) likely derived
from environmental planktonic Thaumarchaea. Metabolic analysis of the raw sequences and
assemblies identified copies of the catalytic subunit required in aerobic ammonia oxidation. In
addition, genes that comprise a nearly complete carbon assimilation pathway in the form of the
3-hyroxypropionate/4-hydroxybutyrate cycle were identified. Comparative genomics contrasting
the putative environmental thaumarchaeal sequences and ‘Candidatus Nitrosopumilus maritimus
SCM1’ revealed a number of genomic islands absent in the Gulf of Maine population. Analysis
of these genomic islands revealed an integrase-associated island also found in distantly-related
microbial species, variations in the abundance of genes predicted to be important in
8
thaumarchaeal respiratory chain, and the absence of a high-affinity phosphate uptake operon.
Analysis of the underlying sequence diversity suggests the presence of at least two dominant
environmental populations. Attempts to assemble complete environmental genomes were
unsuccessful, but analysis of scaffolds revealed two diverging populations, including a
thaumarchaeal-related scaffold with a full urease operon. Ultimately, the analysis revealed a
number of insights into the metabolic potential of a predominantly uncultivated lineage of
organisms. The predicted functions in the thaumarchaeal metagenomic sequences are directly
supported by historic measurements of nutrient concentrations and provide new avenues of
research in regards to understanding the role Thaumarchaea play in the environment.
1.4.2. Chapter 3 – Comparative genomics of planktonic Flavobacteriaceae from the Gulf of
Maine using environmentally derived genomes
The Gulf of Maine is an important biological province of the Northwest Atlantic with
high productivity year round. We used a Sanger-based metagenome to assemble and explore the
environmental genomes of uncultured members of the class Flavobacteria. Each of the
environmental genomes represents organisms that compose less than 1% of the total microbial
metagenome. Analysis was performed at the sub-population level to tease apart both spatial and
temporal genetic heterogeneity within the four different populations, revealing gene variants with
disparate presence within the total community. Comparative genomics has revealed potentially
important niche partitioning genomic variations, including iron transporters and genes associated
with cell attachment and polymer degradation. Analysis of large syntenic regions helped to
reveal potential, ecologically relevant metabolisms in the Gulf of Maine, such as the urea cycle,
and identify possible sites for the incorporation of novel exogenous genes from the environment.
1.4.3. Chapter 4 – Microbial communities associated with ferromanganese nodules and the
surrounding sediments
The formation and maintenance of deep-sea ferromanganese/polymetallic nodules still
remains a mystery 140 years after their discovery. The wealth of rare metals concentrated in
these nodules has spurred global interest in exploring the mining potential of these resources.
9
The prevailing theory of abiotic formation has been called into question and the role of microbial
metabolisms in nodule development is now an area of active research. To understand the
community structure of microbes associated with nodules and their surrounding sediment, we
performed targeted sequencing of the V4 hypervariable region of the 16S rRNA gene from three
nodules collected from the central South Pacific. Results have shown that the microbial
communities of the nodules are significantly distinct from the communities in the surrounding
sediments, and that the interiors of the nodules harbor communities different from the exterior.
This suggests not only differences in potential metabolisms between the nodule and sediment
communities, but also differences in the dominant metabolisms of interior and exterior
communities. We identified several operational taxonomic units (OTUs) unique to both the
nodule and sediment environments. The identified OTUs were assigned putative taxonomic
identifications, including two OTUs only found associated with the nodules, which were
assigned to the !-Proteobacteria. Finally, we explored the diversity of the most assigned
taxonomic group, the Thaumarchaea MG-1, which revealed novel OTUs compared to previous
research from the region and suggests a potential role as a source of fixed carbon for ammonia
oxidizing archaea in the environment.
1.4.4. Chapter 5 – Metagenomic analysis of microbial communities from the South Pacific
Gyre reveals putative mechanisms for biological generation of ferromanganese nodules and
complex biogeochemical interactions
Oligotrophic ocean gyres compose ~50% of the marine environment. The South Pacific
Gyre (SPG) is the most oligotrophic of all the central gyres. The oligotrophic surface waters
directly correlate to deep-sea sediment with low organic carbon content. Low microbial biomass
(<10
6
cells·cm
-3
) and aerobic sediment conditions provide evidence to support the hypothesis that
microbial activity is minimal in the SPG sediments due to energy limitation. Like the sediment
of other central ocean gyres, the SPG has an abundance of ferromanganese (FeMn) nodules
covering up to 70% of the exposed sediment surface. Limited research has been performed on
understanding the microbial community composition associated with FeMn nodules and
oligotrophic sediments, and no research has been performed on potential community function. In
order to understand potential community function, a metagenomic dataset was generated from
10
DNA extracted from a nodule and sediment sample from the SPG water-sediment interface. This
research constitutes the first dataset to examine the microbial community function of both FeMn
nodules and oligotrophic sediments. Analysis was performed in an attempt to understand the
interaction between microorganisms and FeMn nodules, as Fe and Mn related metabolic
reactions could provide an alternate energy source in this energy-limited environment. Results
reveal the presence of several putative coding sequences with homology to known and putative
metal reactive enzymes, including multicopper oxidases, peroxidases, multiheme c-type
cytochromes, and molybdopterin oxidoreductases. The evidence suggests that the microbial
community may have the metabolic potential to interact with FeMn nodules. Further, analysis
was performed to better understand the carbon and nitrogen cycles within the SPG environment.
Microbial community gene content reveals the presence of lithoautotrophic metabolisms from
organisms related to the MG-1 Thaumarchaea and the Nitrosomonadales within the "-
Proteobacteria. Both groups of organisms are capable of aerobic ammonia oxidation and carbon
fixation. Lastly, genomic evidence reveals several anaerobic metabolisms, including
denitrification. These results suggest that SPG sediments may have anaerobic microniches, a trait
common in other aerobic environments. The identified denitrification pathway lacks the gene
necessary to convert nitrous oxide to dinitrogen gas, suggesting that the SPG sediment could be a
source of nitrous oxide gas, as potent greenhouse gas. Collectively, this research provides a
number of possible avenues for further study of the microbial community associated with FeMn
nodules and oligotrophic sediments.
1.5. References
Abulencia CB, Wyborski DL, Garcia JA, Podar M, Chen W, Chang SH et al (2006).
Environmental whole-genome amplification to access microbial populations in contaminated
sediments. Appl Enviro Micro 72: 3291-3301.
Biddle JF, Fitz-Gibbon S, Schuster SC, Brenchley JE, House CH (2008). Metagenomic
signatures of the Peru Margin subseafloor biosphere show a genetically distinct environment.
Proc Nat Acad Sci USA 105: 10583-10588.
Bodaker I, Sharon I, Suzuki MT, Feingersch R, Shmoish M, Andreishcheva E et al (2009).
Comparative community genomics in the Dead Sea: an increasingly extreme environment.
ISMEJ 4: 399-407.
11
Connor N, Sikorski J, Rooney A, Kopac S, Koeppel A, Burger A et al Ecology of Speciation in
the Genus Bacillus. Appl Enviro Micro 76: 1349-1358.
Debroas D, Humbert J, Enault F, Bronner G, Faubladier M, Cornillot E (2009). Metagenomic
approach studying the taxonomic and functional diversity of the bacterial community in a
mesotrophic lake (Lac du Bourget–France). Enviro Microbiol Rep 11: 2412-2424.
Deleon-Rodriguez N, Lathem T, Rodriguez-R L, Barazesh J, Anderson B, Beyersdorf A et al
(2013). Microbiome of the upper troposphere: Species composition and prevalence, effects of
tropical storms, and atmospheric implications. Proc Nat Acad Sci USA 110: 2575-2580.
Dinsdale E, Edwards R, Hall D, Angly F, Breitbart M, Brulc J et al (2008). Functional
metagenomic profiling of nine biomes. Nature 452: 629-632.
Fierer N, Leff J, Adams B, Nielsen U, Bates S, Lauber C et al (2012). Cross-biome metagenomic
analyses of soil microbial communities and their functional attributes. Proc Nat Acad Sci USA
109: 21390-21395.
Friedrich MW (2006). Stable-isotope probing of DNA: insights into the function of uncultivated
microorganisms from isotopically labeled metagenomes. Curr opin Biotech 17: 59-66.
Grzymski JJ, Murray AE, Campbell BJ, Kaplarevic M, Gao GR, Lee C et al (2008).
Metagenome analysis of an extreme microbial symbiosis reveals eurythermal adaptation and
metabolic flexibility. Proc Nat Acad Sci USA 105: 17516-17521.
Hess M, Sczyrba A, Egan R, Kim T, Chokhawala H, Schroth G et al Metagenomic Discovery of
Biomass-Degrading Genes and Genomes from Cow Rumen. Science 331: 463-467.
Iverson V, Morris R, Frazar C, Berthiaume C, Morales R, Armbrust E (2012). Untangling
Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science
335: 587-590.
Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P (2008). A Bioinformatician's
Guide to Metagenomics. Microbiol Mol Biol Rev 72: 557-578.
Morgan J, Darling A, Eisen J (2010). Metagenomic sequencing of an in vitro-simulated
microbial community. PLoS One 5: e10209.
Overmann J (2010). The phototrophic consortium "Chlorochromatium aggregatum" - a model
for bacterial heterologous multicellularity. Adv Exp Med Biol 675: 15-29.
Palenik B, Ren Q, Tai V, Paulsen I (2009). Coastal Synechococcus metagenome reveals major
roles for horizontal gene transfer and plasmids in population diversity. Environ Microbiol Rep
11: 349-359.
12
Perna NT, Plunkett G, Burland V, Mau B, Glasner JD, Rose DJ et al (2001). Genome sequence
of enterohaemorrhagic Escherichia coli O157: H7. Nature 409: 529-533.
Pernthaler A, Dekas AE, Brown CT, Goffredi SK, Embaye T, Orphan VJ (2008). Diverse
syntrophic partnerships from-deep-sea methane vents revealed by direct cell capture and
metagenomics. Proc Nat Acad Sci USA 105: 7052-7057.
Rodriguez-Brito B, Li L, Wegley L, Furlan M, Angly F, Breitbart M et al (2010). Viral and
microbial community dynamics in four aquatic environments. ISMEJ 4: 739-751.
Rusch D, Halpern AL, Sutton G, Heidelberg K, Williamson S (2007). The Sorcerer II Global
Ocean Sampling expedition: Northwest Atlantic through eastern. PLoS Biol 5: e77.
Schloss P, Handelsman J (2004). Status of the Microbial Census. Microbiol Mol Biol Rev 68:
686-691.
Stepanauskas R (2012). Single cell genomics: an individual look at microbes. Curr Opin Micro
15: 1-8.
Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW et al (2005).
Comparative metagenomics of microbial communities. Science 308: 554-557.
Tyson G, Chapman J, Hugenholtz P, Allen E, Ram R, Richardson P et al (2004). Community
structure and metabolism through reconstruction of microbial genomes from the environment.
Nature 428: 37-43.
Venter JC, Remington K, Heidelberg J, Halpern AL, Rusch D, Eisen J et al (2004).
Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
Wilkins D, Lauro F, Williams T, Demaere M, Brown M, Hoffman J et al (2012). Biogeographic
partitioning of Southern Ocean microorganisms revealed by metagenomics. Enviro Micro 15:
1318-1333.
Woese CR (1987). Bacterial Evolution. Microbiol Rev 51: 221-271.
Yokouchi H, Fukuoka Y, Mukoyama D, Calugay R, Takeyama H, Matsunaga T (2006). Whole-
metagenome amplification of a microbial community associated with scleractinian coral by
multiple displacement amplification using phi29 polymerase. Environ Microbiol Rep 8: 1155-
1163.
13
Chapter Two
Metagenomic analysis of a complex marine planktonic thaumarchaeal community from the
Gulf of Maine*
*As published in:
Tully, BJ, et al. Metagenomic analysis of a complex marine planktonic thaumarchaeal
community from the Gulf of Maine. Environ Micro Vol. 14 (1): 254-267 (2012)
2.1. Introduction
Thaumarchaea comprise a large percentage of the marine planktonic microbe
community, with as much as 40-50% of all prokaryotes in the deep ocean, and about 20% of all
marine planktonic prokaryotes (Karner et al., 2001). All known members of the marine 1.1a
group Thaumarchaea are obligately chemoautotrophic, deriving energy through aerobic
ammonia oxidation (Blainey et al., 2011; de la Torre et al., 2008; Hallam et al., 2006b;
Hatzenpichler et al., 2008b; Walker et al., 2010) and fixing bicarbonate through the 3-
hydroxypropionate/4-hydroxybutyrate cycle (Berg et al., 2007; Berg et al., 2010; Walker et al.,
2010). Recent work has shown that an isolate of the 1.1b group Thaumarchaea, Nitrososphaera
viennesis, has enhanced growth when grown in the presence of pyruvate, but still generates a
majority of cellular carbon through autotrophy (Tourna et al., 2011). To date, five species from
this phylum have had their genomes fully sequenced – a symbiont of the sponge Axinella
mexicana, Candidatus Cenarchaeum symbiosum A (Hallam et al., 2006a); a marine isolate from
the Seattle aquarium, Candidatus Nitrosopumilus maritimus SCM1 (Walker et al., 2010); a
moderately thermophilic enrichment culture, Candidatus Nitrosphaera gargensis (Hatzenpichler
et al., 2008a); a thermophilic isolate from Yellowstone National Park, Candidatus Nitrosocaldus
yellowstonii (de la Torre et al., 2008); and a low-salinity enrichment culture from San Francisco
Bay, Candidatus Nitrosoarchaeum limnia SFB1 (Blainey et al., 2011). Additionally, there are a
number of environmental sequences (Béjá et al., 2002; Konstantinidis and DeLong, 2008;
Lopez-Garcia et al., 2004), and an increasing number of isolates and enrichment cultures of
Thaumarchaea undergoing sequencing and annotation. Extensive molecular ecology has
demonstrated the global importance of the Thaumarchaea; their 16S rRNA gene sequence and
the gene of ammonia monooxygenase subunit A (amoA) have been amplified from various
habitats around the world (Bernhard et al., 2010; Church et al., 2010; Francis et al., 2005;
Labrenz et al., 2010; Mincer et al., 2007; Molina et al., 2010a; Park et al., 2006; Santoro et al.,
2010; Venter et al., 2004). In the oceans, Thaumarchaea play a critical role in the carbon and
14
nitrogen cycle; calculations of in situ ammonia oxidation and carbon fixation estimate that this
group of organisms is capable of producing enough nitrite/nitrate to account for all “new”
nitrogen in the upper ocean and 1% of total global carbon fixation, respectively (Berg et al.,
2007; Ingalls et al., 2006). Despite their importance to the global marine ecosystem, the isolation
and sequencing of a planktonic marine thaumarcheaote has yet to be achieved.
Without a planktonic marine isolate, many of the assumptions regarding the genomic
potential and physiology of marine planktonic thaumarchaea are derived from molecular studies
of 16S rRNA gene sequences and amoA genes. Previous studies have shown that the
Thaumarchaea in the planktonic marine environment are capable of fixing carbon (Ingalls et al.,
2006; Wuchter et al., 2003), yet molecular studies have not been used to explore the presence or
abundance of genes implicated in thaumarchaeal carbon fixation (Berg et al., 2007; Walker et al.,
2010). Furthermore, it can only be assumed that the current marine genomes and isolates are
good approximations of genomic diversity and physiology in the planktonic marine environment,
since Ca. C. symbiosum A is an obligate symbiont and Ca. N. maritimus SCM1 was isolated
from an artificial environment.
As an alternative to isolation, we have analyzed thaumarchaeal sequences identified
within a large-scale marine metagenome derived from summer and winter samplings from the
Gulf of Maine (GOM), an environment with high abundance of marine thaumarchaea. From the
dataset of 2.4 Gbp of sequence, we were able to identify (bin) 36,111 reads (30 Mbp) with
putative thaumarchaeal origin. Key genes required for ammonia oxidation and carbon fixation
were identified in the putative thaumarchaeal bin and environmentally derived sequences were
compared to Ca. N. maritimus SCM1 (Walker et al., 2010) to explore the population and
functional diversity of the Gulf of Maine Thaumarchaea.
2.2. Results and Discussions
2.2.1. Environmental Sequencing and Binning
An initial assembly of the GOM metagenomic dataset was screened for contigs and
scaffolds (clone-linked contigs) that originated from planktonic thaumarchaea. The initial GOM
15
assembly produced 117,673 scaffolds comprised of 124,610 contigs, of which 738 scaffolds
(0.59% of the total scaffolds) and 3,662 contigs (2.9% of the total contigs) appeared
thaumarchaeal-like. As the scaffolds are larger and thus contain more phylogenetic signal, initial
work to support the binning process was performed on the scaffolds. A comparison of the
tetranucleotide frequency of these scaffolds to the Ca. N. maritimus SCM1 genome yielded a z-
score correlation of 0.93, supporting our assignment of the scaffolds to the Thaumarchaea
(Teeling et al., 2004). Furthermore, two scaffolds were identified that contain the small subunit
rRNA genes. The two corresponding 16S rRNA genes are nearly identical (99.5% nucleic acid
identity (NAID)). The divergence between the 16S rRNA sequences is much greater than the
0.001% error rate estimated for Sanger-derived sequences processed using the phred base-calling
program (Ewing and Green, 1998), supporting that the scaffolds represent two distinct groups.
Additionally, they have high identity to the Ca. N. maritimus SCM1 16S rRNA genes (98.7%
and 98.6% NAID). While it is unknown to what degree Thaumarchaea follow the microbial
species definition (>97% 16S rRNA gene sequence identity (Hagström et al., 2000)), this
similarity suggests that the environmental dataset contains at least two closely related organisms
which may represent species of the genus Nitrosopumilus.
In total, the thaumarchaeal metagenomic assemblies (scaffolds and contigs) are
composed of 36,111 sequences, averaging 837bp in length, containing a total of 30.2Mbp.
Almost all of these sequences were derived from the winter libraries (99.3%). For the three
libraries generated from the winter samples (see Experimental Procedures), the average
percentage of thaumarchaeal-like reads per library is 2.81% (range, 2.11-3.61%) (Table 2.2.1),
suggesting that the Thaumarchaea may make-up a similar percentage of the planktonic
prokaryotic community at all three sites. About two-thirds (66.3%) of the assemblies are a
composite of sequences from at least two different GOM sites indicating that there is an overlap
in the populations between the sample sites.
2.2.2. Ammonia oxidation and carbon fixation
The gene content of the GOM sequences suggests that, like other members of the
Thaumarchaea, the populations in the GOM are capable of chemoautotrophy. Functional
annotation shows that the ammonia monooxygenase subunits (amoA, amoB, & amoC),
16
ammonium permease (amt), the urease operon, urease transporter, a putative nitrite reductase
(nirK), and a putative nitric-oxide reductase Q (norQ) are all present within the metagenomic
sequences (Table 2.2). There was no evidence of hydroxylamine oxidoreductase, which catalyzes
the second step of aerobic ammonia oxidation in bacteria, however, three multicopper oxidases
(MCO) were identified which could convert hydroxylamine/nitroxyl to nitrite (Walker et al.,
2010). Furthermore, like Ca. N. maritimus SCM1, a number of transporters were detected which
are capable of amino acid uptake. The presence of this suite of genes indicates that the GOM
planktonic thaumarchaea are putatively involved in generating energy through the oxidation of
ammonia and harvesting amino acids from the environment. The presence of both heterotrophic
and chemoautotrophic genes in the environment suggests a potential for mixotrophy, but has not
be seen in experiments with the thaumarchaeal isolates (Blainey et al., 2011; de la Torre et al.,
2008; Walker et al., 2010).
Our observations suggest the possibility that only a minority of the planktonic
thaumarchaeal population in the GOM is capable of using urea as a nitrogen source. For the
GOM metagenome, the genes, which comprise the urease operon, recruit fewer homologous
sequences (1-10 sequences) than the genes that encode ammonium transporters (135 sequences)
and the ammonia oxidation pathway (45-79 sequences) (Table 2.2). The number of metagenomic
sequences homologous to any particular gene potentially provides information regarding the
underlying abundance of that gene within an environmental population, though this correlation
breaks down if there are varying copy numbers between genomes or if cloning bias causes the
genes to be undersampled. In terms of environmental support for a thaumarchaeal subpopulation
capable of using urea as a nitrogen source, during the stratified summer months, urea
concentration exceeds the ammonium concentration in the surface water, generating a large pool
of urea that could be used by organisms (Christensen et al., 1996; Dyhrman and Anderson,
2003).
All isolated members of the Thaumarchaea have the ability to grow autotrophically using
the 3-hydroxypropionate/4-hydroxybutyrate cycle (Berg et al., 2007; Berg et al., 2010; Walker et
al., 2010). Using the sequences for each step in the pathway as identified in Metallosphaera
sedula DSM 5348, all of the genes were present in the putative GOM Thaumarchaea, except for
17
malonate semialdehyde reductase and
4
OH-butyryl-CoA synthetase (Figure 2.2.1; SI Table 2.3)
(Berg et al., 2007). Fifty-six sequences from the planktonic thaumarchaea bin appear to be
homologous to the malonate semialdehyde reductase of marine !-Proteobacterium HTCC2080,
with the highest scoring sequence in this search (Read ID: 1106160064005) having only 27%
amino acid identity (AAID). However, no putative CDS from the GOM metagenome, Ca. C.
symbiosum A, and Ca. N. maritimus SCM1 have significant similarity (E-value " 10
-3
) to the
4
OH-butyryl-CoA synthetase from either M. sedula DSM 5348 (Msed_1456) or from Sulfolobus
tokodaii 7 (ST0783) (Alber et al., 2008). It is possible that the malonate semialdehyde reductase
and the
4
OH-butyryl-CoA synthetase functions have been replaced by marine thaumarchaeal-
specific enzymes. Several putative CDS were identified with similarity to a locus identified in
Ca. N. maritimus SCM1 as being
4
OH-butyryl-CoA synthetase (Nmar_206) (Walker et al.,
2010); however, we found no evidence, either experimental or computational, to support or
refute this annotation. The thaumarchaeal bin was also searched for key genes in the other
carbon fixation pathways, and no orthologs were detected. Thus, despite the incomplete gene set
identified, it is likely that these planktonic thaumarchaea are able to fix bicarbonate through the
3
OH-propionate/
4
OH-butyrate cycle, though, as has been shown in the 1.1b group
Thaumarchaea, some marine thaumarchaea may experience enhanced growth using some form
of mixotrophy (Tourna et al., 2011).
2.2.3. Comparative genomics to Ca. Nitrosopumilus maritimus SCM1
The 36,111 sequences that composed the thaumarchaeal bin were recruited against the
Ca. N. maritimus SCM1 genome. A total of 33,037 sequences (91.49%) aligned. In comparison,
of all the reads generated from the Global Ocean Survey (GOS) phases I and II (14,274,550
reads), only 31,504 reads recruit to the Ca. N. maritimus SCM1 genome. For the GOM
thaumarchaeal bin, the average pairwise nucleotide identity to the Ca. N. maritimus SCM1
genome was 76.6% and average coverage depth was 17.3X (range, 0-54; standard deviation,
12.0). The GOM metagenome more than doubles the depth of coverage available for analysis in
comparing the Ca. N. maritimus SCM1 genome to environmental organisms.
The J. Craig Venter Institute Annotation Pipeline identified 44,701 putative coding
sequences (CDSs) in the thaumarchaeal bin. A vast majority of the putative CDS (87.6%) had
18
highest similarity to genes from the Ca. N. maritimus SCM1 genome. An additional, 10.0% of
the putative CDS had best similarity to genes from other organisms, but still had similarity to a
gene from Ca. N. maritimus SCM1. These CDSs represent genetic material that is more
divergent from the Ca. N. maritimus SCM1 orthologs. Additionally, 1.4% of the putative CDSs
only had similarity to genes from other marine thaumarchaeal sources (see Experimental
Procedures) and 1.0% only had similarity to genes from non-thaumarchaeal sequences. These
CDSs may represent genes with a deep thaumarchaeal lineage that were lost from Ca. N.
maritimus SCM1, or may represent genetic material that was horizontally transferred into the
population.
There were 69 regions in the Ca. N. maritimus SCM1 genome for which no metagenomic
sequences from the GOM were recruited metagenomic sequences from the GOM. Most of the
regions (43) were ! 2kbp in length. These gaps could be due to the random distribution of
coverage common in genomic shotgun sequencing or due to genomic variation between Ca. N.
maritimus SCM1 and the GOM Thaumarchaea. Of the 26 coverage gaps that were >2kbp in
length, 16 also fail to recruit sequences from the GOS phase I metagenomes (Figure 2.2; SI
Table 2.4). Although we do not have sufficient sequence depth to ensure complete coverage of
the environmental thaumarcheaote, the length of these regions and the fact that independent
samples show similar patterns of absence suggests that these gaps represent islands in N.
maritimus SCM1 that are not common in the planktonic thaumarchaea. It should be noted
however, that even large gaps found in reference genomes compared to metagenomic sequences
have been found to be artifacts (Bhaya et al., 2007).
Four of these large islands include annotated integrase genes, suggesting that Ca. N.
maritimus SCM1 acquired these regions through lateral gene transfer. Integrase region 1 (IR1)
(Figure 2.3) is relatively short (~16kbp) and contains 13 genes; seven are predicted hypothetical
proteins, five genes encode restriction/modification functions, and one is the annotated integrase.
The genes adjacent to the integrase gene are syntenic to regions found in the genomes of the
Euryarchaeota Ferroplasma acidarmanus Fer1 (Allen et al., 2007) and the !-Proteobacterium
Klebsiella pneumoniae 342 (Fouts et al., 2008) (Figure 2.3). The wide phylogenetic distance
between these organisms suggests that this is likely a mobile genetic element that can move
19
between the prokaryotic domains and classes, although it is possible that this region is a remnant
of the last common ancestor of prokaryotes.
IR2-4 range in size from ~22-45kbp (18-37 genes). Many of these genes are annotated as
hypothetical proteins. However, some of the genes with a functional annotation in IR2, IR4 and
in other coverage gaps (SI Table 2.5) are related to the proposed thaumarchaeal respiratory chain
for Ca. N. maritimus SCM1 (Walker et al., 2010), specifically, genes annotated as DsbA
oxidoreductase, hypothesized to play a role in alleviating copper or nitric oxide toxicity, blue
copper proteins (BCP), hypothesized to act as plastocyanin-like electron shuttles, and
multicopper oxidases (MCO), predicted to mediate the second step of ammonia oxidation
(H
x
NO
x
! NO
2
-
) (Walker et al., 2010). In the Ca. N. maritimus SCM1 genome, these genes are
numerous – 11 DsbA oxidoreductases, 18 BCPs, and 4 MCOs – but in the GOM and GOS
metagenomes only a subset of these genes recruit reads – 6 DsbA oxidoreductase, 12 BCPs, and
3 MCOs. Distinct selective pressures due to differences in copper concentrations between the
Seattle Aquarium and the marine environment might have resulted in the variation in gene
content.
Another island includes a high-affinity phosphate uptake operon (pstSCAB). This region
does not recruit sequences from the GOM metagenome (Island-22, SI Table 2.2), but does recruit
sequences (at 58-75% NAID) from the GOS metagenomes from the Sargasso Sea, the east coast
of the United States, the open ocean near Cuba, and the Smithsonian Tropical Research Institute
in Panama (Rusch et al., 2007). Loss of this operon in the GOM Thaumarchaea may be due to
the higher phosphate levels found in the GOM. In the winter, phosphate levels range from ~0.5-
1.0 µM is the surface waters of the GOM (World Ocean Database; http://www.nodc.noaa.gov),
approximately 100-times more available phosphate than is found in the Sargasso Sea, where
phosphate levels average 7.9 nM (Van Mooy et al., 2006). These results highlight the importance
of comparing isolate genomes and environmental metagenomes. The lack of the thaumarchaeal
pstSCAB operon from the GOM metagenome would have likely gone unnoticed had the
recruitment to Ca. N. maritimus SCM1 not been performed. Such comparisons can thus increase
our understanding of how the biology is responding to the system in both environments.
20
2.2.4. Metagenomic population diversity
The environmental shotgun sequencing allowed for an analysis of the underlying
population. Due to the high abundance of microbial cells in the marine environment (~10
6
/mL),
it is assumed that environmental sequence generated comes from a unique individual. The
overall population diversity can then be examined at the individual level by comparing the
sequence variations between reads. Variation in the thaumarchaeal population in the GOM was
examined using genes from the ammonia oxidation (amoA) and carbon fixation pathways, as
well as the archaeal DNA repair recombinases (radA). The GOM amoA sequence divergence
ranged from 7.9% to 16.1% NAID to the Ca. N. maritimus SCM1 sequence. These values are
comparable to the divergence in the amoA database generated through environmental molecular
studies (Bernhard et al., 2010; Church et al., 2010; Francis et al., 2005; Labrenz et al., 2010;
Mincer et al., 2007; Molina et al., 2010a; Park et al., 2006; Santoro et al., 2010; Venter et al.,
2004). The thaumarchaeal bin sequences were used to construct a maximum likelihood (ML)
phylogenetic tree (Figure 2.4). The sequences cluster into two groups, closely related to Ca. N.
maritimus SCM1. This pattern of two dominant groups is present in the ML trees constructed for
methylmalonyl-CoA mutase (Figure 2.5),
4
OH-butyryl-CoA dehydratase (SI Figure 2.9)
(representing the carbon fixation pathway), and radA (SI Figure 2.10). Maximum nucleotide
divergence of these sequences is never greater than 20.2%, and all sequences are more closely
related to Ca. N. maritimus SCM1 than other Archaea. Sequences within each of the two groups
are greater than 90% identical, suggesting that further assembly may result in two distinct
dominant genomes.
Environmental amoA clones from various studies were added to the ML tree, including
sequences from the GOS database and sequences that partitioned in to the “water column clusters
A and B” from Francis et al. (2005) (Figure 2.4). Sequences from the GOM metagenome are
interspersed with sequences from the east South Pacific (Molina et al., 2010b), the Benguela
Upwelling of the coast of Namibia (Moraru et al., 2010), Antarctica (Hallam et al., 2006b),
particle associated communities in the Gulf of Mexico (unpublished), and the coastal Arctic
Ocean (unpublished). The wide dispersal of amoA sequences closely related to those in the GOM
metagenome may suggest that the specific thaumarchaeal gene is either under restrictive
evolutionary pressure or rapidly dispersed through the environment. A similar pattern is seen in
21
the methylmalonyl-CoA mutase ML tree, with sequences from Block Island (coastal north
eastern USA), coastal waters off of South Carolina (south eastern USA), and Newcomb Bay
(Antarctica) interspersed with GOM sequences (Figure 2.5). Furthermore, all but two of amoA
sequences retrieved from the GOM metagenome do not cluster with the depth-partitioned water
column groups identified in Francis et al. (2005). The two new groups identified in the GOM
metagenome may represent further diversity in the amoA gene.
2.2.5. Functional diversity
The underlying population diversity suggests that the Thaumarchaea present in the GOM are
part of two dominant populations. The presence of two different, but closely related 16S rRNA
sequences in the original GOM assembly further supports this hypothesis. Assuming each
organism has a genome of a similar size to Ca. N. maritimus SCM1 (~2Mbp), and both variants
have even representation in the community, there is sufficient sequence data in the
thaumarchaeal bin to assemble each genome at ~7X coverage The 36,111 sequences within the
bin were assembled using the Celera assembler (Myers et al., 2000), with the parameter
specification file modified to assemble microbial-sized genomes (see Experimental Procedures).
The assembly process generating 1,584 scaffolds, (longest = 162,170bp) spanning over 6.4Mbp
of sequence. Only about 15% of the sequences did not assemble (5,026 sequences) suggesting
that either the underlying diversity of thaumarchaea in the GOM is more complex than indicated
by our gene-centric analysis or that the binning process was inaccurate (e.g. lacking some
thaumarchaeal sequences or containing non-thaumarchaeal sequences).
The scaffolds were used to compare genomic and functional variations between the two
dominant GOM populations and Ca. N. maritimus SCM1. Specifically, the regions containing
the ammonia monooxygenase subunits,
4
OH-butyryl-CoA dehydratase, and the environmental-
based urease gene were analyzed. For each of these regions (and for most of the other regions
visually inspected), at least two “long” (>15kbp) scaffolds were present, putatively representing
the two dominant populations in the environment, along with several smaller scaffolds (~3-
7kbp). The “long” scaffolds were further analyzed because they offer the greatest opportunity to
study variations in genomic synteny and single nucleotide polymorphism (SNPs).
22
The alignments of the long scaffolds illustrated several different genomic
rearrangements. Two “long” environmental scaffolds contained the genes associated with
ammonia oxidation. “Scaffold A” (38,931 bp) had complete synteny with the region on the Ca.
N. maritimus SCM1 genome associated with ammonia oxidation (1,367,240-1,378,730 bp)
except for two internal gaps (sequencing gaps determined during the assembly process) (Figure
2.6). “Scaffold B” (19,904 bp) possessed two insertions when compared to both Scaffold A and
N. maritimus SCM1. The three putative CDS identified in the insertion sequences had !72%
AAID to genes identified in Ca. N. maritimus SCM1, Ca. C. symbiosum A, and a fosmid clone
derived from an uncultivated deep-sea Thaumarchaea (HF4000_APKG3E18) (Konstantinidis
and DeLong, 2008). amoB and amoC on “Scaffold A” had 89.6% and 88.5% NAID to N.
maritimus SCM1. amoB and amoC on “Scaffold B” had 89.1% and 87.3% NAID to Ca. N.
maritimus SCM1. Despite similar degrees of divergence on both scaffolds for amoB and amoC
when compared to Ca. N. maritimus SCM1, there is still substantial divergence between the two
scaffolds for both genes, 97.3% and 95.4% NAID, respectively. This indicates that the
differences on each scaffold are variations unique to that scaffold (SI Table 3).
Two scaffolds contained the region with
4
OH-butyryl-CoA dehydratase. “Scaffold C”
(17,950 bp) and “Scaffold D” (166,197 bp) are syntenic, yet contain an inversion relative to the
Ca. N. maritimus SCM1 genome, and possess a non-protein coding region (Figure 2.7). Despite
this high degree of synteny, the two scaffolds are divergent to each, in a similar fashion as the
genes on “Scaffold A” and “Scaffold B”.
One scaffold was identified to contain the full urease operon. “Scaffold E” (16,428 bp)
contains the urease operon (urease subunits ", #, $, and accessory proteins E-H) and has the
highest nucleotide similarity to Ca. C. symbiosum A. “Scaffold E” had the least overall synteny
to the Ca. N. maritimus SCM1 genome, with only four syntenic genes (Figure 2.8). The other
genes present on the scaffold had 31-71% AAID to genes identified in Ca. N. maritimus SCM1,
Ca. C. symbiosum A, and a fosmid clone derived from an uncultivated deep-sea Thaumarchaea
(HF4000_APKG8I13) (Konstantinidis and DeLong, 2008), supporting the thaumarchaeal origin
of the scaffold. The syntenic genes on “Scaffold E” were present on another environmental
scaffold. “Scaffold F” (41,439 bp) maintains the synteny of the Ca. N. maritimus SCM1 genes
23
for this region, but has several deletions, and has no synteny over the rest of the scaffold. The
pattern of gene similarity seen between Ca. N. maritimus SCM1 and the other environmental
scaffolds is present for the three genes that “Scaffold E” and “Scaffold F” have in common (SI
Table 3).
Collectively, the analysis of these three different regions, and six different scaffolds,
show that the putative dominant thaumarchaeal populations in the GOM are divergent from each
other, just as the ML tree and 16S rRNA sequences suggested. This divergence manifests in two
different ways: 1) sequence divergence and 2) functional divergence. In terms of sequence
divergence, for each of the genes analyzed, divergence between the scaffolds and Ca. N.
maritimus SCM1 were nearly constant, but the sequences on the scaffolds had moderate
nucleotide divergence from each other, suggesting that the two populations are drifting in
different directions, in relation to Ca. N. maritimus SCM1. Results from !N/!S calculations,
indicate that all of the analyzed genes were undergoing purifying selection, this suggests that the
functionality of the genes may be remaining constant. Furthermore, only “Scaffold C” and
“Scaffold D” were syntenic for sequence order. The four other environmental scaffolds had
several insertions and deletions relative to each other, suggesting that these populations are
becoming more divergent through environmental gene loss and gain. The scaffolds illustrate one
clearly defined example of functional divergence (“Scaffold B” may be functionally different
from “Scaffold A”, depending on the activity of the two hypothetical proteins), that “Scaffold E”
has the genomic potential to utilize urease in the environment as a nitrogen source, while
“Scaffold F” does not.
The population diversity that is part of the GOM thaumarchaeal community is not unlike the
diversity seen in other globally distributed planktonic marine prokaryotes. The high 16S rRNA
gene sequence NAID, indicating a single species, masks a great deal of underlying complexity.
The analysis provided suggests that the putative dominant populations in the environment are
closely related to each other and Ca. N. maritimus SCM1. The scaffolds show that much of the
underlying gene content remains constant for the dominant environmental populations, and
identical features, such as rearrangements and insertions, are present in both. Yet, despite this
high degree of similarity on the gene content level and evolutionary selection pressures, the
24
environmental populations are clearly unique, each possessing divergent nucleotide identity from
N. maritimus SCM1 and from each other. The presence of the full urease operon likely results in
a distinct ecological niche for the population possessing this island, a niche that allows for
divergent evolution between closely related populations.
2.3. Experimental Procedures
2.3.1. Sample Collection
Sea surface water samples were collected from the Gulf of Maine at three sites (GOM03,
GOM04, and GOM06) in Januaryof 2006 from the R/V Delaware and three sites (GOM12,
GOM13, and GOM14) in August of 2006 from the R/V Albatross IV (Table 2.2.1). Samples
were collected using the JCVI standard operating procedure (Rusch et al., 2007). Briefly, 200L
of surface water, from approximately 1.5 m depth, was passed through a 25 !m Nytex pre-filter.
The sample was then size-fractionated by filtering sequentially through 3.0 !m, 0.8 !m, and 0.1
!m pore size filters (Supor membrane disc filter, Pall Life Sciences). Filters used for genomic
extractions were placed in buffer and immediately frozen in liquid N
2
on the vessel, and
transferred to -80°C freezer until DNA isolation could be performed.
2.3.2. Sequencing and Assembly
Sample processing proceeded as described in Rusch et al (Rusch et al., 2007). In brief,
DNA was collected via a freeze-thaw method in an EDTA lysis buffer followed by an
phenol/chloroform extraction from the 0.1 !m filter, fragmented via nebulization, ligated to
BstXI adapters, inserted in to BstXI-linearized medium copy pBR322 plasmids vectors with a
medium range insert size, and electroporated into Escherichia coli. Following cloning, single
colonies were grown overnight in 2mL of liquid media, lysed by an alkaline lysis miniprep, and
DNA was collected by isopropanol precipitation. Paired-end Sanger sequencing was performed
from the plasmids using standard M13 forward and reverse primers. A total of 2,827,702 reads
were returned, (453,807 from GOM03, 957,738 from GOM04, 10,041 from GOM06, 470,592
from GOM12, 925,795 from GOM13, and 9,729 from GOM14) containing over 2,235 Mbp of
sequence data. To reduce redundancy, the sequences were assembled with the Celera assembler
25
at the J. Craig Venter Institute as described by Rusch et al (Rusch et al., 2007). In total, about
15.5% of sequences assembled in scaffolds greater than 10kbp.
Assembly of the thaumarchaeal bin (see below) was performed using the Celera
assembler using a specification file designed to construct microbial-sized genomes from
metagenomic data. The following settings were changed:
utgErrorRate = 0.08
ovlErrorRate = 0.10
cnsErrorRate = 0.10
cgwErrorRate = 0.10
merSize = 14
utgGenomeSize = 2000000
Determination of the total length spanned by assembled scaffolds is possible due to
approximations made of internal sequencing gaps generated using paired-end reads and the
Celera assembler.
2.3.3. Identification of thaumarchaeal signature
The GOM metagenome scaffolds and degenerate contigs were searched (BLASTN
(Altschul et al., 1997)) against the Ca. N. maritimus SCM1 genome (1,645,259bp, 1,795 putative
CDS, Accession number: NC_010085) using varying levels of stringency. All scaffolds
containing any alignment region with an E-value !10
-3
(929 scaffolds totaling 9,208,528 bp)
were submitted to the J. Craig Venter Institute Annotation Service and produced 10,990 CDS.
All CDSs returned through the annotation pipeline were searched (BLASTP; E-value cutoff !
10
-10
) against the NCBI nr protein database and assigned a putative taxonomic and gene
assignment based on the HSP with informative annotation. A two-step screening process refined
the bin assignments. The first step removed all scaffolds where less than 50% of the CDS had a
best BLASTP hit (based on bit score) to a sequenced marine thaumarchaeal genome or fosmid,
including N. maritimus SCM1, Ca. C. symbiosum A, uncultured Crenarchaeota 74A7 (Béjá et
al., 2002), 4B7 (Béjá et al., 2002), DeepAnt-EC39 (Lopez-Garcia et al., 2004), and HF4000
(Konstantinidis and DeLong, 2008). This initial screen yielded 742 scaffolds containing >2.4
Mbp. A final quality control step temporarily removed from consideration all CDS which had
26
<50% identity to sequenced marine thaumarchaeal genomes or fosmids across the length of the
BLASTP alignment, and then scaffolds were then reassessed using the first step curation criteria.
This step filtered out 4 additional scaffolds that contained only sequences with poor alignments
to known thaumarchaeal sequences. The small size (e.g. <3 putative CDSs) and poor alignments
of these scaffolds suggested high divergence from the marine Thaumarchaeota. The initial
planktonic thaumarchaeal bin contained 1,324 of 1,795 genes (73.7%) annotated in the N.
maritimus SCM1 genome. To assess the quality of our binning strategy, TETRA (Teeling et al.,
2004) was used to compare the tetranucleotide frequency z-scores between the binned scaffolds
and the Ca. N. maritimus SCM1 genome.
The assembled scaffolds likely did not contain all the possible thaumarchaeal diversity.
Metagenomic assemblies generated using the Celera Assembler can result in the most abundant
organisms being assigned a degenerate contig flag. Therefore, the degenerate contigs of the
initial GOM metagenomic assembly were of interest due to the unique population structure of the
planktonic thaumarchaeal scaffolds.
The resulting Thaumarchaea-like scaffolds (738) and contigs (3,662) (36,111 sequences
totaling 30,258,762bp) were recruited with the Geneious (V.4.8.3) (Drummond et al., 2009)
assembly program with the High Sensitivity parameter and using the Ca. N. maritimus SCM1 as
a reference genome. In brief, the Geneious assembly algorithm determines all pairwise distance
in a BLAST-like search and progressively aligns the highest scoring pairs. The High Sensitivity
parameter increases the time necessary to perform the assembly, but results in a more accurate
alignment of sequences to each other after initial alignment to the reference sequence.
Fragment recruitment plots, as described in Rusch et al (2007), were generated comparing all
reads in the GOM metagenome and all publicly available GOS sequences against the Ca. N.
maritimus SCM1 genome. In brief, all sequences are BLASTN compared against the reference
genome, such that all sequences with !55% nucleotide identity are displayed along the length of
the genome and color coded to represent specific sampling sites or mate-pair relationships
(Rusch et al., 2007). Fragment recruitment plots were assessed for regions divergent from the
reference genome.
27
2.3.4. Functional Gene Assignment
Amino acid sequences of proteins for the 3-hydroxypropionate/4-hydroxybutyrate cycle
from M. sedula DSM 5348 (Berg et al., 2007) were obtained from Integrated Microbial Genomes
(IMG) (Markowiz et al., 2006) and searched (BLASTX) against all thaumarchaeal reads (E-
value cutoff = 10
-3
) and (BLASTP) against the Ca. C. symbiosum A and Ca. N. maritimus SCM1
genomes (E-value cutoff = 10
-2
). All reads with similarity were considered as putative matches
for further consideration. Amino acid sequences of proteins of the ammonia oxidation pathway
described in Ca. C. symbiosum (Hallam et al., 2006b) were obtained from IMG (Markowiz et al.,
2006) and NCBI and searched (BLASTX) against all thaumarchaeal reads. All read matches with
an E-value !10
-3
were considered as putative matches for further consideration. Reads identified
as possible matches were assessed by comparing the location of identity between the read and
amino acid sequences. If a disagreement in the location of the identity was identified (i.e. identity
in the middle of the read was matched to identity in the middle of the amino acid sequences, with
no homology to the end of the read), reads were no longer considered as a putative match.
2.3.5. Within-Bin Diversity
To analyze the diversity within the planktonic Thaumarchaea, key genes in DNA
recombination and repair, aerobic ammonia oxidation, and the
3
OH-propionate/
4
OH-buytyrate
cycle (Huber et al., 2008) were compared (BLASTX) against all thaumarchaeal-like reads.
Alignments for each gene were generated using CLUSTALW (Thompson et al., 1994) (Cost
matrix: IUB; Gap open cost: 16; Gap extend cost: 6.66) and trimmed with Geneious (Drummond
et al., 2009) to maximize the length of the overlapping region and the number of sequences
included in the alignment. ML trees of sequences with homology to radA (21 reads; 476bp
region), amoA (16 reads; 417bp region), methylmalonyl-CoA mutase (15 reads; 394 bp region)
and
4
OH-butyryl-CoA dehydratase (20 reads; 527bp region) were constructed using PHYML
(Guindon and Gascuel, 2003) (Kimura (K80) model and all other default settings).
Representative environmental sequences were gathered from online databases; the NCBI nt
database was used for ammonia monooxygenase and the CAMERA database (V.1.3.2.31;
http://camera.calit2.net/) was used for methylmalonyl-CoA mutase and
4
OH-butyryl-CoA
dehydratase.
28
2.3.6. Coverage
The coverage depth of each base pair on 738 thaumarchaeal scaffolds was determined
using the sum of the length of each read used to construct a scaffold divided by the length of
scaffold consensus sequence (read to scaffold coverage). The coverage depth of each base pair in
the Ca. N. maritimus SCM1 genome was obtained from the results of the Geneious (Drummond
et al., 2009) assembly program.
2.3.7. Thaumarchaeal scaffold analysis
Scaffolds constructed using the thaumarchaeal sequence bin were aligned against the Ca.
N. maritimus SCM1 genome using the program Mauve, and the progressiveMauve alignment
algorithm (Darling et al., 2010). Scaffold sequences with homology to coding regions of Ca. N.
maritimus SCM1 were compared for pairwise identity in Geneious (Drummond et al., 2009),
translated, and assayed for codon alignment (Suyama et al., 2006), which would also calculate
synonymous (d
S
) and non-synonymous (d
N
) substitution rates for the genes using the codeml
program PAML (Yang, 2007).
2.4. Availability of sequences
The complete Gulf of Maine metagenome has been deposited in to NCBI GenBank as raw reads
in the Trace Archive (TA) (ID no.: 2307942905-2310786347). The scaffolds generated from the
putative thaumarchaeal sequence bin have been deposited at DDBJ/EMBL/GenBank as a Whole
Genome Shotgun project, under the accession AGBE00000000. The version described in this
paper is the first version, AGBE01000000.
2.5. Acknowledgements
The authors gratefully acknowledge NOAA ecosystem process division scientists Jon Hare and
Jerry Prezario for ship time on NOAA Fisheries R/Vs Delaware II (Cruise No. DE 06-02) and
the R/V Albatross IV (Cruise no. AL 06-07). Sequencing was supported by National Science
Foundation Microbial Sequencing grant 0412119. We thank Drs. Karla Heidelberg and Shannon
Williamson for collecting samples. We thank Robert Friedman, and Yu-Hui Rogers for technical
and scientific support in the sequencing efforts. We thank Matt Lewis and Dr. Aaron Halpern,
who processed the GOM samples. We'd like to thank JCVI for providing the JCVI Annotation
29
Service, which provided us with automatic annotation data and the manual annotation tool
Manatee.
2.6. References
Alber, B.E., Kung, J.W., and Fuchs, G. (2008) 3-Hydroxypropionyl-Coenzyme A Synthetase
from Metallosphaera sedula, an Enzyme Involved in Autotrophic CO
2
Fixation. J Bacteriol 190:
1383-1389.
Allen, E.E., Tyson, G.W., Whitaker, R., Detter, J.C., Richardson, P.M., and Banfield, J.F. (2007)
Genome dynamics in a natural archaeal population. Proc Natl Acad Sci USA 104: 1883-1888.
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res 25: 3389-3402.
Béjá, O., Suzuki, M., Heidelberg, J., Nelson, W., Preston, C., Hamada, T. et al. (2002)
Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature 415: 630-633.
Berg, I.A., Kockelkorn, D., Buckel, W., and Fuchs, G. (2007) A 3-Hydroxypropionate/4-
Hydroxybutyrate Autotrophic Carbon Dioxide Assimilation Pathway in Archaea. Science 318:
1782-1786.
Berg, I.A., Ramos-Vera, W.H., Petri, A., and Huber, H. (2010) Study of the distribution of
autotrophic CO
2
fixation cycles in Crenarchaeota. Microbiology 156: 256-269.
Bernhard, A., Landry, Z., Blevins, A., De La Torre, J., Giblin, A., and Stahl, D. (2010)
Abundance of Ammonia-Oxidizing Archaea and Bacteria along an Estuarine Salinity Gradient in
Relation to Potential Nitrification Rates. Appl Environ Microbiol 76: 1285-1289.
Bhaya, D., Grossman, A., Steunou, A.-S., Khuri, N., Cohan, F., Hamamura, N. et al. (2007)
Population level functional diversity in the microbial community revealed by comparative
genomic and metagenomic analyses. ISME J 1: 703-713.
Blainey, P., Mosier, A., Potanina, A., Francis, C., and Quake, S. (2011) Genome of a Low-
Salinity Ammonia-Oxidizing Archaeon Determined by Single-Cell and Metagenomic Analysis.
PLoS ONE 6: e16626.
Christensen, J., Townsend, D.W., and Montoya, J. (1996) Water column nutrients and
sedimentary denitrification in the Gulf of Maine. Cont Shelf Res 16: 489-515.
30
Church, M.J., Wai, B., Karl, D.M., and Delong, E.F. (2010) Abundances of crenarchaeal amoA
genes and transcripts in the Pacific Ocean. Environ Microbiol 12: 679-688.
Darling, A.E., Mau, B., and Perna, N.T. (2010) progressiveMauve: Multiple Genome Alignment
with Gene Gain, Loss and Rearrangement. PLoS One 5: e11147.
de la Torre, J., Walker, C., Ingalls, A., Könneke, M., and Stahl, D. (2008) Cultivation of a
thermophilic ammonia oxidizing archaeon synthesizing crenarchaeol. Environ Microbiol 10:
810-818.
Drummond, A., Ashton, B., Cheung, M., Heled, J., Kearse, M., Moir, R. et al. (2009) Geneious
v4.6. In: Available from http://www.geneious.com/.
Dyhrman, S.T., and Anderson, D.M. (2003) Urease activity in cultures and field populations of
the toxic dinoflagellate Alexandrium. Limnol Oceanogr 48: 647-655.
Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using Phred. II.
error probabilities. Genome Res 8: 186-194.
Fouts, D.E., Tyler, H.L., DeBoy, R.T., Daugherty, S., Ren, Q., Badger, J.H. et al. (2008)
Complete Genome Sequence of the N
2
-Fixing Broad Host Range Endophyte Klebsiella
pneumoniae 342 and Virulence Predictions Verified in Mice. PLoS Genet 4: e1000141.
Francis, C.A., Roberts, K.J., Beman, J.M., Santoro, A.E., and Oakley, B.B. (2005) Ubiquity and
diversity of ammonia-oxidizing archaea in water columns and sediments of the ocean. Proc Natl
Acad Sci USA 102: 14683-14688.
Guindon, S., and Gascuel, O. (2003) A simple, fast and accurate algorithm to estimate large
phylogenies by maximum likelihood. Syst Biol 52: 696-704.
Hagström, Ä., Pinhassi, J., and Zweifel, U.L. (2000) Biogeographical diversity among marine
bacterioplankton. Aquat Micro Ecol. 21: 231-244.
Hallam, S.J., Mincer, T.J., Schleper, C., Preston, C.M., Roberts, K., Richardson, P.M., and
DeLong, E.F. (2006a) Pathways of Carbon Assimilation and Ammonia Oxidation Suggested by
Environmental Genomic Analyses of Marine Crenarchaeota. PLoS Biology 4: 520-536.
Hallam, S.J., Konstantinidis, K.T., Putnam, N., Schleper, C., Watanabe, Y.-i., Sugahara, J. et al.
(2006b) Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum.
Proc Natl Acad Sci USA 103: 18296-18301.
31
Hatzenpichler, R., Lebedeva, E.V., Spieck, E., Stoecker, K., Richter, A., Daims, H., and Wagner,
M. (2008) A moderately thermophilic ammonia-oxidizing crenarchaeote from a hot spring. Proc
Natl Acad Sci USA 105: 2134-2139.
Huber, H., Gallenberger, M., Jahn, U., Eylert, E., Berg, I.A., Kockelkorn, D. et al. (2008) A
dicarboxylate/4-hydroxybutyrate autotrophic carbon assimilation cycle in the hyperthermophilic
Archaeum Ignicoccus hospitalis. Proc Natl Acad Sci USA 105: 7851-7856.
Ingalls, A.E., Shah, S.R., Hansman, R.L., Aluwihare, L.I., Santos, G.M., Druffel, E.R.M., and
Pearson, A. (2006) Quantifying archaeal community autotrophy in the mesopelagic ocean using
natural radiocarbon. Proc Natl Acad Sci USA 103: 6442-6447.
Karner, M.B., DeLong, E.F., and Karl, D.M. (2001) Archaeal dominance in the mesopelagic
zone of the Pacific Ocean. Nature.
Konstantinidis, K.T., and DeLong, E.F. (2008) Genomic patterns of recombination, clonal
divergence and environment in marine microbial populations. ISME J 2: 1052-1065.
Labrenz, M., Sintes, E., Toetzke, F., Zumsteg, A., Herndl, G.J., Seidler, M. et al. (2010)
Relevance of a crenarchaeotal subcluster related to Candidatus Nitrosopumilus maritimus to
ammonia oxidation in the suboxic zone of the central Baltic Sea. ISME J 4: 1496-1508.
Lopez-Garcia, P., Brochier, C., Moreira, D., and Rodriguez-Valera, F. (2004) Comparative
analysis of a genome fragment of an uncultivated mesopelagic crenarchaeota reveals multiple
horizontal gene transfers. Environ Microbiol 6: 19-34.
Markowiz, V.M., Korzeniewski, F., Palaniappan, K., Szeto, E., Werner, G., Padki, A. et al.
(2006) The integrated microbial genome (IMG) system. Nucleic Acids Res 34: D344-D348.
Mincer, T., Church, M., Taylor, L., Preston, C., Karl, D.M., and DeLong, E. (2007) Quantitative
distribution of presumptive archaeal and bacterial nitrifiers in Monterey Bay and the North
Pacific Subtropical Gyre. Environ Microbiol 9: 1162-1175.
Molina, V., Belmar, L., and Ulloa, O. (2010) High diversity of ammonia-oxidizing archaea in
permanent and seasonal oxygen-deficient waters of the eastern South Pacific. Environ Microbiol
12: 2450-2465.
Moraru, C., Lam, P., Fuchs, B.M., Kuypers, M.M.M., and Amann, R. (2010) Gene-FISH - an in
situ technique for linking gene presence and cell identity in environmental microorganisms.
Environ Microbiol 12: 3057-3073.
32
Myers, E.W., Sutton, G., Delcher, A.L., Dew, I.M., and Fasulo, D.P. (2000) A whole-genome
assembly of Drosophila. Science 287: 2196-2204.
Park, H., Wells, G.F., Bae, H., Criddle, C.S., and Francis, C.A. (2006) Occurrence of ammonia-
oxidizing archaea in wastewater treatment plant bioreactors. Appl Environ Microbiol 72: 5643-
5647.
Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B., Williamson, S., Yooseph, S. et al.
(2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern
Tropical Pacific. PLoS Biology 5: 398-431.
Santoro, A., Casciotti, K., and Francis, C. (2010) Activity, abundance and diversity of nitrifying
archaea and bacteria in the central California Current. Environ Microbiol 12: 1989-2006.
Suyama, M., Torrents, D., and Bork, P. (2006) PAL2NAL: robust conversion of protein
sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34: W609-
W612.
Teeling, H., Meyedierks, A., Bauer, M., Amann, R., and Glockner, F.O. (2004) Application of
tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol 6: 938-
947.
Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity
of progressive multiple sequence alignment through sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
Tourna, M., Stieglmeier, M., Spang, A., Konneke, M., Schintlmeister, A., Urich, T. et al. (2011)
Nitrososphaera viennensis, an ammonia oxidizing archaeon from soil. Proc Natl Acad Sci USA
108: 8420-8425.
Van Mooy, B.A.S., Rocap, G., Fredricks, H.F., Evans, C.T., and Devol, A.H. (2006) Sulfolipids
dramatically decrease phosphorus demand by picocyanobacteria in oligotrophic marine
environments. Proc Natl Acad Sci USA 103: 8607-8612.
Venter, J.C., Remington, K., Heidelberg, J., Halpern, A.L., Rusch, D., Eisen, J. et al. (2004)
Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
Walker, C.B., de la Torre, J.R., Klotz, M.G., Urakawa, H., Pinel, N., Arp, D.J. et al. (2010)
Nitrosopumilus maritimus genome reveals unique mechanisms for nitrification and autotrophy in
globally distributed marine crenarchaea. Proc Natl Acad Sci USA 107: 8818-8823.
33
Wuchter, C., Schouten, S., Boschker, H., and Damste, J.S. (2003) Bicarbonate uptake by marine
Crenarchaeota. FEMS Microbiol Lett 219: 203-207.
Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:
1586-1591.
Alber BE, Kung JW, Fuchs G (2008). 3-Hydroxypropionyl-Coenzyme A Synthetase from
Metallosphaera sedula, an Enzyme Invloved in Autotrophic CO2 Fixation. J Bacteriol 190:
1383-1389.
Allen EE, Tyson GW, Whitaker R, Detter JC, Richardson PM, Banfield JF (2007). Genome
dynamics in a natural archaeal population. Proc Natl Acad Sci USA 104: 1883-1888.
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W et al (1997). Gapped BLAST
and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:
3389-3402.
Béjá O, Suzuki M, Heidelberg J, Nelson W, Preston C, Hamada T et al (2002). Unsuspected
diversity among marine aerobic anoxygenic phototrophs. Nature 415: 630-633.
Berg IA, Kockelkorn D, Buckel W, Fuchs G (2007). A 3-Hydroxypropionate/4-Hydroxybutyrate
Autotrophic Carbon Dioxide Assimilation Pathway in Archaea. Science 318: 1782-1786.
Berg IA, Ramos-Vera WH, Petri A, Huber H (2010). Study of the distribution of autotrophic
CO2 fixation cycles in Crenarchaeota. Microbiology 156: 256-269.
Bernhard A, Landry Z, Blevins A, De La Torre J, Giblin A, Stahl D (2010). Abundance of
Ammonia-Oxidizing Archaea and Bacteria along an Estuarine Salinity Gradient in Relation to
Potential Nitrification Rates. Appl Environ Microbiol 76: 1285-1289.
Bhaya D, Grossman A, Steunou A-S, Khuri N, Cohan F, Hamamura N et al (2007). Population
level functional diversity in the microbial community revealed by comparative genomic and
metagenomic analyses. ISME J 1: 703-713.
Blainey P, Mosier A, Potanina A, Francis C, Quake S (2011). Genome of a Low-Salinity
Ammonia-Oxidizing Archaeon Determined by Single-Cell and Metagenomic Analysis. PLoS
ONE 6: e16626.
Christensen J, Townsend DW, Montoya J (1996). Water column nutrients and sedimentary
denitrification in the Gulf of Maine. Cont Shelf Res 16: 489-515.
Church MJ, Wai B, Karl DM, Delong EF (2010). Abundances of crenarchaeal amoA genes and
transcripts in the Pacific Ocean. Environ Microbiol 12: 679-688.
Darling AE, Mau B, Perna NT (2010). progressiveMauve: Multiple Genome Alignment with
Gene Gain, Loss and Rearrangement. PLoS One 5: e11147.
34
de la Torre J, Walker C, Ingalls A, Könneke M, Stahl D (2008). Cultivation of a thermophilic
ammonia oxidizing archaeon synthesizing crenarchaeol. Environ Microbiol 10: 810-818.
Drummond A, Ashton B, Cheung M, Heled J, Kearse M, Moir R et al. (2009). Available from
http://www.geneious.com/.
Dyhrman ST, Anderson DM (2003). Urease activity in cultures and field populations of the toxic
dinoflagellate Alexandrium. Limnol Oceanogr 48: 647-655.
Ewing B, Green P (1998). Base-calling of automated sequencer traces usingPhred. II. error
probabilities. Genome Res 8: 186-194.
Fouts DE, Tyler HL, DeBoy RT, Daugherty S, Ren Q, Badger JH et al (2008). Complete
Genome Sequence of the N2-Fixing Broad Host Range Endophyte Klebsiella pneumoniae 342
and Virulence Predictions Verified in Mice. PLoS Genet 4: e1000141.
Francis CA, Roberts KJ, Beman JM, Santoro AE, Oakley BB (2005). Ubiquity and diversity of
ammonia-oxidizing archaea in water columns and sediments of the ocean. Proc Natl Acad Sci
USA 102: 14683-14688.
Guindon S, Gascuel O (2003). A simple, fast and accurate algorithm to estimate large
phylogenies by maximum likelihood. Systemic Biology 52: 696-704.
Hagström Ä, Pinhassi J, Zweifel UL (2000). Biogeographical diversity among marine
bacterioplankton. Aquat Micro Ecol.
Hallam SJ, Konstantinidis KT, Putnam N, Schleper C, Watanabe Y-i, Sugahara J et al (2006a).
Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proc Natl
Acad Sci USA 103: 18296-18301.
Hallam SJ, Mincer TJ, Schleper C, Preston CM, Roberts K, Richardson PM et al (2006b).
Pathways of Carbon Assimilation and Ammonia Oxidation Suggested by Environmental
Genomic Analyses of Marine Crenarchaeota. PLoS Biology 4: 520-536.
Hatzenpichler R, Lebedeva EV, Spieck E, Stoecker K, Richter A, Daims H et al (2008a). A
moderately thermophilic ammonia-oxidizing crenarchaeote from a hot spring. Proceedings of the
National Academy of Sciences of the United States of America 105: 2134-2139.
Hatzenpichler R, Lebedeva EV, Spieck E, Stoecker K, Richter A, Daims H et al (2008b). A
moderately thermophilic ammonia-oxidizing crenarchaeote from a hot spring. Proc Natl Acad
Sci USA 105: 2134-2139.
Huber H, Gallenberger M, Jahn U, Eylert E, Berg IA, Kockelkorn D et al (2008). A
dicarboxylate/4-hydroxybutyrate autrophic carbon assimilation cycle in the hyperthermophilic
Archaeum Ignicoccus hospitalis. Proc Natl Acad Sci USA 105: 7851-7856.
35
Ingalls AE, Shah SR, Hansman RL, Aluwihare LI, Santos GM, Druffel ERM et al (2006).
Quantifying archaeal community autotrophy in the mesopelagic ocean using natural radiocarbon.
Proc Natl Acad Sci USA 103: 6442-6447.
Karner MB, DeLong EF, Karl DM (2001). Archaeal dominance in the mesopelagic zone of the
Pacific Ocean. Nature.
Konstantinidis KT, DeLong EF (2008). Genomic patterns of recombination, clonal divergence
and environment in marine microbial populations. ISME J 2: 1052-1065.
Labrenz M, Sintes E, Toetzke F, Zumsteg A, Herndl GJ, Seidler M et al (2010). Relevance of a
crenarchaeotal subcluster related to Candidatus Nitrosopumilus maritimus to ammonia oxidation
in the suboxic zone of the central Baltic Sea. ISME J 4: 1496-1508.
Lopez-Garcia P, Brochier C, Moreira D, Rodriguez-Valera F (2004). Comparative analysis of a
genome fragment of an uncultivated mesopelagic crenarchaeota reveals multiple horizontal gene
transfers. Environ Microbiol 6: 19-34.
Markowiz VM, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A et al (2006). The
integrated microbial genome (IMG) system. Nucleic Acids Res 34: D344-D348.
Mincer T, Church M, Taylor L, Preston C, Karl DM, DeLong E (2007). Quantitative distribution
of presumptive archaeal and bacterial nitrifiers in Monterey Bay and the North Pacific
Subtropical Gyre. Environ Microbiol 9: 1162-1175.
Molina V, Belmar L, Ulloa O (2010a). High diversity of ammonia-oxidizing archaea in
permanent and seasonal oxygen-deficient waters of the eastern South Pacific. Environ Microbiol
12: 2450-2465.
Molina V, Belmar L, Ulloa O (2010b). High diversity of ammonia-oxidizing archaea in
permanent and seasonal oxygen-deficient waters of the eastern South Pacific. Environmental
microbiology 12: 2450-2465.
Moraru C, Lam P, Fuchs BM, Kuypers MMM, Amann R (2010). Gene-FISH - an in situ
technique for linking gene presence and cell identity in environmental microorganisms.
Environmental microbiology: 1-17.
Myers EW, Sutton G, Delcher AL, Dew IM, Fasulo DP (2000). A whole-genome assembly of
Drosophila. Science 287: 2196-2204.
Park H, Wells GF, Bae H, Criddle CS, Francis CA (2006). Occurrence of ammonia-oxidizing
archaea in wastewater treatment plant bioreactors. Appl Environ Microbiol 72: 5643-5647.
36
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S et al (2007). The
Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical
Pacific. PLoS Biology 5: 398-431.
Santoro A, Casciotti K, Francis C (2010). Activity, abundance and diversity of nitrifying archaea
and bacteria in the central California Current. Environ Microbiol 12: 1989-2006.
Suyama M, Torrents D, Bork P (2006). PAL2NAL: robust conversion of protein sequence
alignments into the corresponding codon alignments. Nucleic Acids Research 34: W609-W612.
Teeling H, Meyedierks A, Bauer M, Amann R, Glockner FO (2004). Application of
tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol 6: 938-
947.
Thompson JD, Higgins DG, Gibson TJ (1994). CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through sequence weighting, position-specific gap
penalities and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
Tourna M, Stieglmeier M, Spang A, Konneke M, Schintlmeister A, Urich T et al (2011).
Nitrososphaera viennensis, an ammonia oxidizing archaeon from soil. Proc Natl Acad Sci USA
108: 8420-8425.
Van Mooy BAS, Rocap G, Fredricks HF, Evans CT, Devol AH (2006). Sulfolipids dramatically
decrease phosphorus demand by picocyanobacteria in oligotrophic marine environments.
Proceedings of the National Academy of Sciences of the United States of America 103: 8607-
8612.
Venter JC, Remington K, Heidelberg J, Halpern AL, Rusch D, Eisen J et al (2004).
Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
Walker CB, de la Torre JR, Klotz MG, Urakawa H, Pinel N, Arp DJ et al (2010). Nitrosopumilus
maritimus genome reveals unique mechanisms for nitrification and autotrophy in globally
distributed marine crenarchaea. Proc Natl Acad Sci USA 107: 8818-8823.
Wuchter C, Schouten S, Boschker H, Damste JS (2003). Bicarbonate uptake by marine
Crenarchaeota. FEMS Microbiol Lett 219: 203-207.
Yang Z (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:
1586-1591.
37
Table 2.1. Gulf of Maine sample sites and sequencing effort.
Site Name Date Sampled Latitude Longitude Temp. °C Salinity, ppt Nunber of sequences Insert Size Range (kb)
GOM03 25-Jan-06 42° 46' 9" N 68° 40' 8" W 6.4 33 453,805 3-5 / 2-3
GOM04 27-Jan-06 44° 07' 5" N 67° 58' 3" W 5.1 32.2 957,737 4-6
GOM06 1-Jan-06 41° 28' 7" N 69° 6' 0" W 6.2 32.6 10,040 4-6
GOM12 1-Aug-06 41° 08' 6" N 66° 53' 3" W 17.4 32 470,591 6-8
GOM13 1-Aug-06 43° 23' 3" N 67° 41' 9" W 16.6 32 925,793 6-8 / 8-10
GOM14 1-Aug-06 42° 21' 6" N 69° 23' 8" W 19.6 31.3 9,728 6-8
Site location
38
Table 2.2. Sequence recruitment for genes related to ammonia oxidation pathway.
Gene Name Gene Source Genome
Range of %ID
recuited sequences
Total number of
sequence
Urease ! subunit Cenarchaeum symbiosum A 52-79% 6
Urease " subunit Cenarchaeum symbiosum A 64% 1
Urease ! subunit Cenarchaeum symbiosum A 69-73% 2
Urease Accessory E Cenarchaeum symbiosum A 44-57% 3
Urease Accessory F Cenarchaeum symbiosum A 35-46% 5
Urease Accessory G Cenarchaeum symbiosum A 56-74% 6
Urease Accessory H Cenarchaeum symbiosum A 41-58% 10
Urea transporter Cenarchaeum symbiosum A 45-83% 9
Hydroxylamine oxidoreductase Desulfobacterium autotrophicum HRM2 -- --
Nitrate reductase [nar] Haloarcula marismortui ATCC 43049 -- --
Ferrodoxin nitrite/sulfite reductase [nirA] Cenarchaeum symbiosum A 21-32% 79
Ferrodoxin nitrite reductase NAD(P)H [nirB] Vibrio fischeri MJ11 26-27% 4‡
Formate-dependent nitrite reductase [nrfA] Vibrio fischeri MJ11 -- --
Nitrite reductase [nirK] Haloarcula marismortui ATCC 43049 26-40% 25‡
Nitric-oxide reductase
norB Haloarcula marismortui ATCC 43049 -- --
norC Silicibacter pomeroyi DSS-3 -- --
norQ Cenarchaeum symbiosum A 27-33% 70
norD Silicibacter pomeroyi DSS-3 -- --
Nitrous-oxide reductase [norZ] Geobacillus thermodenitrificans NG80-2 -- --
Ammonium transporter Cenarchaeum symbiosum A 33-82% 135
ammonia monooxygenase subunit A Cenarchaeum symbiosum A 72-96% 63
ammonia monooxygenase subunit B Cenarchaeum symbiosum A 58-92% 46
ammonia monooxygenase subunit C Cenarchaeum symbiosum A 69-98% 45
39
Table 2.3. Sequence recruitment of the 3-Hyroxybutyrate/4-Hydroxypropionate cycle.
Gene name Metallospheara sedula locus tag
Cenarchaeum Locus Tag
(CENSYa_)
Percent Identity
(Cenarchaeum)
Nitrosopumilus Locus
Tag (Nmar_)
Percent Identity
(Nitrosopumilus)
Total number of
sequences
Acetyl-CoA carboxylase Msed_1375/Msed_0147/Msed_0148 1660/1661/1662 53.5/43.0/38.5 0272/0273/0274 53.1/43.3/35.3 63/66/53
Malonyl-CoA reductase Msed_0709 1993 33.0 1586 34.0 59‡
Malonate semialdehyde reductase
(marine gamma proteobacterium
HTCC2080) -- -- -- -- 56‡
3-Hydroxypropionyl-CoA synthetase Msed_1456 -- -- 0701 or 0866 46.6/36.0 116
3-Hydroxypropionyl-CoA dehydratase Msed_2001 0116 42.1 1308 42.1 41
Acryloyl-CoA reductase Msed_1426 -- -- -- -- 35‡
Propionyl-CoA carboxylase Msed_1375/Msed_0147/Msed_0148 1660/1661/1662 53.5/43.0/38.5 0272/0273/0274 53.1/43.3/35.3 63/66/53
Methylmalonyl-CoA epimerase Msed_0639 0475 37.1 0953 37.9 20
Methylmalonyl-CoA mutase Msed_0638 0476 51.2 0954 52.4 48
cobalamin B12-binding domain protein
(Methylmalonyl-CoA mutase) Msed_2055 0482 53.3 0958 53.0 20
Succinyl-CoA reductase Msed_0709 1993 33.0 1586 34.0 59‡
Succinate semialdehyde reductase Msed_1424 1407 or 0510 31.9/31.8 0523 30.3 131‡
4-Hydroxybutyryl-CoA synthetase Msed_1422 -- -- -- -- --
4-Hydroxybutyryl-CoA dehydratase (a) Msed_1220 -- -- -- -- 41
4-Hydroxybutyryl-CoA dehydratase (b) Msed_1321 0022 42.4 0207 42.5 56
Crotonyl-CoA hydratase - Candidate A Msed_0399 0413 40.0 1028 40.6 88
Candidate B Msed_0384 0166 40.8 1308 41.4 36
Candidate C Msed_0385 -- 1308 32.4 28‡
Candidate D Msed_0336 -- 1308 32.7 29
Candidate E Msed_0389 0413 42.3 1028 40.8 88
(S)-3-Hydroxybutyryl-CoA dehydrogenase -
Candidate A Msed_1423 0413 40.7 1028 42.2 48
Candidate B Msed_0399 0413 40.0 1028 40.6 88
Candidate C Msed_1993 0413 38.5 1028 39.0 29
Candidate D Msed_0389 0413 42.3 1028 40.8 88
Acetoacetyl-CoA !-ketothiolase - Candidate A Msed_0656 -- -- -- -- --
Candidate B Msed_1647 -- -- -- -- 66
Candidate C Msed_1290 1780 or 0587 36.7/30.6 1631 34.6 77
Candidate D Msed_0396 1780 or 0587 37.8/32.4 1631 36.9 71
Candidate E Msed_0386 1780 or 0587 35.2/31.8 1631 35.7 69
Candidate F Msed_0271 -- -- -- -- 49‡
Candidate G Msed_0270 1780 30.9 -- -- 48
40
Table 2.4. Location of gaps in recruitment along N. maritimus SCM1 genome.
Gap #
Approximate
Range on the
N. maritimus
genome
(Kbp)
Approximate
Length (Kbp)
%G+C Notes
1 12.8-13.5 0.716
2 15.4-15.8 0.360
3 27.7-27.9 0.151
4 105-159 54.2 27.8
Not one contiguous gap. Low coverage. Average
coverage = 1.3X; Range = 0-12X coverage
5 171-172 1.0
6 178-178 0.150
7 192-194 2.0
8 220-222 2.0
9 294-297 2.6
10 318-319 0.755
11 325-330 4.4
12 339-340 1.0
13 342-345 2.4 32.4
A 369 bp region recruits GOM reads at 1X coverage
and 48.6% nucleotide identity
14 400-418 17.4 34.6
Underrepresented in GOS metagenomes. 4 reads,
409300-412400 bp, 77-80% ID, hypothetical
proteins Nmar_0456, _0457, _0458, _0549,
Mangrove in Isabella Island, Galapagos Islands. 2
reads, 415400-416900 bp, 87-90% ID, hypothetical
proteins Nmar_0461, _0462, DUF protein,
Mangrove in Isabella Island, Galapagos Islands.
15 423-425 2.0
16 429-437 7.8 35.3
Pst Operon. 58-75% ID, Sargasso Sea, coastal US
East coast, near Cuba, & STRI, Panama
17 497-498 1.7
18 504-506 1.6 33.3
A 778 bp region recruits GOM reads at 0.6X
coverage and 60.0% nucleotide identity
19 520-521 1.1
20 538-545 7.1
21 546-553 7.0
22 600-607 6.9 26.2
23 611-611 0.106
24 614-615 0.697
25 628-629 0.455
26 644-647 2.4
27 664-664 0.675
28 675-675 0.069
29 680-680 0.441
30 718-720 1.4 36.8 Blue (type I) copper protein (Nmar_0815).
31 724-724 0.155
32 733-733 0.080
33 790-790 0.633
34 792-794 2.1
35 801-802 1.4
36 804-805 0.839
37 871-892 21.3 34.2
Underrepresented in GOS metagenomes. 11 reads,
880300-884600 bp, 63-84% ID, Magnesium &
Cobolt transporter, protein DUF, D-alanine--D-
alanine ligase, Sargasso Sea, near nuclear plant in
Delaware Bay, Mangrove in Isabella Island,
Galapagos Islands
38 914-915 1.5
Supplemental Table 2 - Location of gaps in recruitment along N. maritimus SCM1 genome
31.5
Gap present in GOS metagenomes. A 1,661 bp
region recruits GOM reads at 3.1X coverage and
65.9% nucleotide identity. Integrase [1].
41
39 919-923 4.1 33.3
Underrepresented in GOS metagenomes. 7 reads,
917089-918855 bp, 65-81% ID, short chain
dehydrogenase, Sargasso Sea, Newfoundland Coast
40 934-936 1.9
41 946-981 35.3 40.0
Underrepresented in GOS metagenomes. 2 reads,
949100-950600 bp, 80-84% ID, Fibronectin type III
domain, Mangrove in Isabella Island, Galapagos
Islands. 1 read, 980700-981700 bp, 91% ID,
hypothetical Nmar protein, Lagoon Rangiroa.
42 991-992 0.750
43 998-1000 2.0
44 1030-1031 1.3
45 1032-1033 0.519
46 1036-1072 36.6 43.9
Underrepresented in GOS metagenomes.
Integrase [2]. 3X DSBA oxidoreductase. 2X
Blue (type I) copper protein (Nmar_1161 &
_1142). Multicopper oxidase (Nmar_1135). 1
read, 1036000-1036300 bp, 63% ID, iron (metal)
dependent repressor, Block Island, NY. 3 reads,
1046700-1049300 bp, 62-69%, cytochrome c
biogenesis protein, glutaredoxin, 2-alkenal
reductase, Off Fernandina Island and Mangrove in
Isabella Island, Galapagos Island.
47 1095-1096 0.606
48 1098-1120 22.0 33.4
Underrepresented in GOS metagenomes. A 1,122
bp region recruits GOM reads at 3.7X coverage and
37.7% nucleotide identity. Integrase [3]. 1 read,
1097400-1098100 bp, 66% ID, hypothetical protein
Nmar_1201, Sargasso Sea.
49 1144-1147 3.6
50 1167-1174 7.6 34.8
Gap present in GOS metagenomes. 2X UspA
proteins. Blue (type I) copper protein
(Nmar_1273).
51 1189-1190 1.5
52 1194-1195 1.0
53 1211-1251 40.3 36.1
Underrepresented in GOS metagenomes.
Integrase [4]. UspA protein. Multicopper
oxidase (Nmar_1354). 4 reads, 1232100-
1233300 bp, 55-58% ID, diaminobutyrate--2-
oxoglutarate aminotransferase, Flamingo Pond,
Galapagos Island.
54 1255-1262 6.6 33.8
55 1304-1313 8.1 33.7
Underrepresented present in GOS metagenomes. 6
reads, 1303600-1304600 bp, 62-76% ID, TatD-
related deoxyribonuclease, transcriptional factor
CBF/NF-Y, Sargasso Sea, Block Island, NY
56 1325-1333 7.7 34.5 Gap present in GOS metagenomes.
57 1336-1344 7.7 32.2 Gap present in GOS metagenomes
58 1350-1350 0.150
59 1391-1391 0.298
60 1413-1416 2.6 41.4 Gap present in GOS metagenomes
61 1455-1456 0.928
62 1475-1476 0.398
63 1491-1498 6.2 32.4
Underrepresented present in GOS metagenomes.
Blue (type I) copper protein (Nmar_1637). 1
read, 1491000-1492100 bp, 62% ID, hypothetical
protein Nmar_1636, Block Island, NY. 2 reads,
1497000-1497800, 72-75% ID, hypothetical protein
Nmar_1639, Mangrove in Isabella Island, Galapagos
Islands
42
64 1516-1541 24.9 32.8
Underrepresented in GOS metagenomes. A 1,661
bp region recruits GOM reads at 2.6X coverage and
77.1% nucleotide identity (likely NirK, Nmar_1667).
A 779 bp region recruits GOM reads at 1X coverage
and 70.2% nucleotide identity (likely multicopper
oxidase, Nmar_1663). 2X DSBA oxidoreductase.
2X blue (type I) copper proteins (Nmar_1678
& _1665). UspA protein.
65 1542-1543 0.911
66 1555-1556 1.5
67 1612-1612 0.060
68 1615-1616 1.0
69 1625-1626 0.161
Plain text: indicates information regarding underrepresented reads that recruited from the GOM metagenome
Italics: indicates information regarding underrepresented reads that recruited from the GOS metagenomes
Bold: Genes of interest from the predicted archaeal respiratory chain.
43
Table 2.4. Comparison of genes of interest between assembled scaffolds and Ca. N.
maritimus SCM1
Scaffold ID Compared To Gene Name %NAID dS dN dN/dS
scaffold A N. maritimus SCM1
amoA - - - -
amoB 89.6 0.4204 0.0444 0.1057
amoC 88.5 0.6376 0.0146 0.0228
ribosomal
protein S3Ae 83.3 1.5964 0.0628 0.0393
protein of
unknown
function 72.6 59.285 1.6234 0.0274
geranylgeranyl
reductase 81.3 73.8197 0.5019 0.0068
norQ 81.4 2.1429 0.0685 0.032
scaffold B N. maritimus SCM1
amoA 91.2 0.3952 0.0036 0.009
amoB 89.1 0.4622 0.0459 0.0994
amoC 87.3 0.725 0.0168 0.0232
ribosomal
protein S3Ae - - - -
protein of
unknown
function 75.8 14.8162 12.314 0.8311
geranylgeranyl
reductase 80.2 72.4272 0.5119 0.0071
norQ 81.5 2.0661 0.0707 0.0342
scaffold A scaffold B
amoA -
amoB 97.3 0.2203 0.0324 0.1468
amoC 95.4 0.1165 0.0024 0.0203
ribosomal
protein S3Ae - - - -
protein of
unknown
function 80.1 1.3785 0.1714 0.1243
geranylgeranyl
reductase 90.1 0.8085 0.0315 0.0389
norQ 89.2 1.3398 0.023 0.0172
scaffold C N. maritimus SCM1
ribonucelase H - - - -
CoA-binding
protein - - - -
4
OH-butyryl-
CoA
dehydratase 86.6 1.9889 0.0283 0.0142
signal
peptidase 70.2 3.0736 0.1057 0.0344
hypotheical
protein 76.8 2.7413 0.053 0.0193
scaffold D N. maritimus SCM1
ribonucelase H 85.3 1.9208 0.0872 0.0454
CoA-binding
protein 85.5 1.7517 0.0448 0.0256
4
OH-butyryl-
CoA
dehydratase 85.1 1.8474 0.0353 0.0191
signal
peptidase 80.3 2.8659 2.1979 0.7669
hypotheical
protein 84.7 2.4258 0.0572 0.0236
scaffold C scaffold D
ribonucelase H - - - -
CoA-binding
protein - - - -
44
4
OH-butyryl-
CoA
dehydratase 91.5 0.8466 0.0314 0.0371
signal
peptidase 81.7 0.7853 0.0988 0.1258
hypotheical
protein 86.4 0.7433 0.0256 0.0344
scaffold E N. maritimus SCM1
FAD-dependent
pyridine
nucleotide
disulphide
oxidoreductase 75.5 63.2881 0.5098 0.0081
hypotheical
protein 80.9 2.2775 0.0761 0.0334
heat shock
HSP20 81.3 3.1732 1.3729 0.4327
scaffold F N. maritimus SCM1
FAD-dependent
pyridine
nucleotide
disulphide
oxidoreductase 74.6 10.6881 0.545 0.051
hypotheical
protein 81.6 2.3603 0.0698 0.0296
heat shock
HSP20 80.3 2.9081 1.2942 0.445
scaffold E scaffold F
FAD-dependent
pyridine
nucleotide
disulphide
oxidoreductase 91.1 0.7166 0.0395 0.0551
hypotheical
protein 91.6 0.7428 0.0282 0.038
heat shock
HSP20 88.3 0.4701 0.0818 0.1739
45
Figure 2.1. 3-Hydroxypropionate/4-Hydroxybutyrate cycle.
46
Figure 2.2. Recruitment ploy of GOM metagenome and GOS Phase I metagenomes against
'Ca. N. maritimus SCM1' reference genome.
47
Figure 2.3. Representation of integrase region 1 (IR1).
Ferroplasma
acidarmanus Fer1 –
Euryarchaea
Ca. Nitrosopumilus
maritimus SCM1 –
Thaumarchaea
Klebsiella pneumoniae
342 –
!-Proteobacteria
Type 1 restriction
modification system
Restriction modification
system, DNA specificity
Type 1 site specific
DNase
Viral integrase
Hypothetical protein
Appr-1-p processing
enzyme family protein
Restriction
modification
system, DNA
specificity
48
Figure 2.4. Maximum likelihood tree of a 422 bp region of the archaeal amoA gene.
49
Figure 2.5. Maximum likelihood tree of a 388 bp region of the methylmalonyl-CoA
synthetase gene.
50
Figure 2.6. MAUVE alignment of environmental 'Scaffold A' and 'Scaffold B' compared
with 'Ca. N. maritimus SCM1'.
! "
# $ % & ' % ( ( % ) * + Ca. N. maritimus
SCM1
51
Figure 2.7. MAUVE alignments of environmental 'Scaffold C' and 'Scaffold D' compared
with 'Ca. N. maritimus SCM1'.
!
!! "
# $! Ca. N. maritimus SCM1
52
Figure 2.8. MAUVE alignments of environmental 'Scaffold E' and 'Scaffold F' compared
with 'Ca. N. maritimus SCM1'.
!
" # $ % "
&
'
( ) ## $ $ *+ #, - . / / Ca. N. maritimus
SCM1
53
Figure 2.9.
54
Figure 2.10.
Chapter Three
Comparative genomics of planktonic Flavobacteriaceae from the Gulf of Maine using
environmentally derived genomes
3.1. Introduction
The Flavobacteriaceae have been indicated as major constituents of microbial communities attached to
detritus and planktonic organisms (Grossart et al., 2005; Kirchman, 2002) and through this interaction
are presumed to be important players in the “microbial loop”, breaking down large organic molecules
(e.g. proteins, chitin and other polysaccharides, etc.) and making them accessible to other microbes
(Cottrell and Kirchman, 2000). The microbial loop in the surface ocean is estimated to degrade and
recycle about 50% of surface primary production (Fuhrman and Azam, 1982). Within the microbial loop
individual species typically play specialized roles in polymeric degradation; frequently they have
specialization in expressed enzymes that target only specific compounds and are limited in the range of
compounds (Martinez et al., 1996). Therefore, a change in the substrates within an environment
influences microbial composition (Teeling et al., 2012). Many particle-associated microorganisms thrive
in limited, ephemeral niches attached to marine particles, with the plankton possibly acting as a source
from which individuals are drawn to compose future assemblages (Heidelberg et al., 2002).
Metagenomics has previously been used to reconstruct microbial genomic potential and to
predict microbial structure from diverse environments (Baker et al., 2006; DeLong et al., 2006; Engel et
al., 2012; Venter et al., 2004). Many metagenomic studies involved the reconstruction of single
organisms (or species-level populations), linking metabolic functions with phylogeny (Baker et al.,
2006; Tripp et al., 2010; Tully et al., 2012). However, organisms with highly similar 16S rRNA genes
have large genomic variations, as much as 60% dissimilarity in gene content (Coleman et al., 2006;
Tenaillon et al., 2010). These variations in genomic content play a key role in determining the
ecological niche that a given species can occupy and are likely the underpinning of microbial speciation
events (Coleman and Chisholm, 2010).
The Gulf of Maine (GoMA) is an oceanic province of high biological significance, with a high
degree of primary productivity and historically important commercial fisheries (Townsend et al., 2004)
(SI Figure 3.6). We used a metagenome from the GoMA to reconstruct four environmental genomes and
56
explore the functional capacity of Flavobacteria present in the microbial community. Each of the four
environmental genomes was a rare member of the total community; derived from organisms that
collectively compose about 1% of the total microbial metagenome. Several previous studies for which
environmental genomes were reconstructed from a metagenome required the population to be near
clonal and/or at least 5% of the microbial population (Iverson et al., 2012; Tyson et al., 2004). Previous
work has shown that microbial bacterioplankton community can change over short distances (10km)
(Hewson et al., 2006) and possess the largest difference in community structure at the 6-month temporal
scale (Fuhrman et al., 2006). Our results demonstrate a complex population structure on both spatial and
temporal scales, presenting an increased understanding of the persistence of planktonic microbial
organisms. Further, we are able to highlight genomic variations that potentially could allow for each
individual Flavobacteria to occupy specific niches within the microbial community, such as the
presence novel peptidases and glycoside hydrolases, and other ecologically relevant genes related to
specific metabolisms in the environmentally derived genomes. These sorts of comparative genomic
techniques have been predominantly performed on cultured isolates (Kettler et al., 2007; Tuanyok et al.,
2008), and have rarely been performed on environmental microbial metagenomic datasets. Lastly, we
examined the diversity inherent as part of the complex underlying population for each environmental
genomes. To date, few environmental datasets (Konstantinidis et al., 2009; Simmons et al., 2008) have
been used to access in situ population-level genomics of microorganisms combining analysis of
selection pressure and comparative genomics of related organisms that occupy the same spatial and
temporal environments.
3.2. Materials and Methods
3.2.1. Metagenome Sequencing. As previously described (Tully et al., 2012), three, 200 L water
samples were collected from 1.5-m depth from the GoMA in January and August of 2006, size-
fractionated by sequential filtering, immediately frozen in liquid N
2
and stored at -80°C. Sequencing of
the metagenome has been described previously (Tully et al., 2012) and follows established protocols
(Rusch et al., 2007). In brief, DNA was extracted from the 0.1 µm filter, inserted in to medium range
insert vectors, and sequenced via paired-end Sanger sequencing. A total of 2,827,702 reads were
generated for 6 samples, containing over 2,235 Mbp of sequence data. All sequences were subjected to
quality screening and initial assembly with the Celera assembler at the J. Craig Venter Institute (Rusch
et al., 2007).
57
3.2.2. Metagenomic binning, annotation, bin refinement and estimation of completeness and
duplication. Briefly, single copy, non-16S rRNA phylogenetic markers were searched against all
GoMA scaffolds using BLASTN (all values used in BLAST searches can be found in SI Table 3.12).
Tri-, tetra-, penta-, and hexanucleotide frequency clusters were generated for scaffold assemblies from
the GoMA metagenome. Bins with spans at least as long as previous Flavobacteriales genomes were re-
assembled using the Celera assembler and grouped using the aforementioned clustering method. The
reads compromising these scaffolds were again re-assembled with the Celera assembler. If the estimated
percent duplication (see below) was high, the sub-bin was further sub-divided until the duplication
estimation was < 10%.
Phylogenetic bins were annotated using the RAST annotation server (Aziz et al., 2008). Each
putative CDS was assigned a phylogenetic identity based on a top BLASTP hit to GenBank NR
database. If less than 60% of the CDSs on a scaffold had top hits to the phylum Bacteroidetes the
scaffold was removed from the bin. Duplicated scaffolds were identified by a BLASTP of all bin CDS
against the CDSs of Flavobacteriales sp. ALC-1 (GOLD ID: Gi01424). The smaller of the two scaffolds
were removed from the bin. Duplicated scaffolds were confirmed by progressiveMauve alignments
(Darling et al., 2010).
Estimation of bin completeness and duplication was adapted from ref. (Hess et al., 2011). Using
HMMER V3.0 (Finn et al., 2011) (E-value ! 1E-5), all CDSs from each of the genomes within the
pangenome were searched against the TIGRFAM database V12.0. The minimum number of shared
individual TIGRFAMs was determined to be the core genome. Percent completeness for each bin was
determined using the equation: est. % Complete = (No. of identified core genes ÷ No. of expected core
genes) " 100. Percent duplication was determined using a pangenome constructed for this study to
identify conserved single copy genes (CSCGs) via BLASTP and calculated using the equation: est. %
Duplication = (No. of duplicate genes ÷ No. of CSCGs) " 100.
3.2.3. Identification of catalytic domains. Catalytic domains for glycoside hydrolyses (GH) and
carbohydrate-binding modules (CBM) identified from ref. (Engel et al., 2012), and peptidases were
identified by comparing the MEROPS database (Rawlings et al., 2012) of catalytic peptidase units
58
against the PFAM V26.0 to identify the best PFAM model for each peptidase unit. These lists were used
to generate libraries from PFAM V26.0. Each library was used to search the putative CDS from each bin
using HMMER. The peptidases of each bin were compared via BLASTP against each other, the
pangenome, and NR. The predicted destinations for the peptidases was determined using PSORTb
(v3.0.2) (Yu et al., 2010), signal peptides determined by SignalP (v4.1) (Peterson et al., 2011), and
lipoprotein signals determined by LipoP (v1.0) (Juncker et al., 2003).
3.2.4. Population analysis of variable sites. Using Geneious V5.6.2 (Biomatters;
http://www.geneious.com), the underlying sequences for each scaffold were trimmed and mapped to the
respective scaffolds (see SI Materials and Methods for settings). SNPs were identified and counted for
sites with ! 4X read coverage, where at least 2 of the reads contained a mutation at that site. Alignments
were performed using CLUSTAL W and checked manually for bases with low quality scores (cutoff =
30). Base pairs without supporting neighbors were corrected to reflect the consensus base at these sites.
Putative CDS of interest were identified with coverage > 7X. Each CDS was manually checked for
different variants based on SNP patterns. For CDS with more than two variants, pairwise calculations
were made using the variant with the most supporting reads as the reference. If two variants had the
same number of supporting reads, the reference was determined to be the sequence with the highest
nucleotide similarity to the consensus. Amino acid alignments were performed using ClustalW
(Thompson et al., 1994) and the PAL2NAL (V14) (Suyama et al., 2006) software was used to align the
corresponding nucleotide sequences. Codon Table 11 (bacterial, archaeal, and plant plastid) was used for
translations. The “Remove gaps, inframe stop codons” settings was turned on. dS, dN, and dN/dS values
were computed in PAL2NAL using the codeml program within PAML (Yang, 2007).
3.2.5. Bin functionality variation. Putative CDS from a single bin were compared via BLASTP to each
other bin, the pangenome, and NR via BLASTP to identify unique genes. The two separate lists were
further reduced by identifying uniquely annotated genes and then manually curated for potential genes
of interest. progressiveMauve (Darling et al., 2010) was used to compare the scaffolds of the four
phylogenetic bins and identify large regions of synteny and homology.
3.2.6. Phylogenetic and proteorhodopsin marker trees. Putative proteorhodopsin nucleotide
sequences were identified for each bin and aligned with CLUSTAL W (Thompson et al., 1994) to
59
sequences obtained from GenBank. Putative DNA primase (DnaG), ATP-dependent DNA helicase
(RecG), DNA polymerase III, alpha subunit (DnaE), ATP-dependent DNA helicase (RecG), translation
elongation factor (EF-G) (FusA), and DNA-directed RNA polymerase subunit beta’ (RpoC) protein
sequences were identified for each bin and aligned with CLUSTAL W to sequences obtained from IMG
Genomes (Markowitz et al., 2006). It should be noted that RpoC is not as robust of a phylogenetic
marker as the corresponding subunit beta (RpoB). Alignments were trimmed, concatenated in the case of
the DnaG-FusA, DnaG-DnaE-RecG, and DnaG-RpoB trees, and a maximum likelihood tree was
constructed using PHYML with 1,000 bootstraps (Guindon et al., 2005) (see SI Materials and Methods).
3.3. Results and Discussion
3.3.1. Sequencing, community structure, assembly and binning assessment. Over 2.8 million Sanger
sequences from January (winter) and August (summer) of 2006 were analyzed (SI Fig S1, SI Table 3.6,
(Tully et al., 2012)). In total, 2,238 16S rRNA genes were identified, of which 2,213 could be classified
using the RDP classifier (see SI Materials and Methods). The Bacteroidetes represented the third largest
group in both the summer and winter (15.0% and 15.6% sequences, respectively), following the !- and
"-Proteobacteria. From within the Bacteroidetes, 75 sequences were identified as members of the class
Flavobacteria (3.4%). When viewed at higher taxonomic levels, most groups showed similar
abundances in both the summer and winter (SI Figure 3.7). As the taxonomic resolution increases, many
of the sub-phyla and sub-class populations show some seasonal variation.
In brief, the binning and assessment of the putative Flavobacterial utilized tri-, tetra-, penta-, and
hexanucleotide frequency pattern correlations (Teeling et al., 2004). Four bins of sufficient size for
analysis were confirmed to possess Flavobacterial phylogenetic markers. These bins (FlavA, FlavG,
FlavH and FlavI) were assembled, annotated, and screened for duplicated scaffolds. Based on the
conserved set of genomic functions and single copy genes, we estimated the degree of completeness to
be 37-84% and duplication 0-7% (Table 3.1; SI Table 3.7). Due to the size of some of the bins, the
number of commonly used phylogenetic markers was limited. For example, only a single bin (FlavA)
had a copy of the 16S rRNA gene, and only two bins had a copy of the 23S rRNA gene. Two
phylogenetic markers common to all four bins were DNA primase (DnaG) and ATP-dependent DNA
helicase (RecG) and several other markers were present in three of the four bins. The phylogenetic
relationship of these bins within the Flavobacteriaceae is similar for all of the marker genes; FlavA and
60
-H cluster with Flavobacteria sp. MS024-2A (Figure 3.3.1; SI Figure 3.8A-E), a single cell amplified
Flavobacteria from the GoMA (Woyke et al., 2009) and are basal to the rest of Flavobacteriaceae in
four of the five constructed trees. FlavH and -I form a more divergent clade, are still closely related to
the Flavobacteria from the GoMA, and are always basal to the FlavA, -H and MS024-2A clade. FlavH,
which had an estimated degree of duplication of 7%, possessed two copies of RecG that share 77%
amino acid identity, and is likely a result of the binning method incorporating closely related organisms
in to a single bin.
3.3.2. Spatial and temporal differentiation. The GoMA metagenome spans several sites and two
seasons, allowing for comparison within and between bins on both temporal and spatial scales. As a
whole, flavobacterial sequences are more than twice as abundant during the summer compared to the
winter (Table 3.2). Surface chlorophyll-a concentrations (and consequently standing-biomass) are higher
in the GoMA during August (SI Figure 3.6). For the individual scaffolds, sequence abundances were
similar across sites and season (SI Table 3.8). However, several outliers are apparent (SI Table 3.8; as
determined using (Iglewicz and Hoaglin, 1993)). Outliers may be due to a combination of three possible
explanations: 1) the presence of discreet seasonal ecotypes; 2) stochastic variation (i.e., random genomic
segments recovered at this sequencing depth); and, 3) a varying impact of the cloning and sequence
biases involved with Sanger sequencing methodologies. Scaffolds considered to be outliers based on
seasonal composition were not included in the niche differentiation analysis, but were included for all
other aspects of analysis.
All four groups are present in both summer and winter. FlavA is significantly more abundant in
the summer than FlavH or FlavI (T-test, p < 0.002) and conversely, FlavI is significantly more abundant
than FlavA or FlavH in the winter (T-test, p < 0.002). FlavA and FlavG are composed of sequences
primarily derived from the summer; for which FlavA is almost exclusively present at this time, while
FlavG has more presence during the winter. FlavI is made up of primarily winter sequences (abundances
in the summer, FlavA > FlavG > FlavH > FlavI).
It is unknown the degree to which Flavobacteriaceae share overlapping ecological niches or the
extent to which species specialize in substrate degradation or distinct niche conditions. Using the GoMA
metagenomic data, we explored the possibility that these organisms may temporally overlap, but are
61
spatially separated. Our data support both homogenous mixing across a large spatial-scale and spatial
separation, depending on the population. In August, FlavH had near-identical abundance at two different
sites (GoMA12 and GoMA14; 0.213% and 0.215%, respectively; p = 0.0234, 10,000 permutation T-test
using DAAG package in R), separated by 221km. FlavA was far more abundant at site GoMA12 and
much less abundant at site GoMA13 (1.218% and 0.331%, respectively; p < 0.002, 10,000 permutation
T-test DAAG package). Generally, for each season, the most abundant bin for that season is spatially
heterogeneous, while the less abundant bins tend to be more homogeneous (Table 3.2). There are at least
two possible explanations for this relationship: 1) despite being low abundance in the water column, all
of the phylogenetic bins are present in the GoMA, attached to particles and contributing to overall
community function; or, 2) the lower abundance organisms represent a background planktonic
population, while the abundant organisms vary in abundance at different sites due to substrate
availability. The data presented cannot be used to directly explore the either possibility, though it can be
used to look at how organisms competing for limited resources may selectively occupy discreet niches,
while collectively contributing to overall community function. One possible mechanism by which
organisms can alleviate competitive exclusion at the species-level or subspecies-level is through
diversification of genomic content (Martinez et al., 1996).
3.3.3. Phylogenetic Bin Functionality. Functionality is difficult to determine due to the incomplete
nature of draft genomes. Parsing bins for genes found in other Flavobacteriaceae, illustrates the patchy
nature of the metagenomic coverage; however, it does reveal some commonality of genes and functions
amongst all four bins. For example, all bins possess genes involved in gliding motility, and also have at
least one of the key genes necessary for anaplerotic carbon fixation, a process that may complement
cellular metabolism in conjunction with proteorhodopsin activity (Gonzalez et al., 2011) (SI Table 3.9).
The FlavA bin does not include an identified proteorhodopsin gene, but it may simply be in a
sequencing gap. The other three have green-light adapted proteorhodopsin (Rusch et al., 2007) (SI
Figure 3.8B). FlavG, -H, and -I lack genes related to cobalamin (vitamin B12) metabolism, while FlavA
possess an outer membrane receptor and ATP:cob(I)alamin adenosyltransferase (PduO), which converts
vitamin B12 into the coenzyme form (Johnson et al., 2001). Both FlavA and FlavG have genes
necessary for thiamin (vitamin B1) synthesis, while none were found in FlavH and FlavI. FlavA, -H, and
-I share superoxide dismutases that have a copper-zinc (Cu-Zn) coenzymatic active site, while FlavG
62
lacks an identified superoxide dismutase. FlavA also possesses a second dismutase with a manganese
coenzymatic site.
An issue in environmental studies is quantifying the degree to which genomic and proteomic
variations impact physiology and ecology. The bins are all phylogenetically related, and have many
genes in common with each other and with a marine Flavobacterial pangenome, made up of 7 publicly
available genomes (SI Table 3.10). However, it is known that slight variations in amino acid sequence
can alter specificity and substrates, and conversely, high variation can have limited impacts on function
(Ng and Henikoff, 2006). By concentrating on the coding sequences (CDSs) without high identity
homologs in other Flavobacteriaceae or the GenBank nonredundant (NR) protein database (Benson,
2004), it is possible to highlight genes that may truly impact physiology and ecology. CDSs with
ecologically relevant putative roles in to environmental genomes without significant similarity to the NR
were identified to assess possible roles in niche specialization.
When the CDSs of FlavA were compared to NR, a number of novel genes were identified that
suggests attachment to particles may be important. Three CDSs identified have possible roles in cell
adhesion: a YapH protein (a large protein family that include adhesins), a fat protein-possibly involved
in cell-cell attachment, and a CDS with multiple VCBS domains (PF13517). Further, two CDSs were
identified as containing TSP type-3 repeats, calcium binding proteins commonly found on the outer
membrane (Kvansakul et al., 2004). The nature of these genes may suggest a cell-substrate interaction
for FlavA that may be shared with the other GoMA environmental genomes.
FlavG had three novel CDSs of interest with no similarity in NR: a novel chitinase, an outer
membrane lipoprotein that is part of the RND efflux system, and calcium-binding protein of the RTX
family. All three of these genes could potentially be cell surface proteins that interact with or degrade
particulate organic matter, as chitinases must act on molecules much larger than can be transported in to
the cell (Cohen-Kupiec and Chet, 1998). Though RTX proteins have a wide-range of functions, one of
the more common functions is to act as a surface layer protein, possibly as a peptidase or lipase
(Linhartová et al., 2010). These genes suggest interaction with and degradation of particulate organic
matter.
63
Both FlavH and -I are smaller bins and have a limited number of novel CDSs. For FlavH, one of
the genes without an similar sequence in NR is a lysine exporter (LysE) that has been shown to
specifically remove excess lysine from the cellular environment (Vrljic et al., 2003). FlavI genes
without a similar sequence in NR, include a Fe
2+
siderophore transporter, suggesting diversity in iron
transport systems, a gene of unknown function related to a cartilage oligomeric matrix protein, found in
humans and some bacteria, and a glycoprotein that contains extracellular and calcium-binding domains
(Paulsson and Heinegård, 1981).
Of particular interest were both peptidases and glycoside hydrolases (GHs). These gene groups
have a large diversity of form and function in regards to degrading peptides and large carbohydrates,
respectively, and can play key roles in differentiating metabolic potential for organisms and microbial
communities.
Peptidases. Using the estimated degree of completeness and extrapolating estimates that the number of
peptidases identified in each bin was within the range commonly seen for Flavobacterial genomes (67-
149 peptidases) (Table 3.3). The function of peptidases ranges from the internal cycling of proteins
within the cell to interactions with dissolved and particulate organic matter in the cytoplasm, at the cell
exterior and in the extracellular environment. Of the identified peptidases, most (81-91%) were
identified to be cytoplasmic in nature or embedded in the cytoplasmic membrane (39-44%) or lack a
discernable final destination (36-47%). Those without a determined destination were further examined
for evidence of signal peptides (33%) or lipoprotein signals (15%), which can be used to infer a potential
to interact with substrates outside of the cell. The remaining peptidases could be assigned as either
periplasmic (2-13%) or interacting with the surrounding environment, as either outer membrane-
associated or extracellular (4-13%).
All of the bins, except for FlavI, have at least one peptidase without similarity in the NR
database (Table 3). These five peptidases were either assigned a destination to the outer membrane-
associated or extracellular (60%) or, if assigned an unknown destination, had a lipoprotein signal,
suggesting all of the unique peptidases were interacting with proteins in the environment. All of these
peptidases belong to the clan MA (CL0126) in the Pfam database. This clan consists of 52 families of
peptidases, all of which share a two Zn-dependent co-catalytic site within the motif HEXXH. One of the
64
peptidases from FlavA could be assigned more specifically to family Peptidase M28 (PF04389), which
preferentially releases basic amino acids from the N-terminal end of peptides (Chevrier et al., 1996).
Unique glycoside hydrolases (GH). The number of identified GHs for the phylogenetic bins was similar
to the numbers found in other marine Flavobacterial genomes (18-58 GHs) (Table 3). Each bin, except
FlavH, had at least 1 unique GH when compared to the NR. Three of the unique GHs were derived from
FlavA. The first unique GH contained a domain assigned to GH Family 88 (PF07470), annotated as a
rhamnogalacturonide degradation protein, indicating it may degrade pectin polysaccharides (Cantarel et
al., 2009). The second was assigned to the GH Family 18 (PF00704), which has a number of possible
functions attributed to it, such as chitinases, acetylglucoaminidases, and xylanase inhibitors (Cantarel et
al., 2009). The third was annotated as a short-chain dehydrogenase/reductase and was assigned as a GH
with a bacterial neuraminidase repeat (BNR). FlavI also contained a unique CDS with putative BNR
function. Neuraminidases specifically degrade neuraminic acids, commonly found glycoproteins on cell
surfaces (Gaskell et al., 1995). Activity of the putative neuraminidases would require the flavobacteria
populations to be attached to exterior of cell expressing such glycoproteins, further supporting the
putative role of flavobacteria attached to particles. FlavG contained a single unique GH that was
annotated as a !-L-fucosidase, which will cleave hexose deoxy sugars on cell surfaces (Cantarel et al.,
2009).
Collectively, these indicate (at the highly specific gene assignment level) types of genomic
variation that may allow different populations to occupy overlapping temporal and spatial niches and
perform non-overlapping community functions. It is possible any one of these genes may confer the
necessary genomic variation for niche stabilization. However, since these data are analogous to “draft
genomes”, it is not possible to know for certain all of the genes that make the phylogenetic bins distinct.
Nor is it possible to assign functions to putative CDSs with limited similarity to known genes, further
limiting our predictive capability. However, these unique functions do highlight possible ecological
niches that the populations may inhabit.
3.3.4. Comparative genomics. Insight in to how genomic diversification occurs can further be explored
through examining syntenic genomic regions for evidence of gene addition or deletion. There were only
a few syntenic regions between all bins that could be utilized for comparative analysis. Two regions
65
were identified based on nucleotide identity and synteny (Figure 3.3.2; Figure 3.3). These regions
further underscore the high degree of genomic variation that commonly occurs between related
organisms, and offers insights in to how the phylogenetic relationships may impact genomic content.
Homologous region 1 (HR1) is 15 genes and ~21kbp in length (Figure 3.3.2). Several of the
genes appear to have an operon-like nature and pertain to amino acid biosynthesis and other essential
cellular functions. Interspersed between the syntenic regions of HR1 reside a number of across-bin
variations. The largest variation is the inclusion of genes related to arginine biosynthesis in FlavA, -H,
and -I. Arginine biosynthesis genes are orthologous and syntenic in both FlavA and -H, however, both
of these bins contain an insertion (relative to FlavI) annotated as pyrroline-5-carboylate reductase. When
arginine biosynthesis genes are compared to homologs in the genomes of the Flavobacteriaceae used to
construct the pangenome, it can be seen that the whole cycle is syntenic for most of the organisms, with
exception of Kordia algicida OT-1, which, like FlavI, lacks pyrroline-5-carboylate reductase. However,
unlike the GoMA phylogenetic bins, the surrounding genomic architecture varies quite markedly,
including Flavobacteria sp. MS024-2A. This suggests that the arginine biosynthesis operon is conserved
in members of the Flavobacteriaceae, though its relative location in each genome can vary and that the
lack of the operon in FlavG is not indicative of a deletion, but may be the result of the rearrangement of
the entire operon.
Homologous region 2 (HR2) is far more variable in size, with 17 genes anchoring the syntenic
region, and a number of gene differences in FlavA, -G, and -H that increases the length of the region
from 31 kb in FlavI to 64 kb in FlavA (Figure 3.3). The conserved portion of HR2 indicates a role in cell
division. The variable region appears to be amenable to gene gain/loss and many of the genes present in
each bin confer potential functions that would be beneficial to the proposed marine Flavobacteriaceae
lifestyle, such as glycoside hydrolases and TonB-receptors. Interestingly, the genes within the variable
region have limited overlap in putative function between the bins. This may indicate that this type of
variable region is prone to gene gain/loss, potentially as a mechanism for incorporating horizontally
transferred functions into the genome, and like other genomic islands (Tuanyok et al., 2008), this region
is adjacent to a tRNA, specifically tRNA-Arg. The horizontally transferred genes may confer metabolic
specialization and diversification, similar to those genomic islands seen in Prochlorococcus (Kettler et
al., 2007).
66
3.3.5. Population variation. In addition to evaluating differences in genomic content, we analyzed gene
variations within the underlying population structure, using the individual reads that assembled into each
putative CDS. The large assemblies and binning protocol allows for analysis of in vivo variations in
putative CDS of an environmental population. Additionally, Sanger sequencing is not subject to the
amplification biases and higher error rates of 454 sequencing and Illumina sequencing platforms (though
cloning biases do apply). The total number of base pairs sequenced (~2Gbp) was much less than the
total genomic material in the 200L surface ocean samples (~2 Tbp, assuming 5x10
6
cells!ml
-1
and 2
Mbp!genome
-1
); therefore, overlapping sequences in assemblies are likely derived from individuals.
This type of analysis has rarely been performed on metagenomic datasets because of the low coverage of
less abundant organisms in smaller, complex environmental metagenomes (Rusch et al., 2007) and the
lack of coherent phylogenetic groups to pursue concerted effort within a single population. As such,
such analyses usually rely on genomes derived from cultured and sequenced closely related strains. As
these problems are addressed such analyses will be even more relevant for understanding variation in the
environment, as the gene variations present within microbial populations may confer an unknown degree
of niche differentiation. Many of the variant differences are synonymous, but the non-synonymous
changes may have significant effects on protein function and activity.
Putative CDSs were identified with coverage similar to that of a draft genome (> 7X coverage)
in all of the phylogenetic bins. The 93 CDSs (1.9% of predicted CDS) with sufficient coverage were
manually accessed for single nucleotide polymorphisms (SNPs) that could used to differentiate between
different gene variants (Table 3.4). 33 of the CDSs had no SNPs and represented genes for which only
one variant was detectable in the environment. A further 16 CDSs had at least one SNP, but variants
could not be determined due to a number of factors, such as limited coverage across the length of the
gene. For the remaining 44 CDS, 101 different gene variants were generated. One example of a CDS
with high coverage is phospho-N-acetylmuramoyl-pentapeptide transferase in FlavG (Figure 3.4). Three
distinct variant tracks that occurred along a 449-bp length of the gene (1,215-bp) were identified through
SNP analysis. Variant-1 was the shortest region reconstructed and was composed of 2 summer clones.
Variant-2 and Variant-3 are composed of 4 clones. All of the clones in Variant-3 were winter clones,
while Variant-2 had clones from both seasons. Across the region covered by the three variants there
67
were 47 synonymous and 13 nonsynonymous SNPs in at least one sequence. In total, 37 variants were
generated that were composed of only summer reads, 23 variants with only winter reads, and 42 variants
with reads from both seasons. A majority of the CDSs analyzed in this way (64%) had at least one
variant with reads from both seasons. These data indicates that the underlying population structure of the
studied Flavobacteriaceae has a complex nature, with gene variants persistent in both seasonal time
points competing with seasonal specific variants. How these variations influence genomic function
cannot be parsed from the current information, but the variant can be used to assess in situ selective
pressure within the population.
The presence of multiple closely related populations within each phylogenetic bin allows for a
comparison of selection pressures using dN/dS values. dN/dS is calculated based on the number of non-
synonymous substitutions per non-synonymous site to the number of synonymous substitutions per
synonymous site. Differences can be used as an indicator of selective pressure acting on a protein-
coding gene. Further, dS can be used to approximate the time since divergence, if there is the
assumption of a constant mutation rate across all genes (Rocha et al., 2006). Konstantinidis et al.
examined an environmental metagenome to determine dN/dS values for different depths in the water
column (Konstantinidis et al., 2009), and Simmons et al. has been able to use the well characterized
microbial communities from the Iron Mountain acid mine drainage system to study selection in
individual species in that environment (Simmons et al., 2008). Still, little is known about selection in
environmental genomes, as numerous genomes are required to decipher meaning.
Using our data, dN/dS values were calculated for each variant pair to determine the selective
pressure on each gene (in the cases with CDS that had more than two variants, pairwise comparisons
were made against the variant with the most underlying sequences) (Figure 3.5). All but one CDS appear
to be under purifying selection (dN/dS < 1). The single CDS under positive selection was annotated as a
hypothetical protein, 114-bp in length, and no significant similarity in the NR, suggesting it may just be
an artifact of the annotation process. Several CDSs from FlavG and -H have a dS value >2.0, a cutoff
generally used to classify genes that are saturated in synonymous mutations, such that the estimate of dS
is inaccurate (Rocha et al., 2006). The sharp distinction between these CDSs and most of the other
variants (dS < 0.9) may suggest that CDS with dS > 2.0 have a disparate gene history compared with
rest of the genome, potentially the result of horizontal gene transfer and recombination. Many of the rest
68
of the CDS have dS values < 0.3, indicating a short time since divergence and low dN/dS value (< 0.4)
supporting removal of nonsynonymous mutations from the population. Building our understanding of
how selection functions in the environment
3.4. Conclusion. The analysis of the Flavobacterial component of the GoMA metagenome has allowed
us to begin utilizing comparative genomic techniques on populations at and below the 1% abundance
level. This study has been used to understand how community and population diversity are separated
both spatially and temporally in the planktonic environment. Using comparative genomics, it is possible
to identify potentially key differences in the gene content of these understudied Flavobacterial
populations such as, unique iron transporters (FlavI) and specialization in cell attachment (FlavA) and
polymer exploitation (FlavG). Further, we were able to see similar gene content over large segments of
syntenic regions for all four bins, including the identification of the complete urea cycle, which may
impact the GoMA on large-scale, and identified a genomic site that may play host to the acquisition of
new functions. We were able to start examining the selection pressures on specific genes and could see
evidence of recombination between populations. The transition of metagenomics as hypothesis-
generating technique towards a technique that analyzes putative functions of organisms and pursues
greater understanding of gene flow and evolution will be important in converting metagenomics in to a
hypothesis-testing technique.
3.5. Acknowledgements. This work was supported by the National Science Foundation Microbial
Sequencing Grant 0412119. The authors gratefully acknowledge NOAA ecosystem process division
scientists Jon Hare and Jerry Prezario for ship time on NOAA Fisheries R/V Delaware II (Cruise No.
DE 06-02) and R/V Albatross IV (Cruise No. AL 06-07). We thank Dr. Shannon Williamson for
collecting samples. We thank Robert Friedman, and Yu-Hui Rogers for technical and scientific support
in the sequencing efforts. We thank Matt Lewis and Dr Aaron Halpern, who processed the GoMA
samples. We would also like to thank Dr. Jason Sylvan for reviewing the manuscript before submission
and offering constructive suggestions.
3.6. Unit and Abbreviations
Gulf of Maine = GoMA
coding sequence = CDS
69
conserved single-copy gene = CSCG
nonredundant = NR
thrombospondin = TSP
repeats-in-toxin = RTX
glycoside hydrolases = GH
bacterial neuraminidase repeat = BNR
3.7. Sequence data
Raw reads can be found in the NCBI GenBank Trace Archive (TA) ID No. 2307942905-2310786347
Scaffolds of putative Flavobacteria will be deposited after review and acceptance.
3.8. References
Aziz R, Bartels D, Best A, Dejongh M, Disz T, Edwards R et al (2008). The RAST Server: Rapid
Annotations using Subsystems Technology. BMC Genomics 9: 75.
Baker B, Tyson G, Webb R, Flanagan J, Hugenholtz P, Allen E et al (2006). Lineages of Acidophilic
Archaea Revealed by Community Genomic Analysis. Science 314: 1933-1935.
Benson D (2004). GenBank. Nucleic Acids Res 33: D34-D38.
Cantarel B, Coutinho P, Rancurel C, Bernard T, Lombard V, Henrissat B (2009). The Carbohydrate-
Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 37: D233-
D238.
Chevrier B, D'Orchymont H, Schalk C, Tarnus C, Moras D (1996). The structure of the Aeromonas
proteolytica aminopeptidase complexed with a hydroxamate inhibitor. Eur J Biochem 237: 393-398.
Cohen-Kupiec R, Chet I (1998). The molecular biology of chitin digestion. Curr Opin Biotechnol 9:
270-277.
Coleman ML, Chisholm SW (2010). Ecosystem-specific selection pressures revealed through
comparative population genomics. Proc Natl Acad Sci USA 107: 18634-18639.
Coleman ML, Sullivan M, Martiny A, Steglich C, Barry K, DeLong E et al (2006). Genomic islands and
the ecology and evolution of Prochlorococcus. Science 311: 1768-1770.
Cottrell M, Kirchman D (2000). Natural assemblages of marine proteobacteria and members of the
Cytophaga-Flavobacteria cluster comsuming low- and high-molecular-weight dissolved organic matter.
Appl Environ Microbiol 66: 1692-1697.
70
Darling AE, Mau B, Perna NT (2010). progressiveMauve: Multiple Genome Alignment with Gene
Gain, Loss and Rearrangement. PLoS ONE 5: 1-17.
DeLong E, Preston C, Mincer T, Rich V, Hallam S, Frigaard N et al (2006). Community genomics
among stratified microbial assemblages in the ocean's interior. Science 311: 496-503.
Engel P, Martinson V, Moran N (2012). Functional diversity within the simple gut microbiota of the
honey bee. Proc Natl Acad Sci USA 109: 11002-11007.
Finn R, Clements J, Eddy S (2011). HMMER web server: interactive sequence similarity searching.
Nucleic Acids Res 39: W29-W37.
Fuhrman J, Azam F (1982). Thymidine incorporation as a measure of heterotrophic bacterioplankton
production in marine surface waters: Evaluation and field results. Mar Biol 66: 109-120.
Fuhrman J, Hewson I, Schwalbach M, Steele J, Brown M, Naeem S (2006). Annually reoccurring
bacterial communities are predictable from ocean conditions. Proc Natl Acad Sci USA 103: 13104-
13109.
Gaskell A, Crennell S, Taylor G (1995). The three domains of a bacterial sialidase: a !-propeller, an
immunoglobulin module and a galactose-binding jelly-roll. Structure 3: 1197-1205.
Gonzalez J, Pinhassi J, Fernandez-Gomez B, Coll-Llado M, Gonzalez-Velazquez M, Puigbo P et al
(2011). Genomics of the Proteorhodopsin-Containing Marine Flavobacterium Dokdonia sp. Strain
MED134. Appl Environ Microbiol 77: 8676-8686.
Grossart H, Levold F, Allgaier M, Simon M, Brinkhoff T (2005). Marine diatom species harbour distinct
bacterial communities. Environ Microbiol 7: 860-873.
Guindon S, Lethiec F, Duroux P, Gascuel O (2005). PHYML Online--a web server for fast maximum
likelihood-based phylogenetic inference. Nucleic Acids Res 33: W557-W559.
Heidelberg J, Heidelberg K, Colwell R (2002). Bacteria of the "-Subclass Proteobacteria Associated
with Zooplankton in Chesapeake Bay. Applied and Environmental Microbiology 68: 5498-5507.
Hess M, Sczyrba A, Egan R, Kim T, Chokhawala H, Schroth G et al (2011). Metagenomic Discovery of
Biomass-Degrading Genes and Genomes from Cow Rumen. Science 331: 463-467.
Hewson I, Steele J, Capone D, Fuhrman J (2006). Temporal and spatial scales of variation in
bacterioplankton assemblages of oligotrophic surface waters. Mar Ecol Prog Ser 311: 67-77.
Iglewicz B, Hoaglin D (1993). Volume 16: How to Detect and Handle Outliers. In: Mykytka EF (ed).
The ASQC Basic References in Quality Control: Statistical Techniques.
Iverson V, Morris R, Frazar C, Berthiaume C, Morales R, Armbrust E (2012). Untangling Genomes
from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science 335: 587-590.
71
Johnson CL, Pechonick E, Park SD, Havemann GD, Leal NA, Bobik TA (2001). Functional genomic,
biochemical, and genetic characterization of the Salmonella pduO gene, an ATP: cob I)alamin
adenosyltransferase gene. Journal of Bacteriology 183: 1577-1584.
Juncker A, Willenbrock H, von Heijne G, Nielsen H, Brunak S, Krogh A (2003). Prediction of
lipoprotein signal peptides in Gram-negative bacteria. Protein Sci 12: 1652-1662.
Kettler G, Martiny A, Huang K, Zucker J, Coleman M, Rodrigue S et al (2007). Patterns and
Implications of Gene Gain and Loss in the Evolution of Prochlorococcus. PLoS Genetics 3: e231.
Kirchman D (2002). The ecology of Cytophaga-Flavobacteria in aquatic environments. FEMS Microbiol
Ecol 39: 91-100.
Konstantinidis K, Braff J, Karl DM, DeLong E (2009). Comparative metagenomic analysis of a
microbial community from 4000 m at Station ALOHA in the North Pacific Subtropical Gyre. Appl
Environ Microbiol 75: 5345-5355.
Kvansakul M, Adams J, Hohenester E (2004). Structure of a thrombospondin C-terminal fragment
reveals a novel calcium core in the type 3 repeats. EMBO J 23: 1223-1233.
Linhartová I, Bumba L, Ma!ín J, Basler M, Osi"ka R, Kamanová J et al (2010). RTX proteins: a highly
diverse family secreted by a common mechanism. FEMS Microbiol Rev 34: 1076-1112.
Markowitz V, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A et al (2006). The integrated
microbial genomes (IMG) system. Nucleic Acids Res 34: D344-D348.
Martinez J, Smith D, Steward G, Azam F (1996). Variability in ectohydrolytic enzyme activities of
pelagic marine bacteria and its significance for substrate processing in the sea. Aquat Micro Ecol 10:
223-230.
Ng P, Henikoff S (2006). Predicting the Effects of Amino Acid Substitutions on Protein Function. Annu
Rev Genom Human Genet 7: 61-80.
Paulsson M, Heinegård D (1981). Purification and structural characterization of a cartilage matrix
protein. Biochem J 197: 367.
Peterson T, Brunak S, von Heijne G, Nielsen H (2011). SignalP 4.0: discriminating signal peptides from
transmembrane regions. Nat Methods 8: 785-786.
Rawlings N, Barrett A, Bateman A (2012). MEROPS: the database of proteolytic enzymes, their
substrates and inhibitors. Nucleic Acids Res 40: D343-D350.
Rocha E, Smith J, Hurst L, Holden M, Cooper J, Smith N et al (2006). Comparisons of dN/dS are time
dependent for closely related bacterial genomes. J Theor Biol 239: 226-235.
72
Rusch D, Halpern AL, Sutton G, Heidelberg K, Williamson S (2007). The Sorcerer II Global Ocean
Sampling expedition: Northwest Atlantic through eastern. PLoS Biology 5: 398-431.
Simmons S, Dibartolo G, Denef V, Goltsman D, Thelen M, Banfield J (2008). Population Genomic
Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage
Formation. PLoS Biology 6: e177.
Suyama M, Torrents D, Bork P (2006). PAL2NAL: robust conversion of protein sequence alignments
into the corresponding codon alignments. Nucleic Acids Res 34: W609-W612.
Teeling H, Fuchs B, Becher D, Klockow C, Gardebrecht A, Bennke C et al (2012). Substrate-Controlled
Succession of Marine Bacterioplankton Populations Induced by a Phytoplankton Bloom. Science 336:
608-611.
Teeling H, Meyerdierks A, Bauer M, Amann R, Glöckner F (2004). Application of tetranucleotide
frequencies for the assignment of genomic fragments. Environ Microbiol Rep 6: 938-947.
Tenaillon O, Skurnik D, Picard B, Denamur E (2010). The population genetics of commensal
Escherichia coli. Nat Rev Micriobiol 8: 207-217.
Thompson J, Higgins D, Gibson T (1994). CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting, position-specific gap penalties and weight
matrix choice. Nucleic Acids Res 22: 4673-4680.
Townsend DW, Thomas AC, Mayer LM, Thomas MA, Quinlan JA (2004). Oceanography of the
Northwest Atlantic Continental Shelf. In: Robinson AR and Brink KH (eds). The Sea. Harvard
University Press: Cambridge, MA. pp 119-168.
Tripp HJ, Bench SR, Turk KA, Foster RA, Desany BA, Niazi F et al (2010). Metabolic streamlining in
an open-ocean nitrogen-fixing cyanobacterium. Nature 464: 90-94.
Tuanyok A, Leadem BR, Auerbach RK, Beckstrom-Sternberg SM, Beckstrom-Sternberg JS, Mayo M et
al (2008). Genomic islands from five strains of Burkholderia pseudomallei. BMC Genomics 9: 1-14.
Tully B, Nelson W, Heidelberg J (2012). Metagenomic analysis of a complex marine planktonic
thaumarchaeal community from the Gulf of Maine. Environ Microbiol 14: 254-267.
Tyson G, Chapman J, Hugenholtz P, Allen E, Ram R, Richardson P et al (2004). Community structure
and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37-43.
Venter JC, Remington K, Heidelberg J, Halpern AL, Rusch D, Eisen J et al (2004). Environmental
genome shotgun sequencing of the Sargasso Sea. Science 304: 66-74.
Vrljic M, Sahm H, Eggeling L (2003). A new type of transporter with a new type of cellular function: L-
lysine export from Corynebacterium glutamicum. Mol Microbiol 22: 815-826.
73
Woyke T, Xie G, Copeland A, González J, Han C, Kiss H et al (2009). Assembling the marine
metagenome, one cell at a time. PLoS ONE 4: 1-10.
Yang Z (2007). PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol 24: 1586-
1591.
Yu N, Wagner J, Laird M, Melli G, Rey S, Lo R et al (2010). PSORTb 3.0: Improved subcellular
localization prediction with refined localization subcategories and predictive capabilities for all
prokaryotes. Bioinformatics 26: 1608-1615.
74
SI Materials and Methods
16S rRNA assignments. The unassembled metagenome was searched with BLASTN for 16S rRNA
gene homologs against a database of rRNA sequences (Altschul et al., 1997). Reads containing at least
250 bp 16S rRNA sequences were subjected to the Ribosomal Database Project (RDP) classifier (Cole
et al., 2009; Wang et al., 2007).
Metagenomic binning. Single copy, non-16S SSU rRNA phylogenetic markers, as identified in ref.
(Santos and Ochman, 2004) from Flavobacterium psychrophilum JIP02, were BLASTN against all
GOM scaffolds. All scaffolds were then grouped using an oligonucleotide frequency calculator. In brief,
all scaffolds >3,000 bp in length had tri-, tetra-, penta-, and hexanucleotide frequencies determined.
Sequences were clustered using a hierarchical clustering method. Scaffolds with phylogenetic markers
were used to identify all neighboring scaffolds with a Pearson’s correlation !0.90, scaffolds grouped in
this way were considered phylogenetically similar bins. Bins with spans at least as long as previous
Flavobacteriales genomes (>1.9 Mbp) (n=3) were re-assembled using the Celera assembler (see below)
and all scaffolds >5,000 bp were grouped using the oligonucleotide frequency calculator. Branching
groups of scaffolds were initially identified as a sub-bin, if the sub-bin contained >10,000 reads. The
reads compromising these scaffolds were re-assembled with the Celera assembler. If the estimated
percent duplication (see below) was high, the sub-bin was further sub-divided until the duplication
estimation was <10%. Putative phylogeny was determined by identifying the 16S SSU rRNA gene. If
sub-bin had a 16S rRNA identity not within the Bacteriodetes (Flavobacteriales) it was excluded from
further analysis.
Annotation and bin refinement. Phylogenetic sub-bins (further referred to as bins) were annotated
using the RAST annotation server (Aziz et al., 2008). The accuracy of the scaffolds within each bin was
further refined using a process similar to ref. (Tully et al., 2012). In brief, each putative coding sequence
(CDS) was assigned a phylogenetic identity based on a top BLASTP hit to GenBank NR database. If the
percent of CDS on a scaffold with a top hit to the phylum Bacteriodetes was <60% the scaffold was
removed from the bin. True, duplicated scaffolds were identified by a BLASTP of all bin CDS against
the CDS of Flavobacteriales sp. ALC-1. If the smaller of the two scaffolds had !85% similar genes to a
75
longer scaffold, the shorter scaffold was removed from the bin. All duplicated scaffolds were confirmed
by progressiveMauve alignments.
Estimation of bin completeness and duplication. Estimation of bin completeness and duplication was
adapted from ref. (Engel et al., 2012). Seven marine-related Flavobacteriales genomes (SI Table S5)
were identified to develop a pangenome related to the bins. Using HMMER V3.0 (Finn et al., 2011) (E-
value ! 1E-5), all CDS from each of the seven genomes within the pangenome were searched against the
TIGRFAM database V12.0 (Haft, 2003). For each TIGRFAM match common in all genomes, a
minimum shared number for each individual TIGRFAM match was determined to be the core genome.
The core genome consisted of 665 unique TIGRFAM models, and 799 total occurrences. Percent
completeness for each bin was determined using the equation:
!
est.%complete =
No.of identified core genes
No.of expected core genes
"100
The same pangenome was used to determine the minimum number of single copy genes via BLASTP,
and 199 conserved single copy genes (CSCGs) were identified. Percent duplication within a bin was
determine via a BLASTP search of the CSCGs against all putative CDS and determined using the
equation:
!
est.% duplication =
No.of duplicate genes
No.of CSCGs
"100
Celera Assembler Settings. The following settings were used in the specification file to assemble reads
identified in each bin and, subsequently, each sub-bin:
utgErrorRate=0.18
ovlErrorRate=0.20
cnsErrorRate=0.20
cgwErrorRate=0.20
merSize=14
doExtendClearRanges=2
doOverlapTrimming=0
doResolveSurrogates=1
doFragmentCorrection=0
utgBubblePopping=1
76
utgGenomeSize=
merylMemory = 4000
merylThreads = 2
overlapper = ovl
ovlMemory = 4GB --hashload 0.7 --hashstrings 60000
ovlThreads = 2
ovlHashBlockSize = 180000
ovlRefBlockSize = 4000000
unitigger=bog
cnsConcurrency=6
fakeUIDs = 1
astatLowBound = -20
Geneious Assembly and SNP determination Settings. Using Geneious V5.6.2, all reads used to
construct scaffolds were trimmed based on quality using an Error Probability Limit set at 0.015 from the
5’ and 3’ ends of each read. Geneious describes Error Probability Limit as, “Trim bases up until the
point where trimming further bases will improve the error rate by less than the limit”. The reads for each
individual scaffold were then assembled against the scaffold. Geneious settings were the default when
Sensitivity = Medium Sensitivity and Fine Tuning = Maximum. Defaults settings were as such:
Allow Gaps Max. per read = 15%
Word length = 14
Ignore words repeated more than 10 times
Max. gap = 50
Index Word Length = 12
Max. ambiguity = 4
Geneious was used to determine possible locations for SNPs. The SNPs identified using this feature
were further refined using the method described in Methods and Materials. The setting used were:
Minimum coverage: 2
Minimum Variant Frequency: 0.2
Analyze effect of polymorphisms on translations: Bacterial
Only find SNPs
77
Don’t find variations in annotation types: Editing History Deletion
PHYML, Maximum likelihood phylogenetic tree Settings. PHYML trees were built using the JTT
model and a computed using 1,000 bootstraps. The version of PHYML used was part of Geneious
V5.6.2 and had the default settings of:
Transition/Transversion ratio: 4 – Estimated
Proportion of invariable sites: 0 – Fixed
Number of substitutions rate categories:1
Gamma distribution parameter: 0 – Estimated
Topology search: NNI
References.
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W et al (1997). Gapped BLAST and PSI-
BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-402.
Aziz R, Bartels D, Best A, Dejongh M, Disz T, Edwards R et al (2008). The RAST Server: Rapid
Annotations using Subsystems Technology. BMC Genomics 9: 75.
Cole J, Wang Q, Cardenas E, Fish J, Chai B, Farris R et al (2009). The Ribosomal Database Project:
improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37: D141-D145.
Engel P, Martinson V, Moran N (2012). Functional diversity within the simple gut microbiota of the
honey bee. Proc Natl Acad Sci USA 109: 11002-11007.
Finn R, Clements J, Eddy S (2011). HMMER web server: interactive sequence similarity searching.
Nucleic Acids Res 39: W29-W37.
Haft D (2003). The TIGRFAMs database of protein families. Nucleic Acids Res 31: 371-373.
Santos S, Ochman H (2004). Identification and phylogenetic sorting of bacterial lineages with
universally conserved genes and proteins. Environ Microbiol 6: 754-759.
Tully B, Nelson W, Heidelberg J (2012). Metagenomic analysis of a complex marine planktonic
thaumarchaeal community from the Gulf of Maine. Environ Microbiol 14: 254-267.
Wang Q, Garrity G, Tiedje J, Cole J (2007). Naive Bayesian classifier for rapid assignment of rRNA
sequence into the new bacterial taxonomy. Appl Environ Microbiol 73: 5261-5267.
78
Table 3.1. Summary of the assembly data
Bin Name
Total Size
(bps)
No. of
scaffolds
Perc.
Complete
Perc.
Duplication
Approx.
Genome
Size (bp)
FlavA 2,220,739 19 83.60 0.00 2,656,386
FlavG 703,361 4 36.80 2.01 1,872,889
FlavH 1,453,303 15 57.20 7.04 2,361,871
FlavI 799,420 10 37.92 0.00 2,108,175
79
Table 3.2. Data pertaining to the seasonality and spatial heterogeneity of sequences from
each bin.
Table 2. Data pertaining to the seasonality and spatial heterogeneity of sequences from each bin.
Perc. of reads in
assemblies by
seasons* Perc. Abundance of Reads by Season and Sampling Location
#
Bin Winter Summer Winter Summer GOM03 GOM04 GOM06 GOM12 GOM13 GOM14
FlavA 6.52 93.47 0.044 0.631 0.054 0.039 0 1.218 0.331 0.678
FlavG 20.27 79.72 0.041 0.162 0.047 0.038 0.019 0.175 0.155 0.133
FlavH 40.74 59.25 0.099 0.144 0.112 0.093 0.099 0.213 0.108 0.215
FlavI 60.65 39.34 0.137 0.088 0.112 0.148 0.159 0.123 0.071 0.061
TOTAL -- -- 0.321 1.025 0.325 0.318 0.277 1.729 0.665 1.087
* Percentage of the total number of sequences that compose all assemblies within each bin by season.
# Percent abundance of sequences identified in each bin from corresponding libraries. (No. of sequences from bin
÷ total sequences from library [or season])
80
Table 3.3. The number of peptidases and glycoside hydrolases identified within each
phylogenetic bin.
Bin
Name
Total
Peptidases
Cytoplasmic-
Related
Peptidases
Periplasmic
Peptidases
Cell
Exterior
Peptidases*
Unknown
Peptidases
Unique
Peptidases
#
Total
Glycoside
hydrolases
Unique
Glycoside
hydrolases
#
FlavA 57 25 1 4 27 2 33 3
FlavG 23 9 3 1 10 1 7 1
FlavH 52 23 3 7 19 2 9 0
FlavI 23 9 3 1 10 0 14 1
81
Table 3.4. Breakdown of variant data for each phylogenetic bin.
Bin
Name
No. of CDS with
>7X coverage
No. of CDS
without SNPs
No. of CDS
with multiple
variants
No. of winter
only variants
No. of summer
only variants
No. of variants
composed of
both seasons
FlavA 43 24 13 1 22 4
FlavG 24 5 15 4 12 20
FlavH 11 1 9 5 2 14
FlavI 11 3 7 13 0 4
82
Table 3.5. Site summary data
Site Name Date Sampled Latitude Longitude Temp. °C Salinity, ppt
Nunber of
sequences
Insert Size
Range (kb)
GoMA03 25-Jan-06 42° 46' 9" N 68° 40' 8" W 6.4 33.0 453,805 3-5 / 2-3
GoMA04 27-Jan-06 44° 07' 5" N 67° 58' 3" W 5.1 32.2 957,737 4-6
GoMA06 30-Jan-06 41° 28' 7" N 69° 6' 0" W 6.2 32.6 10,040 4-6
GoMA12 25-Aug-06 41° 08' 6" N 66° 53' 3" W 17.4 32.0 470,591 6-8
GoMA13 28-Aug-06 43° 23' 3" N 67° 41' 9" W 16.6 32.0 925,793 6-8 / 8-10
GoMA14 29-Aug-06 42° 21' 6" N 69° 23' 8" W 19.6 31.3 9,728 6-8
Site location
83
Table 3.6. Data pertaining to assembly efficiency
Bin Name N50
No. of reads
in bin
Annotated
Putative CDS
No. of
peptidases
No. of
Glycoside
Hyrdrolases
No. of unique
peptidase -
bin-to-bin
No. of unique
peptidase -
including
pangenome
No. of unique GH -
bin-to-bin
FlavA 133,660 9,910 2,168 57 33 24 5 28
FlavG 129,191 3,452 747 22 7 7 3 4
FlavH 174,151 5,832 1,283 51 8 22 2 3
FlavI 93,775 3,434 772 22 13 4 0 13
84
Table 3.7. Data pertaining to the seasonality and spatial heterogeneity of sequences from
each scaffold within each bin.
Bin Scaffold ID Winter Summer Winter Summer GOM03 GOM04 GOM06 GOM12 GOM13 GOM14
scf6364 8.129 91.87 0.003 0.035 0.004 0.002 0.000 0.066 0.020 0.000
scf6365 7.967 92.03 0.006 0.070 0.006 0.006 0.000 0.130 0.040 0.082
scf6366 5.248 94.75 0.000 0.017 0.000 0.001 0.000 0.032 0.010 0.010
scf6368 5.71 94.28 0.001 0.027 0.001 0.001 0.000 0.055 0.014 0.000
scf6369 6.539 93.46 0.005 0.074 0.006 0.004 0.000 0.145 0.038 0.061
scf6370 4.198 95.8 0.001 0.044 0.002 0.001 0.000 0.086 0.023 0.071
scf6371 6.55 93.44 0.001 0.027 0.002 0.001 0.000 0.054 0.013 0.020
scf6372 10.26 89.73 0.000 0.008 0.000 0.001 0.000 0.017 0.004 0.020
scf6374 7.07 92.92 0.000 0.012 0.001 0.000 0.000 0.025 0.006 0.000
scf6375 7.465 92.53 0.001 0.023 0.001 0.001 0.000 0.041 0.014 0.020
scf6376 4.427 95.57 0.001 0.030 0.001 0.001 0.000 0.054 0.017 0.082
scf6377 6.032 93.96 0.002 0.035 0.001 0.002 0.000 0.074 0.014 0.061
scf6378 4.907 95.09 0.000 0.008 0.001 0.000 0.000 0.015 0.004 0.000
scf6380 6.125 93.87 0.001 0.020 0.002 0.000 0.000 0.037 0.011 0.000
scf6381 6.673 93.32 0.001 0.023 0.002 0.001 0.000 0.048 0.010 0.092
scf6382 8.518 91.48 0.003 0.033 0.004 0.002 0.000 0.066 0.017 0.061
scf6383 7.257 92.74 0.001 0.022 0.002 0.001 0.000 0.042 0.012 0.041
scf6384 6.076 93.92 0.006 0.105 0.009 0.005 0.000 0.209 0.053 0.051
scf6385 3.502 96.49 0.000 0.007 0.000 0.000 0.000 0.016 0.003 0.000
scf5919 22.29 77.7 0.028 0.099 0.031 0.027 0.019 0.101 0.099 0.051
scf5920 17.02 82.97 0.005 0.025 0.007 0.004 0.000 0.030 0.022 0.061
scf5921 16.74 83.25 0.007 0.037 0.008 0.007 0.000 0.043 0.034 0.020
scf5923* 58.53 41.46 0.019 0.013 0.015 0.020 0.059 0.028 0.006 0.000
scf1523 43.53 56.46 0.006 0.009 0.005 0.007 0.000 0.013 0.006 0.020
scf1524 43.03 56.96 0.014 0.019 0.012 0.015 0.019 0.028 0.014 0.000
scf1525 39.43 60.56 0.003 0.005 0.005 0.003 0.000 0.007 0.004 0.000
scf1526 35.91 64.08 0.001 0.002 0.001 0.001 0.019 0.001 0.002 0.010
scf1527 50.06 49.93 0.005 0.005 0.006 0.004 0.000 0.006 0.004 0.000
scf1528 56.87 43.12 0.002 0.001 0.003 0.001 0.000 0.001 0.002 0.000
scf1530 30.33 69.66 0.004 0.009 0.004 0.003 0.000 0.014 0.006 0.020
scf1531 40.11 59.88 0.022 0.033 0.025 0.021 0.019 0.053 0.022 0.082
scf1532 36.7 63.29 0.002 0.004 0.002 0.002 0.000 0.007 0.002 0.000
scf1534* 97.01 2.982 0.060 0.001 0.087 0.047 0.000 0.000 0.002 0.000
scf1535* 96.49 3.509 0.019 0.000 0.031 0.014 0.000 0.000 0.000 0.000
scf1538 39.22 60.77 0.020 0.031 0.025 0.018 0.019 0.045 0.024 0.061
scf1539 40.07 59.92 0.011 0.017 0.015 0.009 0.019 0.025 0.013 0.000
scf1541* 94.86 5.135 0.060 0.003 0.092 0.045 0.019 0.002 0.003 0.000
scf1542 45.49 54.5 0.003 0.004 0.004 0.003 0.000 0.007 0.002 0.020
scf6040 19.74 80.25 0.010 0.042 0.011 0.009 0.039 0.046 0.040 0.061
scf6043 75.79 24.2 0.009 0.002 0.006 0.010 0.000 0.004 0.002 0.000
scf6044 78.67 21.32 0.006 0.001 0.001 0.008 0.039 0.003 0.001 0.000
scf6045 58.46 41.53 0.007 0.005 0.009 0.007 0.000 0.008 0.004 0.000
scf6046 50.89 49.1 0.013 0.013 0.011 0.014 0.000 0.026 0.006 0.000
scf6047 98.35 1.648 0.025 0.000 0.034 0.021 0.000 0.000 0.000 0.000
scf6049 48.52 51.47 0.009 0.010 0.008 0.010 0.000 0.020 0.005 0.000
scf6050 59.36 40.63 0.004 0.003 0.003 0.004 0.019 0.005 0.001 0.000
scf6053 86.75 13.24 0.023 0.003 0.012 0.028 0.019 0.003 0.003 0.000
scf6054 84.63 15.36 0.025 0.004 0.012 0.031 0.039 0.003 0.005 0.000
FlavI
By season Percent (%) Abundance of Reads by Season and Sampling Location
FlavA
FlavG
FlavH
85
Table 3.8. Functions of interest.
Gene/Function of
Interest FlavA FlavG FlavH FlavI
Gliding motility GldFGHJDCBI GldJBCGFI GldJB GldJCB, GldJ
Anaplerotic + + + +
Vitamin B12 + - - -
Vitamin B1 + - - +
Proteorhodopsin - + + +
Superoxide
dismutase Mn & Cu-Zn Cu-Zn Cu-Zn -
86
Table 3.9. The seven genomes used in the pangenome.
Organisms in pangenome Reference
Maribacter sp. HTCC2170 Oh HM, Kang I, Yang SJ, Jang Y, Vergin KL, et al (2011)
Kordia algicida OT-1 Lee HS, Kang SG, Kwon KK, Lee JH, Kim SJ (2011)
Gramella forsetii KT0803 Bauer M, Kube M, Teeling H, Richter M, Lombardot T, et al (2006)
Flavobacteriales sp. ALC-1 Publicly available. GOLD ID in IMG Database Gi01424
Flavobacteria MS024-2A Woyke T, Xie G, Copeland A, Gonzalez J, Han C, et al (2009)
Robiginitalea biformata HTCC2501 Oh HM, Giovannoni SJ, Lee K, Ferriera S, Johnson J, Cho JC (2009)
Dokdonia donghaensis MED134 Gonzalez JM, Pinhassi J, Fernandez-Gomez B, Coll-Llado M, Gonzalez-Velaquez M, et al (2011)
87
Table 3.10. The genes identified in each bin.
Name/Bin Scaffold ID No. of Tracks Gene Length Shortest Variant Length dN dS dN/dS No. summer reads No. winter reads
FlavH
Translation elongations Factor p 1531 3 567 501
Track1 2 1
Track2 0.005 0.1616 0.0307 0 2
Track3 0.01 0.0699 0.143 2 0
SSU ribosomal protein S4p 1538 3 606 603
Track1 3 1
Track2 0.0063 0.1245 0.0502 1 1
Track3 0.0021 0.1187 0.0177 2 2
Protein-L-isoaspartate O-methyltransferase 1534 2 642 642
Track1 0.0001 0.0523 0.001 0 3
Track2 0 2
Hypothetical-1 1541 2 1062 1005
Track1 0 0.0039 0.001 1 2
Track2 0 5
Hypothetical-2 1538 2 426 426
Track1 0 0.0076 0.001 1 1
Track2 3 0
Acetate permease 1538 2 471 471
Track1 0.0056 0.0328 0.1705 3 1
Track2 1 1
Phosphoribosylaminoimidazole-succinocarboxamide synthase 1531 2 936 711
Track1 0.0087 0.0587 0.1484 1 1
Track2 1 1
Aminotransferase, class III 1538 3 1425 981
Track1 0.0089 0.0467 0.1845 1 1
Track2 2 1
Track3 0.0128 0.0363 0.3511 1 1
Sulfatase modifying factor 1 precursor 1531 2 1065 954
Track1 0.0225 0.4897 0.0459 0 3
Track2 2 1
FlavI
Hypothetical-1 6044 3 117 117
Track1 2 2
Track2 0.0694 0.4709 0.1473 0 3
Track3 0.202 1.9411 0.1041 0 2
Oxidoreductase 6040 2 1386 1386
Track1 0.0017 0.0161 0.1056 3 0
Track2 3 2
Hypothetical-2 6044 3 168 168
Track1 1 2
Track2 0.0219 0.2984 0.0735 0 2
Track3 0.0431 4.1031 0.0105 0 2
Hypothetical-3 6054 3 408 408
Track1 0.0177 0.0576 0.3067 0 4
Track2 0 2
GCN5-related N-acetyltransferase 6044 2 489 489
Track1 0.025 0.2145 0.1164 0 3
Track2 0 3
Thiol peroxidase 6043 3 450 285
Track1 0.0177 0.094 0.1883 0 2
Track2 0 3
Track3 0.0478 1.6802 0.0284 1 1
Cell division protein FtsH 6054 3 1947 936
Track1 0 3
Track2 1.2932 10.6917 0.121 0 2
Track3 0.004 0.3968 0.0101 0 2
FlavA
Methionyl-tRNA formyltransferase 6365 2 945 945
Track1 0.0033 0 99 2 0
Track2 0 1
Hypothetical-1 6382 2 114 114
Track1 0.2055 0.0687 2.9913 3 0
Track2 2 1
Hypothetical-2 6376 3 162 162
Track1 0.0836 0.0985 0.8493 4 0
Track2 3 0
Track3 0.0095 0.0001 99 4 0
Transcriptional Regulator 6376 2 1083 1083
Track1 0.0037 0 99 2 0
Track2 2 1
Twin-arginine translocation protein 6382 2 195 195
Track1 0.0162 0.0082 0.5035 4 0
Track2 3 0
1-acyl-sn-glycerol-3-phosphate acyltransferase 6382 2 615 615
Track1 0.0022 0.038 0.0566 2 0
Track2 3 0
Membrane protein 6384 2 561 561
Track1 0.0021 0.0123 0.1727 3 0
Track2 3 0
Hypothetical-3 6376 2 867 843
Track1 0.0162 0.1574 0.1028 2 0
Track2 4 0
Hypothetical-4 6376 2 123 123
Track1 0 0.0335 0.001 3 1
Track2 2 0
Inosose isomerase 6384 2 1053 987
Track1 0.0014 0.0324 0.0431 3 0
Track2 4 0
Phosphoglycerate kinase 6382 2 1188 879
88
Track1 0.0016 0.0251 0.0647 4 0
Track2 1 2
Iron-sulfur Cluster Assembly ATPase protein 6384 2 747 747
Track1 0 0.0041 0.001 3 0
Track2 3 0
Gluconolactonase 6382 2 1011 1011
Track1 0.0013 0.0045 0.2869 4 0
Track2 5 0
FlavG
NG,NG-dimethylarginine dimethylaminohydrolase 5919 3 909 723
Track1 0.0032 0.0126 0.0252 2 1
Track2 0 0.0142 0.001 2 0
Track3 2 1
Hypothetical-1 5920 2 1032 579
Track1 0.0022 0.0704 0.0309 2 0
Track2 3 0
LSU ribosomal protein L17p 5919 2 618 573
Track1 0.0042 0.0397 0.1054 1 1
Track2 3 1
GTP-binding protein EngB 5923 2 603 603
Track1 0.077 2.3909 0.0322 0 3
Track2 4 0
UDP-N-acetylmuramoylalanine--D-glutamate ligase 5923 3 570 570
Track1 0.0062 0.2617 0.0237 1 2
Track2 0.0128 0.2843 0.045 0 3
Track3 0 3
Phospho-N-acetylmuramoyl-pentapeptide transferase 5923 3 1215 696
Track1 0.0648 2.6013 0.0249 3 1
Track2 1 3
Track3 0.063 2.4727 0.0255 2 1
Hypothetical-2 5923 2 495 495
Track1 0.0024 0.0334 0.0711 2 1
Track2 2 0
Phosphoribosylaminoimidazole carboxylase catalytic subunit 5919 2 477 477
Track1 0.0078 0.1474 0.0531 1 1
Track2 4 0
WD40-like Beta Propeller 5919 2 1752 452
Track1 0.015 0.2455 0.061 3 0
Track2 4 0
Alpha,alpha-trehalose-phosphate synthase 5919 3 1665 1137
Track1 0.001 0.0245 0.0414 3 0
Track2 4 2
Track3 0.049 0.5309 0.0924 0 2
Translation initiation factor 1 5919 2 216 216
Track1 0.017 0.7127 0.0238 1 1
Track2 3 1
Hypothetical-3 5919 4 1164 531
Track1 0.0002 0.2111 0.001 2 0
Track2 0.0002 0.2434 0.001 1 1
Track3 0 0 0.1903 3 0
Track4 2 1
Citrate Synthase 5919 2 1284 840
Track1 0.0029 0.0914 0.0321 1 1
Track2 2 2
LSU ribosomal protein L36p 5919 2 117 117
Track1 0.0006 0.6391 0.001 1 1
Track2 4 1
Proteorhodopsin 5919 2 696 696
Track1 0 0.0406 0.001 1 1
Track2 2 0
89
Table 3.11. Values used as cutoffs for the various BLAST analyses.
BLAST Description Program Used E-value cutoff AA %ID cutoff % Alignment Length Cutoff
F. psychrophilum JIP02 phylogenetic markers against GOM scaffolds BLASTN 1!10-3 n/a n/a
Putative CDS against the GenBank NR for rough phylogenetic assignment BLASTP 1!10-10 n/a n/a
Putative CDS against Flavobacteria sp. MS024-2A BLASTP 1!10-3 n/a n/a
Determining CSCGs for pangenome BLASTP n/a 80% 80%
Pangenome CSCGs against putative CDS BLASTP n/a 80% 80%
Putative peptidases against other bins and pangenome BLASTP 1!10-5 35% 80%
Putative peptidases against the GenBank NR BLASTP 1!10-5 35% 80%
Putative CDS of 1 bin against all other bins to determine bin functionality BLASTP n/a 40% 80%
90
Figure 3.1. Maximum likelihood tree generated using PHYML of DNA primase.
91
Figure 3.2. Gene synteny map of a ~21 kbp homologous region identified using the
progressiveMauve aligner.
FlavG
Homoserine kinase
Threonine sythase
Aldehyde dehydrogenase Cyclopropane-fatty-acyl-phospholipid synthase
Glutamate synthase, small
Glutamate synthase (ferrodoxin)
Hypothetical protein
Dokdonia donghaensis MED134
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
3-isopropylmalate dehydrogenase small subunit
Hypothetical protein
Hypothetical protein
Putative 3-isopropylmalate dehydrogenase
Acetolactate synthase small subunit
Ketol-acid reductoisomerase
Threonine dehyratase biosynthetic
3-isopropylmalate dehydrogenase small subunit
3-isopropylmalate dehydrogenase
Gramella forestii KT0803
3-isopropylmalate dehydrogenase large subunit
HPP family protein
Glycosyl transferase group 2 family protein
Glycosyltransferase
Hypothetical protein
Ribosomal large subunit pseudouridine synthase
Argininosuccinate lyase
Robiginitalea biformata HTCC2501
Hypothetical protein
Hypothetical protein
Glycosyl transferase group 2 family protein
Glycosyltransferase
Hypothetical protein
Ribosomal large subunit pseudouridine synthase
Putative metallopeptidase
Maribacter sp. HTCC2170
Hyrodlase, carbon-nitrogen family protein
Hypothetical protein
Gamma-glutamyl kinase
Gamma-glutamyl phophate reductase
Oxidoreductase
Kordia algicida OT-1
Hypothetical protein
Two-component repsonse regulator
Putative transmembrane protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Glyceraldehyde 3-phosphate dehydrogenase
Ketol-acid reductoisomerase
Acetolactate synthase - small
Acetolactate synthase - large
Dideoxy acid dehydratase
Aldehyde dehydrogenase
Hypothetical protein
Putative nitroreductase
Nitrilotriacetate monooxygenase B
DsbD
Para-amino benzoate synthase
tRNA(Ile) synthetase
Arylsulfatase
Dihydrolipoaminde dehydrogenase
Na
+
/H
+
antiporter
FtsK
Hypothetical protein
Membrane protein
3,4-dihydroxy-2-butanone 4-phosphate synthase
Protein of unknown function
FlavI
Hypothetical protein
Argininosuccinate synthase
N-acetyl-γ-glutamyl phosphate reductase
Acetylorthinine aminotransferase
Orthinine carbamoyl transferase
Acetylglutamate kinase
Acetylorthinine deactylase
Argininosuccinate lyase
FlavH
Threonine dehydrogenase Pyrroline-5-carboxylate reductase Catalase/peroxidase
Catalase/peroxidase
FlavA
Mg
2+
chelatase
Hypothetical protein
Creatinine amidohydrolase
Hypothetical protein
Orthinine carbamoyl transferase FtsK
Hypothetical protein
Na
+
/H
+
antiporter
Glycosidase PH1107-related protein
Cytochrome c biogenesis protein transmembrane
tRNA(Ile)-lysidine synthetase
Oxidoreductase domain protein
Hypothetical protein
Protease inhibitor Kazal-type
Dihydrolipoamide dehydrogenase
Flavobacteriales bacterium ALC-1
Flavobacteria MS024-2A
92
Figure 3.3. Gene synteny map of a ~64 kbp homologous region identified using the
progressiveMauve aligner.
FlavH
FlavA
FlavG
FlavI
RpoD
Ribulose-phosphate 3-epimerase
Anhydro-N-acetylmuramie acid kinase
Secreted GH
Hypothetical protein
Transamidase GatB
FstZ
FstA
Putative cell division
UDP-N-acetylmuramate--alanine ligase
UDP-N-acetylglucosamine (EC 2.4.1.227)
FstW
UDP-N-acetylmuramoylalanine--D-glutamate ligase
Phospho-N-acetylmuramoyl-pentapeptide transferase
FstI
UDP-N-acetylmuramoylalanyl-D-glutamate --2,6-diaminopimelate ligase
Hypothetical protein
rRNA SSU methyltransferase H
MraZ
2-hydroxy-6-oxo-phenylhexa-2,4-dienoate hydrolase
GTP binding protein
Hypothetical protein
Fructose bisphosphate aldolase
Acetyl-CoA carboxyl transferase
Polyribonuceleotide nucleotidyl transferase
OmpA domain
Hypothetical protein
Aminotransferase
Aminotransferase
Hypothetical protein
Hypothetical protein
Hypothetical protein
Ribulosephosphate-3-epimerase Anhydro-N-acetylmuramic acid kinase
Hypothetical protein
Hypothetical protein
Secreted GH
OmpA/MotB domain
Glucoamylase
Alpha-trehalose-phosphate synthase
Hypothetical protein Sucrose-6-phosphate hydrolase
Arylsulfatase
Alpha rhamnosidase
Nucleoside:H
+
symporter
Ketoglutamate semialdehyde dehydrogenase
Mn dependent transcription regulator MntR
Ton-B receptor
Ton-B receptor
Transcriptional regulator AraC
Conserved hypothetical protein
Hypothetical protein
D-alanyl-D-alanine carboxypeptidase
Hypothetical protein
Beta-xylosidase
*
*
*
*
*
*
93
Figure 3.4. Diversity of the underlying reads for putative CDS annotated as
phospho-N-acetylmuramoyl-pentapeptide transferase.
Variant 1
Variant 2
Variant 3
Variant 1
Variant 2
Variant 3
Variant 1
Variant 2
Variant 3
94
Figure 3.5. dN/dS ratio plotted against dS.
0
0.5
1
1.5
2
2.5
3
0 2 4 6 8 10
FlavA
FlavG
FlavH
FlavI
dS
dN/dS
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
A
B
C C
0
0.1
0.2
0.3
0.4
0.5
0 0.05 0.1 0.15 0.2 0.25 0.3
C
95
Figure 3.6. MODIS satellite data showing the surface chlorophyll-a levels
-72 -71 -70 -69 -68 -67 -66 -65 -64
40 41 42 43 44 45 46
Chl a (mg m
-3
)
03
04
06
13
14
12
mg
.
m
-3
10.00
3.16
0.32
1.00
0.10
mg
.
m
-3
10.00
3.16
0.32
1.00
0.10
Summer Winter
-72 -71 -70 -69 -68 -67 -66 -65 -64
40 41 42 43 44 45 46
96
Figure 3.7. Number of identified 16S rRNA gene fragments from the Gulf of Maine
metagenome from each sampling season for several phylum level groups in the
Bacteria and Archaea, and the class level groups within the Proteobacteria
Proteobacteria Other Bacteria Archaea
Sequences
5.1
4.5
0.1
4.1
4.5
15.6
15.0
0.3
0.6
0.1
1.2
2.0
0.6
0.2
2.9
3.0
0.1
0.5
0.4
1.5
4.1
2.3
37.0
49.8
18.3
2.3
1.4
3.0
19.5
Winter 2006
Summer 2006
180
20
40
60
80
100
120
140
160
0
0
100
200
300
400
500
600
Thaumarchaeota
Gamma-
Delta-
Beta-
Alpha-
Unclassi!ed
Verrucomicrobia
Planctomycetes
Firmcutes
Deferribacteres
Cyanobacteria
Chrysiogenetes
Chloro"exi
Chlamydiae
Bacteroidetes
Actinobacteria
Euryarchaeota
97
Figure 3.8. Maximum likelihood tree generated using PHYML (bootstrap: 1,000) of:
A) DNA helicase (RecG), B) DnaG-FusA, C) DnaG-DnaE-RecG, D) DnaG-RpoC,
and E) proteorhodopsin.
0.2
Dokdonia donghaensis strain PRO95 (FJ627053)
Gamma-proteobacterium Hot 75m4 (AF349981)
uncultured bacterium (AAY68042)
Bizionia sp. PZ-13 (AB557561)
Dokdonia donghaensis MED134 (ZP_01049273)
Polaribacter sp. MED152 (AANA01000000)
Flavobacteria bacterium BAL38 (AAXX01000000)
Gamma-proteobacterium EBAC31A08 (AF279106)
Winogradskyella sp. PC-19 (AB557551)
Photobacterium sp. SKA34 (EAR55132)
uncultured organism HF10_3D09 (ABB82983)
Kordia sp. PC-4 (AB557548)
FlavH
FlavG
uncultured organism HF70_19B12 (ABB82977)
Candidatus Pelagibacter ubique strain HTCC1062 (EF640911)
Winogradskyella sp. PG-10 (AB557555)
Uncultured bacterium clone A1-60-3 (EU683491)
Flavobacterium sp. SA4-8 (AB557584)
Photobacterium angustum S14 (EAS63552)
Polaribacter irgensii 23-P (ZP_01117885)
Uncultured bacterium clone W1-13 (EU683555)
Winogradskyella sp. PC-8 (AB557549)
Psychroflexus torquis ATCC 700755 (ZP_01253360)
Polaribacter sp. PG-5 (AB557554)
Flavobacteria bacterium MS024-3C (ABVW01000000)
Flavobacteria bacterium SG-18 (AB557569)
Winogradskyella sp. PC-10 (AB557550)
Uncultured bacterium clone W8-0-10 (EU683572)
Uncultured bacterium clone W1-2 (EU683549)
Uncultured bacterium clone W8-30-6 (EU683581)
Uncultured bacterium clone A1-0-7(3) (EU683481)
Gilvibacter sp. SZ-19 (AB557573)
Winogradskyella sp. PG-2 (AB557552)
Polaribacter sp. PG-12 (AB557556)
gamma proteobacterium HTCC6216 (ABO88139)
Dokdonia donghaensis MED134 (AAMZ01000000)
Psychroflexus torquis ATCC (AAPR01000000)
Vibrio sp. AND4 (ZP_02194911)
bacterium HTCC8046 (ABV58577)
Tenacibaculum sp. SZ-18 (AB557572)
uncultured bacterium eBACred22E04 (AAY82778)
Flavobacterium sp. SA4-16 (AB557577)
uncultured bacterium (AF349978)
Flavobacteria bacterium SG-13 (AB557568)
Flavobacteria bacterium BAL38 (ZP_01734914)
uncultured marine group II euryarchaeote HF70_59C08 (ABA61391)
Flavobacterium sp. SA4-44 (AB557582)
Uncultured bacterium clone W8-50-2 (EU683589)
Uncultured bacterium clone A1-0-6(3) (EU683483)
Flavobacteria bacterium MS024-2A (ZP_03702117)
Polaribacter sp. SA4-10 (AB557575)
FlavI
uncultured marine gamma proteobacterium EBAC20 (AAS73014)
Tenacibaculum sp. SG-28 (AB557571)
Candidatus Pelagibacter ubique HTCC1062 (YP_266049)
Uncultured bacterium clone W1-4 (EU683550)
uncultured archaeon MedDCM-OCT-S08-C16 (ADD93192)
uncultured marine bacterium EB0_41B09 (ABL97764)
100
96
100
71
52
100
51
100
60
99
99
59
96
98
60
80
98
87
51
100
55
60
100
75
54
100
99
88
100
99
53
99
100
! "
#
$
%
98
Chapter Four
Microbial communities associated with ferromanganese nodules and the surrounding
sediments
*As published in:
Tully, BJ and JF Heidelberg. Microbial communities associated with ferromanganese nodules
and the surrounding sediments. Frontiers Micro Vol. 4: 1-10 (2013)
4.1. Introduction
Ferromanganese/polymetallic nodules form at the sediment-water column interface in
deep-sea environments (4-6,000 m). Generally small in size (1-5 cm) and formed as concentric
laminated structures, they are primarily composed of manganese (Mn), iron (Fe), and a large
number of other metals, including copper, nickel, zinc, and titanium; however, composition
varies by nodule and oceanic province. Despite their small size, the global estimate for metal
content in ferromanganese (FeMn) nodules is 2 ! 10
14
kg of each Fe and Mn (Somayajulu,
2000). Recently, an increase in the value of rare earth metals has stimulated an interest in mining
these resources. The removal of FeMn nodules from the seafloor could have unknown
ramifications on the environment, since the processes governing nodule formation and
maintenance and the role nodules play in supporting the adjacent biosphere is poorly understood.
The formation process for FeMn nodules has been a scientific unknown since their
discovery in the 1870’s (Murray, 1891). Emphasis had been placed on abiotic processes, with
formation times on the order of a few mm"10
6
yr
-1
based on radiometric data (Kerr, 1984). But
new quantification techniques have reduced the estimates of formation time to a few mm"10
3
yr
-1
(Boltenkov, 2012). As our understanding of the role microorganisms play in geochemical
processes has increased, research has started to shift towards determining if biotic processes play
a role in nodule formation. Much of the recent evidence of a microbial component to nodule
geochemistry revolves around visual inspection of nodules using scanning electron microscopy.
These studies have identified different exolithic and endolithic morphotypes of microorganisms
(Lysyuk, 2008). Like most deep-sea environments, little is known about the physiologies and
metabolisms of microorganisms associated with FeMn nodules or the impact these microbial
processes may have on global ocean metal chemistry.
99
Previous work on FeMn nodules comes from samples collected from the Clarion-
Clipperton Zone in the eastern North Pacific (Wang et al., 2012). An equally large FeMn nodule
province exists within the central South Pacific Gyre (SPG), where FeMn nodules can occupy as
much as 70% of the exposed surface sediment. The SPG has the lowest primary productivity and
sedimentation rates of the major ocean gyres and, as a direct result, has extremely oligotrophic,
recalcitrant underlying sediment (D'Hondt et al., 2009). Microbial activity in these sediments
may be stimulated by the presence of FeMn nodules, which can abiotically degrade refractory
organic compounds in to labile, low molecular weight organic compounds suitable for microbial
respiration (Sunda and Kieber, 1994). Nodules also act as concentrators of metallic dications
(Ni
2+
, Cu
2+
, Zn
2+
) (Dick et al., 2008) and anionic forms of phosphorous, vanadium,
molybdenum, and tungsten (Koschinsky and Halbach, 1995), many of which are important co-
factors for key biochemical processes. Thus, FeMn nodules may play an important role in the
degradation of buried organic compounds and the global carbon cycle.
Previous attempts to isolate genetic material from FeMn nodules have been unsuccessful,
with the exception of a single published 16S rRNA gene sequence (Wang and M!ºller, 2009).
We sampled nodules and sediments from three different sites in the SPG to gain a greater
understanding of the biogenic controls associated with nodule formation and cycling. The V4
hypervariable region of the 16S rRNA gene was amplified from DNA extracted from different
layers within each nodule and the surrounding sediment and sequenced using 454 sequencing.
Deep sequencing the microbial community allowed us to determine the dominant 16S rRNA
gene sequences and compare how the community structure varied between nodules from
different sites and by the source layer from within the nodules. We found that the most abundant
group of organisms could be assigned to the Thaumarchaea, but that this group was extremely
diverse in the nodule/sediment ecosystem. We found that the microbial communities associated
with the nodule were distinct from the communities present in the sediment and the nodule
communities varied based on sampling site. Further, the microbial communities were
significantly different between nodule layers.
100
4.2. Materials and Methods
4.2.1. Sample collection
Sediment and FeMn nodules were collected as part of the Deep Sea Drilling Program (DSDP)
cruise 595/596 to the SPG (Dec 2006 – Jan 2007 aboard the R/V Roger Revelle). Nodules and
surface sediments were collected from SPG2, 9, and 10, with an additional sediment sample
collected from SPG3 (Table 4.1). The largest nodule collected measured 6.5 cm in diameter.
Nodules and corresponding sediments were aseptically sampled from a multicore on the catwalk,
as samples were brought onboard. Sediments from 0-5 cm were sampled from the same cores
that the nodules were obtained and stored at -80°C. Nodules were rinsed gently with 0.2 µm
filtered and autoclaved ambient bottom water to remove sediment adhering to the surface. A
flame-sterilized hammer and chisel were used to aseptically section the nodules based on visual
changes in strata (delimited as either outer layer, inner layer, or core, where applicable). Further
subsamples were generated and stored in 1.5 mL cyrovials and stored at -80°C.
Table 4.1. Sampling site locations and depths.
Site Location Meters below sea level
SPG2 26° 03.0896' S, 156° 53.6472' W 5,127
SPG3 27° 56.539' S, 148° 35.388' W 4,852
SPG9 38° 03.6904' S, 133° 05.5072' W 5,697
SPG10 39° 18.617' S, 139° 48.036' W 5,283
4.2.2. DNA extraction
Extraction of DNA from nodules proceeded using a modified phenol-chloroform extraction
method . Approximately 0.5 mL of sample was resuspended in 675 µL of 2% CTAB
(cetyltrimethylammonium bromide) lysis buffer [100 mM Tris, 100 mM EDTA, 250 mM
Na
2
PO
4
, 1.5 M NaCl, brought to 40 mL and adjusted to pH 8.0; addition of 2% CTAB; diluted to
50 mL total volume and autoclaved] and vortexed thoroughly for 30 seconds. To each slurry, 20
µL of proteinase K (800 units!mL
-1
) was added and incubated horizontally for 30 minutes at
50°C. To each sample, 150 µL of 10% SDS was added and incubated further for 120 min at 65°C
101
followed by the addition of 600 µL of PCI (phenol:chloroform:isoamyl alcohol) 25:24:1
incubation for 20 min at 65°C. Samples where then centrifuged for 10 min at 10,000 ! g. The
upper layer was transferred to a new tube (care was taken to avoid transferring material from the
interface or below), 0.7 volumes of isopropanol was added, and incubated for 60 min at room
temperature. Samples were again centrifuged at 10,000 ! g for 15 min, 0.5 mL of cold 70%
ethanol was added, and then centrifuged for an additional 5 min. Following removal of the
supernatant, the pellet was left to air dry in a fume hood for 15-30 min (as necessary) and
resuspended in 30 µL of sterile, DNase-free H
2
O. Samples were then quantified (3 µL) using the
Qubit 1.0 fluorometer and the Qubit dsDNA HS Assay Kit (Life Technologies).
Due to low yield using the described phenol-chloroform method, extraction of DNA from
sediment samples was performed using the MoBio PowerLyzer PowerSoil DNA kit following
the manufacturer’s protocol, and quantified as above.
All samples with > 0.1ng/µL final DNA concentration were cleaned and concentrated
using the ZYMO Clean + Concentrator 5 (6:1 DNA Binding Buffer, as per suggested protocol)
and samples were resuspended in 20 µL of sterile, DNase-free H
2
O.
4.2.3. Sample amplification and pyrosequencing
Samples were amplified using PCR, targeting a portion of the 16S rRNA gene. All
amplifications were performed using the FastStart High Fidelity PCR System (Roche). Initial
amplification was performed in triplicate using a forward fusion primer, (5’-
CGTATCGCCTCCCTCGCGCCATCAGxxxxxxxxxx(x)GTGYCAGCMGCCGCGGTA-3’)
designed to incorporate the Roche 454 Titanium Adapter A sequence (bold), a multiplexing
identifier (MID) sequence, the U515 forward primer (underlined) (Wang and Qian, 2009), and a
U1048 reverse primer (5’-CGRCRCCATGYANCWC-3’) modified from (Huber et al., 2007).
Each 50 µL PCR reaction was composed of: forward primer (0.8 µM), reverse primer (0.8 µM),
0.2 mM dNTPs, 1X buffer #2, 15 µg of BSA (bovine serum albumin), PVP
(polyvinylpyrrolidone, 0.5%), and 2.5 U Taq, with H
2
O up to 50 µL. If the template DNA had
been cleaned and concentrated, 7 µL"reacion
-1
was added, otherwise 10 µL"reaction
-1
template
102
DNA was added. Amplification proceeded on a thermocycler with the following heating steps:
95°C 4 min, 35 cycles of 95°C 30 sec, 56°C 30 sec, 72°C 1 min, 72°C 5 min, hold at 4°C.
Initial PCR products were pooled and the PCR product (~550bp) was gel excised using
the Qiagen Gel Extraction Kit (Qiagen) following the manufacturer’s protocol. Excised DNA
products were amplified in duplicate to generate sufficient material for pyrosequencing. The
same forward primer was used, but the reverse primer (U1048R) had the 454 Roche Titanium
Adapter B sequence (5’-CTATGCGCCTTGCCAGCCCGCTCAG-3’) added to the 5’ end.
The second round of PCR amplification proceeded as above, with the following exceptions: [1]
each primer was added at 0.6 µM final concentration; [2] no BSA or PVP was used; and, [3] 5 ng
of template of DNA was added. The same settings were used for the thermocycler, except that
amplification was only performed for 30 cycles.
PCR products were pooled and cleaned using the AMPure Bead XP (Agencourt) kit,
following the manufacturer’s protocol. Samples were quantified using PicoGreen and visualized
using Agilent Bioanalyzer using the High Sensitivity (Agilent) chip. The various 16S rRNA gene
amplicons were pooled following the recommend procedure in the Amplicon Library Preparation
Method Manual (Roche, GS FLX Titanium Series, Oct. 2009). Pyrosequencing was performed
by EnGenCore (University of South Carolina, Columbia, SC) utilizing Titanium FLX chemistry.
Four of the samples had technical replicates performed to determine how well the
procedure reproduced results for identical samples. While the absolute value of the number of
sequences generated was different for each replicate, the relative abundance of the OTUs
remained virtually the same (data not shown).
4.2.4. Data analysis
Trimming, cleaning, and clustering of the 16S rRNA gene amplicon sequences generated via
pyrosequencing was performed using mothur (V1.28) following the Schloss laboratory’s
standard operating procedure (SOP) (available at www.mothur.org) (Schloss et al., 2011;
Schloss and Westcott, 2011; Schloss et al., 2009). In brief, a combination of the programs
103
trim.flows (set to 350 flows for FLX data), shhh.flows, and trim.seqs was used to identify high
quality sequences and trimmed of the any remaining adapter and primer sequences, using the
recommended settings (allowing for 1 difference in the barcode, 2 differences in primer
sequence, a maximum homopolymer length of 8 nucleotides, and a minimum length of 200bp).
Sequences remaining after these initial steps were aligned to a reference file generated using
previously aligned SILVA 16S and 18S rRNA gene sequences (V111) from Bacteria, Archaea
and Eukarya. The program screen.seqs was used to restrict the area of the sequences analyzed to
an area immediately surrounding and including the V4 hypervariable region (13,861-23,959bp of
the aligned sequence). Following a series of steps to collapse related sequences in to more
manageable numbers, groups were removed that did not have at least 1000 sequences remaining.
The groups with >1000 sequences were processed using UCHIME to detect putative chimeric
sequences by comparing all sequences to the most abundant sequences in the dataset (Edgar et
al., 2011). Putative taxonomic assignments were derived using the classify.seqs program and
filtered to remove sequences with the any taxonomic assignment to Chloroplast or Mitochondria.
The programs dist.seqs and cluster (set to average neighbor) were used as described in the SOP.
The mothur tool was also used to analyze the processed sequences to determine !- and "-
diversity measures and putative phylogenetic assignment. For this downstream analysis, OTU
(operational taxonomic unit) calls were made at the 99% identity level (“label=0.01”), if
applicable, all groups were randomly subsampled to 900 sequences (“size=900”), and for any
process were multiple iterations were required, 1000 iterations were used (“iters=1000”). !-
diversity was determined using the summary.single command, which estimated Good’s coverage
(Good, 1953) for the samples. Several calculators were used to determined "-diversity, including
ThetaYC (#
YC
), Jaccard dissimilarity, and Bray-Curtis dissimilarity. Statistical significance of
these "-diversity measures were computed using parsimony, weighted Unifrac (Lozupone and
Knight, 2005), and unweighted Unifrac (Lozupone and Knight, 2005). Non-metric
multidimensional scaling (NMDS) ordination plots were constructed and statistical significance
was determined using a analysis of molecular variance (AMOVA) test (Excoffier et al., 1992).
OTUs were putatively classified using the classify.otu program.
104
4.2.5. Phylogenetic tree construction
Representative sequences for each of the first 30 OTUs determined to putatively belong to the
Thaumarchaea MG-1 group were generated in mothur. Representatives were aligned with
CLUSTAL W (Thompson et al., 1994) to environmental sequences (Durbin and Teske, 2010)
and sequenced members of the Phylum Thaumarchaea (Hallam and Konstantinidis, 2006;
Hatzenpichler et al., 2008; Walker et al., 2010) and trimmed to a region (289 bp) that included
the V4 hypervariable region of the 16S rRNA gene. Maximum likelihood trees were constructed
with 1,000 bootstraps using the Kimura (K80) (Kimura, 1980) evolution model within the
software package Geneious (Settings: Transitions/Transversion – Estimated = 4; Proportion of
invariable sites – Fixed = 0; Number of substitution rate categories = 1; Gamma distribution
parameter – Estimated = 0).
4.3. Results and Discussion
Analysis of 16S rRNA gene sequences generated allowed us to examine the community diversity
and OTU abundances of the FeMn nodule-associated and sediment mircoorganisms from four
sites in SPG. Analyzing the total diversity of the samples (!-diversity), how the community
diversity compared between samples ("-diversity), and the overall community composition
functioned as a corollary for establishing putative roles and functions of the microbes associated
with FeMn nodules.
4.3.1. Alpha-diversity
!-diversity is a general measure of species diversity (OTU richness) used to contrast
different samples/sites in ecological studies. Results for this study returned a total of 1,270 OTUs
across the 20 samples. Based on the observed OTU richness, it is apparent that the sediment
samples collected from 0-5 cm near the collected nodules tended to have a higher number of
OTUs (Table 4.2), though not exclusively. Samples collected from different layers within each
nodule demonstrated a wide range in the number of observed OTUs, but these numbers do not
coincide with specific source layer. The OTU richness of the inward layers (denoted as inner and
core) for nodules collected at sites SPG2 and SPG10 are higher, while SPG9 is more even in
105
richness from both outer and inner layers (Table 4.2). In general, use of Good’s Coverage
Estimate suggests that our depth of sequencing, and subsequent subsampling, approximately
covers 84-98% of the OTU diversity within the samples, with the lowest coverage generated in
from the sediment samples (Table 4.2).
Table 4.2. Sequence effort and diversity measures for each sample.
Sampling Site Sample Name Sample Type
No. of
sequences
No. of sequences
after processing
Good's
Coverage
Estimate
Observed
Richness
SPG2 5B-1 Nodule - Inner 8550 7144 0.895 176.0
SPG2 1A-1 Nodule - Outer 28941 25893 0.912 134.1
SPG2 1B-1 Nodule - Outer 4127 3698 0.925 115.2
SPG2 3 Nodule - Outer 1710 1372 0.938 109.7
SPG9 7 Nodule - Core 1271 928 0.898 192.2
SPG9 6 Nodule - Inner 1493 1296 0.937 128.7
SPG9 1B-1 Nodule - Outer 6749 4895 0.874 210.6
SPG9 2 Nodule - Outer 1773 1464 0.891 181.7
SPG10 7A Nodule - Inner 23767 18435 0.863 217.5
SPG10 11 Nodule - Inner 1305 1137 0.900 174.8
SPG10 14 Nodule - Inner 4273 3653 0.914 138.9
SPG10 17 Nodule - Outer 1916 1725 0.958 103.9
SPG10 18B Nodule - Outer 2801 2227 0.907 184.7
SPG10 24 Nodule - Outer 2898 2528 0.915 131.5
SPG10 25 Nodule - Outer 1469 1282 0.976 78.5
SPG10 26 Nodule - Outer 1162 1009 0.920 135.5
SPG2 1-5 Sediment 8658 7254 0.843 252.6
SPG3 1-8 Sediment 9860 8380 0.854 243.3
SPG9 1-5 Sediment 5597 4667 0.858 218.5
SPG10 1-1 Sediment 4064 3423 0.865 216.8
The data suggest that the sediments are more diverse than the nodules. It is generally
assumed that increases in diversity are linked to an increase in potential energy sources (e.g.
increased microbial diversity in estuaries and coastal environments compared to the open ocean
106
(Zinger et al., 2012)). The increased diversity between the nodule and sediment environment
may suggest that the surface sediment, despite having low total organic carbon (D'Hondt et al.,
2009), is capable of sustaining more microbial activity than the nodules. Alternatively, the
nodule environment introduces a number of cellular stressors due to the presence of increased
metal concentrations that may hinder the growth of a more diverse microbial population.
The data also suggest that the outer layers of the nodules are less diverse than the inner
layers. The implication may be that the inner layers are more capable of promoting microbial
diversity and growth. This may be counterintuitive if it is assumed that the most influential
metabolic process were linked to metal oxidation. The outer layers are the site of active nodule
growth and the only region directly in contact with the surrounding organic material (OM) of the
sediment, thus the major sites of metal oxidation and access to extant OM. The increased
diversity in the inner layers may be the result of metabolic activity linked to metal oxide
reduction (or some type of cycling between reduction and oxidation of the metal species).
Alternatively, microorganisms in the communities of the inner layers may be entombed, such
that the increased diversity is an artifact of multiple series of organisms colonizing the active
outer layers and becoming trapped.
4.3.2. Beta-diversity
!-diversity is a measure used to compare the communities of different samples/sites.
Community variation calculations were performed using three different methods that were
computed using: [1] only the differences in the OTUs present (Jaccard dissimilarity); [2] using
the common OTUs absolute abundance (Bray-Curtis dissimilarity); and, [3] using the common
OTUs relative abundance ("
YC
). In general, the calculations of statistical significance were in
agreement for all possible sample groupings (see Appendix Table 4.4), but only the results of the
"
YC
calculations will be discussed, as this analysis is more robust in demarcating differences
between communities.
Using hierarchical clustering to visualize the differences in communities between the
samples and sites, it becomes apparent that SPG9 and SPG10 cluster together and away from
107
SPG2, and that the sediment communities cluster away from the nodule communities, except for
an inner layer sample from SPG2 (Figure 4.1). To increase the resolution of these variations and
test the statistical significance, the samples were plotted using NMDS in three dimensions and
tested for significance using the AMOVA test (Figure 4.2). Multiple designations of the samples
were used to tease apart which factors attributed the most to community variation. Samples were
classified by nodule site or sample source (layer or sediment), individually (e.g., all sediment
samples are labeled “sediment” and sample site is ignored), or combined, with various iterations
to test significance (see Appendix Table 4.5). Assigning samples by sample site and source
allowed for the most accurate interpretation of the data. Many of the broad interpretations made
from only labeling by sample site or source were supported for the different iterations of the
data.
Each sediment community was significantly different from each of the other sediment
communities (p < 0.0009). The sediment communities were also significantly distinct from the
communities of SPG2 (p < 0.0009), SPG9 (p < 0.0009) and the SPG10 inner layers (p < 0.0009),
though the SPG10 outer layers were not significantly distinct. This overlap between the SPG10
outer and inner layer communities may be the result of the inaccurate nature of the subsampling
process and the difficulty of parsing subsamples that may overlap layers. SPG9 and SPG10
communities were not significantly different from each other, but SPG2 communities were
significantly distinct from both SPG9 and SPG10 (p
SPG2:SPG9
, p
SPG2:SPG10
< 0.0009). For both
SPG2 and SPG9 the communities on the exterior of the nodule (“outer”) were significantly
different from the communities of the inward portions of the respective nodules
(p
SPG2Inner:SPG2Outer
< 0.0009) (for SPG9 this includes an inner layer and a core sample, both
distinct from each other and the outer layer [p
SPG9Core:SPG9Inner
, p
SPG9Inner:SPG9Outer
, p
SPG9Core:SPG9Outer
< 0.0009). The different layers within SPG10 were not distinct from each other.
SPG9 and SPG10 do not have significantly distinct communities, despite a distance of
~600 km. This is in contrast to SPG2, which is significantly different and located at least ~2100
km from SPG9 and SPG10. This difference appears to be related to specific OTUs that are part
of the SPG2 community and not part of the SPG9 and SPG10 communities (Figure 4.3). SPG9
and SPG10 are in a slightly different regime within the SPG, with higher average surface
108
chlorophyll concentrations compared to SPG2 (D'Hondt et al., 2009). Though, if the apparent
similarity between the nodules of SPG9 and SPG10 (and difference of nodules from SPG2) were
a result of the physical/biological parameters of the overlying water column and OM inputs, it
might be assumed that the sediments from SPG9 and SPG10 would have similar communities,
when the data indicate these communities are distinct. Possibly the distinct communities are the
result of differences in age of the nodules, the surrounding sediment environment, or the seeding
populations. There are a number of OTUs that are present in all three of the nodule communities,
and it may be that these members play a role in nodule formation/maintenance, while the
differences represent possible flexibility in the recruitment of microbial populations.
For the nodules from SPG2 and SPG9 it is possible to differentiate between the inner
layers of the nodule and the outer layers. The implication from these results suggests that the
interior conditions of the nodule may be selecting for a particular community composition that is
different from the community composition of the exterior samples. Many of the community
members are present in both the interior and exterior samples, and it is the level of abundance
that changes. An increase in abundance may be linked to changes in the activity or role played by
these OTUs, such that their metabolisms are favored in the interior conditions compared to other
OTUs that decrease in abundance.
4.3.3. Community Composition
The most abundant OTU signatures were examined to determine if the presence/absence
and putative taxonomic assignments revealed information about nodule-microbe interactions. For
16S rRNA gene surveys, much of the emphasis is on the most abundant OTUs as a proxy for the
most abundant organisms present in the samples. In general, this type of abundance data agrees
with the more active members of the community, but there have been examples where the
counter is true (Campbell et al., 2011). The top 30 most abundant OTUs (in total sequences
assigned to the OTU) were assessed for the role they play in differentiating between different
samples and were assigned putative taxonomic groups. Taxonomic assignments were used to
predict how the microbial communities might be functioning based on the metabolisms of related
organisms. The top 30 OTUs cover 43-80% (mean: 61%) of the total number of sequences
109
assigned to OTUs for each sample (Figure 4.3). Many of the top 31-50 OTUs contain less < 5%
of the total abundance, with the rest of OTUs containing < 1% of the total abundance for each
sample.
Twenty-two of the top 30 most abundant OTUs were present in both the sediment and
nodule samples (Figure 4.3). The sediment samples have three OTUs found only in these
samples (OTU13, 28, and 30). Based on the SILVA taxonomy and assignment by mothur, these
OTUs were putatively assigned to the MG-1 group within the Phylum Thaumarchaea (Figure
4.3A; Table 4.3). There were five OTUs not found in the sediment. Three of these OTUs were
only associated with SPG2 (OTU1, 6 and 26), while the other two (OTU17 and -23) were present
in all of the nodule samples. Of the OTUs only found associated with SPG2, OTU1 could be
assigned to the Genus Colwellia, and OTU6 and 26 were assigned to the Family
Flavobacteriaceae. For the other nodule-associated OTUs, both OTU17 and 23 were assigned to
the Class !-Proteobacteria. Interestingly, 15 of the 30 most abundant OTUs were assigned to the
MG-1 Thaumarchaea and 6 to the Family Sinobacteraceae in the Class "-Proteobacteria.
Table 4.3. Putative phylogenetic assignments using the SILVA taxonomy, to the highest
discernable phylogenetic level
OTU No.
No. of total
sequences
assigned to OTU Putative Phylogenetic Assignment
1 13900 !-Proteobacteria: Colwellia
2 7833 Thaumarchaea: MG-1
3 6326 Thaumarchaea: MG-1
4 3042 Thaumarchaea: MG-1
5 2970 Thaumarchaea: MG-1
6 2934 Bacteriodetes: Flavobacteriaceae
7 2278 !-Proteobacteria: Sinobacteraceae
8 2123 Thaumarchaea: MG-1
9 1850 Thaumarchaea: MG-1
10 1699 !-Proteobacteria: Sinobacteraceae
11 1594 "-Proteobacteria: Rhodospirillaceae
12 1412 !-Proteobacteria: "endosymbiont"
110
13 1378 Thaumarchaea: MG-1
14 1259 !-Proteobacteria: Sinobacteraceae
15 1234 Thaumarchaea: MG-1
16 1230 !-Proteobacteria: Sinobacteraceae
17 1230 "-Proteobacteria
18 1230 !-Proteobacteria: Sinobacteraceae
19 1079 Thaumarchaea: MG-1
20 1044 Thaumarchaea: MG-1
21 923 "-Proteobacteria
22 918 Thaumarchaea: MG-1
23 881 "-Proteobacteria
24 774 !-Proteobacteria: Alteromonodales: NB-1d
25 774 Thaumarchaea: MG-1
26 750 Bacteriodetes: Flavobacteriaceae
27 655 Thaumarchaea: MG-1
28 624 Thaumarchaea: MG-1
29 624 !-Proteobacteria: Sinobacteraceae
30 582 Thaumarchaea: MG-1
The OTUs associated only with SPG2 were assigned to phylogenetic groups that are
known to specialize in the degradation of high molecular weight compounds . While such a
function would be common in sediments with higher loads of OM, the SPG sediments are the
most carbon-poor marine sediments sampled to date . Organisms specialized in OM degradation
would need to effectively process recalcitrant material. Of all the OTUs, OTU1 was assigned the
highest taxonomic rank, putatively a member of the Genus Colwellia, and was the most abundant
OTU in its respective sample (> 50% abundance) (Figure 4.3B). The members of the Genus
Colwellia are all psychrophiles, while certain members have been shown to form biofilms,
possess co-enzyme F420, which may play a role in aromatic compound degradation, and contain
the potential for denitrification (D'Hondt et al., 2009). Aromatic compounds are generally more
recalcitrant than other compounds, so if OTU1 also possesses such a genetic potential, it may be
a viable metabolism for effective growth in the SPG. Interestingly, denitrification is an anaerobic
metabolism, but the SPG has been shown to have measurable O
2
throughout the sediment
column (Methé et al., 2005). The other two OTUs unique to SPG2 were assigned to the
Flavobacteriaceae, known to be a group associated with marine snow particles and key players
111
in the microbial loop of the surface ocean (Cottrell and Kirchman, 2000; D'Hondt et al., 2009;
Grossart et al., 2005). Potentially, what makes the SPG2 nodule communities distinct were the
members with a role in OM degradation, and not members with putative roles in metal
chemistry. This may indicate that the SPG2 community is undergoing different processes
biologically, and potentially chemically, than the nodules from the other sites.
Less clearly defined were the OTUs that are exclusive to all of the nodule communities
(Figure 4.3B-D). Both OTU17 and 23 could only be assigned to the level of Class within the !-
Proteobacteria. The !-Proteobacteria is a very large and diverse phylogenetic group. A number
of the organisms within the !-Proteobacteria partake in metal biogeochemistry, including
organisms in the Genus Magnetospirillum, which can form magnetosomes composed of
magnetite (Fe(II,III) oxide) (Kirchman, 2002). OTU11 was assigned to the Family
Rhodospirillaceae, for which the Genus Magnetospirillum is a member.
Most surprising of the putative phylogenetic assignments was the breadth of diversity
recovered from the Thaumarchaea MG-1 group. Half of the top 30 most abundant OTUs were
assigned to this group, including the three OTUs exclusive to the sediment communities. The
Thaumarchaea are one of the most abundant groups of organisms on the planet and, currently,
all members are believed to be capable of nitrification (ammonia oxidation) and carbon fixation
(Hatzenpichler, 2012; Matsunaga et al., 1991). While there is no known direct link between this
group of organisms and metal chemistry, their presence in the SPG may have a consequential
impact. The SPG sediment environment is believed to be extremely carbon depleted, but with
relatively high concentrations of reduced nitrogen compounds (Tully et al., 2012). If the
Thaumarchaea from this study are active and autotrophic, they may function as a source of
reduced carbon, acting as a primary producer for microbial communities.
Using representative sequences for each of the 15 OTUs assigned to the Thaumarchaea, a
phylogenetic tree was constructed by including sequences from Thaumarchaea genomes and
SPG site 12 sediment. where a targeted 16S rRNA gene survey analyzing the Thaumarchaea
MG-1 diversity was done (D'Hondt et al., 2009) (Figure 4.4). Based on the phylogenetic groups
assigned in Durbin and Teske (2010), OTU13, 30 and 28 fall within the MG-1 ! group. The MG-
112
1 ! group was seen to increase with sediment depth and could not be found in the bottom water
samples. Most the sequences fell within the MG-1 " group, which were found in the Durbin and
Teske (2010) samples at the sediment-water column interface. The phylogenetic data from the
nodules and sediment from SPG2, 3, 9, and 10 for the Thaumarchaea does not reveal any new
distinct groups and covers many of the surface sediment related groups previously identified.
4.4. Conclusion
The results of the study reveal novel information regarding the types of microorganisms
associated with FeMn nodules and present a starting point for further research into the role
biology plays in their formation and maintenance. It clearly shows that the FeMn nodule-
associated microbial community is significantly different from the surrounding sediment
communities, suggesting the microbes in the nodules may play a different metabolic roles than
those of the sediment (and are not just “hitchhikers” from the surrounding environment). This
idea is further supported by the underlying similarity between FeMn nodule communities (with
some exceptions), especially for SP9 and SPG10, despite the distance between the sites.
Furthermore, the communities associated with the inner portions of a nodule are distinct from its
outer portions and surrounding sediment, implicating a possible selective pressure, such that the
dominant physiologies of the inner nodule are different than those of the outer nodule. This may
be the result of a shift from metal oxidation and OM degradation on the exterior to metal oxide
reduction and autotrophy in the interior, or potentially some form of complex cycling of metal
species and organic material. The presence of sequences related to predominantly OM degrading
organisms for SPG2 suggest an increased role in heterotrophic metabolism for these samples.
While the presence of Thaumarchaea in all samples may highlight a possible source for a food
web supported by ammonia oxidation and carbon fixation in the energy-limited SPG
environment.
The lack of an abundant 16S rRNA gene sequence strongly linked to a phylogenetic
group with known metal metabolisms may have implications for the role biology plays in nodule
chemistry. It could be possible that the microorganisms associated with FeMn nodules do not
play an active role in nodule formation through key metabolic functions (e.g., metal oxide
113
reduction as a terminal electron acceptor in a electron transport chain), but the biochemical
reactions associated with microorganisms may still be important. Oxidation/reduction of the
metal species may be ancillary biochemical mechanisms related to biofilm
formation/maintenance, cellular detoxification of reduced metals, or metabolic waste
sequestration. Alternatively, the nodule-associated organisms may be utilizing the OM degraded
by the FeMn nodule to sustain growth and nodule formation is truly abiotic. Further study of the
genomic potential of the microbial community may reveal metal metabolisms in unexpected
lineages or biological mechanisms linked to nodule chemistry.
4.5. Acknowledgements
We would like to thank Greg Horn and Dr. Katrina Edwards for providing access to the
ferromanganese nodules collected on DSDP 595/596, Drs. Victoria Orphan and Benjamin
Harrison for access to the surface sediments associated with the nodules, and the science staff
and crew of the R/V Roger Revelle. We would also like to thank Johanna Holm for helping make
the manuscript ready for publication. This work was supported by the Center for Dark Biosphere
Investigations (C-DEBI) Graduate Research Fellowship (OCE-0939654 to Benjamin Tully).
4.6. References
Boltenkov B (2012). Mechanisms of formation of deep-sea ferromanganese nodules:
Mathematical modeling and experimental results. Geochem Int 50: 125-132.
Campbell B, Yu L, Heidelberg J, Kirchman D (2011). Activity of abundant and rare bacteria in a
coastal ocean. Proc Natl Acad Sci USA 108: 12776-12781.
Cottrell M, Kirchman D (2000). Natural assemblages of marine proteobacteria and members of
the Cytophaga-Flavobacteria cluster comsuming low- and high-molecular-weight dissolved
organic matter. Appl Environ Microbiol 66: 1692-1697.
D'Hondt S, Spivack AJ, Pockalny R, Ferdelman T, Fischer JP, Kallmeyer J et al (2009).
Subseafloor sedimentary life in the South Pacific Gyre. Proc Natl Acad Sci USA 106: 11651-
11656.
Dick GJ, Podell S, Johnson HA, Rivera-Espinoza Y, Bernier-Latmani R, McCarthy JK et al
(2008). Genomic insights into Mn(II) oxidation by the marine alphaproteobacterium
Aurantimonas sp strain SI85-9A1. Appl Environ Microbiol 74: 2646-2658.
114
Durbin A, Teske A (2010). Sediment-associated microdiversity within the Marine Group I
Crenarchaeota. Environ Microbiol Rep 2: 693-703.
Edgar R, Haas B, Clement J, Quince C, Knight R (2011). UCHIME improves sensistivity and
speed of chimera detection. Bioinformatics 27: 2192-2200.
Excoffier L, Smouse P, Quattro J (1992). Analysis of molecular variance inferred from metric
distances among DNA haplotypes: application to human mitochondrial DNA restriction data.
Genetics 131: 479-491.
Good I (1953). The Population Frequencies of Species and the Estimation of Population
Parameters. Biometrika 40: 237-264.
Grossart H, Levold F, Allgaier M, Simon M, Brinkhoff T (2005). Marine diatom species harbour
distinct bacterial communities. Environ Microbiol 7: 860-873.
Hallam SJ, Konstantinidis KT (2006). Genomic analysis of the uncultivated marine
crenarchaeote Cenarchaeum symbiosum. Proc Natl Acad Sci USA.
Hatzenpichler R (2012). Diversity, physiology and niche differentiation of ammonia-oxidizing
archaea. Appl Environ Microbiol 78: 7501-7510.
Hatzenpichler R, Lebedeva EV, Spieck E, Stoecker K, Richter A, Daims H et al (2008). A
moderately thermophilic ammonia-oxidizing crenarchaeote from a hot spring. Proc Natl Acad
Sci USA 105: 2134-2139.
Huber J, Mark Welch D, Morrison H, Huse S, Neal P, Butterfield D et al (2007). Microbial
population structures in the deep marine biosphere. Science 318: 97-100.
Kerr R (1984). Manganese nodules grow by rain from above: the rain of plant and animal
remains falling into the deep sea not only provides metals to nodules but also determines nodule
growth rates and composition. Science 223: 576-577.
Kimura M (1980). A simple method for estimating evolutionary rates of base substitutions
through comparative studies of nucleotide sequences. J Mol Evol 16: 111-120.
Kirchman D (2002). The ecology of Cytophaga-Flavobacteria in aquatic environments. FEMS
Microbiol Ecol 39: 91-100.
Koschinsky A, Halbach P (1995). Sequential leaching of marine ferromanganese precipitates:
Genetic implications. Geochim Cosmochim Ac 59: 5113-5132.
Lozupone C, Knight R (2005). Unifrac: a New Phylogenetic Method for Comparing Microbial
Communities. Appl Environ Microbiol 71: 8225-8235.
115
Lysyuk G (2008). Biomineral nanostructures of manganese oxides in oceanic ferromanganese
nodules. Geol Ore Deposits 50: 647-649.
Matsunaga T, Sakaguchi T, Fumihiko T (1991). Magnetite formation by magnetic bacterium.
Appl Environ Microbiol 35: 651-655.
Methé BA, Nelson KE, Deming JW, Momen B, Melamud E, Zhang X et al (2005). The
psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H
through genomic and proteomic analyses. Proc Natl Acad Sci USA 102: 10913-10918.
Murray J. (1891). H. M. S. Stationery Office: London.
Schloss P, Gevers D, Westcott S (2011). Reducing the Effects of PCR Amplification and
Sequencing Artifacts on 16S rRNA-Based Studies. PLoS ONE 6: e27310.
Schloss P, Westcott S (2011). Assessing and improving methods used in OTU-based approaches
for 16S rRNA gene sequence analysis. Applied and Environmental Microbiology: 1-8.
Schloss P, Westcott S, Ryabin T, Hall J, Hartmann M, Hollister E et al (2009). Introducing
mothur: open-source, platform-independent, community-supported software for describing and
comparing microbial communities. Applied and Environmental Microbiology 75: 7537.
Somayajulu B (2000). Growth rates of oceanic manganese nodules: Implications to their genesis,
palaeo-earth environment and resource potential. Curr Sci India 78: 300-308.
Sunda W, Kieber D (1994). Oxidation of humic substances by manganese oxides yields low-
molecular-weight organic substrates. Nature 367: 62-64.
Thompson J, Higgins D, Gibson T (1994). CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
Tully B, Nelson W, Heidelberg J (2012). Metagenomic analysis of a complex marine planktonic
thaumarchaeal community from the Gulf of Maine. Environ Microbiol 14: 254-267.
Walker C, De la Torre J, Klotz M, Urakawa H, Pinel N, Arp D et al (2010). Nitrosopumilus
maritimus genome reveals unique mechanisms for nitrification and autotrophy in globally
distributed marine crenarchaea. Proc Natl Acad Sci USA 107: 8818.
Wang X, M!ºller W (2009). Marine biominerals: perspectives and challenges for polymetallic
nodules and crusts. Trends in biotechnology 27: 375-383.
Wang X, Schlo"macher U, Wang S, Schröder H, Wiens M, Batel R et al (2012). From
nanoparticles via microtemplates and milliparticles to deep-sea nodules: biogenically driven
mineral formation. Adv Mat Res 6: 97-115.
116
Wang Y, Qian P (2009). Conservative Fragments in Bacterial 16S rRNA Genes and Primer
Design for 16S Ribosomal DNA Amplicons in Metagenomic Studies. PLoS ONE 4: e7401.
Zinger L, Gobet A, Pommier T (2012). Two decades of describing the unseen majority of
aquative microbial diversity. Mol Ecol 21: 1878-1896.
117
Table 4.4. Appendix Table 1.
ThetaYC Jaccard Bray-Curtis ThetaYC Jaccard Bray-Curtis ThetaYC Jaccard Bray-Curtis
Sample Comparisons Set-up 1 - - - + + + - - -
SPG10inner-SPG2inner - - - + + + - - -
SPG10outer-SPG2inner - - - + + + - - -
SPG10inner-SPG9core - - - + + + - - -
SPG10outer-SPG9core - + + + + + - - -
SPG2inner-SPG9core - - - - + + - - -
SPG2outer-SPG9core - - - + + + - - -
SPG10inner-SPG9outer - - - + + + - - -
SPG10outer-SPG9outer - - - + + + - - -
SPG2inner-SPG9outer - - - + + + - - -
SPG9core-SPG9outer - - - + + + - - -
SPG9core-sediment - - - + + + - - -
SPG10inner-SPG10outer - - - + + + - - -
SPG2inner-SPG2outer - - - + + + - - -
SPG2outer-SPG9outer - - - + + + - - -
SPG9outer-sediment - + + + + + - - -
SPG2inner-sediment - + + + + + - - -
SPG10inner-SPG2outer + - - + + + - - -
SPG10inner-sediment + + + - + - - - -
SPG2outer-sediment + - - + + + - - -
SPG10outer-SPG2outer + - - + + + - - -
SPG10outer-sediment
Sample Comparisons Set-up 2
outer-sediment + - - + + + - - -
core-inner - - - + + + - - -
core-outer - - + + + + + - +
core-sediment - - - + + + - - -
inner-outer - + + + + + - - -
inner-sediment + + + + + + + + +
Sample Comparisons Set-up 3
SPG10-SPG2 + + - + + + - + -
SPG10-sediment + - - + + + - - -
SPG10-SPG9 - + + + + + - - -
SPG2-SPG9 + + + + + + + - +
SPG2-sediment + + + - + + - - -
SPG9-sediment + + + + + + - - -
Sample Comparisons Set-up 4
SPG10inner-SPG2inner -
SPG10outer-SPG2inner -
SPG10inner-SPG9core -
SPG10outer-SPG9core -
SPG2.3sediment-SPG9core -
SPG2inner-SPG9core -
SPG2outer-SPG9core -
SPG9.10sediment-SPG9core -
SPG10inner-SPG9outer -
SPG10outer-SPG9outer -
SPG2inner-SPG9outer -
SPG9core-SPG9outer -
SPG2.3sediment-SPG9.10sediment -
SPG2inner-SPG9.10sediment -
SPG9.10sediment-SPG9outer -
SPG10inner-SPG10outer -
SPG2.3sediment-SPG2inner -
SPG2.3sediment-SPG9outer -
SPG2.3sediment-SPG2outer -
SPG10inner-SPG2.3sediment -
SPG10inner-SPG9.10sediment -
SPG2outer-SPG9.10sediment -
SPG2inner-SPG2outer -
SPG2outer-SPG9outer -
SPG10outer-SPG9.10sediment -
SPG10outer-SPG2.3sediment -
SPG10inner-SPG2outer -
SPG10outer-SPG2outer +
Sample Comparisons Set-up 5
SPG10inner-SPG10sediment -
SPG10outer-SPG10sediment -
SPG10inner-SPG2inner -
SPG10outer-SPG2inner -
SPG10sediment-SPG2inner -
SPG10sediment-SPG2outer -
SPG10inner-SPG2sediment -
SPG10outer-SPG2sediment -
Parsimony UniFrac Unweighted UniFrac Weighted
Appendix Table 1. Beta-Diversity Signifcance Tests For Various Beta-Diversity Calculators and Sample Designations (+ = statisitical significance)
118
SPG10sediment-SPG2sediment -
SPG2inner-SPG2sediment -
SPG2outer-SPG2sediment -
SPG10inner-SPG3sediment -
SPG10outer-SPG3sediment -
SPG10sediment-SPG3sediment -
SPG2inner-SPG3sediment -
SPG2outer-SPG3sediment -
SPG2sediment-SPG3sediment -
SPG10inner-SPG9core -
SPG10outer-SPG9core -
SPG10sediment-SPG9core -
SPG2inner-SPG9core -
SPG2outer-SPG9core -
SPG2sediment-SPG9core -
SPG3sediment-SPG9core -
SPG10inner-SPG9outer -
SPG10outer-SPG9outer -
SPG10sediment-SPG9outer -
SPG2inner-SPG9outer -
SPG2sediment-SPG9outer -
SPG3sediment-SPG9outer -
SPG9core-SPG9outer -
SPG10inner-SPG9sediment -
SPG10outer-SPG9sediment -
SPG10sediment-SPG9sediment -
SPG2inner-SPG9sediment -
SPG2outer-SPG9sediment -
SPG2sediment-SPG9sediment -
SPG3sediment-SPG9sediment -
SPG9core-SPG9sediment -
SPG9outer-SPG9sediment -
SPG10inner-SPG10outer -
SPG2outer-SPG9outer -
SPG2inner-SPG2outer -
SPG10inner-SPG2outer -
SPG10outer-SPG2outer +
119
Table 4.5. Appendix Table 2.
Appendix Table 2. AMOVA Results (+ = statistical significance)
Experiment-wise error rate 0.05 0.05 0.05 0.05
Pair-wise error rate (Bonferroni) 0.00238 0.00833333 0.00833333 0.00090909
SPG10inner-SPG10outer-
SPG2inner-SPG2outer-SPG9core-
SPG9outer-sediment +
core-inner-
outer-sediment +
SPG10-SPG2-
SPG9-sediment +
SPG10inner-SPG10outer-
SPG10sediment-SPG2inner-
SPG2outer-SPG2sediment-
SPG3sediment-SPG9core-
SPG9inner-SPG9outer-
SPG9sediment +
SPG10inner-SPG10outer - core-inner - SPG10-SPG2 + SPG10inner-SPG10outer -
SPG10inner-SPG2inner - core-outer - SPG10-SPG9 - SPG10inner-SPG10sediment +
SPG10inner-SPG2outer + core-sediment - SPG10-sediment + SPG10inner-SPG2inner +
SPG10inner-SPG9core + inner-outer - SPG2-SPG9 + SPG10inner-SPG2outer +
SPG10inner-SPG9outer - inner-sediment - SPG2-sediment + SPG10inner-SPG2sediment +
SPG10inner-sediment - outer-sediment + SPG9-sediment + SPG10inner-SPG3sediment +
SPG10outer-SPG2inner - SPG10inner-SPG9core +
SPG10outer-SPG2outer + SPG10inner-SPG9inner -
SPG10outer-SPG9core + SPG10inner-SPG9outer -
SPG10outer-SPG9outer - SPG10inner-SPG9sediment -
SPG10outer-sediment - SPG10outer-SPG10sediment -
SPG2inner-SPG2outer + SPG10outer-SPG2inner -
SPG2inner-SPG9core - SPG10outer-SPG2outer +
SPG2inner-SPG9outer - SPG10outer-SPG2sediment -
SPG2inner-sediment - SPG10outer-SPG3sediment -
SPG2outer-SPG9core + SPG10outer-SPG9core -
SPG2outer-SPG9outer + SPG10outer-SPG9inner -
SPG2outer-sediment - SPG10outer-SPG9outer -
SPG9core-SPG9outer + SPG10outer-SPG9sediment -
SPG9core-sediment - SPG10sediment-SPG2inner +
SPG9outer-sediment - SPG10sediment-SPG2outer +
SPG10sediment-SPG2sediment +
SPG10sediment-SPG3sediment +
SPG10sediment-SPG9core +
SPG10sediment-SPG9inner +
SPG10sediment-SPG9outer +
SPG10sediment-SPG9sediment +
SPG2inner-SPG2outer +
SPG2inner-SPG2sediment +
SPG2inner-SPG3sediment +
SPG2inner-SPG9core +
SPG2inner-SPG9inner +
SPG2inner-SPG9outer +
SPG2inner-SPG9sediment +
SPG2outer-SPG2sediment +
SPG2outer-SPG3sediment +
SPG2outer-SPG9core +
SPG2outer-SPG9inner +
SPG2outer-SPG9outer +
SPG2outer-SPG9sediment +
SPG2sediment-SPG3sediment +
SPG2sediment-SPG9core +
SPG2sediment-SPG9inner +
SPG2sediment-SPG9outer +
SPG2sediment-SPG9sediment +
SPG3sediment-SPG9core +
SPG3sediment-SPG9inner +
SPG3sediment-SPG9outer +
SPG3sediment-SPG9sediment +
SPG9core-SPG9inner +
SPG9core-SPG9outer +
SPG9core-SPG9sediment +
SPG9inner-SPG9outer +
SPG9inner-SPG9sediment +
SPG9outer-SPG9sediment +
Considering Site and Source Layer for
Nodules and grouping all sediment as one
group Considering only source type
Considering only Sample site
and grouping all sediment as
one group
Considering each type of sample and source
layer collectively.
120
Figure 4.1. Dendrogram representing the community similarity of the 20 different sediment
and nodule samples.
121
Figure 4.2. Three-dimensional plot of the 20 different sediment and nodule samples.
122
Figure 4.3. Rank abundance curves for the 30 most abundant OTUs (by total sequences
assigned) for the (a) sediment samples and the nodules from (b) SPG2, (c) SPG9, and (d)
SPG10.
123
Figure 4.4. Maximum likelihood phylogenetic tree constructed using 16S rRNA gene
sequences (289 bp) covering the V4 hypervariable region.
124
Chapter Five
Metagenomic analysis of the microbial communities associated with South Pacific Gyre
sediment and ferromanganese nodules
5.1. Introduction
5.1.1. Background
Much of the critical information regarding ferromanganese (FeMn) nodules and their
potential geochemical roles was detailed in the Introduction of Chapter 4. In brief, FeMn nodules
are small, metal oxides composed predominantly of manganese (Mn), iron (Fe) and a number of
other metals. The metals contained within FeMn nodules are important both biologically, as trace
element cofactors and metabolic substrates, and economically, in the form of valuable transition
metals that naturally occur at low concentrations in seawater. It is unknown the degree to which
biology may promote and maintain FeMn nodules in the environment and recent research has
implicated a possible interaction between nodules and Bacteria and Archaea. The biochemical
processes governing these interactions are poorly understood. FeMn nodules provide a number
of relevant metabolic resources to the microbial communities through abiotic processes,
including: 1) the concentration of divalent metallic cations (Dick et al., 2008b) and anionic
transition elements (Koschinsky and Halbach, 1995); 2) and, their ability to act as high-energy
electron acceptors. FeMn nodules are highly abundant at the water-sediment interface of the
South Pacific Gyre (SPG), covering up to 70% of the exposed sediment surface.
The SPG is the most oligotrophic of all the central gyres in the global oceans. Low
surface water primary production (!0.14 mg chl-a·m
-3
) directly translates to low sedimentation
rates (<0.111 cm·kyr
-1
) (D'Hondt et al., 2009), thin sediment coverage of the basalt seafloor (<3
– 130m) (Edwards et al., 2005), and low total organic carbon from recalcitrant sources.
Measurements from pore water chemistry have shown that sediment in the SPG is oxygenated
from the overlying water through the underlying basalt seafloor (D'Hondt et al., 2013). At all
previously studied marine sediment sites, oxygen and other terminal electron acceptors are
sequentially reduced via microbial metabolisms utilizing the abundant organic carbon sources as
electron donors (Biddle et al., 2008). The removal of terminal electron acceptors in the upper
meter of the sediment is minimal and is virtually negligible in the remaining portion of the
sediment column, indicating that microbial metabolism and activity is minimal in most of the
125
sediment (D'Hondt et al., 2009). The low predicted activity is reflected in microbial cell
abundance, which is below ~10
6
cells·cm
-3
in sediments at the water-sediment interface (D'Hondt
et al., 2013). This is in comparison to organic rich sediments, which can have >10
8
cells·cm
-3
.
Cell abundance rapidly decreases to below detection limit (~10
3
cells·cm
-3
) as depth increases in
the sediment column (D'Hondt et al., 2013).
In this extremely oligotrophic environment, microbes in the SPG sediment are actively
competing for energy sources. There are several potential metabolisms associated with the
presence of FeMn nodules and may have a substantial impact on the microbial activity. First,
oxidation of reduced metal species (Mn
2+
and Fe
2+
) may represent a key source of electrons in an
energy-starved environment. The abiotic oxidation of Fe
2+
to Fe
3+
occurs spontaneously in oxic
environments and it is relatively difficult for microorganisms to harness the energy derived from
this process (McAllister et al., 2011). Mn
2+
is more stable in oxic marine environment and can
persist for much longer (Tebo, 1991). The abiotic oxidation of Mn
2+
to Mn
4+
is still
thermodynamically favored and the biological energy derived from this reaction is relatively
small (Dick et al., 2008b). And second, it has been shown in other oxic marine habitats that
microbial metabolisms can reduce the oxygen concentrations in the environment to hypoxic (<2
mg O
2
·L
-1
) or anoxic levels at the µm scale (Shanks and Reeder, 1993). In these
microenvironments, Fe and Mn oxides present in the nodules may act as electron acceptors and
mediate the release of reduced metal species, spurring further microbial activity (White et al.,
2013).
5.1.2. Previous results
The results of the 16S rRNA gene survey described in Chapter 4 offer a number of
avenues through which to explore microbial functional potential. First and foremost, the most
perplexing question surrounding the taxonomic data is the lack of a previously identified metal
oxidizing organism. One possible explanation is the inherent limitation of using 16S rRNA
hypervariable regions to assign robust taxonomy, as the small size of the amplified region makes
assignment difficult. There was no evidence of any !-Proteobacteria, suggesting a lack of
neutrophilic Fe-oxidizing organisms (McAllister et al., 2011). However, both the "- and #-
Proteobacteria have previously shown to have members that oxidize Mn (Tebo et al., 2005). The
126
genomic mechanisms by which biological oxidation of Mn and Fe proceed are not fully
understood. Extracellular multicopper oxidases (MCOs) and peroxidases have been previously
shown to play key roles in Mn oxidation, though the suspected mechanisms share little in
common, including sequence homology, suggesting multiple convergent pathways have
established the ability to oxidize Mn (Anderson et al., 2009; Dick et al., 2008b; G. J. Brouwers,
2000). The processes of acidophilic and anaerobic Fe oxidation have been previously
documented (Bonnefoy and Holmes, 2011), but the process by which neutrophilic,
microaerophilic Fe oxidation proceeds is not fully understood. Genomic evidence suggests the
molybdopterin oxidoreductases may play a role (Singer et al., 2011). Identifying these genes in
the environment may offer insight in to a possible mechanism associated with nodule accretion.
The presence of !-Proteobacteria related to the order Alteromonodales could indicate the
presence of organisms related to the genus Shewanella, known to be capable of using Fe
3+
and
Mn
4+
oxides as terminal electron acceptors (Coursolle and Gralnick, 2012). A biological
mechanism for metal oxide reduction has been characterized in Shewanella oneidensis MR-1 and
other organisms and utilizes outer membrane (OM) and extracellular-bound multiheme c-type
cytochromes (c-Cyts) (Shi et al., 2012). The presence of extracellular-bound multiheme c-Cyts
has been documented in other Fe
3+
reducing organisms, such as Geobacter sulfurreducens,
where the complete suite of required genes remains unknown (Shi et al., 2007), and in Fe
2+
oxidizing organisms, such as Sideroxydans lithotrophicus ES-1 (Liu et al., 2012). Identification
and characterization of multiheme c-Cyts may suggest potential anaerobic metabolic pathways.
The most prevalent signal in the 16S rRNA gene data is the abundance and diversity of
sequences related to the MG-1 group within the Phylum Thaumarchaea. To date, all known
members of the MG-1 group are believed to be capable of aerobic ammonia oxidation (the first
step in nitrification) and carbon fixation (Walker et al., 2010). The presence of these organisms
associated with the FeMn nodules and surrounding sediments, and the observed increase in
nitrate concentrations in surface sediments (D'Hondt et al., 2009), suggests an active role in the
N cycle. Little is known about the N cycle in central gyre sediments. There is no current
understanding of the biology associated with other major N cycle process, such as nitrogen
fixation, denitrification, or anammox. Assessing the potential for these processes may shed light
127
on overall microbial community interactions and constrain our understanding of the N cycle in
oligotrophic marine sediments.
5.1.3. Current research
In conjunction with the FeMn nodule and sediment samples previously assessed for
microbial community diversity in Chapter 4, extracted DNA was amplified using a linear
amplification methodology and sequenced in order to generate a metagenomic dataset from
which to assess potential community function. Metagenomes have been used to previously to
assess overall community functional potential, by annotating and taxonomically identifying
putative coding sequences (CDSs), and to generate environmental genomes of novel organisms,
through binning and assembly protocols (Iverson et al., 2012). Metagenomes were generated
from SPG site 10 from an exterior nodule sample and from sediment collected at 0-5 cm depth
near the nodule. The metagenomic datasets were assessed for the microbial community
composition, assembled using a de Bruijn graph based metagenomic assembler, and analyzed for
potential metabolic functions utilizing both binned environmental genomes and whole
community functional annotation. The results show a diverse microbial community with
members spanning many of the major phylogenetic groups in the Bacteria and the
Thaumarchaea in the Archaea. Multiple putative CDSs were identified with homology and
similarity to potentially metal-reactive genes in the environment, including MCOs, peroxidases,
and molybdopterin oxidoreductases. Evidence from the examination of putative CDSs with
similarity to key genes in the C and N cycles and annotations in phylogenetic bins suggests that
anoxic conditions may occur in the SPG surface sediments in local environments providing an
avenue for anaerobic metabolisms.
5.2. Materials and Methods
5.2.1. Sample Collection
Sediment and FeMn nodules were collected as part of the Deep Sea Drilling Program
(DSDP) cruise 595/596 to the SPG (Dec 2006 – Jan 2007 aboard the R/V Roger Revelle).
Nodules and surface sediments were collected from SPG site 10 (39° 18.617’ S, 139° 48.036’
W). The nodule and corresponding sediment were aseptically sampled from a multicore on the
catwalk, as samples were brought onboard. Sediment from 0-5 cm was sampled from the same
128
core that the nodule was obtained and stored at -80°C. The FeMn nodule was rinsed gently with
0.2 µm filtered and autoclaved ambient bottom water to remove sediment adhering to the
surface. A flame-sterilized hammer and chisel were used to aseptically section the nodule based
on visual changes in strata. Further subsamples were generated and stored in 1.5 mL cyrovials
and stored at -80°C.
5.2.2. DNA extraction (Note: Identical to Chapter 4 extraction protocol)
Extraction of DNA from the nodule proceeded using a modified phenol-chloroform
extraction method . Approximately 0.5 mL of sample was resuspended in 675 µL of 2% CTAB
(cetyltrimethylammonium bromide) lysis buffer [100 mM Tris, 100 mM EDTA, 250 mM
Na
2
PO
4
, 1.5 M NaCl, brought to 40 mL and adjusted to pH 8.0; addition of 2% CTAB; diluted to
50 mL total volume and autoclaved] and were vortex thoroughly for 30 seconds. To each slurry,
20 µL of proteinase K (800 units!mL
-1
) was added and incubated horizontally for 30 minutes at
50°C. To each sample, 150 µL of 10% SDS was added and incubated further for 120 min at 65°C
followed by the addition of 600 µL of PCI (phenol:chloroform:isoamyl alcohol) 25:24:1
incubation for 20 min at 65°C. Samples where then centrifuged for 10 min at 10,000 " g. The
upper layer was transferred to a new tube (care was taken to avoid transferring material from the
interface or below), 0.7 volumes of isopropanol was added, and incubated for 60 min at room
temperature. Samples were again centrifuged at 10,000 " g for 15 min, 0.5 mL of cold 70%
ethanol was added, and then centrifuged for an additional 5 min. Following removal of the
supernatant, the pellet was left to air dry in a fume hood for 15-30 min (as necessary) and
resuspended in 30 µL of sterile, DNase-free H
2
O. Samples were then quantified (3 µL) using the
Qubit 1.0 fluorometer and the Qubit dsDNA HS Assay Kit (Life Technologies).
Due to low yield using the described phenol-chloroform method, extraction of DNA from
sediment samples was performed using the MoBio PowerLyzer PowerSoil DNA kit following
the manufacturer’s protocol, and quantified as above.
5.2.3. DNA Amplification, Sequencing, and Quality Control
Amplification of extracted DNA was performed according to the Nugen® Ovation
Ultralow Library System protocol with slight modifications. In brief, 10 µL of template DNA
129
(5.94 ng sediment DNA and 13.0 ng nodule DNA) were sheared using the Covaris® Focused-
ultrasonicator in 130 µL microTUBEs. Sonicator settings were set for 64 sec, at Intensity 3, Duty
Cycle 5%, and Bursts/cycle 200. Further amplification steps strictly adhered to the Nugen®
Ovation Ultralow Library System protocol. Amplification success was determined using a 16S
rRNA gene PCR amplification step using universal primers designed to target both Bacteria and
Archaea (Chapter 4). Amplified genomic DNA was sequenced at the University of California
Davis Genome Facility using the Illumina MiSeq 2!260 library chemistry.
Generated sequences were processed to remove low quality sequences. Sequences were
trimmed of the Illumina library adaptors using the program CutAdapt (V1.1) (Martin, 2011). For
the forward (61bp) and reverse (58bp) adaptors, settings were set such that a minimum overlap
of 12 base pairs (-O 12) with an expected maximum error rate of 0.50% (-e 0.005) was required
to remove the adaptor sequence. Any sequence without the appropriate adaptor was excluded
from further analysis. The remaining sequences were further trimmed based on quality score
using Btrim using a sliding window of 20bp (-w 20) that required an average quality score of 24
based on the Sanger quality scoring scale within that window (-a 24 -S) and an approximate
insert length of 105bp (-l 105) (Kong, 2011). Two different sets of files were generated for
paired-end sequences (PE), where both sequences successfully passed trimming, and for
sequences were only a single sequence (SE) passed trimming. PE sequences would be used for
all future assembly attempts.
5.2.4. Reconstruction and Taxonomy of Full-Length 16S rRNA Genes
Putative 16S rRNA gene fragments were detected using a RNA hidden Markov model
(HMM). To increase the fidelity of the search, the program FLASH (Parameters: default; Min
overlap: 10; Max overlap: 70; Max mismatch density: 0.25) was used to test PE sequences for
overlapping sequences that could be used to form a longer single sequence (Magoc and Salzberg,
2011). All combined sequences, remaining PE sequences, and SE sequences were searched using
the Meta-RNA program (-m ssu -e 10
-10
) to detect putative 16S rRNA gene fragments (Huang et
al., 2009).
130
Identified putative 16S rRNA gene fragments were then processed using EMIRGE to
reconstruct full-length sequences (Miller et al., 2011). The SILVA SSU reference database
(V111) (Quast et al., 2013) clustered at 98% sequence similarity was used as the search database.
The script emirge_amplicon.py was used over 40 iterations to generate full-length sequences
(nodule: -l 381; sediment: -l 509). The insert size and standard deviation parameters were set to a
specific value (-i 200 -s 50) as required by the program, despite the sequences no longer being in
paired-end format. Multiple attempts using different insert size and standard deviation values had
little-to-no effect on the final number of 16S rRNA genes generated. Two attempts were made
using a eukaryote and combined prokaryote-eukaryote SILVA SSU database to test for the
presence of 18S rRNA gene fragments. A single 18S rRNA sequences was generated.
Full-length 16S rRNA gene sequences were aligned to the SILVA database of all three
domains in the program mothur (Schloss et al., 2009). Implemented within mothur, sequences
were screened for possible chimeric sequences using Chimera Slayer (Haas et al., 2011) and the
SILVA GOLD (V111) reference database. Putative taxonomic assignments were assigned using
1,000 iterations (iters=1000) within the classify.seqs command and the three-domain SILVA
database. The bootstrap cutoff value for taxonomic assignment was set at 80 (cutoff=80).
Full-length 16S rRNA gene sequences were aligned to the SILVA 16S reference database
(V111) and added to the phylogenetic tree in the ARB package (Pruesse et al., 2012) using the
“ARB Quick Parsimony” function to “hang” sequences on the pre-computed tree. From the
nodule sample, a single 16S rRNA gene sequence clustered with the Escherichia coli. It was
classified as a contaminating sequence and removed from further analysis. At least one nearest
neighbor was chosen for all other sequences and if the sequence grouped in a clade with a
sequenced isolate, the 16S sequence of the isolate(s) was included. Sequences were aligned using
the CLUSTALW (Thompson et al., 1994) and manually trimmed. Sequences that were too short
were removed from further re-alignments and were not included in the final phylogenetic tree
(Sediment, n = 10; Nodule, n = 1). All phylogenetic trees were constructing using maximum
likelihood within the program PHYML (Guindon et al., 2010), the HKY85 substitution model
(Hasegawa et al., 1985), and 1,000 bootstraps. An archaeal phylogenetic tree was constructed for
archaeal sequences from both the sediment and nodule samples. The representative 16S rRNA
131
sequences from Durbin and Teske (Durbin and Teske, 2010) were added to the tree. A single
bacterial tree was constructed for the nodule sequences, while three trees were constructed for
the sediment sequences to decrease the number of branches on individual trees.
5.2.7. Sequence Assembly
All sequences that passed quality control and remained in a PE format were subjected to
assembly using the IDBA-UD assembler, a de Bruijn graph assembler, to generate contigs, a
contiguous set of overlapping DNA sequences (Peng et al., 2012). The assembler was run using
the default parameters, except for the use of the “--pre_correction” flag, recommended for true
metagenomic sequences. Several other de Bruijn assemblers were tried (Velvet (Zerbino and
Birney, 2008) and Meta-Velvet (Namiki et al., 2012)), but results from IDBA-UD were superior
in the overall length of assembled data, the number of long contigs (>2,000 bp), and N50 value.
5.2.8. MG-RAST Annotations and Sequence Analysis
Assembled contigs were submitted to the MG-RAST analysis pipeline (Meyer et al.,
2008). In brief, MG-RAST is an automated pipeline that can determine taxonomic and metabolic
assignments for predicted CDSs. CDSs are determined using the FragGeneScan program (Rho et
al., 2010) and metabolic assignments are made using similarity searches against a number of
publically available databases, including NCBI nr, KEGG, SEED, SILVA, UniProt, etc., and a
final assignment is based on the collective annotation of the MG-RAST non-redundant database,
known as M5NR (Wilke et al., 2012).
Several categories of genes were of interest for determining the possible mechanism of
metal oxidation in the system. The collective group of genes of interest (GOIs) was obtained
using a number of methods. (1) MG-RAST annotations were searched based on the putative
function assigned to the putative CDSs. A list of key words was searched against the total set of
putative CDSs. The list included the words “multicopper”, to target putative multicopper
oxidases, “heme cytochrome”, for which only multiheme cytochromes with !4 heme groups
were of interest, and “animal heme” and “animal haem”, to target homologs of animal heme
peroxidases. Putative CDS identified using the MG-RAST term search were assigned a putative
cellular localization using the PSORTb server (Yu et al., 2010).
132
(2) Searches were performed of the putative CDSs from both the nodule and sediment
using HMMs. To utilize HMM searches, HMM models were constructed using the hmmbuild
function in HMMER (V 3.0) (Finn et al., 2011). Prior to building the HMM model, sequences
were collected from various sources (IMG, NCBI nr, UniProtKB, etc.) and aligned using
CLUSTALW. Homologs were collected for the genes involved in a number of processes of
interest. (a) Thirty genes from Shewanella species and various diverse organisms involved with
electron transfer during metal oxide reduction in Shewanella oneidensis MR-1 (MtrABC).
Sequences were collected using the IMG Orthologous neighborhood tool (Markowitz et al.,
2006). (b) Genes belonging to the fungal laccases (a multicopper oxidase family gene) shown to
oxidase Mn
2+
to Mn
3+
from UniProt (E.C. No. 1.10.3.2). (c) Molybdopterin oxidoreductase
(MOB) genes predicted to be involved with neutrophilic Fe
2+
oxidation from Mariprofundus
ferrooxydans PV-1 and 5 other genes based on sequence similarity from known Fe
2+
oxidizing
organisms. (d) Conserved genes involved with the generation of magnetosomes that include
MamFDCEHMOA. Genes were identified in Magnetospirillum gryphiswaldense MSR-1 and
several (5-6) homologs were identified either in syntenic operons using the IMG Orthologous
neighborhood tool or in nr using BLASTP with similarity above 40% AAID.
Once the HMM model was built, it was searched against the total putative CDSs for both
the sediment and nodule using the hmmsearch function in HMMER3 (E-value = 1!10
-5
).
(3) The KEGG Map output from MG-RAST was used to visually determine the
presence/absence of genes involved in metabolic pathways related to C and N cycling. Identified
pathways for C cycling included the Calvin cycle, the 3-hydroxypropionate/4-hyxdroxybutyrate
cycle, and C1 metabolisms. For N cycling, nitrification, assimilatory nitrate/nitrite reduction, and
denitrification were identified. For the identified pathways, genes for the entire metabolic
pathway (e.g., denitrification) or marker genes (e.g., RuBisCo for the Calvin cycle) were
downloaded based on EC Number from the KEGG database (Kanehisa et al., 2012). Multiple
sequences were downloaded from several organisms belonging to different taxonomic groups.
Genes were searched against the putative CDSs from both the sediment and nodule using
BLASTP (E-value = 1!10
-10
) (Altschul et al., 1997). A putative taxonomic assignment was made
133
based on similarity to sequences in nr using BLASTP (E-value = 1!10
-30
). Putative function and
taxonomies was assigned if >65% of the contig sequence was covered at >45% AAID. The
lowest common phylogenetic level was assigned to the SPG putative CDS based on manual
inspection.
5.2.9. Phylogenomic Trees
Phylogenomic trees were constructed for the genes identified as homologs to Fe
2+
-related
MOBs and fungal laccases (MCOs). In general, the genes used to generate the HMM modules
were aligned to the identified putative CDSs (MOB E-value = 1!10
-20
; fungal laccase E-value =
1!10
-15
) from the SPG samples using CLUSTALW. After alignment, sequences that were too
short were removed from further analysis. Sequences were added with known functions in order
to assign putative functions to unique clades generated from the SPG CDSs. MOB homologs
included genes with the following functions: nitrate reductase (NAR), dimethyl sulfide reductase
(DMSR), polysulfide reductase (PSR), formate dehydrogenase (FDH), alternative complex III
(ACII) and tetrathionate reductase (TTR). Fungal laccase homologs included MCOs with
putative Mn
2+
oxidation funcationality from Bacillus sp. SG-1 (MnxG), Pseudomonas putida
MnB1 (CumA), Aurantimonas sp. SI85-9A1 (MoxA1 and MoxA2) (Dick et al., 2008b),
Pedomicrobium sp. ACM 3067 (MoxA), and MCOs with known copper resistance function
(Tebo et al., 2005). Maximum likelihood phylogenomic trees were constructed using PHYML,
the JTT subsitution model (Jones et al., 1992), and 1,000 bootstraps.
5.2.10. Taxonomic Binning
Sequences were binned into putative taxonomic groups using two different methods. (1)
For the sediment sample, which had a larger N50 compared to the nodule sample, assembled
contigs >3,000 bp in length were binned into putative taxonomic groups using hierarchical
clustering of tetranucleotide frequencies, as in Chapter 4, with a Pearson correlation cutoff of
0.85 (this value was set manually based on observed clustering patterns of the sequences and
increases the possibility of chimeric sequennce bins). Due to the smaller nature of contigs
generated from the nodule sample, clustering was performed on all sequences >2,500 bp in
length, with a cutoff of 0.85. Putative phylogenetic bins were annotated using the RAST
annotation pipeline (Aziz et al., 2008). (2) Unassembled PE sequences were processed using
134
IDBA-Hybrid, an assembler that uses an initial alignment to a reference genome to generate
“seed” contigs, which are then extended through similarity searches to unrecruited sequences
(Peng et al., 2012). The assembler was run using the default parameters, except that the
similarity overlap identity was set to 60% (--similar 0.60) to capture the maximum number of
sequences in the assembly. The collective set of assembled contigs was considered to represent a
single taxonomic bin. Reference sequences were chosen based on the number of recruited
putative CDSs to isolate genomes in the MG-RAST output.
5.3. Results and Discussion
5.3.1. Sequencing and Assembly Results
For all categories, the SPG sediment sample produced a superior number of high quality
sequences and assembled contigs compared to the SPG nodule sample (Table 1). These
variations could be due the differences in the extraction methodology used or the quality/quantity
of the sample DNA recovered. The nodule samples likely have higher residual metal
concentrations, which are known to bind DNA molecules and inhibit PCR amplification
techniques (Stein et al., 2001). Results from the 16S rRNA study (Chapter 4) indicate that the
nodule microbial community is less diverse than the sediment, but the less successful assembly
output may indicate a microbial community with complex genomic nature that is difficult to
resolve using short read sequences and/or with de Bruijn graph assemblers.
5.3.2. Full-length 16S rRNA Gene Analysis
5.3.2.1. EMIRGE Output Statistics
The 16S rRNA gene has been used extensively as a phylogenetic marker to measure
microbial diversity and abundances (Sogin et al., 2006). This metagenome is not subject to the
same biases as 16S rRNA amplicon methodologies, like those used in Chapter 4, and analysis
was performed to determine the rRNA signature from within the dataset. From a total of
35,465,474 and 22,101,498 sequences, the program rna_hmm3.py identified 12,603 (0.036%)
and 17,531 (0.048%) putative rRNA fragments for the sediment and nodule samples,
respectively. This percentage of 16S rRNA sequences from the metagenome is similar to values
obtained from metagenomes in other environments (Fan et al., 2012).
135
Furthermore, unlike rRNA amplicon methodologies that target and amplify a specific
region of the target gene, for which the length of the region is determined by the available
sequence technology, the rRNA signature in the metagenome contains full-length 16S rRNA
sequence, if it can be assembled correctly. The program EMIRGE is designed to assemble
metagenomic rRNA fragments in to full-length sequences utilizing an alignment method to an
rRNA database. EMIRGE was able to align at least a single sequence to 16S rRNA database for
61.90% and 46.87% of the putative rRNA fragments generating 227 and 46 full-length 16S/18S
rRNA sequences from the sediment and nodule samples, respectively. These values are in
agreement with the species richness seen in the 16S rRNA amplicon results (Chapter 4), where
the sediment sample had higher richness compared to the nodule samples.
The EMIRGE output includes a prediction of the relative abundance for each full-length
rRNA sequence generated based on the sequence length of reference and the number of
sequences aligned to the reference. The most abundant sequence from both the sediment and
nodule was estimated to be <7.0% (6.6% and 6.8%, respectively) of the microbial community,
with many of the sequences estimated to be about 1-2% of the total microbial community, or
lower (Table 2). These values do not agree with the 16S rRNA data in Chapter 4 and may be an
artifact of how the EMIRGE program processes the relative abundance of the Meta-RNA pre-
screened PE and SE sequences. Conversely, these values may represent more realistic values of
relative abundance for the environmental communities, as the random amplification applied to
generate the metagenomic libraries do not have the same biases as targeted PCR amplification
(e.g., primer annealing biases, uneven amplification, etc.). If the microbial community is more
even in species abundance, as the EMIRGE results suggest, this may explain the limited number
of large contigs generated from the sediment and nodules samples, as IDBA-UD relies on
variations in k-mer abundance to separate putative microbial genomes.
5.3.2.2. Archaeal Phylogenetic Diversity
EMIRGE generated 10 and 5 full-length rRNA sequences for the sediment and nodule
samples, respectively, all of which were assigned to the marine group-1 clade (MG-1) within the
Phylum Thaumarchaeota using the ARB phylogenetic tree (Figure 5.1). Taxonomic assignments
of the full-length sequences using mothur and the SILVA GOLD database assigned 9 and 5
136
rRNA sequences to the MG-1 for the sediment and nodule, respectively. These values are lower
than the diversity seen in the 16S rRNA amplicon results (Chapter 4) and are likely the result of
EMIRGE using a dissimilarity value of 4% to align reads to the 16S rRNA database, and the
clustering for the 16S rRNA amplicon study was performed at 1% dissimilarity. The higher
dissimilarity value was used to ensure that complete full-length rRNA sequences were generated,
as a lower threshold may was resulted in the breaking of the alignments, and to generate a
manageable number of sequences, that could be processed in the available time and on the
available hardware. As seen in the diversity of archaeal sequences for the 16S rRNA amplicon
results, the EMIRGE full-length 16S sequences fall in to many of the previously described
subgroups within the MG-1 group (Massana et al., 2000) and the same subgroups as represented
by the 15 most abundant MG-1 operational taxonomic units (OTUs) (Figure 5.1). This indicates
that while much of the diversity of the 16S rRNA amplicon results has been compressed into
fewer sequences, these full-length sequences still represent all of the inherent diversity of the
most abundant OTUs.
5.3.2.3. Bacterial Phylogenetic Diversity
EMIRGE generated 216 and 41 full-length rRNA sequences for the sediment and nodule
samples, respectively, that could be assigned within the domain Bacteria, and taxonomic
assignments of the full-length sequences using mothur and the SILVA GOLD database assigned
214 and 41 bacterial sequences for the sediment and nodule, respectively. The shift from
abundant archaeal sequences in the 16S rRNA amplicon data to a putatively bacterial dominated
community in the full-length rRNA data allows for more detailed analysis of the bacterial
community. In practical terms, the relatively low abundance for most of the full-length rRNA
sequences predicted during the EMIRGE reconstruction step may account for the discrepancy
between the number of bacterial sequences reconstructed and number of bacterial OTUs
identified in the rRNA amplicon data. Detailed analysis was only performed on the most
abundantly recovered sequences from the amplicon data and sequences that were not analyzed in
detail were below ~5% of the predicted relative abundance. If the relative abundances
determined by EMIRGE for the metagenomes are accurate, despite the diversity of a particular
bacterial group (potentially containing >3 full-length rRNA sequences), in actuality these
137
sequences may only represents a small portion of the total community (<1%) with distinct
phylogenetic diversity.
Many of the bacterial sequences fall into novel divisions or clades within phylogenetic
groups. Sediment sequences were assigned to the candidate divisions WS3, TA06, OP3 and
BRC1 (Figure 5.2A & B). These groups represent truly novel bacterial diversity and the lack of
representative genomes makes it difficult to predict putative functions or assign putative
taxonomies to phylogenetic bins. For some of the sediment sequences that group within
established Orders and Classes, many of the sequences fall within novel clades within these
groups, such as the Chloroflexi-SAR202, Acidobacter-RB25, -PAUC26f, -DA052, -KF-JG30-18,
and Actinobacteria-OSC155 groups, further confounding the possible community functions these
sequences may represent. Both the nodule and sediment bacterial communities have sequences
that phylogenetically group with !-, "-, #-, and $-Proteobacteria, but the nodule sample contains
the only sequences related to the %-Proteobacteria (Figure 5.2C & 3). The Proteobacteria have a
number of identified organisms with metabolisms related to Mn
2+
oxidation and metal oxide
reduction (Anderson et al., 2009), such that increased resolution offered by full-length rRNA
sequences may support further analyses in regards to identifying organisms involved in nodule
formation and/or maintenance.
A single 18S rRNA sequence from the sediment EMIRGE output was assigned to the
genus Puccinia within the fungi (EMIRGE Normalize Abundance = 0.18%). P. hordei and P.
pelargonii-zonalis are the closest relatives and are both spore forming land plant pathogens. P.
hordei specifically causes the disease Barley Rust. This sequence may represent Puccinia spores
that were deposited from terrestrial inputs, as Puccinia spores have been shown to travel great
distance via wind dispersal delivery mechanisms (Aylor, 1987). Though it cannot be ruled out
that these sequences represent an endemic fungal population in the SPG sediment community, a
property of marine sediments that appears more common than previously thought (Edgcomb et
al., 2010; Orsi et al., 2013).
138
5.3.3. Functional Assignments of Putative Coding Sequences
5.3.3.1. Putative Metal Biochemistry CDSs
The biological mechanisms by which microorganisms perform Mn
2+
and aerobic
neutrophilic Fe
2+
oxidation are poorly understood (Liu et al., 2012; Singer et al., 2011; Tebo et
al., 2005). Both oxidation processes occur spontaneously in aerobic systems, though Mn
2+
is
stable in aerobic environments for much longer than Fe
2+
and generates less overall reducing
power. The oxidation of Mn
2+
has been linked with a microbial necessity for reduced energy
sources in energy-deplete environments, such as aged colonies on Petri plates (Cahyani et al.,
2009), and could act as an alternative energy source for microorganisms in the energy-starved
SPG sediments. For Mn
2+
oxidation, both MCOs and animal heme peroxidases have been
identified as families of proteins that may play key roles in oxidation events (Anderson et al.,
2009; Brouwers et al., 1999; Dick et al., 2008b). However, identified genes are varied in
structure and lack homology, making classification of related genes through common shared
domains impossible. The mechanism for Fe
3+
oxide reduction has been identified in Shewanella
species and involves the protein MtrA, a periplasmic decaheme cytochrome from the DmsE
protein family anchored to the OM that has been demonstrated to transfer electrons from the cell
interior to extracellular-bound cytochromes (Coursolle and Gralnick, 2012). Multiheme
cytochromes with !4 heme groups continually appear as significant electron conduits for
transferring reducing power across a membrane. Further, a homolog of MtrA has been identified
in the neutrophilic Fe
2+
-oxidzing Sideroxydans lithotrophicus ES-1 with the demonstrated ability
to shuttle electrons in a similar fashion to MtrA (Liu et al., 2012).
The putative CDSs generated using the MG-RAST analysis pipeline were searched for
sequences with possible functional roles related to MCOs, animal heme peroxidases, and
multiheme cytochromes (Table 3). For all of the categories, the sediment sample had an order of
magnitude more genes assigned to these functional categories. An attempt was made to identify
the putative localization for each of the possibly metal reactive identified CDSs. As the process
of Mn
2+
and Fe
2+
oxidation results in the precipitation of insoluble oxides that would cause cell
lysis should the metabolic waste accumulate within the cell. The localization of MCOs, heme
peroxidases, and multiheme cytochromes in the OM or exported as extracellular enzymes may be
indicative of a role in metal oxidation.
139
The PSORTb server utilizes a HMM search database of various localization signals (e.g.,
transmembrane helices, signal peptides, etc.) to estimate the final localization of a putative
protein sequences. Only two putative CDSs from the nodule and sediment had external cell
localization and none of the identified CDSs had prediction in the OM. The putative CDS from
the nodule was an MCO with a cytoplasmic localization signal, but contained a secondary signal
as an external spore coat protein. The spore-related MCO MnxG has previously been identified
on Bacillus sp. SG-1 spores and possesses Mn
2+
oxidation capabilities (Dick et al., 2008a). The
putative CDS from the sediment with predicted extracellular localization belonged to the animal
heme peroxidase protein superfamily (Pfam No. PF03098). An OM bound animal heme
peroxidase with hemolysin-like Ca
2+
binding regions that belongs to the RTX protein family has
been isolated and identified as the major Mn
2+
oxidizing enzyme in an !-Proteobacterium
Aurantimonas sp. SI85-9A1 (Anderson et al., 2009). Manganese peroxidases (MnPs) have been
identified previously in fungi (Wong, 2009), but their role was not understood in prokaryotic
Mn
2+
oxidation events until detected in Aurantimonas sp. SI85-9A1. The identified extracellular
animal heme peroxidase identified in the SPG sediment sample also possesses Ca
2+
binding
regions and is related to the RTX protein family. The combination of the conserved domains
between the SPG sediment CDS and the Aurantimonas sp. SI85-9A1 MnP, the putative animal
heme peroxidase function, and the putative extracellular localization may indicate that this
environmental CDS is an MnP in the SPG sediment environment.
For the other putative CDSs with possible roles in metal chemistry identified using MG-
RAST, a majority (57%) could not be assigned localization prediction, likely due to the small
size of the assembled fragments, limiting identifiable markers, and the incomplete nature of the
PSORTb database (Table 3). A small number of putative CDSs were identified as having
predicted localization in the periplasmic space (16%) and the remaining sequences were
predicted as either cytoplasmic or cytoplasmic membrane in nature. It is a possibility that the
periplasmic sequences still play a role in metal chemistry in a number of ways. First, periplasmic
cytochromes may play a role in shuttling electrons across the cytoplasmic membrane to unknown
OM bound/embedded metal reactive enzymes, similar to the role of MtrA in Shewanella species.
Second, for genes like MtrA, the PSORTb database assigns the localization as periplasmic,
140
though evidence shows that MtrA is embedded in the OM, further increasing the possibility that
a periplasmic assigned putative CDS may in fact be part of the OM mosaic. The SPG putative
CDSs may lack the known signals for localization and cannot be identified by the PSORTb
database. And lastly, the folded protein may be transferred across the OM by a twin-arginine
translocation (TAT) secretion pathway and the short length of the putative CDSs lack the twin-
arginine signal motif (Voulhoux et al., 2001).
5.3.3.2. Phylogenomics of Putative Metal Biochemistry CDS
MG-RAST only utilizes sequence similarity searches to determine the putative function
of putative CDSs. As such, other forms of detecting homology, such as HMMs based on
sequence alignments of homologous proteins, offers an opportunity to detect more divergent
forms of function and homology. HMMs were constructed for homologs of three sets of GOIs:
(1) MtrABC, (2) MOBs and (3) fungal laccases. The MtrABC proteins form a protein complex
associated with the OM of metal oxide reducing organisms. MtrA is a decaheme c-Cyt inserted
within the ring-like structure of MtrB, which is embedded in the OM (Shi et al., 2012). The
linear form of MtrA has been shown to accept electrons from cytoplasmic membrane c-Cyts and
transfer these electrons directly to MtrC, an extracellular c-Cyt, which can then deposit electrons
on external electron acceptors (Shi et al., 2012).
MOB proteins represent a conserved family of proteins that do not always contain
molybdenum as co-factor (Yanyushin et al., 2005). The functions of MOBs are varied and
include NARs, DSMRs, PSRs, FDHs, ACIII and TTRs. The functional groups of the various
MOB proteins are orthologous. The comparison of the putative CDSs with MOBs of known
function could allow for some degree of prediction of putative function. A MOB has been
identified in M. ferrooxydans PV-1 as a gene with putative function involved with Fe
2+
oxidation
(Singer et al., 2011). Clustering of putative SPG MOBs with the M. ferrooxydans PV-1 homolog
could indicate environmental CDS with putative roles in metal oxidation chemistry.
Laccases are a group of MCOs found in fungi, but have also been identified in bacteria
(Givaudan et al., 1993), that can oxidize Mn
2+
to Mn
3+
and Mn
4+
(Anderson et al., 2009). For
organisms that generate the Mn
3+
intermediate, it is used to cleave covalent bonds within lignin
141
and humic acid compounds (Wong, 2009). Microorganisms in the SPG sediment environment
may utilize laccase-like enzymes to actively degrade recalcitrant carbon compounds or possibly
to generate energy through the complete oxidation of Mn
2+
to Mn
4+
. Further, Mn
3+
can
abiotically disproportionate in to Mn
2+
and Mn
4+
(in the form of MnO
2
) if the Mn
3+
intermediate
is not used immediately (Tebo et al., 2005). This could occur if the Mn
3+
was stored or
transported using a Mn
3+
-siderophore complex.
Several putative CDSs were identified using the MtrABC HMM models. Most of these
sequences were from the sediment environment (MtrA homologs = 23; MtrB homologs = 6;
MtrC homologs = 1). Only two MtrA homologs were identified in the nodule sample. The
presence of MtrABC homologs in the SPG environment may suggest that external electron
acceptors are being using instead of the readily available O
2
. This could be an indication that in
local environments the use of alternative electron acceptors may be required for microbial
metabolisms. Interestingly, MtrA has a homolog in the genome of the Fe
2+
oxidizing !-
Proteobacterium S. lithotrophicus ES-1, MtoA, which has experimentally been shown to transfer
electrons in a similar manner to MtrA (Liu et al., 2012). This may indicate a possible function of
MtoA transferring electrons in to the cell, after the oxidation of reduced metal substrates in the
environment. The role these homologs play in the SPG sediment environment cannot be
completely parsed with the current dataset, but may offer future avenues of research.
The putative CDSs identified via the HMM search as possible MOBs, cluster in a number
of the known MOB functional groups. After trimming shorter sequences, 1 and 28 putative
MOBs remained from the nodule and sediment datasets, respectively. Sequences from the SPG
metagenome clustered with a number of clades with known functional activities, including two
different forms of formate dehydrogenase (FdnG and FdhA) and a divergent pair of sequences
that branch deeply with the PSRs (Figure 5.4). The FdhA clade includes the single sequence
aligned from the nodule samples. Two distinct phylogenetic clusters occur in relation the
proteins with putative Fe
2+
oxidizing function. Three sequences from the SPG cluster with the
MOB identified in the M. ferrooxydans PV-1 genome as potentially important in Fe oxidation
and with three homologs from Geobacter species (G. sp. M18, G. uraniireducens Rf4 and G.
metallireducens GS-15). Nine sequences cluster with S. lithotrophicus ES-1, Leptothrix
142
ochracea, and Gallionella capsiferriformans, all three have metal-related biochemistries,
including Fe
2+
oxidation. The function of MOB in M. ferrooxydans PV-1 has yet to be
established, but a syntenic operon containing the MOB and other proteins with putative Fe
oxidation function have been identified in M. ferrooxydans PV-1, all three Geobacter species,
and S. lithotrophicus ES-1. The dual nature of these phylogenetic clusters may indicate that these
two clades perform different functions related to metal biochemistries, but the presence of the
SPG sequences related to M. ferrooxydans PV-1 may support a role of Fe
2+
oxidation within the
sediment community. Lastly, two SPG sequences form a single clade without know orthologs,
for which the putative function is not predictable.
Using a laccase HMM search model, 40 and 3 putative CDSs from the sediment and
nodule, respectively, were identified as possible laccase homologs. After trimming, 30 and 2
putative CDS from the sediment and nodule, respectively, were used to generate the
phylogenetic tree (Figure 5.5). Ten of the sediment sequences clustered with MCOs with
identified functions related to copper resistance, but three sediment sequences clustered with
MCOs with putative Mn
2+
oxidation function. The MCOs in A. sp. SI85-9A1 (MoxA1 and
MoxA2) and Pedomicrobium sp. ACM 3067 (Mox), once were thought to be the major
component of Mn
2+
oxidation in these organisms, have been identified as having a less dominant
role in Mn oxidation (Anderson et al., 2009). These enzymes still have some Mn oxidizing
ability, and evidence suggests this oxidation may occur in a laccase-like fashion. Interestingly,
basal to the putative Mn-oxidizing MCOs is a large group of sequences (n = 10) without
identifiable function and limited similarity in the nr database (<45% AAID). Only half of these
sequences had been identified previously by MG-RAST. Based on PSORTb localization
predictions, only two sequences could be assigned putative localization within the periplasmic
space, and the rest were unidentifiable. The MCO with a localization prediction of spore coat
(see above) was included in the final tree and this sequence had limited homology to the Bacillus
sp. SG-1 MnxG Mn-oxidizing MCO.
Collectively, these methodologies have identified a number of putative CDSs from the
nodule and sediment environment with possible Mn and Fe oxidation capabilities. Homologs
exist in the SPG environment for all of the known biochemical pathways associated with Mn and
143
Fe oxidation. The identification of both a putative extracellular MCO related to MnxG and
animal heme peroxidase related to the functional MnP in A. sp. SI85-9A1 offers promising
targets for future identification of Mn active proteins. Further, there is convincing evidence of
MtrA homologs, MOBs, and MCOs with putative Fe and Mn metabolic functions present in the
environment, including a novel clade of MCOs basal and deeply branching to the known Mn
reactive MCOs. The number of possible metal reactive putative enzymes in the SPG nodule-
sediment environment could suggest that metal related metabolisms are an important biological
function within the microbial community, but further research is required to completely
understand the full mechanisms involved.
5.3.3.3. Identification of Putative Magnetosome Genes
The mechanism for the generation of the FeMn nodules in the SPG environment remains
undefined; one further possible mechanism for the accumulation of Fe oxides in deep marine
sediments is the biomineralization of magnetite (Fe
3
O
4
) and greigite (Fe
3
S
4
) in the
magnetosomes of magnetotactic bacteria (Abreu et al., 2011). Magnetotactic bacteria utilize
magnetosomes to orient the cell within a magnetic field. The genes involved in magnetosome
formation (mam) have been studied for several magnetotactic bacteria and revealed that the key
genes reside in several conserved operons in a genomic island (Grunberg et al., 2001; Matsunaga
et al., 2005).
An HMM search was performed utilizing the most conserved mam genes identified in
both comparative proteomic and genomic studies from two different operons (MamFD from the
mamGFDC operon and MamCEKMOABS from the mamAB-like operon) (Jogler and Schuler,
2009). Homologs for all of the genes in the mamAB-like operon, except for MamS, were
identified, while the genes from the mamGFDC operon were not. Not all the genes in the
mamAB operon are conserved in all of the magnetosome-forming bacteria. Previously, it has
been shown that the mamAB operon is the only operon required for magnetosome formation
(Lohsse et al., 2011). The lack of a MamS homolog may suggest that this gene is not conversed
in all mamAB operons, or possibly that these genes do not represent putative magnetosome
formation in the environment. The biomineralization of Fe oxides in the SPG sediment
environment may play a yet unknown role in the formation and maintenance of FeMn nodules.
144
5.3.3.4. N & C Cycling
Little is known about the metabolisms related to the cycling of C and N compounds in the
SPG sediment environment. The SPG sediment is extremely oligotrophic and the presence of O
2
throughout the sediment column suggests that the limited availability of reduced carbon
compounds limits microbial respiration. The upper cm of the sediment does have a measurable
decrease in O
2
and an increase in NO
3
-
concentrations; these profiles are linked to the highest
microbial cell counts in the sediment column (D'Hondt et al., 2011). The assumption could be
made that the NO
3
-
peak is related the complete nitrification NH
3
to NO
3
-
and this assumption
could be supported by the presence of the MG-1 Thaumarchaea, which are known to participate
in the first step of NH
3
oxidation and fix CO
2
, but how this relates to the cycling of C and N in
the SPG environment remains unclear. As such, the MG-RAST KEGG map output was used to
identify putative metabolic pathways of interest related to C and N cycling.
Several pathways of interest were identified in regards to C cycling in the SPG sediment.
Sequences representing these metabolic pathways were selected from the KEGG Ontology
database and searched against the putative CDSs. There was indication for carbon fixation
through both the Calvin-Benson cycle and the
3
OH-propoinate/
4
OH-butyrate cycle. Further,
several genes related to C1 metabolisms were identified. The key genes of the
3
OH-
propoinate/
4
OH-butyrate cycle were found and were all related to the MG-1 Thaumarchaea,
suggesting that these organisms may be potentially fixing C in energy-starved sediments. Form I
(cbbL) ribulose-bisphosphate carboxylase (RuBisCo) could be identified in both the nodule and
sediment samples. Three different putative RuBisCo CDSs were identified in the sediment.
Phylogenetically these sequences were related to the Nitrosomonadales within the !-
Proteobacteria, an Order than contains both the Genera Nitrosospira and Nitrosomonas. Both
genera are lithoautotrophic and may be actively fixing C. Five different putative RuBisCo CDSs
were identified in the nodule sample. One could be identified as belonging to the Genus
Nitrosospira; while the other four could be loosely identified taxonomies, such as
“environmental bacterial sequence” or generally within the Phylum Proteobacteria. There had
been no evidence of !-Proteobacteria in the most abundant operational taxonomic units of the
16S rRNA gene community analysis.
145
The presence of genes related to C1 metabolisms is an interesting feature of the aerobic
SPG sediments that may represent metabolic pathways from which energy can be supplemented
for cellular growth and maintenance. Generally, C1 compounds, such as formate and methanol,
are considered to be products of anaerobic metabolisms, either through the oxidation of CH
4
or
via fermentation reactions. Yet, homologs of CO dehydrogenase, formate dehydrogenase, and
methanol dehydrogenase could be identified in both the nodule and sediment microbial
communities. For both the nodule and sediment, the aerobic carbon monoxide dehydrogenase
small subunit (CoxS) is more diverse in the dataset than the corresponding medium subunit
(CoxM), with a 9:1 ratio in the nodule and an ~4:1 ratio in the sediment. The cox subunits in the
nodule correspond to the !- and "-Proteobacteria, the Actinobacteria and the Bacteroidetes,
while in the sediment the subunits correspond to the same groups and include the Firmicutes,
Chloroflexi, Planctomycetes, and the Archaea. The presence and diversity of cox-related putative
CDSs may indicate a relative importance of CO as an electron donor in the SPG sediment
environment. The large number of CoxS and the diversity of cox-related subunits may indicate
that this technique is not accurate enough to resolve the true function of the dehydrogenase
subunits.
Further, dissimilatory formate and methanol dehydrogenases were identified in the
metagenomic dataset. The presence of the FDHs was identified previously when examining the
phylogenomic relationships of the MOBs. The dichotomous nature evident in the phylogenomic
tree is apparent in the sediment dataset with one “seed” FDH having similarity to a small number
of (n = 5) putative CDSs with putative taxonomic assignments to the Aquificae and the !-
Proteobacteria, while the other “seed” FDH recruited predominantly !-Proteobacteria (n = 20
CDS). The nodule contained only 12 putative CDSs with similarity to the second “seed” FDH
and recruited "-, "-, and #-Proteobacteria. The dataset contained 2 and 1 putative CDSs with
similarity to dissimilatory methanol dehydrogenases for the sediment and nodule, respectively.
The mechanism for the generation of C1 compounds in this aerobic environment is unknown, but
one possible aerobic input into these sources could be the action of C-P lyases commonly found
in the genomes of MG-1 Thaumarchaea. C-P lyases produce methane, and in turn the methane
could be oxidized as an energy source. Further examination of the dataset along a more extend
146
biochemical pathway than is provided by the KEGG map output of MG-RAST may provide
clues as to the mechanism for the generation of these compounds.
Three key aspects of the N cycle were identified and analyzed in the same fashion as the
C cycle. These pathways included, (1) aerobic ammonia oxidation (nitrification), (2) assimilatory
nitrate/nitrite reduction, and (3) denitrification (Figure 6). Aerobic ammonia oxidation is
performed in a series of metabolic steps that oxidize NH
3
to NH
2
OH (hydroxylamine), which is
then oxidized to NO
2
-
(Vajrala et al., 2013). The first step is performed by ammonia
monooxygenase, of which the A subunit (AmoA) is the catalytic subunit. The underlying
sequence coverage of putative CDSs with similarity to both archaeal and bacterial AmoA
sequences had a ratio of ~4:1 (archaeal : bacterial) and ~10:1 in the sediment and nodule
samples, respectively. The greater abundance of assembled archaeal subunits may indicate the
contribution of NH
3
oxidation is higher from the Archaea relative to the Bacteria, though
presence is not indicative of function. Additionally, the true number of both sequences is likely
masked during the assembly process, as a 5% dissimilarity cutoff is used in the IDBA-UD
assembler. The archaeal sequences have putative taxonomic identification to the MG-1 group
and the bacterial sequences to the Nitrosomonadales, further supporting the role of both groups
as lithoautotrophs in the SPG sediment environment. Putative CDSs from the SPG metagenomic
dataset also contained a number of assimilatory nitrate/nitrite reductases, such as NarB, SoxC,
and NrfA. These genes are capable of generating reduced N compounds that could be used for
cell growth.
A limited number of putative CDSs from the sediment sample had similarity to
denitrification genes, an anaerobic metabolic process that converts NO
3
-
to N
2
gas through the
reduction of several intermediates (NO
2
-
! NO ! N
2
O) (Figure 6). Putative CDSs with
similarity were specifically identified that could potentially reduce NO
3
-
to NO
2
-
(NarGH), NO
2
-
to NO (NirK), and NO to N
2
O (NorB). There were no putative CDSs with similarity to NosZ, the
gene that converts N
2
O to N
2
. The putative CDSs with similarity to NirK and NorB all had
putative taxonomic identification belonging to the Proteobacteria, while NarGH included the
Firmicutes, Chloroflexi, and Candidate Division OP1. The presence of another metabolic process
linked with anaerobic conditions (including the generation of C1 compounds and MtrABC
147
homologs) may indicate that microbial activity on the small scale in the sediment environment is
capable of generating anaerobic conditions, within which the reduction of alternative electron
acceptors is a necessity. Similar conditions are observed in the microniches of marine snow
particles, which possess high, localized heterotrophic microbial activity (Shanks and Reeder,
1993). The reduction of Mn and Fe oxides are more energetically favorable than nitrate/nitrite,
such that anaerobic conditions will likely result in the liberation soluble reduced metal species.
The absence of putative CDSs with similarity to NosZ may suggest that the anaerobic conditions
are limited and the reduction of N
2
O completely to N
2
is unfavorable. There are other
environments where the denitrification process does not go to completion (Wrage et al., 2001).
The release of N
2
O gas from oligotrophic sediment environments could have potential
implications for atmospheric heating, as N
2
O is a potent greenhouse gas.
The first glimpse into the potential C- and N-related metabolic activities of the SPG
sediment/nodule environment has a revealed a number of potentially ecologically important
pathways. Firstly, the lithoautotrophic nature of the NH
3
-oxidizing archaea (AOA) and bacteria
(AOB) may provide a vital source of fixed C to the energy starved sediment environment. C
fixation through this process could account for the higher microbial abundance levels in the
upper portion of the sediment column. And secondly, there is some indication from both C and N
pathways that there is the potential for anaerobic conditions to occur in the aerobic sediment
environment. Understanding the genomic extent of these processes will provide some clues as to
of how microorganisms are interacting with the environment and could offer insight in to a
marine habitat that may be a potential source of N
2
O gas.
5.3.4. Putative Phylogenetic Bins
5.3.4.1. Unsupervised Hierarchical Clustering
Ideally, metagenomic data could be used to reconstruct complete or near-complete
microbial genomes from the environment. In reality, such reconstruction efforts are difficult to
achieve due to a number of complicating factors, such as random sequencing results and the
inability to resolve similar genomic sequences using assembler algorithms. One possible
methodology to overcome such limitations is to use supervised methods that require the genome
of a sequenced organism as a reference to which environmental sequences can be aligned and
148
then extended through similarity. These types of methodologies require highly accurate
knowledge of the closest relatives of the organisms in the environment and a corresponding
genome. Methodologies that use innate properties of the DNA sequences, such as nucleotide
frequencies, can recruit like sequences in an unsupervised manner. These methods require the
length of the sequence to be fairly long (>1,000 bp), as the evolutionary signal present in
nucleotide frequency analyses is confounded for short sequences (Teeling et al., 2004).
An unsupervised binning protocol using hierarchical clustering of tetranucleotide
frequencies was applied to the large sequence assemblies from the sediment and nodule
metagenomes (Chapter 4). Unfortunately, the limited number of long sequences from the nodule
dataset prevented any substantial phylogenetic bins, with the largest containing only 55 kbp.
Several (n =14) large phylogenetic bins were identified in the sediment dataset (Table 5.4). The
two largest bins were >300 kbp in length and the smallest was ~62 kbp (Mean = 143,162 bp;
Median = 107,378 bp). The phylogenetic bins were annotated using the RAST annotation
pipeline and generated 1,759 putative CDSs. Estimates of the nearest neighbor were generated
by RAST and results indicate the presence of 5 !-Proteobacteria related to Alphaproteobacertia
sp. BAL199, M. gryphiswaldense MSR-1, M. magneticum AMB-1, Rhodospirillum centenum
SW, and Azospirillum sp. B510. Further, a "-Proteobacteria related to Geobacter
metallireducens GS-15 and a Thaumarchaea related to Nitrosopumilus maritimus SCM1 were
also identified. A number of the metabolic genes of interest mentioned through this manuscript
were identified in the putative phylogenetic bins, including MCOs, FDHs, and nitrate reductase
in three different !-Proteobacteria bins. A single ~84 kbp bin, with an unidentifiable taxonomy,
contained annotated putative CDSs related to aerobic carbon monoxide oxidation, flagellar
biosynthesis, gliding motility, and an FDH. Motility genes were identified in two other
phylogenetic bins for both flagellar-based movement and putative twitch-related motility. The
presence of motility genes in the SPG sediment is interesting, in that other sediment
metagenomes, motility-related gene assignments have been minimal, leading to the hypothesis
that motility is a trait that is selected against in the sediment environment (Biddle et al., 2008).
The phylogenetic bin with the assignment to M. gryphiswaldense MSR-1 contains genes
annotated for anaerobic dimethyl sulfoxide (DMSO) metabolism. Putative CDSs in multiple bins
149
were assigned potential functions in thiamine (vitamin B1) biosynthesis and catabolism and a
single putative CDS was identified as an Fe
3
+
siderophore transporter.
Collectively, the evidence in these phylogenetic bins reasserts some of the findings
presented in the rest of the manuscript. A fourth anaerobic metabolism was identified and
assigned to the !-Proteobacteria. Genes of interest were identified and linked together in a
common phylogenetic genomic unit. Further, a number of new genes of interest were identified,
including putative CDSs related to motility. Using these phylogenetic bins to direct further
targeted gene analysis may yield increased understanding of the microbial community.
5.3.4.2. Supervised Phylogenetic Binning
Supervised phylogenetic binning requires detailed knowledge of the organisms present in
the environment. The putative taxonomic identities assigned to the unsupervised phylogenetic
bins were supported during the MG-RAST analysis. MG-RAST identified the same reference
genomes as the most recruited genomes for the putative CDSs. In MG-RAST, for both the
sediment and nodule datasets, N. maritimus SCM1 was the most highly recruited reference
genome. N. maritimus SCM1 recruited ~1.6X and ~10X more sequences than the second most
recruited reference genome in the sediment and nodule, respectively. This is in contradiction to
the number of phylogenetic bins assigned as putatively thaumarchaeal in the unsupervised
binning method and could be the result of high degree of similarity seen in MG-1 genomes. This
high similarity would result in many breaks during sequence assembly and generate many small
fragments of assembled DNA that would not be included in the unsupervised binning
methodology. Genomes of all of the !-Proteobacteria listed above were used as reference
genomes (except M. gryphiswaldense MSR-1), as well as N. maritimus SCM1 and ‘Candidatus
Nitrosopumilus sp. AR2’. Recovery in the number of sequences attributed to each genome was
higher than the unsupervised method (Mean = 624,205 bp; Median 606,511 bp); however, the
information gained from the RAST annotations of the IDBA-Hybrid contigs was extremely
limited, as many of the putative CDS assignments were to conserved cellular functions and not
genes with metabolic potential (Table 5.5). Interestingly, it should be noted that the G+C%
content of the putative !-Proteobacteria is much higher than that of the putative MG-1
Thaumarchaea. This feature may be useful in future assembly attempts.
150
The information generated from the supervised binning protocol was limited in nature,
but did offer some insight in to the number of sequences that could be aligned and assembled
using a reference genome and adds support to the idea that MG-1 organisms are the most
abundant compared to other microorganisms in the system, in agreement with the 16S rRNA
amplicon data.
5.4. Concluding Remarks
Analysis of the metagenomic samples collected from the sediment and FeMn nodule SPG
environments has revealed new insights into a complex microbial community of an extreme,
energy-limited environment. A number of putative CDSs were identified in the environment that
could play a biotic role in FeMn nodule formation and/or maintanence, including MCOs,
peroxidases, c-Cyts, and MOBs. The diversity contained within these four putative metal
reactive groups is large and requires more detailed sampling and analysis to pinpoint which, if
any, of these putative genes has activity in the environment. Further, analysis of the genes related
to C and N metabolic pathways revealed the presence of putative AOA and AOB constituents of
the community that could be responsible for some degree of C fixation and the introduction of
reduced C sources into the energy limited SPG sediments. And lastly, there is evidence to
suggest the presence of multiple types of anaerobic metabolisms in the aerobic SPG sediments.
The microscale drawdown of O
2
concentrations by microbial populations has been identified in
other environments, but the degree to which these metabolisms play a role in the SPG sediments
is relevant to the understanding of how the microbial systems of central ocean gyres function.
Specifically for the SPG, the possibility that the anaerobic portion of the microbial community
acts as a source of N
2
O should be explored for the other central gyre sediment communities and
research should to be performed to constrain the impact the amount of N
2
O generated in these
environments.
The direction for further research based on current results is two-fold. First, in the
immediate future, increased effort should be made to assemble larger genomic fragments from
the available dataset. The initial first steps used by IDBA-UD and Velvet yielded results with
more practical application than were available in recent years. But newly available assembler
151
packages, such as SEAStAR, have successfully untangled near-complete genomes from
environmental metagenomic datasets (Iverson et al., 2012) and could be applied to this dataset.
Generating more complete genomic fragments is important for understanding how the various
functions discussed in this manuscript are co-located within microbial genomes. The information
gained from putative taxonomic assignments for some of these genes is limited due to the extent
of horizontal gene transfer in microorganisms. Definitively linking the putative CDSs with
possible metal-related functions with other functions, such as siderophore biosynthesis and metal
transporters, could lead t a more robust understanding of which organisms are participating in
these pathways (if any) and what aspects of FeMn nodule formation may be mediated
biologically. Second, for long-term directions, these data will need to be corroborated with the
large number of samples collected from the SPG on the DSDP cruise 595/596 and International
Ocean Drilling Program (IODP) Expedition 329, including geochemical measurements, activity
assessments, culturing techniques, and further metagenomic analyses. By linking these results
with other data sets, we may be capable of determining the active metabolic roles that are
shaping the SPG sediment environment, and in turn oligotrophic marine sediments.
5.5. References
Abreu F, Cantao ME, Nicolas MF, Barcellos FG, Morillo V, Almeida LG et al (2011). Common
ancestry of iron oxide- and iron-sulfide-based biomineralization in magnetotactic bacteria.
ISMEJ 5: 1634-1640.
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W et al (1997). Gapped BLAST
and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:
3389-402.
Anderson C, Johnson H, Caputo N, Davis R, Torpey J, Tebo B (2009). Mn(II) Oxidation Is
Catalyzed by Heme Peroxidases in "Aurantimonas manganoxydans" Strain SI85-9A1 and
Erythrobacter sp. Strain SD-21. Applied and Environmental Microbiology 75: 4130-4138.
Aylor D (1987). Deposition gradients of urediniospores of Puccinia recondita near a source.
Phytopathology 77: 1442-1448.
Aziz R, Bartels D, Best A, Dejongh M, Disz T, Edwards R et al (2008). The RAST Server:
Rapid Annotations using Subsystems Technology. BMC Genomics 9: 75.
Biddle JF, Fitz-Gibbon S, Schuster SC, Brenchley JE, House CH (2008). Metagenomic
signatures of the Peru Margin subseafloor biosphere show a genetically distinct environment.
Proceedings of the National Academy of Sciences of the United States of America 105: 10583-
10588.
152
Bonnefoy V, Holmes D (2011). Genomic insights into microbial iron oxidation and iron uptake
strategies in extremely acidic environments. Environmental Microbiology 14: 1597-1611.
Brouwers G, de Vrind JP, Corstjens PL, Cornelis P, Baysse C, de Vrind-de Jong EW (1999).
cumA, a gene encoding a multicopper oxidase, is involved in Mn2+ oxidation in Pseudomonas
putida GB-1. Applied and Environmental Microbiology 65: 1762-1768.
Cahyani V, Murase J, Ishibashi E, Asakawa S, Kimura M (2009). Phylogenetic positions of
Mn
2+
-oxidizing bacteria and fungi isolated from Mn nodules in rice field subsoils. Biol and Fert
Soils 45: 337-346.
Coursolle D, Gralnick J (2012). Reconstruction of Extracellular Respiratory Pathways for
Iron(III) Reduction in Shewanella Oneidensis Strain MR-1. Front Microbiol 3: 1-11.
D'Hondt S, Abrams L, Anderson R, Dorrance J, Durbin A, Ellett L et al (2011). KNOX-02RR:
drilling site survey - life in subseafloor sediments of the South Pacific Gyre. Proceedings of the
IODP 329: 1-82.
D'Hondt S, Inagaki F, Alvarez Zarikian C, Party IES (2013). IODP Expedition 329: Life and
Habitability Beneath the Seafloor of the South Pacific Gyre. Scientific Drilling: 4-10.
D'Hondt S, Spivack AJ, Pockalny R, Ferdelman T, Fischer JP, Kallmeyer J et al (2009).
Subseafloor sedimentary life in the South Pacific Gyre. Proc Natl Acad Sci USA 106: 11651-
11656.
Dick G, Torpey J, Beveridge T, Tebo B (2008a). Direct identification of a bacterial
manganese(II) oxidase, the multicopper oxidase MnxG, from spores of several different marine
Bacillus species. Appl Environ Microbiol 74: 1527-1534.
Dick GJ, Podell S, Johnson HA, Rivera-Espinoza Y, Bernier-Latmani R, McCarthy JK et al
(2008b). Genomic insights into Mn(II) oxidation by the marine alphaproteobacterium
Aurantimonas sp strain SI85-9A1. Appl Environ Microbiol 74: 2646-2658.
Durbin A, Teske A (2010). Sediment-associated microdiversity within the Marine Group I
Crenarchaeota. Environ Microbiol Rep 2: 693-703.
Edgcomb V, Beaudoin D, Gast R, Biddle J, Teske A (2010). Marine subsurface eukaryotes: the
fungal majority. Environ Microbiol 13: 172-183.
Edwards K, Bach W, McCollom T (2005). Geomicrobiology in oceanography: microbe-mineral
interactions at and below the seafloor. TRENDS in Microbiology 13: 449-456.
Fan L, McElroy K, Thomas T (2012). Reconstruction of Ribosomal RNA Genes from
Metagenomic Data. PLoS ONE 7: e39948.
153
Finn R, Clements J, Eddy S (2011). HMMER web server: interactive sequence similarity
searching. Nucleic Acids Res 39: W29-W37.
G. J. Brouwers EVPL (2000). Bacterial Mn 2+Oxidizing Systems and Multicopper Oxidases: An
Overview of Mechanisms and Functions. Geomicrobiology Journal 17: 1-24.
Givaudan A, Effosse A, Faure D, Potier P, Bouillant M-L, Bally R (1993). Polyphenol oxidase in
Azospirillum lipoferum isolated from rice rhizosphere: Evidence for laccase activity in non-
motile strains of Azospirillum lipoferum. FEMS Microbiol Lett 108: 205-210.
Grunberg K, Wawer C, Tebo BM, Schuler D (2001). A large gene cluster encoding several
magnetosome proteins is conserved in different species of magnetotactic bacteria. Appl Biochem
Biotechnol 67: 4573-4582.
Guindon S, Dufayard J, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010). New Algorithms
and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of
PhyML 3.0. Sys Biol 59: 307-321.
Haas B, Gevers D, Earl A, Feldgarden M, Ward D, Giannoukos G et al (2011). Chimeric 16S
rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons.
Genome Res 21: 494-504.
Hasegawa M, Kishino H, Yano T (1985). Dating of human-ape splitting by a molecular clock of
mitochondrial DNA. Journal Mol Evol 22: 160-174.
Huang Y, Li W, Gilna P (2009). Identification of ribosomal RNA genes in metagenomic
fragments. Bioinformatics 25: 1338-1340.
Iverson V, Morris R, Frazar C, Berthiaume C, Morales R, Armbrust E (2012). Untangling
Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science
335: 587-590.
Jogler C, Schuler D (2009). Genomics, Genetics, and Cell biology of Magnetosome Formation.
Annu Rev Microbiol 63: 501-521.
Jones D, Taylor W, Thornton J (1992). The rapid generation of mutation data matrices from
protein sequences. Comput Applic Biosci 8: 275-282.
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012). KEGG for integration and
interpretation of large-scale molecular datasets. Nucl Acids Res 40: D109-D114.
Kong Y (2011). Btrim: A fast, lightweight adapter and quality trimming program for next-
generation sequencing technologies. Genomics 98.
Koschinsky A, Halbach P (1995). Sequential leaching of marine ferromanganese precipitates:
Genetic implications. Geochim Cosmochim Ac 59: 5113-5132.
154
Liu J, Wang Z, Belchik S, Edwards M, Liu C, Kennedy D et al (2012). Identification and
Characterization of MtoA: A Decaheme c-Type Cytochrome of the Neutrophilic Fe(II)-
Oxidizing Bacterium Sideroxydans lithotrophicus ES-1. Front Microbiol 3.
Lohsse A, Ullrich S, Katzmann E, Borg S, Wanner G, Richter M et al (2011). Functional
Analysis of the Magnetosome Island in Magnetospirillum gryphiswaldense: The mamAB Operon
Is Sufficient for Magnetite Biomineralization. PLoS ONE 6: e25561.
Magoc T, Salzberg S (2011). FLASH: fast length adjustment of short reads to improve genome
assemblies. Bioinformatics 27: 2957-2963.
Markowitz V, Korzeniewski F, Palaniappan K, Szeto E, Werner G, Padki A et al (2006). The
integrated microbial genomes (IMG) system. Nucleic Acids Res 34: D344-D348.
Martin M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads.
EMBnet 17: 10-12.
Massana R, DeLong E, Pedros-Alio C (2000). A few cosmopolitan phylotypes dominate
planktonic archaeal assemblages in widely different oceanic provinces. Appl Environ Microbiol
66: 1777-1787.
Matsunaga T, Okamura Y, Fukuda Y, Wahyudi A, Murase Y, Takeyama H (2005). Complete
Genome Sequence of the Facultative Anaerobic Magnetotactic Bacterium Magnetospirillum sp.
strain AMB-1. DNA Research 12: 157-166.
McAllister S, Davis R, McBeth J, Tebo B, Emerson D, Moyer C (2011). Biodiversity and
Emerging Biogeography of the Neutrophilic Iron-Oxidizing Zetaproteobacteria. Appl Environ
Microbiol 77: 5445-5457.
Meyer F, Paarman D, D'Souza M, Olson R, Glass E (2008). The metagenomics RAST server–a
public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC
Bioinf 9.
Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011). EMIRGE: reconstruction of
full-length ribosomal genes from microbial community short read sequencing data. Genome Biol
12: R44.
Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012). MetaVelvet : An extension of Velvet
assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40:
e155.
Orsi W, Biddle J, Edgcomb V (2013). Deep Sequencing of Subseafloor Eukaryotic rRNA
Reveals Active Fungi across Marine Subsurface Provinces. PLoS ONE 8: e56335.
155
Peng Y, Leung H, Yiu S, Chin F (2012). IDBA-UD: a de novo assembler for single-cell and
metagenomic sequencing data with highly uneven depth. Bioinformatics 28: 1420-1428.
Pruesse E, Pruesse J, Glockner F (2012). SINA: accurate high-throughput multiple sequence
alignment of ribosomal RNA genes. Bioinformatics 28: 1823-1829.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al (2013). The SILVA ribosomal
RNA gene database project: improved data processing and web-based tools. Nucl Acids Res 41:
D590-D596.
Rho M, Tang H, Ye Y (2010). FragGeneScan: Predicting Genes in Short and Error-prone Reads.
Nucl Acids Res: 1-12.
Schloss P, Westcott S, Ryabin T, Hall J, Hartmann M, Hollister E et al (2009). Introducing
mothur: open-source, platform-independent, community-supported software for describing and
comparing microbial communities. Appl Environ Microbiol 75: 7537-7541.
Shanks A, Reeder M (1993). Reducing microzones and sulfide production in marine snow. Mar
Ecol Prog Ser 96: 43-47.
Shi L, Rosso K, Clarke T, Richardson D, Zachara J, Fredrickson J (2012). Molecular
Underpinnings of Fe(III) Oxide Reduction by Shewanella Oneidensis MR-1. Front Microbiol 3:
1-10.
Shi L, Squier T, Zachara J, Fredrickson J (2007). Respiration of metal (hydr)oxides by
Shewanella and Geobacter: a key role for multihaem c-type cytochromes. Mol Micro 65: 12-20.
Singer E, Emerson D, Webb E, Barco R, Kuenen J, Nelson W et al (2011). Mariprofundus
ferrooxydans PV-1 the First Genome of a Marine Fe(II) Oxidizing Zetaproteobacterium. PLoS
ONE 6: e25386.
Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR et al (2006). Microbial
diversity in the deep sea and the underexplored "rare biosphere". Proceedings Of The National
Academy Of Sciences Of The United States Of America 103: 12115-12120.
Stein LY, La Duc MT, Grundl TJ, Nealson K (2001). Bacterial and archaeal populations
associated with freshwater ferromanganous micronodules and sediments. Environ Microbiol 3:
10-18.
Tebo B (1991). Manganese(II) oxidation in the suboxic zone of the Black Sea. Deep Sea Res
Part I: Oceano Res 38: 883-905.
Tebo B, Johnson H, McCarthy J, Templeton A (2005). Geomicrobiology of manganese (II)
oxidation. TRENDS in Microbiology 13: 421-428.
156
Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner F (2004). TETRA: a web-service and
a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA
sequences. BMC Bioinformatics 5: 163.
Thompson J, Higgins D, Gibson T (1994). CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.
Vajrala N, Martens-Habbena W, Sayavedra-Soto L, Schauer A, Bottomley P, Stahl D et al
(2013). Hydroxylamine as an intermediate in ammonia oxidation by globally abundant marine
archaea. Proceedings of the National Academy of Sciences 110: 1006-1011.
Voulhoux R, Ball G, Ize B, Vasil M, Lazdunski A, Wu L et al (2001). Involvement of the twin-
arginine translocation system in protein secretion via the type II pathway. EMBO J 20: 6735-
6741.
Walker C, De la Torre J, Klotz M, Urakawa H, Pinel N, Arp D et al (2010). Nitrosopumilus
maritimus genome reveals unique mechanisms for nitrification and autotrophy in globally
distributed marine crenarchaea. Proc Natl Acad Sci USA 107: 8818-8823.
White G, Shi Z, Shi L, Wang Z, Dohnalkova A, Marshall M et al (2013). Rapid electron
exchange between surface-exposed bacterial cytochromes and Fe(III) minerals. Proc Natl Acad
Sci USA 110: 6346-6351.
Wilke A, Harrison T, Wilkening J, Field D, Glass E, Kyrpides N et al (2012). The M5nr: a novel
non-redundant database containing protein sequences and annotations from multiple sources and
associated tools. BMC Bioinf 14.
Wong D (2009). Structure and Action Mechanism of Ligninolytic Enzymes. Appl Biochem
Biotechnol 157: 174-209.
Wrage N, Velthof G, van Beusichem M, Oenema O (2001). Role of nitrifier denitrification in the
production of nitrous oxide. Soil Biol Biochem 33: 1723-1732.
Yanyushin MF, del Rosario MC, Brune DC, Blankenship RE (2005). New class of bacterial
membrane oxidoreductases. Biochemistry 44: 10037-10045.
Yu N, Wagner J, Laird M, Melli G, Rey S, Lo R et al (2010). PSORTb 3.0: Improved protein
subcellular localization prediction with refined localization subcategories and predictive
capabilities for all prokaryotes. Bioinformatics 26: 1608-1615.
Zerbino D, Birney E (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn
graphs. Genome Res 18: 821-829.
157
Table 5.1. Results of Illumina Sequencing and IDBA-UD Metagenomic Assembly.
Table 1. Results of Illumina Sequencing and IDBA-UD Metagenomic Assembly
No. Raw
Reads (10
6
)
Total Raw Sequence
Length (Gbp)
No. Trimmed Paired-
End Reads (10
6
)
Total Trimmed Paired-End
Sequence Length (Gbp)
No. Assembled
Contigs
Contig Mean
Length
Contig Max
Length
Contig N50
Sediment 62.69 16.3 48.03 9.7 634,487 502 38,305 497
Nodule 72.33 18.8 46.57 7.5 156,740 363 32,113 375
158
Table 5.2. Top 10 full-length 16S rRNA EMIRGE Results.
Table 2. Top 10 full-length 16S rRNA EMIRGE Results
Taxonomic Assignment*
EMIRGE Predicted Normalized
Perc. Abundance
Nitrosococcus 6.77
!-Proteobacteria 5.18
Thaumarchaea - MG1 4.32
Sinobacteraceae - MBG 4.23
Rhizobiales 3.72
Thaumarchaea - MG1 3.44
Thaumarchaea - MG1 3.37
Rhodospirillaceae 2.92
Sinobacteraceae - MBG 2.76
Nitrospina 2.75
Thaumarchaea - MG1 6.56
Rhodospirillaceae 4.56
Thaumarchaea - MG1 4.30
Rhodospirillaceae - Pelagibus 4.10
Thaumarchaea - MG1 2.36
Deferribacteres 2.07
Sinobacteraceae - MBG 1.89
Rhodobacteraceae 1.59
Sinobacteraceae - MBG 1.50
Thaumarchaea - MG1 1.41
*Taxonomies based on mothur assignments
Nodule
Sediment
159
Table 5.3. Predicted Localization of Putative CDS with Possible Metal-related
Biochemistry.
Table 3. Predicted Localization of Putative CDS with Possible Metal-related Biochemistries
Sediment Nodule Sediment Nodule Sediment Nodule
Unknown 142 23 14 3 30 2
Cytoplasmic 49 5 9 - 4 -
Cytoplasmic Membrane 8 - - - 24 2
Periplasmic 53 1 - - 4 -
Extracellular - 1* 1 - - -
* Represents putative CDS counted a second time due to dual localization prediction
Multicopper Oxidase Peroxidase c-type Cytochromes
160
Table 5.4. Results of Unsupervised Binning on Sediment Contigs.
Table 4. Results of Unsupervised Binning on Sediment Contigs
Bin
Name
Bin Total Sequence
(bp)
No. of
Contigs
No. of putative
CDS
G+C% MG-RAST Predicted Neighbor
711X 317,580 70 285 67.4 Alphaproteobacterium BAL199
658X 309,274 71 297 67.5
Magnetospirillum gryphiswaldense
MSR-1
692X 232,013 46 195 66.3 Azospirillum sp. B510
726X 200,474 42 175 66.1
Magnetospirillum magneticum
AMB-1
764X 112,306 25 150 33.7 Nitrosopumilus maritimus SCM1
808X 111,298 15 106 60.9 Geobacter metallireducens GS-15
752X 107,378 19 101 64.2
Magnetospirillum magneticum
AMB-1
845X 107,098 27 103 65.4 Rhodospirillum centenum SW
807X 91,354 12 89 61.0 No neighbor
821X 84,809 18 68 64.2 No neighbor
760X 62,918 13 61 64.0 No neighbor
830X 62,773 8 65 61.5 No neighbor
776X 61,833 7 64 61.6 No neighbor
161
Table 5.5, Results of Supervised Binning of South Pacific Gyre Contigs
Table 5. Results of Supervised Binning of South Pacific Gyre Contigs
Reference Genome
Bin Total
Sequence
(bp)
No. of
Contigs
No. of
putative
CDS
G+C%
Alphaproteobacterium BAL199 589,563 924 414 66.1
Azospirillum sp. B510 759,551 1,200 503 67.3
Magnetospirillum magneticum
AMB-1
604,828 917 406 66.7
Rhodospirillum centenum SW 608,195 917 384 67.8
Nitrosopumilus maritimus SCM1 686,912 1,123 565 34.6
Ca. Nitrosopumilus AR2 772,490 1,268 701 34.1
Nitrosopumilus maritimus SCM1 442,952 881 333 34.7
Ca. Nitrosopumilus AR2 529,154 1,048 405 34.1
Sediment
Nodule
162
Figure 5.1. Maximum likelihood phylogenetic tree of the sediment and nodule archaeal 16S
rRNA sequences generated by EMIRGE.
Nitrososphaera gargensis
Cenarchaeum symbiosum A
Nitrosopumilus maritimus SCM1
0.07
Ca. Nitrosopumilus sp. NM25
η
Eta
υ
Upsilon
ζ
Zeta
θ
Theta
α
Alpha
1.1b
100
96
86
91
63
64
94
57
92
90
74
95
76
69
96
61
61
163
Figure 5.2. A. Maximum likelihood phylogenetic tree of the sediment bacterial 16S rRNA
sequences generated by EMIRGE.
Blastopirellula marina
Planctomyces maris
Planctomyces brasiliensis
Caldithrix abyssi
Persicobacter psychrovividus
Cytophaga sp.
Lentisphaera araneosa
Pelagicoccus albus
Gemmatimonadetes
Fusobacter BS1-0-74
Planctomycetes
Deferribacteres
Ca. WS3
Ca. TA06
Flavobacteriaceae
Chlorobi
Verrucomicrobia
Ca. OP3
Rubricoccus marinus
94
99
99
99
77
61
65
99
52
53
95
56
52
72
60
97
90
77
93
92
72
71
93
57
92
52
81
87
86
86
82
99
99
86
79
52
95
55
88
51
59
88
67
98
88
97
98
60
68
85
52
64
56
164
Figure 5.2. B. Maximum likelihood phylogenetic tree of the sediment bacterial 16S rRNA
sequences generated by EMIRGE.
Acidobacteria
Chloroflexi - SAR202
Nitrospira moscoviensis
Chloroflexi - SAR202
Nitrospirae
BRC1
Actinobacteria - OSC155
δ-Proteobacteria
Acidobacter - RB25
Acidobacter - PAUC26f
Acidobacter - DA052
Acidobacter - KF-JG30-18
93
82
98
69
69
98
96
90
53
55
82
67
70
86
82
52
89
95
98
97
98
78
95
68
99
62
57
97
52
79
86
61
96
97
86
74
80
64
165
Figure 5.2. C. Maximum likelihood phylogenetic tree of the sediment bacterial 16S rRNA
sequences generated by EMIRGE.
63
Sneathiella chinensis
Pelagibus litoralis
Magnetospirillum magneticum AMB-1
Defliviicoccus vanus
Pedomicrobium sp. ACM 3067
Aurantimonas sp. SI85-9A1
Sulfitobacter sp. NAS-14.1
Rhodobacteraceae bacterium KAUST-100406-0324
Rhodobacter sp. SW2
Spirillum volutans
Thiobacillus thioparus
Thiobacillus denitrificans
Nitrosomonas europaea
Leptothrix discophora SS-1
Marinicella litoralis
Nitrosococcus oceani
Pseudomonas putida GB-1
Haliangium ochraceum
63
70
66
94
81
53
75
97
53
55
50
78
97
63
63
57
75
79
94
99
74
77
61
66
90
94
56
62
99
84
74
98
72
52
86
94
53
95
78
63
98
85
60
99
51
62 62
62
74
61
95
63
88
55
99
91
50
99
55
55
γ-Proteobacteria
β-Proteobacteria
α-Proteobacteria
δ-Proteobacteria
166
Figure 3.3. Maximum likelihood phylogenetic tree of the nodule bacterial 16S rRNA
sequences generated by EMIRGE.
ε-Proteobacteria
Bacteroidetes
γ-Proteobacteria
β-Proteobacteria
δ-Proteobacteria
α-Proteobacteria
δ-Proteobacteria
Actinobacteria
Nitrospirae
Chloroflexi - SAR202
Magnetospirillum magneticum AMB-1
Cohaesibacter sp. DQHS-21
Pedomicrobium sp. ACM 3067
Aurantimonas sp. SI85-9A1
Sulfitobacter sp. NAS-14.1
Rhodobacter sp. SW2
Leptothrix discophora SS-1
Nitrosomonas europa
Pseudomonas putida GB-1
Nitrosococcus oceani
Bacillus sp. SG-1
Nitrospira moscoviensis
Acrobacter marinus
Acrobacter nitrofigilis
Acrobacter cryaerophilus
Acrobacter butzleri
Rubricoccus marinus
Gilvibacter sediminis
97
57
51
100
97
100
100
54
98
61
84
83
66
79
98
58
95
98
60
66
98
98
98
74
71
90
60
90
71
66
91
63
91
75
87
66
66
88
96
95
96
63
66
55
167
Figure 5.4. Maximum likelihood phylogenomic tree of putative CDS with homology
determined by HMMER3 to molybdopterin oxidoreductases (MOBs) from Fe2+ oxidizing
organisms.
Bord par
Shew one
84 99
98
87 77
88
90
Mari ELB17
Rose 217
Phae gal
67
FDH - G
FDH - A
Mari pro
Geo M18,
uri, met
Putative Fe
2+
MOB
Salm ent
Shew one
PSR
96
Geob lov
Esch col
Baci art
NAR
Salm ent
Esch col
Rhod mar
Myxo xan
Chlo aur
97
Lept orc
Side lit
Gall cap
94
66
90
62
84
97
59
99
Salm typ
Bord par
Vibr par
Hyph MC1
DMS
ACIII
Putative Fe
2+
MOB
TTR
0.2
Not
characterized
MOB
168
Figure 5.5. Maximum likelihood phylogenomic tree of putative CDS with homology
determined by HMMER3 to fungal laccases (MCOs).
MoxA1
MoxA2
Mox
Xanth euv
Fungal
Laccases
68
73
88
71
70
96
99
64
76
91
89
CumA
68
57
60
95 92
Cu
Resistance
79
58
79
51 54
96 88
MnxG
CueO
Q70KY3
Q12542
P17489
0.2
Putative Mn
MCO
169
Figure 5.6. Schematic of possible Nitrogen cycling mechanism identified for putative CDSs
within the South Pacific Gyre metagenome samples.
NH
4
+
PON
N
2
Oxic
Suboxic
NH
2
OH NO
2
-
NO
3
-
NO
3
-
NO
2
-
NO N
2
O N
2
N
2
Nitrification
Denitrification
amoA
norB
narG
nirK narH nosZ
AOB
AOA 9 3
1 1
Bacteria
OP1
Bacillaceae
Chloroflexi
Proteobacteria
narG narH
4
1 -
- 1
- 1
- 1
3
α-Proteobacteria
β-Proteobacteria
-
-
-
-
-
-
-
1
1
-
-
-
-
-
-
-
-
-
Bacteria
Proteobacteria
α-Proteobacteria
β-Proteobacteria
γ-Proteobacteria
nirK
6
6
1
8
10
-
-
-
-
4
Bacteria
α-Proteobacteria
γ-Proteobacteria
norB
1
1
1
-
-
-
amoA
X
N
2
O
Abstract (if available)
Abstract
The formation and maintenance of deep-sea ferromanganese/polymetallic nodules still remains a mystery 140 years after their discovery. The wealth of rare metals concentrated in these nodules has spurred global interest in exploring the mining potential of these resources. The prevailing theory of abiotic formation has been called into question and the role of microbial metabolisms in nodule development is now an area of active research. To understand the community structure of microbes associated with nodules and their surrounding sediment, we performed targeted sequencing of the V4 hypervariable region of the 16S rRNA gene from three nodules collected from the central South Pacific. Results have shown that the microbial communities of the nodules are significantly distinct from the communities in the surrounding sediments, and that the interiors of the nodules harbor communities different from the exterior. This suggests not only differences in potential metabolisms between the nodule and sediment communities, but also differences in the dominant metabolisms of interior and exterior communities. We identified several operational taxonomic units (OTUs) unique to both the nodule and sediment environments. The identified OTUs were assigned putative taxonomic identifications, including two OTUs only found associated with the nodules, which were assigned to the Alpha-Proteobacteria. Finally, we explored the diversity of the most assigned taxonomic group, the Thaumarchaea MG-1, which revealed novel OTUs compared to previous research from the region and suggests a potential role as a source of fixed carbon for ammonia oxidizing archaea in the environment. ❧ Oligotrophic ocean gyres compose ~50% of the marine environment. The South Pacific Gyre (SPG) is the most oligotrophic of all the central gyres. The oligotrophic surface waters directly correlate to deep-sea sediment with low organic carbon content. Low microbial biomass (<10⁻⁶ cells·cm⁻³) and aerobic sediment conditions provide evidence to support the hypothesis that microbial activity is minimal in the SPG sediments due to energy limitation. Like the sediment of other central ocean gyres, the SPG has an abundance of ferromanganese (FeMn) nodules covering up to 70% of the exposed sediment surface. Limited research has been performed on understanding the microbial community composition associated with FeMn nodules and oligotrophic sediments, and no research has been performed on potential community function. In order to understand potential community function, a metagenomic dataset was generated from DNA extracted from a nodule and sediment sample from the SPG water-sediment interface. This research constitutes the first dataset to examine the microbial community function of both FeMn nodules and oligotrophic sediments. Analysis was performed in an attempt to understand the interaction between microorganisms and FeMn nodules, as Fe and Mn related metabolic reactions could provide an alternate energy source in this energy-limited environment. Results reveal the presence of several putative coding sequences with homology to known and putative metal reactive enzymes, including multicopper oxidases, peroxidases, multiheme c-type cytochromes, and molybdopterin oxidoreductases. The evidence suggests that the microbial community may have the metabolic potential to interact with FeMn nodules. Further, analysis was performed to better understand the carbon and nitrogen cycles within the SPG environment. Microbial community gene content reveals the presence of lithoautotrophic metabolisms from organisms related to the MG-1 Thaumarchaea and the Nitrosomonadales within the β-Proteobacteria. Both groups of organisms are capable of aerobic ammonia oxidation and carbon fixation. Lastly, genomic evidence reveals several anaerobic metabolisms, including denitrification. These results suggest that SPG sediments may have anaerobic microniches, a trait common in other aerobic environments. The identified denitrification pathway lacks the gene necessary to convert nitrous oxide to dinitrogen gas, suggesting that the SPG sediment could be a source of nitrous oxide gas, as potent greenhouse gas. Collectively, this research provides a number of possible avenues for further study of the microbial community associated with FeMn nodules and oligotrophic sediments.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Unexplored microbial communities in marine sediment porewater
PDF
Microbial ecology in the deep terrestrial biosphere: a geochemical, metagenomic and culture-based approach
PDF
Genetic characterization of microbial eukaryotic diversity and metabolic potential
PDF
Microbe to microbe: monthly microbial community dynamics and interactions at the San Pedro Ocean Time-series
PDF
Temporal variability of marine archaea across the water column at SPOT
PDF
Changes in the community composition of marine microbial eukaryotes across multiple temporal scales of measurement
PDF
Characterizing protistan diversity and quantifying protistan grazing in the North Pacific Subtropical Gyre
PDF
Using molecular techniques to explore the diversity, ecology and physiology of important protistan species, with an emphasis on the Prymnesiophyceae
PDF
Electron transfer capability and metabolic processes of the genus Shewanella with applications to the optimization of microbial fuel cells
PDF
Extracellular electron transport: Investigating the diversity and mechanisms behind an understudied microbial process with global implications
PDF
Survival and evolution of Shewanella oneidensis MR-1: applications for microbial fuel cells
PDF
The marine, neutrophilic, and chemolithoautotrophic iron-oxidizing bacteria: insights into the physiology of Zetaproteobacteria and the discovery of novel iron-oxidizing Gammaproteobacteria
PDF
Identifying functional metabolic guilds: a computational approach to classifying heterotrophic diversity in the marine system
PDF
Spatial and temporal dynamics of marine microbial communities and their diazotrophs in the Southern California Bight
PDF
Enhancing recovery of understudied and uncultured lineages from metagenomes
PDF
Great Salt Lake ooids: insights into rate of formation, potential as paleoenvironmental archives, and biogenicity
PDF
Electrochemical studies of outward and inward extracellular electron transfer by microorganisms from diverse environments
PDF
Dynamics of marine bacterial communities from surface to bottom and the factors controling them
PDF
B-vitamins and trace metals in the Pacific Ocean: ambient distribution and biological impacts
PDF
Microbial metabolism in deep subsurface sediments of Guaymas Basin (Gulf of California): methanogenesis, methylotrophy, and asgardarchaeota
Asset Metadata
Creator
Tully, Benjamin J.
(author)
Core Title
Using sequencing techniques to explore the microbial communities associated with ferromanganese nodules and sediment from the South Pacific gyre
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Marine and Environmental Biology
Publication Date
08/06/2013
Defense Date
06/05/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
amplicon,Flavobacteria,Gulf of Maine,metagenomics,microbial ecology,nodules,OAI-PMH Harvest,Thaumarchaea
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Heidelberg, John F. (
committee chair
), Berelson, William M. (
committee member
), Nealson, Kenneth H. (
committee member
)
Creator Email
tully.bj@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-317866
Unique identifier
UC11293561
Identifier
etd-TullyBenja-1968.pdf (filename),usctheses-c3-317866 (legacy record id)
Legacy Identifier
etd-TullyBenja-1968.pdf
Dmrecord
317866
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Tully, Benjamin J.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
amplicon
Flavobacteria
Gulf of Maine
metagenomics
microbial ecology
nodules
Thaumarchaea