Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Patterns of molecular microbial activity across time and biomes
(USC Thesis Other)
Patterns of molecular microbial activity across time and biomes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PATTERNS OF MOLECULAR MICROBIAL
ACTIVITY ACROSS AND TIME AND BIOMES
by
Rohan Sachdeva
University of Southern California
Faculty of the USC Graduate School
Department of Biological Sciences
Marine Environmental Biology
Marine Biology and Biological Oceanography
In Partial Fulfillment of the Requirements for the Degree:
Doctor of Philosophy
Dissertation Committee:
Dr. John Heidelberg (Chair)
Dr. Jed Fuhrman
Dr. Andre Ouellette
May 2018
There are many creatures in water, in soil, and in fruit. Indeed, there are
many creatures that are so minute that their existence can only be
inferred.
Mahabharata (Shanti, 15.25 - 26)
800 - 400 BC
ACKNOWLEDGEMENTS
My whole life is built on the Marine Environmental Biology department at the
University of Southern California. I have been continually inspired, and motivated by,
the creativity and dedication of the members of the Marine Environmental Biology
department.
I thank John Heidelberg, my PhD advisor, for his never ending support and belief
in me throughout graduate school. John’s true curiosity and excitement for
understanding biology pushed me to make my PhD a scientific and creative adventure.
His mellow, but serious, and determined nature has thought me a great deal about how
to effectively do science and maintain grace under pressure. John’s persistent trust in
me has always been confidence inspiring and I consider us to be good friends.
I also thank Jed Fuhrman for initially taking me as an undergraduate researcher
in his lab. I was originally a pre-med student and Jed took me into his lab. He not only
introduced me to marine biology and microbial ecology, but research as a whole. I will
be forever indebted to Jed. Jed’s unrelenting spirit, and ability to synergize his technical
expertise with the “big picture” has been continually exemplary. I also thank my
dissertation as a committee a whole (John Heidelberg, Jed Fuhrman, and Andre
Ouellette) for their time and insightful comments.
I thank my friends and family for being supportive throughout graduate school.
Especially my mother, Rekha Sachdeva, who has been unwavering in her support, a true
bedrock. Her hardworking and “no-nonsense” attitude has been something I try to make
a part of myself everyday. I also thank Megan Hall, whose intellectuality and implacable
spirit has been a model I have tried to emulate. That someone as inspiring as Megan has
so ardently believed in me has been extremely motivating.
Many people contributed to all the work presented here and were instrumental
in my development as a scientist. I thank David Needham, Michael Morando, Ben Tully,
Johanna Holm, Elizabeth Teel, Jacob Cram, Josh Steele, Mike Beman, Cheryl Chow,
Mahira Kakajiwala, Anand Patel, Sheila O’Brien, Barb Campbell, Sharon Grim, Laura
Gómez-Consarnau, Sergio Sañudo-Wilhelmy, Steve Finkel, Ken Nealson, Dave Caron,
Alma Parada, Sarah Hu, Karla Heidelberg, Bill Nelson, Troy Gunderson, Roberta
Marinelli, Vikki Campbell, Willie Haskell, Vickie Trinh, Elaina Graham, Victor
Hernando-Morales, R/V Yellowfin crew, Jeremy Holt, Cathy Garcia, Erin Fichot, Linda
Bazillian, Kellie Spafford, Lauren Czarnecki Oudin, and the Wrigley Institute for
Environmental Studies.
TABLE OF CONTENTS
Chapter 1 - PhyLigo: Automated Genome Decontamination Using Iterative
Phylogenetic Refinement (pp. 1 - 54)
Chapter 2 - Rare microbes dominate community activity (pp. 55 - 142)
Chapter 3 - Multi-year temporal dynamics of highly resolved microbial communities in
an oxygen minimum zone (pp. 143 - 194)
1
Chapter 1
PhyLigo: Automated Genome Decontamination Using Iterative Phylogenetic
Refinement
Introduction
One of the prime directives of modern genome reconstruction efforts is the rapid
recovery of reference quality genomes from both simple and complex sources. The
increasing availability of genomes has enabled the direct probing of genomic content
from otherwise inaccessible organisms and entities in a myriad of environments (Tyson
et al. 2004; Venter et al. 2004; Hess et al. 2011; Tully et al. 2014; Raveh-Sadka et al.
2015). This novel accessibility of genomic content has revealed the dynamics of poorly
understood evolutionary and ecological processes (Shapiro et al. 2012; Kashtan et al.
2014). Organismal communities range in compositional complexity, from simple
communities dominated by a single taxon, e.g., isolate genome sequencing, to those
comprised of innumerable taxa, e.g., metagenomics. Even in the simplest case, i.e. an
isolate, non-target genomes may be contaminating. In any of these cases, sequences are
rarely assembled into entirely finished whole genomes. In many instances, individual
genomes may need to be be separated from other “contaminating” genomes.
In the simplest case, genome reconstruction is accomplished by sequence
matching to previously sequenced genomes and removing conflicting sequences from
genome “bins”. This has the downside of capturing only those sequences that have a
similar reference match in a database, and may also over merge genomes that are only
somewhat similar. In the increasingly more common case, reference genomes are not
available and genome bins are created reference-free using compositional metrics
2
(Teeling et al. 2004), e.g., oligonucleotide or kmer frequencies. For example, partially
assembled genome fragments in a mixed community with high compositional similarity
can be grouped together into a single genome bin. Recently, the availability of high-
throughput genomic sequencing for many samples has enabled coverage based binning.
The coverage of genome fragments across multiple samples is considered as a
compositional metric and used to aid in recovering genome fragments from the same
genome (Strous et al. 2012; Albertsen et al. 2013; Alneberg et al. 2013; Imelfort et al.
2014; Kang et al. 2015; Wu et al. 2016; Graham et al. 2017). The advantage to binning
using these methods is that genome fragments do not necessarily have to overlap and
are grouped together by genome-wide features. A high level of sensitivity can be
achieved, but specificity suffers because many known genome-wide features, e.g. %G+C
content, are not capable of resolving all genomes. That is to say, although compositional
based binning can recreate genomes, they often suffer from chimerism leading to
contamination.
Downstream, genome contamination can be qualitatively determined by ensuring
all sequences in a bin have the same taxonomic match. This once again is lessened in
effectiveness by a reliance on sufficient database matches. For example, ProDeGe
depends on BLAST matches to reference databases to determine contamination and
automated refinement (Tennessen et al. 2016). Quantitatively, contamination can be
estimated based on the extent of duplication of genes identified as single copy in most of
a phylogenetic lineage. The a priori assumption for these methods is that all SCGs
(SCGs) are absolutely single copy in all members of the lineages. This is not always the
case. For example, the identification of SCGs within 90% (Rinke et al. 2013) – 95%
(Parks et al. 2015) of known members of Bacteria may misclassify novel genomes as
3
contaminated. More extremely, in lineages where the distribution of SCGs has not been
determined, contamination cannot be estimated. For example, the distribution of SCGs
is poorly understood in eukaryotes and so contamination estimates cannot be effectively
computed. We propose a new phylogenetic based framework for the quantification of
genome contamination and provide an application for automated genome
decontamination.
PhyLigo Pipeline
A typical genome reconstruction pipeline generally flows from DNA extraction, library
preparation, sequencing, assembly, binning, bin selection, and bin refinement to
functional and structural annotation (Wooley et al. 2010; Murat Eren et al. 2015). Bin
refinement typically involves human-guided removal of assemblies from bins that have
incongruous compositional features or coverage. Notably, all steps of a typical genome
reconstruction pipeline have become fully automatable using robotic liquid handlers
(Riemann et al. 2007; Baym et al. 2015) and bioinformatics toolkits (Treangen et al.
2013), except for bin refinement, which typically requires time-consuming human
guided manual curation.
Partially completed genomes are the typical input for PhyLigo, resulting from the
binning of metagenomic or genomic assembly or long read sequencing technologies.
First, sequences from each genome bin are searched for 43 phylogenetically informative
marker genes (Parks et al. 2015). These marker genes are involved in core cellular
functions, e.g., ribosomal proteins and RNA polymerase, are thought to be relatively
recalcitrant to horizontal gene transfer (HGT), and, therefore, have highly similar
evolutionary histories (Darling et al. 2014). Because each gene has essentially the same
4
evolutionary history, each gene can be placed on the same tree (Stark et al. 2010). To
bypass that different genes cannot be aligned to each other, each gene is aligned to a
reference set of markers, concatenated, and placed on a reference phylogenetic tree. The
PhyLigo reference tree consists of the concatenated alignment of the 43 phylogenetic
marker genes from 18,387 genomes, spanning all three domains of life. After all
fragments containing markers have been placed on the PhyLigo tree, relative
phylogenetic distances are determined and clustered into phylogenetic mass islands
(Matsen et al. 2010). Fragments that are from the same genome should cluster into a
single mass island. In cases of genome contamination, we define a dominant genome
island based on maximum fragment abundance. The less abundant islands are
considered contaminating and we define a new estimate of contamination, phylogenetic
dissonance (PD) as:
where C is the total collection of counts per mass island. Unlike single copy gene based
estimates of contamination, PD does not rely on an a priori understanding of gene
distributions. Unlike taxonomic based approaches, the phylogenetic approach of
PhyLigo can also work when there is low sequence similarity to the tree as calculations
rely on the relative location of sequences on the phylogenetic tree. Sequences from the
same bin that may not be totally phylogenetically resolvable, but they should cluster
together on the same part of the tree. PD assumes that the phylogenetic markers are
5
vertically inherited and not subject to horizontal gene transfer. Accordingly, PD can be
applied to Bacteria, Archaea, Eukarya and possibly viruses.
After PD calculation, PhyLigo represents sequences within a genome bin as a
hierarchical clustering of oligonucleotides frequencies and, if available, coverage within
a sample or across samples (Fig. 1). Starting at the root of the hierarchical clustering
with a bin, clusters are split when PD > 0%, or a user-defined threshold 0 - 100%. This
process is iterated until the tips of the clustering are reached or PD is less than or equal
to the desired threshold. A heuristic is applied to retain only those bins with significant
genomic content, defined as bins with >50% (user-definable) of the content of the
largest refined bin. The combination of phylogenetic placement and
compositional/coverage clustering does not require all sequences within a bin to have a
phylogenetic marker. Sequences without markers that cluster with sequences with
phylogenetic markers are assumed to have at least a similar phylogeny, if not the same
(Pride et al. 2003). Consequently, bins can be refined when as little as 2 sequences
possess phylogenetic markers.
Results
Artificial in silico metagenomes
Phyligo was tested on an in silico set of 3 metagenomes with varying exponential
coverages comprised of 72 genomes from 3 domains of life: Bacteria (n = 41), Archaea (n
= 23), and Eukarya (n = 8). These metagenomes captured a wide range of diversity,
including fungi (Supplementary Table 1). The original reference genomes were sampled
randomly to approximate the sampling of SCGs on assemblies in a typical metagenomic
sequence library. This is especially important because PhyLigo uses concatenated
6
alignments and pplacer to infer the phylogenetic relationships between sequences. More
SCGs, i.e., phylogenetic markers, on a sequence will allow for more precise placement on
the PhyLigo tree. Similarly, to simulate novel sequences, none of the genomes in the in
silico dataset are present in the PhyLigo tree. The vast majority of assemblies in a
metagenomic study will not have a high number of single copy gene, affecting the
performance of phylogenetic placement. The reference sequences were further shattered
into smaller random sized fragments to represent typical assembly efficiencies in
metagenomic assemblies. These sampled and fragmented portions of the references
were considered the reference bins for performance evaluations. Binning performance
with BinSanity (Graham et al. 2017), CONCOCT (Alneberg et al. 2014), MaxBin (Wu et
al. 2016), and MetaBAT (Kang et al. 2015) prior to and after PhyLigo refinement were
calculated using cluster performance evaluation metrics: precision, recall, harmonic
mean of precision and recall (Rosenberg and Hirschberg 2007), and Adjusted Rand
index (ARI) (Hubert and Arabie 1985). For these metrics, precision measures how many
different reference bins comprise a single bin (Graham et al. 2017), essentially a
measure of contamination. Recall quantifies the extent that each bin was fully recreated,
i.e., completed in one cluster or bin. The harmonic mean provides an outlier down
weighted average representation of precision and recall. ARI determines how well each
binning tool recreates the original input reference structure. Each of the values are
scored 0 - 1, with 1 being the highest score possible.
PhyLigo refinement of each automated binning strategy resulted in an increase in
precision for all tested binning tools (Fig. 2ab). Specifically, BinSanity (0.92 to 0.99),
CONCOCT (0.91 to 0.98), MaxBin (0.82 to 0.97), and MetaBAT (0.97 to 0.99) (Fig. 2c).
Recall values were not affected positively or negatively by PhyLigo refinement, except
7
for a substantial MaxBin increase (0.87 to 0.94). Harmonic mean was also improved for
all binning tools and was mostly driven by increases in precisions. ARI was improved for
all binning strategies, except for MetaBAT, which was slightly reduced from 0.97 to
0.96. BinSanity and CONCOCT were moderately increased, from 0.94 to 0.99 and 0.94
to 0.97, respectively. MaxBin was the most improved from the lowest ARI of 0.69 to
0.92. To verify that PhyLigo is effective on all 3 domains of life, and that the
performance of PhyLigo is not skewed by its performance on any single domain, we
evaluated precision, recall, and ARI for each domain. Each bin was assigned a
taxonomic domain based on the domain that unambiguously comprises most of the
sequences in a bin. PhyLigo improved the precision of each taxonomic domain and
binning strategy, except BinSanity where the precision could not be further improved
beyond 1. In all cases, PhyLigo improved precision values of all bins to >0.95, even when
the initial precision was as low as 0.6 (Supplementary Fig. 1). Recall was most improved
for MaxBin, similarly across all 3 domains. Bacteria recall was slightly reduced for all
other binning tools (0.0008 - 0.003). Archaea recall was similarly slightly reduced
(0.004 - .01), except for CONCOCT where it was also improved by 0.001. Eukaryotic
recall was improved in all cases, except when it was already 1, e.g., BinSanity. Harmonic
mean was unanimously increased and was primarily driven by an increase in precision.
ARI for bacterial bins were substantially improved for BinSanity (0.86 to 0.99),
CONCOCT (0.91 to 0.98), and MaxBin (0.76 to 0.95). MetaBAT was only slightly
improved from 0.94 to 0.99. Archaeal binning resulted in varying degrees of
improvement for BinSanity (0.87 to 0.99), CONCOCT (0.65 to 0.94), MaxBin (0.47 to
0.96), and MetaBAT (0.94 to 0.99). Eukaryotic bins were only moderately improved for
8
CONCOCT (0.96 to 1) and greatly improved for MaxBin (0.26 to 0.44), whereas
BinSanity could not be further improved from an ARI of 1.
Complex natural deep ocean metagenomes
We sequenced 3 metagenomes collected from the San Pedro Ocean Time-series (SPOT)
station in the Eastern North Pacific that is part of an oxygen minimum zone (OMZ) at a
depth of 890 m. OMZs are areas of the ocean and range from hypoxia to complete
anoxia and are sites are implicated in the loss of fixed nitrogen through denitrification
and anammox (Gilly et al. 2013). These 3 metagenomes represent a typical
environmental metagenomic study with a high-level of complexity and novel sequence
not represented in existing databases. Each metagenome was amended prior to
sequencing library construction with 4 non-source, phylogenetically diverse, “exotic”
genomes (Satinsky et al. 2014) with a range of %G+C content (34.5 - 59.9%) at different
abundances (Supplementary Table 2). These additional genomes were also not present
in the PhyLigo tree. This serves as a reference benchmark to evaluate precision, recall,
harmonic mean of precision and recall, and ARI while accounting for potentially
complicating and detrimental effects of a natural sample. For example, the addition of
many genomes from a typical sample may not be distinguishable using PhyLigo’s
hierarchical clustering due to high similarity in oligonucleotide frequency and sample
coverage. PhyLigo refinement of the amended genomes improved precision, recall,
harmonic mean, and ARI for all binning tools (Fig. 2A). Precision had a minimum of
0.85 (BinSanity) and was improved to 0.99-1.0 in all cases. Recall was near 0 in all cases
(0.03-0.09) and was greatly improved to 0.17-0.45. Harmonic mean was strongly
influenced by recall and followed a similar pattern, with a range of 0.06 - 0.17 and
9
improved to 0.29 - 0.62. ARI was near 0 prior to refinement (0 - 0.01) and was only
marginally improved for BinSanity, MaxBin, and MetaBAT (0.04 - 0.16), but was largely
improved for CONCOCT after refinement (0.85).
Sequence novelty is particularly important in evaluating PhyLigo, as it depends
on determination of mass islands based on mapping to the PhyLigo reference tree. We
examined test PhyLigo refinement on all of the sequences in the SPOT metagenomes to
determine its performance in a natural setting. Because the a priori clustering solution
is not known, we used an independent tool, CheckM (Parks et al. 2015), to evaluate the
performance of PhyLigo. Briefly, CheckM utilizes the overrepresentation of known SCGs
within a lineage to estimate the level of contamination/redundancy (CR) in a genome
bin. The recovery of the lineage-specific genes are also used to determine genome
completion values. Initial binning of the dataset resulted in the lowest CR from MaxBin
(19.8 ± 2.8%) and 19.8 ± 2.8 - 43.4 ± 7.8% for all binning tools (Fig. 2B). PD estimates
for each binning tools were similar to CR, except that MetaBAT produced the lowest PD
values 18.8 ± 2.6% (Supplementary Fig. 1). In each case, PhyLigo refinement to 0% PD
reduced CR for all binning tools by at least 50% to a final CR of 4.2 ± 0.8 - 11.3 ± 3.3%.
CheckM completion estimates were reduced to similar levels, regardless of binning tool,
of 20.9 ± 1.6 - 25.9 ± 4.1% from 44.5 ± 4.4 - 62.5 ± 5.3%.
Human-guided manual curation and infant gut metagenomes
Finally, Phyligo was tested on a daily longitudinal metagenomic sampling of an infant
gut microbiome that was subsequently assembled and binned by Sharon et al. 2013
using human-guided clustering based on emergent self-organizing maps (Dick et al.
2009) (ESOMs). Notably, these bins have been highly manually curated using a
10
combination of features based on coverage, single-copy gene distribution, and
taxonomy. The resulting bins have become a benchmarking platform to compare
automated binning tools against time-consuming manual human-guided curation.
Prior to refinement, manual curation produced the lowest levels of CR (0.75 ±
0.69%, mean ± SEM), although BinSanity resulted in similarly low levels of CR (0.76 ±
0.52%). PD was lower than CR, except for manual curation. Refinement of each binning
strategy through the PhyLigo pipeline resulted in lower CR estimates (Fig. 4).
Regardless of binning approach, PhyLigo refinement resulted in CR reductions of 2.61 ±
1.99 - 0.04 ± 0.04% (BinSanity), 25.6 ± 24.4 - 3.4 ± 3.4% (CONCOCT), 11.2 ± 5.4 - 0.68
± 0.46% (MaxBin), 12.9 ± 11.4 - 3.16 ± 3.11% (MetaBAT). Binning strategies that had
the highest levels of initial CR were also the most reduced in CR. Similarly, the level of
decrease in completion corresponds to the initial completion levels and ranges from 13 -
25%. Refinement with MaxBin was able to reach contamination levels lower than
manual curation. PhyLigo refinement of BinSanity produced bins with lower CR, near
0% (0.04 ± 0.04%), than human-guided manual curation. PhyLigo was able to further
decontaminate manually curated bins to the same near 0% (0.04 ± 0.04%) as BinSanity
PhyLigo refined bins.
Discussion
Phylogenetics is a powerful framework that can not only recapitulate the evolutionary
history of organisms, but can be practically utilized for modern genome applications.
We describe a new framework for estimating contamination of genome assembly
projects based entirely on phylogenetics. We developed a phylogenetic based metric of
genome contamination, phylogenetic dissonance (PD), and combine it with
11
oligonucleotide and coverage based clustering to iteratively refine genome
contamination in an automated way (Fig. 1).
One of the major attributes of PhyLigo is that contamination estimates are based
on PD. The primary advantage is that PD only relies on the phylogenetic robustness of
genes used for phylogenetic based levels of contamination. Accordingly, genes are
limited to those that are exclusively transferred by vertical descent and not HGT. The
implication is that all genes being used have the same evolutionary history. One
alternative method for genome contamination estimates qualitatively relies on
taxonomic purity within a bin. The major issue with taxonomic methods is that there
may not be sufficiently similar matches to assign taxonomies. Another alternative,
duplication of genes thought to be single copy within a lineage, has also been used to
quantitatively estimate contamination. This relies on a priori knowledge of SCG
distributions within a lineage to get a true estimate of contamination, and not only the
level of duplication or redundancy of SCGs within a genome bin. Further, the use of the
SCG estimates are limited to lineages where there are previously identified bonafide
SCGs. Specifically, this limits their usage in eukaryotes and novel lineages, where the
distributions of SCGs have not been thoroughly delineated. Still, PhyLigo has a mode
that can use the distribution of SCGs within a bin to decontaminate genomes to a user
set CR threshold.
Almost every step of the contemporary genomic pipeline for recovery of novel
genomes is automated, except for the decontamination. A typical genome assembly
pipeline, either from a single genome or a complex sample, requires some level of
manual curation to ensure that non-source genomes do not contaminate the target
genome. Here we present a wholly automated method for genome bin decontamination
12
using PD combined with hierarchical clustering of oligonucleotide frequencies and
coverage of sequences within a bin. We show that a phylogenetic based estimate of
contamination, PD, and hierarchical clustering of oligonucleotide composition and
sequence coverage can be used to automatically decontaminate genome bins.
PhyLigo was tested on in silico metagenomes, complex ocean metagenomes with
verifiable amended genomes, and longitudinally sampled metagenomes from the infant
gut after automated binning with BinSanity, CONCOCT, MaxBin, and MetaBAT. The in
silico metagenomes provide a constrained system to assess the performance of PhyLigo
using well established metrics for cluster integrity. Similarly, the SPOT ocean
metagenomes were amended to test the effectiveness of PhyLigo in a practical setting
using standard cluster metrics. The infant gut metagenomes were previously and
independently manually curated by Sharon et al. 2013, and provide a basis to test
human-guided curation against PhyLigo automated refinement. Indeed, PhyLigo
refinement reduced contamination of bins in all test cases, either via an increase in
precision or decrease in CheckM-measured CR. Those binning strategies with higher
levelsCR benefited from the largest reduction in CR. Notably, eukaryotic contamination
estimates were also decreased using PhyLigo (Supplementary Fig. 1), as PD is applicable
to all three domains of life, and possibly viruses. Most notably, the combination of
PhyLigo and BinSanity generated bins with ~0% CR, lower than human-guided manual
binning. This wholly automatic approach circumvents manual binning, a time
consuming, previously considered essential step for finishing genome reconstruction
(Sangwan et al. 2016) (Fig. 4). When manually curated bins were used as an input,
PhyLigo was able to further decontaminate the bins to 0% CR. Recall, i.e., how split the
original bins were, was either increased or when starting at a high value virtually
13
unchanged, indicating that PhyLigo was able to remove small improperly clustered bins.
ARI, a measurement of how well the final genomes match the input genomes, was
similarly increased or unchanged, indicating that PhyLigo aided in recreating the
original sample community.
Although PhyLigo resulted in decontamination of bins (increased precision or
CR) in all tests, it lowered CheckM genome completion estimates. Where CheckM
completion estimates were reduced, decreases in CR estimates, were much greater.
CheckM genome completion is a measurement for a whole bin and provides a composite
estimate of many possible genomes in a bin. PhyLigo was run to refine bins to 0% PD, to
provide the most purified bins possible with the notion that one of the major goals of
genome assembly is to provide a true representation of the genomic content of a source
genome. Nevertheless, PhyLigo can be easily run with a range of PD from 0-100% to
maximize genome completion. Genome completion reductions are primarily the result
of improper binning from the original binning strategy, e.g., BinSanity, CONCOCT,
MaxBin, or MetaBat. To salvage bins that are contaminated and reduce contamination,
PhyLigo must split these bins at the cost of genome completion estimates.
In ideal cases, genome completion reductions are the result of PhyLigo salvaging
incorrect inputs. In other cases, reductions are the result of PhyLigo errors, stemming
from 2 primary causes, improper phylogenetic placement of a sequence or mismatch
between phylogeny and kmer/coverage cluster representation. The first error type stems
from the reliance of PhyLigo on the placement of non-homologous genes on the same
phylogenetic tree. These genes are assumed to have the same evolutionary history and
same general tree topology. Although broadly true when considering all taxa, there can
be phylogenetic disagreements between genes for individual taxa. There also may be
14
disagreements at the finescale tips of a phylogenetic tree (Hug et al. 2016) that cause
errors in phylogenetic placement. The second error is the result of discordance between
the phylogeny of a sequence and its kmer and cluster representation. PhyLigo relies on
the assumption that compositional and coverage clustering broadly aligns with the
phylogeny of an organism (Pride et al. 2003). Oligonucleotide clustering of a sequence is
a local representation of the global pattern of oligonucleotide frequencies. Similarly, the
phylogeny of the PhyLigo markers are a global representation of phylogeny from a small
local representation. Areas of the genome under different evolutionary processes than
the PhyLigo markers will exhibit discrepancies between phylogenetic and
oligonucleotide clustering. For example, laterally transferred regions of a genome may
have a different underlying pattern that does not coincide with the phylogeny of HGT
resistant genes (Monier et al. 2007) and will cause bin splitting. Nevertheless, despite
decreased completion estimates, precision and contamination were largely improved.
Ultimately, the goal of genome assembly is to provide an uncontaminated
representation of the target genome and not artificially high levels of genome recovery.
As genome sequence volumes increase and become more democratized, the need
for automated, fast pipelines will also necessarily follow suit. Currently, most genome
reconstruction pipelines rely on a tedious time-consuming human-guided manual
curation for decontamination. We define a new method for estimating genome
contamination, PD, that relies on phylogenetics and can be applied to all domains life.
We extend this concept to automatically assist in genome decontamination that can also
be applied to any organism. PhyLigo can produce genomes with lower contamination
than human-guided curation and improve upon human-guided bins. There will always
15
be need for some level of human intervention in genome reconstruction, but we expect
that tools like PhyLigo will vastly increase the efficiency of genome finishing.
Methods
SPOT OMZ Sample Collection and DNA Extraction. Seawater samples (20 L)
were collected from three months in 2013 (January 16, June 19, September 18) in acid
rinsed cubitainers at the SPOT station (33° 33’N, 118° 24’W) from a depth of 890 m
aboard the R/V Yellowfin. Samples were serially filtered through a nylon mesh (80 µm),
Acrodisc (Pall) glass fiber filter (~1µm), and terminally with a Sterivex-GP (Millipore)
polyethersulfone filter (0.22 µm). Acrodisc and Sterivex filters were preserved with 250
µl and 500 µl of RNAlater, respectively, and subsequently incubated at ambient
temperature for 2 min. Samples were immediately frozen in LN 2 and stored at -80˚C.
The picoplankton fraction was selected for sequencing and DNA was extracted from
each Sterivex cartridge using a modified AllPrep DNA/RNA (Qiagen) kit. RNAlater was
sparged from each Sterivex using a syringe. High salt concentrations can lower DNA
yields with the AllPrep DNA/RNA kit. To desalt RNAlater while maximizing nucleic acid
concentrations we used an Amicon Ultra 3k. Briefly, 500 µl RNAlater (Ambion) was
transferred to an Amicon Ultra 3k 0.5ml spin concentrator and centrifuged at 14,000 g
for 30 min at 4˚C. This was repeated with any remaining unconcentrated RNAlater. 400
ul of 4˚C nuclease free water was added directly to the concentrator and concentrate to
desalt and centrifuged at 14,000 g for 1 hr at 4 ˚C 2 ml of 65˚C RLT+ lysis buffer and 20
µl beta-mercaptoethanol were added to the previously desalted and concentrated
RNAlater. The sterivex was sealed with luer lock caps and vortexed for 5 s, flipped and
vortexed again for 5s. The Sterivex cartridge was horizontally rotated for 15 min at 65˚C.
16
The lysate was removed, and the Sterivex filter was lysed again with 1ml RLT+ and 10 µl
beta-mercaptoethanol. The lysates were combined and DNA was extracted following the
manufacturer’s protocol and finally eluted with buffer AE.
SPOT OMZ Metagenome Preparation. DNA concentrations were determined for
each sample using a Qubit High Sensitivity DNA assay (Invitrogen) and diluted to 10 ng.
Samples were spiked with the DNA of 4 exotic genomes from American Type Culture
Collection in 2 fold increasing concentrations at ~1% of the total DNA concentration of
each sample (Supplementary Table 2). Metagenomes were prepared from each spiked
sample using a Covaris S2 (130 µl) with parameters: duty cycle (10%), intensity (5)
cycles/burst (200), time (60 s). Insert and dual indexed libraries were prepared using a
modified NEBNext Ultra DNA II dual indexing kit (New England Biolabs) without size
selection. The protocol was modified to control for chimeric PCR amplicons and reduce
overamplification biases. Adaptor ligated and end repaired fragments were amplified
with 0.1X of 10,000X SYBR Green I (Invitrogen). Amplification was monitored on a
CFX96 real-time PCR machine (BioRad) and stopped in the exponential phase to avoid
overamplification. The reaction was held at 98˚C for 30 s followed by 6 cycles of
amplification at 98˚C for 10 s, 65˚C for 5 min, and a final extension of 30 min.
Extension times were increased to reduce chimeric amplicons. Libraries were
subsequently 2 x 250 bp paired end sequenced on an Illumina HiSeq 2500.
SPOT OMZ Sequence Processing and Metagenome Assembly. Each library was
adapter and quality trimmed using BBDuk (Bushnell 2016) 37.02 (bbduk.sh minlen=1
qtrim=rl trimq=25 ktrim=r k=25 mink=11 ref=truseq.fa.gz) to maximize regions with
17
Q>25. Each sample was assembled using MEGAHIT (Li et al. 2016) v1.0.6 in paired end
mode (megahit --merge-level 20,0.99--k-min 21 --k-max 255 --k-step 6). The resulting
assemblies were combined and assemblies with >99 % average nucleotide identity (ANI)
were deduplicated using BBMap (Bushnell 2016) v37.02 (dedupe.sh minidentity=99).
This dereplication approach absorb smaller assemblies into assemblies of the same size
or larger with >99 % ANI. The combined and deduplicated assembly is comprised of
1.73 X 10
6
assemblies with a total of 1.2 Gbp.
In silico metagenomes construction. The in silico metagenomes were prepared
from 72 genomes from Bacteria (n = 41), Archaea (n = 23), and Eukarya (n = 8) to
capture a wide range of diversity. Only genomes that have less than 43 phylogenetic
markers were selected and are therefore not represented in the PhyLigo reference tree.
Sequences from each genome project were retrieved and plasmids were removed.
Segments of the genomes representing between 30-90% of the complete genome were
randomly selected from each genome to approximate completion levels from natural
metagenomes. These fractions of the genome were then split into randomly sized
fragments between 5 and 50 kbp to approximate metagenomic assemblies. Different
exponential coverage distributions for each assembly were generated for 3 artificial
samples using BBMap v37.02 (randomreads.sh seed=-1 length=150 metagenome=t).
This was achieved by joining the splits for each genome with 10000 Ns to create the
same coverage for each split from the same source genome. The joined splits were
finally resplit for binning.
18
Binning artificial and natural metagenomes. The in silico metagenomes, SPOT
metagenomes amended with known genomes, and infant gut metagenomes were binned
using a combination of binning tools to evaluate the performance of PhyLigo in typical
use cases. Infant gut assemblies and human-guided manually curated bins were
downloaded from ggKbase (http://ggkbase.berkeley.edu/carrol). The sequences from
the in silico metagenomes, SPOT assemblies, and infant gut assemblies were filtered to
retain only sequences >5 kbp. Each sample was mapped to their respective assemblies
using Bowtie2 (Langmead and Salzberg 2012) (bowtie2 --mm) to obtain abundance
distributions for coverage based binning. BinSanity (Graham et al. 2017) v0.1.4,
CONCOCT (Alneberg et al. 2014) via Anvi’o (Murat Eren et al. 2015) v2.0.2, MaxBin
(Wu et al. 2016) v2.2.1, and MetaBat (Kang et al. 2015) v0.32.4 were all run using
default settings and coverage information. The resulting bins were then run through the
PhyLigo v0.1 pipeline using the default settings. Only the resulting bins that had >50%
of the largest refined bin by bps were used for performance evaluations. Artificial
communities, in silico metagenomes, and SPOT amended genomes, were evaluated by
measuring precision, recall, harmonic mean, and Adjusted rand index (ARI) via Python
(Van Rossum 2007) v2.7.12 and scikit-learn (Pedregosa et al. 2011) v0.19.dev0
(homogeneity_completeness_v_measure and adjusted_rand_score).
Reference tree construction. All genome projects from Genbank are downloaded on
a monthly basis to assemble a trusted phylogenetic reference tree comprised of bacteria,
archaea, and eukaryotes. The founding reference tree is based on 60,946 genome
projects from Genbank release 216 (NCBI Resource Coordinators 2016). Coding
sequences for each genome were searched (HMMER (Eddy 2011) v3.1b2; hmmsearch –
19
cut_tc –notextw) for 43 trusted phylogenetically informative genes from CheckM (Parks
et al. 2015) (phylo.hmm). These marker genes have parsimonious phylogenies and are
thought be relatively recalcitrant to horizontal gene transfer. For those genomes with
multiple copies of a single gene, only the CDS with the lowest i-evalue and highest i-
bitscore were retained. Regions matching each Hidden Markov model (HMM) were
retrieved, resulting in ~50,000 sequences per gene and multiple aligned using MAFFT
(Katoh and Standley 2013) (v7.305b; mafft --auto). Marker genes from only those
genomes that have the full complement of 43 genes were retained, resulting in a final set
of 18,387 genomes comprised of 17,287 bacteria, 316 archaea, and 793 eukaryotes. A
HMM was built for each alignment using hmmbuild (default parameters). Each
alignment was concatenated resulting in a final alignment length of 44,449 bps. The
reference tree consisting of 18,387 branches was constructed from this alignment using
FastTree and the Whelan-And-Goldman 2001 model with optimization under the CAT
approximation and lengths rescaled to optimize the Gamma20 likelihood (FastTree
(Price et al. 2010) v2.1.8; fastTreeMP -wag -gamma).
Phylogenetic placement and clustering. The first step of PhyLigo requires
placement of the user’s input assemblies from each bin onto the PhyLigo reference tree.
PhyLigo increases run speeds by executing in parallel (Tange and Others 2011). Each
input assembly from the in silico metagenomes and SPOT metagenomes was translated
into all six frames (EMBOSS (Rice et al. 2000) v6.6.0.0; transeq --frame 6 --filter) to
avoid under-calling genes using an ORF based gene finder. Each frame was searched
(HMMER v3.1b2; hmmsearch –cut_tc –notextw) for the 43 CheckM phylogenetically
informative marker genes. For each gene, only the region with the lowest i-evalue and
20
highest i-bitscore were retained. Matching regions with inframe stop codons and ≤ 30
amino acids were removed. The resulting identified genes were aligned to the PhyLigo
reference gene alignments using hmmalign. The resulting alignments were concatenated
into a single alignment for each assembly. Missing genes in the alignment were filled
with gaps. The resulting alignments were placed onto the PhyLigo reference tree using
pplacer (Matsen et al. 2010) v1.1.alpha18-2-gcb55169 (pplacer --mrca-class). Mass
islands, groupings of phylogenetically similar assemblies that are cluster on similar
parts of the phylogenetic tree, were determined using guppy v1.1.alpha18-2-gcb55169
(Matsen et al. 2010) (guppy islands).
Hierarchical cluster representation of Bins. After placement of the assemblies
from the user’s input bins, PhyLigo then requires a hierarchical cluster representation of
the similarity between sequences within a bin. For each bin, sample reads were mapped
with Bowtie2 (bowtie2 v2.2.9; bowtie2 -mm). Sequence start and stop locations were
converted to Simple Annotation Format (SAF) for subsequent coverage determination
with featureCounts (Liao et al. 2014) in paired end mode with multimapping read
support (featureCounts v1.5.0-p3; featureCounts -p -O -M). Tetranucleotide frequencies
for each assembly were calculated and normalized using a zero-order Markov model to
reduce biases from background genome compositional patterns (Siranosian et al. 2015).
The tetranucleotide frequencies and sample coverage distributions were combined into
a single matrix for each bin. Pearson’s correlation, a measure of similarity, between
tetranucleotide and coverage distributions of each sequence were clustered using
centroid clustering (Cock et al. 2009).
21
Network representation of bins. Bins from the artificial in silico metagenomes
created with BinSanity, CONCOCT, MaxBin, and MetaBAT were represented as an
undirected network using Gephi (Bastian et al. 2009) v0.9.1 and colored by their source
mock genome. Networks were calculated using the Fruchterman Reingold (Fruchterman
and Reingold 1991) algorithm and run sequentially with speeds: 1000, 100, 10, and
finally 1.
22
References
Albertsen, M., P. Hugenholtz, A. Skarshewski, K. L. Nielsen, G. W. Tyson, and P. H. Nielsen.
2013. Genome sequences of rare, uncultured bacteria obtained by differential coverage
binning of multiple metagenomes. Nat. Biotechnol. 31: 533–538.
Alneberg, J., B. S. Bjarnason, I. de Bruijn, and others. 2013. CONCOCT: Clustering cONtigs
on COverage and ComposiTion. arXiv [q-bio.GN].
Alneberg, J., B. S. Bjarnason, I. de Bruijn, and others. 2014. Binning metagenomic contigs by
coverage and composition. Nat. Methods 11: 1144–1146.
Bastian, M., S. Heymann, and M. Jacomy. 2009. Gephi: an open source software for
exploring and manipulating networks. Icwsm 8: 361–362.
Baym, M., S. Kryazhimskiy, T. D. Lieberman, H. Chung, M. M. Desai, and R. Kishony. 2015.
Inexpensive Multiplexed Library Preparation for Megabase-Sized Genomes.
Bushnell, B. 2016. BBMap short read aligner. University of California, Berkeley, California.
URL http://sourceforge. net/projects/bbmap.
Cock, P. J. A., T. Antao, J. T. Chang, and others. 2009. Biopython: freely available Python
tools for computational molecular biology and bioinformatics. Bioinformatics 25: 1422–
1423.
Darling, A. E., G. Jospin, E. Lowe, F. A. Matsen 4th, H. M. Bik, and J. A. Eisen. 2014.
PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2: e243.
Dick, G. J., A. F. Andersson, B. J. Baker, S. L. Simmons, B. C. Thomas, A. P. Yelton, and J. F.
Banfield. 2009. Community-wide analysis of microbial genome sequence signatures.
Genome Biol. 10: R85.
Eddy, S. R. 2011. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7: e1002195.
23
Fruchterman, T. M. J., and E. M. Reingold. 1991. Graph drawing by force-directed
placement. Softw. Pract. Exp. 21: 1129–1164.
Gilly, W. F., J. M. Beman, S. Y. Litvin, and B. H. Robison. 2013. Oceanographic and
biological effects of shoaling of the oxygen minimum zone. Ann. Rev. Mar. Sci. 5: 393–
420.
Graham, E. D., J. F. Heidelberg, and B. J. Tully. 2017. BinSanity: unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 5:
e3035.
Hess, M., A. Sczyrba, R. Egan, and others. 2011. Metagenomic discovery of biomass-
degrading genes and genomes from cow rumen. Science 331: 463–467.
Hubert, L., and P. Arabie. 1985. Comparing partitions. J. Classification 2: 193–218.
Hug, L. A., B. J. Baker, K. Anantharaman, and others. 2016. A new view of the tree of life.
Nat Microbiol 1: 16048.
Imelfort, M., D. Parks, B. J. Woodcroft, P. Dennis, P. Hugenholtz, and G. W. Tyson. 2014.
GroopM: an automated tool for the recovery of population genomes from related
metagenomes. PeerJ 2: e603.
Kang, D. D., J. Froula, R. Egan, and Z. Wang. 2015. MetaBAT, an efficient tool for accurately
reconstructing single genomes from complex microbial communities. PeerJ 3: e1165.
Kashtan, N., S. E. Roggensack, S. Rodrigue, and others. 2014. Single-cell genomics reveals
hundreds of coexisting subpopulations in wild Prochlorococcus. Science 344: 416–420.
Katoh, K., and D. M. Standley. 2013. MAFFT multiple sequence alignment software version
7: improvements in performance and usability. Mol. Biol. Evol. 30: 772–780.
Langmead, B., and S. L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nat.
Methods 9: 357–359.
24
Liao, Y., G. K. Smyth, and W. Shi. 2014. featureCounts: an efficient general purpose program
for assigning sequence reads to genomic features. Bioinformatics 30: 923–930.
Li, D., R. Luo, C.-M. Liu, C.-M. Leung, H.-F. Ting, K. Sadakane, H. Yamashita, and T.-W.
Lam. 2016. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by
advanced methodologies and community practices. Methods 102: 3–11.
Matsen, F. A., R. B. Kodner, and E. V. Armbrust. 2010. pplacer: linear time maximum-
likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference
tree. BMC Bioinformatics 11: 538.
Monier, A., J.-M. Claverie, and H. Ogata. 2007. Horizontal gene transfer and nucleotide
compositional anomaly in large DNA viruses. BMC Genomics 8: 456.
Murat Eren, A., Ö. C. Esen, C. Quince, J. H. Vineis, H. G. Morrison, M. L. Sogin, and T. O.
Delmont. 2015. Anvi’o: an advanced analysis and visualization platform for ‘omics data.
PeerJ 3: e1319.
NCBI Resource Coordinators. 2016. Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res. 44: D7–19.
Parks, D. H., M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson. 2015. CheckM:
assessing the quality of microbial genomes recovered from isolates, single cells, and
metagenomes. Genome Res. 25: 1043–1055.
Pedregosa, F., G. Varoquaux, A. Gramfort, and others. 2011. Scikit-learn: Machine Learning
in Python. J. Mach. Learn. Res. 12: 2825–2830.
Price, M. N., P. S. Dehal, and A. P. Arkin. 2010. FastTree 2--approximately maximum-
likelihood trees for large alignments. PLoS One 5: e9490.
25
Pride, D. T., R. J. Meinersmann, T. M. Wassenaar, and M. J. Blaser. 2003. Evolutionary
implications of microbial genome tetranucleotide frequency biases. Genome Res. 13:
145–158.
Raveh-Sadka, T., B. C. Thomas, A. Singh, and others. 2015. Gut bacteria are rarely shared by
co-hospitalized premature infants, regardless of necrotizing enterocolitis development.
Elife 4. doi:10.7554/eLife.05477
Rice, P., I. Longden, and A. Bleasby. 2000. EMBOSS: the European Molecular Biology Open
Software Suite. Trends Genet. 16: 276–277.
Riemann, K., M. Adamzik, S. Frauenrath, R. Egensperger, K. W. Schmid, N. H. Brockmeyer,
and W. Siffert. 2007. Comparison of manual and automated nucleic acid extraction from
whole-blood samples. J. Clin. Lab. Anal. 21: 244–248.
Rinke, C., P. Schwientek, A. Sczyrba, and others. 2013. Insights into the phylogeny and
coding potential of microbial dark matter. Nature 499: 431–437.
Rosenberg, A., and J. Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External
Cluster Evaluation Measure. EMNLP-CoNLL. 410–420.
Sangwan, N., F. Xia, and J. A. Gilbert. 2016. Recovering complete and draft population
genomes from metagenome datasets. Microbiome 4: 8.
Satinsky, B. M., B. C. Crump, C. B. Smith, and others. 2014. Microspatial gene expression
patterns in the Amazon River Plume. Proc. Natl. Acad. Sci. U. S. A. 111: 11085–11090.
Shapiro, B. J., J. Friedman, O. X. Cordero, S. P. Preheim, S. C. Timberlake, G. Szabó, M. F.
Polz, and E. J. Alm. 2012. Population genomics of early events in the ecological
differentiation of bacteria. Science 336: 48–51.
26
Sharon, I., M. J. Morowitz, B. C. Thomas, E. K. Costello, D. A. Relman, and J. F. Banfield.
2013. Time series community genomics analysis reveals rapid shifts in bacterial species,
strains, and phage during infant gut colonization. Genome Res. 23: 111–120.
Siranosian, B., S. Perera, E. Williams, C. Ye, C. de Graffenried, and P. Shank. 2015.
Tetranucleotide usage highlights genomic heterogeneity among mycobacteriophages.
F1000Res. 4: 36.
Stark, M., S. A. Berger, A. Stamatakis, and C. von Mering. 2010. MLTreeMap--accurate
Maximum Likelihood placement of environmental DNA sequences into taxonomic and
functional reference phylogenies. BMC Genomics 11: 461.
Strous, M., B. Kraft, R. Bisdorf, and H. E. Tegetmeyer. 2012. The binning of metagenomic
contigs for microbial physiology of mixed cultures. Front. Microbiol. 3: 410.
Tange, O., and Others. 2011. Gnu parallel-the command-line power tool. The USENIX
Magazine 36: 42–47.
Teeling, H., A. Meyerdierks, M. Bauer, R. Amann, and F. O. Glöckner. 2004. Application of
tetranucleotide frequencies for the assignment of genomic fragments. Environ.
Microbiol. 6: 938–947.
Tennessen, K., E. Andersen, S. Clingenpeel, and others. 2016. ProDeGe: a computational
protocol for fully automated decontamination of genomes. ISME J. 10: 269–272.
Treangen, T. J., S. Koren, D. D. Sommer, and others. 2013. MetAMOS: a modular and open
source metagenomic assembly and analysis pipeline. Genome Biol. 14: R2.
Tully, B. J., R. Sachdeva, K. B. Heidelberg, and J. F. Heidelberg. 2014. Comparative
genomics of planktonic Flavobacteriaceae from the Gulf of Maine using metagenomic
data. Microbiome 2: 34.
27
Tyson, G. W., J. Chapman, P. Hugenholtz, and others. 2004. Community structure and
metabolism through reconstruction of microbial genomes from the environment.
Nature 428: 37–43.
Van Rossum, G. 2007. Python Programming Language. Proc. USENIX Annu. Tech. Conf.
Venter, J. C., K. Remington, J. F. Heidelberg, and others. 2004. Environmental genome
shotgun sequencing of the Sargasso Sea. Science 304: 66–74.
Wooley, J. C., A. Godzik, and I. Friedberg. 2010. A primer on metagenomics. PLoS Comput.
Biol. 6: e1000667.
Wu, Y.-W., B. A. Simmons, and S. W. Singer. 2016. MaxBin 2.0: an automated binning
algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32:
605–607.
28
Figures
29
Figure 1. Overview of PhyLigo steps to produce high-quality decontaminated genomes
using phylogeny, sequence composition (oligonucleotides), and coverage. a) The input is
a single genome or multiple genomes. In this example, a single genome suspected of
contamination is used. The genome can be sourced from an isolate sequencing, single-
cell, or metagenomic binning project. The genome bin is represented by 2
contaminating genomes. b) The sequences from the genome bin (a) are placed onto the
PhyLigo tree consisting of 18,387 genomes using 43 concatenated universal marker
genes. Each symbol represents a marker gene and all 43 genes are represented by 6
symbols. As is typical in genome reconstruction projects, the input sequences are only
partially complete and do not have all 43 genes. Without overlapping genes the
phylogenetic relationship within the contaminated genome bin cannot be determined.
PhyLigo leverages pplacer to place all input sequences with at least 1 marker gene to
calculate the phylogenetic relationship between all input sequences. The location of the
sequences are clustered into mass islands. At this point phylogenetic dissonance (PD) is
calculated and is 50%. c) Input sequences are hierarchically clustered based on
oligonucleotide (4-mer, 5-mer, etc) frequencies and/or sample coverage. Note, there are
more input sequences that can be hierarchically clustered because not all sequences
have marker genes, but all sequences can be represented using sequence composition
and/or sample coverage. d) The phylogenetic relationship and hierarchical clustering
are linked to begin genome decontamination and refinement. e) Input sequences from
Genome 1 are separated into a new genome bin because they form a distinct mass island
on the PhyLigo tree. Although, not all sequences from Genome 1 have markers, they are
all contained within the hierarchical clustering and phylogenetic tree. f) Genome 2
sequences are not fully contained within the hierarchical clustering and phylogenetic
30
tree, but the extra sequence is binned with Genome 2, because there is no further
evidence to split it into a third genome. f) The final output is the contaminated genome
split into 2 properly decontaminated genome bins.
31
Figure 2. a) Network representation of BinSanity, CONCOCT, MaxBin, and MetaBAT
binning of the in silico metagenomes. Each node is a partial sequence, i.e., artificial
contig, from a genome in the in silico metagenomes and are colored by their source
genome. Only bins that are contaminated as defined by composition of >25% non-
dominant genomes are displayed. Clusters with higher multi-coloration are more
contaminated. b) Network representation of Fig. 2a after PhyLigo refinement to 0% PD.
c) Precision, recall, harmonic mean of precision and recall, and ARI of the in silico bins
before and after PhyLigo refinement.
32
Figure 3. Performance evaluation of PhyLigo refinement of BinSanity, CONCOCT,
MaxBin, and MetaBAT bins from deep water complex ocean metagenomes from the
SPOT station. a) Precision, recall, harmonic mean of precision and recall, and ARI of 4
genomes amended to natural SPOT samples. b) CheckM CR and genome completion
estimates of all assemblies in the SPOT metagenomes (mean ± SEM). Only bins with
>0% CheckM completion are displayed.
33
Figure 4. Infant gut metagenomes CheckM % CR and % completion (mean ±
SEM) estimates before and after PhyLigo refinement of BinSanity, CONCOCT, MaxBin,
MetaBAT, and ESOM human-guided manual curation bins. Only bins with >0%
CheckM completion are displayed.
34
Supplementary figures
Supplementary figure 1. Precision, recall, harmonic mean of precision and recall,
and ARI of in silico metagenomes By domain. Each domain was assigned a domain
based on the domain that had the count representation within each bin.
35
Supplementary figure 2. SPOT deep ocean metagenomes PhyLigo PD and CheckM
% CR comparison before PhyLigo refinement (mean ± SEM).
36
Supplementary figure 3. Infant gut metagenomes PhyLigo PD and CheckM % CR
comparison before PhyLigo refinement (mean ± SEM).
37
Supplementary tables
Domain Accession Name #
Phylogenetic
Markers
NCBI taxonomy
Archaea CP003168.1 Aciduliprofundum sp.
MAR08-339
42 Archaea; Euryarchaeota; unclassified
Euryarchaeota; DHVE2 group; Aciduliprofundum
Archaea FN869859.1 Thermoproteus tenax
Kra 1
41 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Thermoproteales;
Thermoproteaceae; Thermoproteus;
Thermoproteus tenax
Archaea CP010529.1 Haloarcula sp.
CBA1115
42 Archaea; Euryarchaeota; Halobacteria;
Halobacteriales; Haloarculaceae; Haloarcula
Archaea CP007536.1 Nitrososphaera
viennensis EN76
42 Archaea; TACK group; Thaumarchaeota;
Nitrososphaeria; Nitrososphaerales;
38
Nitrososphaeraceae; Nitrososphaera;
Nitrososphaera viennensis
Archaea HG425166.1 Methanobacterium
sp. MB1
42 Archaea; Euryarchaeota; Methanobacteria;
Methanobacteriales; Methanobacteriaceae;
Methanobacterium
Archaea CP007026.1 Candidatus
Nitrosopelagicus
brevis
41 Archaea; TACK group; Thaumarchaeota;
unclassified Thaumarchaeota; Candidatus
Nitrosopelagicus
Archaea BA000001.2 Pyrococcus horikoshii
OT3
42 Archaea; Euryarchaeota; Thermococci;
Thermococcales; Thermococcaceae; Pyrococcus;
Pyrococcus horikoshii
Archaea CP012850.1 Candidatus
Nitrocosmicus
oleophilus
42 Archaea; TACK group; Thaumarchaeota;
Nitrososphaeria; Nitrososphaerales;
Nitrososphaeraceae; Candidatus Nitrosocosmicus
39
Archaea CP001731.1 Sulfolobus islandicus
L.D.8.5
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Sulfolobales; Sulfolobaceae;
Sulfolobus; Sulfolobus islandicus
Archaea CP000968.1 Candidatus
Korarchaeum
cryptofilum OPF8
42 Archaea; TACK group; Candidatus Korarchaeota;
Candidatus Korarchaeum; Candidatus
Korarchaeum cryptofilum
Archaea AL139299.1 Thermoplasma
acidophilum DSM
1728
42 Archaea; Euryarchaeota; Thermoplasmata;
Thermoplasmatales; Thermoplasmataceae;
Thermoplasma; Thermoplasma acidophilum
Archaea CP006577.1 Archaeoglobus
fulgidus DSM 8774
42 Archaea; Euryarchaeota; Archaeoglobi;
Archaeoglobales; Archaeoglobaceae;
Archaeoglobus; Archaeoglobus fulgidus
Archaea CP001014.1 Pyrobaculum
neutrophilum V24Sta
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Thermoproteales;
40
Thermoproteaceae; Pyrobaculum; Pyrobaculum
neutrophilum
Archaea CP001939.1 Thermosphaera
aggregans DSM
11486
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Desulfurococcales;
Desulfurococcaceae; Thermosphaera;
Thermosphaera aggregans
Archaea CP009528.1 Methanosarcina
barkeri MS
42 Archaea; Euryarchaeota; Methanomicrobia;
Methanosarcinales; Methanosarcinaceae;
Methanosarcina; Methanosarcina barkeri
Archaea CP010514.1 Candidatus
Nanopusillus
acidilobi
33 Archaea; DPANN group; Nanoarchaeota;
Nanoarchaeales; Nanopusillaceae; Candidatus
Nanopusillus
Archaea CP002363.1 Desulfurococcus
mucosus DSM 2162
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Desulfurococcales;
41
Desulfurococcaceae; Desulfurococcus;
Desulfurococcus mucosus
Archaea BA000002.3 Aeropyrum pernix K1 41 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Desulfurococcales;
Desulfurococcaceae; Aeropyrum; Aeropyrum
pernix
Archaea CP000852.1 Caldivirga
maquilingensis IC-
167
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Thermoproteales;
Thermoproteaceae; Caldivirga; Caldivirga
maquilingensis
Archaea CP002535.1 Acidianus hospitalis
W1
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Sulfolobales; Sulfolobaceae;
Acidianus; Acidianus hospitalis
42
Archaea DP000238.1 Cenarchaeum
symbiosum A
41 Archaea; TACK group; Thaumarchaeota;
Cenarchaeales; Cenarchaeaceae; Cenarchaeum;
Cenarchaeum symbiosum
Archaea CP013694.1 Sulfolobus
acidocaldarius
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Sulfolobales; Sulfolobaceae;
Sulfolobus
Archaea CP000575.1 Staphylothermus
marinus F1
42 Archaea; TACK group; Crenarchaeota;
Thermoprotei; Desulfurococcales;
Desulfurococcaceae; Staphylothermus;
Staphylothermus marinus
Bacteria CP000686.1 Roseiflexus sp. RS-1 42 Bacteria; Terrabacteria group; Chloroflexi;
Chloroflexia; Chloroflexales; Roseiflexineae;
Roseiflexaceae; Roseiflexus
Bacteria CP002299.1 Frankia inefficax 42 Bacteria; Terrabacteria group; Actinobacteria;
Actinobacteria; Frankiales; Frankiaceae; Frankia
43
Bacteria CP001968.1 Denitrovibrio
acetiphilus DSM
12809
42 Bacteria; Deferribacteres; Deferribacteres;
Deferribacterales; Deferribacteraceae;
Denitrovibrio; Denitrovibrio acetiphilus
Bacteria CP011966.1 Clostridium
pasteurianum NRRL
B-598
42 Bacteria; Terrabacteria group; Firmicutes;
Clostridia; Clostridiales; Clostridiaceae;
Clostridium; Clostridium beijerinckii
Bacteria CP002032.1 Thermoanaerobacter
mathranii subsp.
mathranii str. A3
29 Bacteria; Terrabacteria group; Firmicutes;
Clostridia; Thermoanaerobacterales;
Thermoanaerobacteraceae; Thermoanaerobacter;
Thermoanaerobacter mathranii;
Thermoanaerobacter mathranii subsp. mathranii
Bacteria CP005074.1 Spiroplasma
taiwanense CT-1
42 Bacteria; Terrabacteria group; Tenericutes;
Mollicutes; Entomoplasmatales;
Spiroplasmataceae; Spiroplasma; Spiroplasma
taiwanense
44
Bacteria CP000964.1 Klebsiella
pneumoniae 342
40 Bacteria; Proteobacteria; Gammaproteobacteria;
Enterobacterales; Enterobacteriaceae; Klebsiella;
Klebsiella pneumoniae
Bacteria CP000237.1 Neorickettsia
sennetsu str.
Miyayama
42 Bacteria; Proteobacteria; Alphaproteobacteria;
Rickettsiales; Anaplasmataceae; Neorickettsia;
Neorickettsia sennetsu
Bacteria CP006823.1 Helicobacter pylori
oki154
42 Bacteria; Proteobacteria; delta/epsilon
subdivisions; Epsilonproteobacteria;
Campylobacterales; Helicobacteraceae;
Helicobacter; Helicobacter pylori
Bacteria CP012072.1 Actinomyces meyeri 42 Bacteria; Terrabacteria group; Actinobacteria;
Actinobacteria; Actinomycetales;
Actinomycetaceae; Actinomyces
45
Bacteria CP000084.1 Candidatus
Pelagibacter ubique
HTCC1062
42 Bacteria; Proteobacteria; Alphaproteobacteria;
Pelagibacterales; Pelagibacteraceae; Candidatus
Pelagibacter; Candidatus Pelagibacter ubique
Bacteria CP007754.1 Prochlorococcus sp.
MIT 0801
42 Bacteria; Terrabacteria group;
Cyanobacteria/Melainabacteria group;
Cyanobacteria; Synechococcales; Prochloraceae;
Prochlorococcus
Bacteria BA000022.2 Synechocystis sp.
PCC 6803
42 Bacteria; Terrabacteria group;
Cyanobacteria/Melainabacteria group;
Cyanobacteria; Synechococcales;
Merismopediaceae; Synechocystis
Bacteria JH611156.1 SAR86 cluster
bacterium SAR86A
42 Bacteria; Proteobacteria; Gammaproteobacteria;
unclassified Gammaproteobacteria; SAR86 cluster
46
Bacteria DS995298.1 Candidatus
Pelagibacter sp.
HTCC7211
42 Bacteria; Proteobacteria; Alphaproteobacteria;
Pelagibacterales; Pelagibacteraceae; Candidatus
Pelagibacter
Bacteria CP015136.1 Luteitalea pratensis 42 Bacteria; Acidobacteria; unclassified
Acidobacteria; Acidobacteria subdivision 6;
Luteitalea
Bacteria CP007757.1 Halomonas
campaniensis
42 Bacteria; Proteobacteria; Gammaproteobacteria;
Oceanospirillales; Halomonadaceae; Halomonas
Bacteria CP007133.1 Escherichia coli
O145:H28 str.
RM12761
42 Bacteria; Proteobacteria; Gammaproteobacteria;
Enterobacterales; Enterobacteriaceae;
Escherichia; Escherichia coli
Bacteria CP007152.1 Marinobacter salarius 41 Bacteria; Proteobacteria; Gammaproteobacteria;
Alteromonadales; Alteromonadaceae;
Marinobacter
47
Bacteria HE999705.1 Listeria
monocytogenes N53-
1
37 Bacteria; Terrabacteria group; Firmicutes; Bacilli;
Bacillales; Listeriaceae; Listeria; Listeria
monocytogenes
Bacteria HG518322.1 Rhizobium sp.
IRBG74
42 Bacteria; Proteobacteria; Alphaproteobacteria;
Rhizobiales; Rhizobiaceae;
Rhizobium/Agrobacterium group; Rhizobium
Bacteria CP007437.1 Micrococcus luteus 42 Bacteria; Terrabacteria group; Actinobacteria;
Actinobacteria; Micrococcales; Micrococcaceae;
Micrococcus
Bacteria CP002623.1 Roseobacter litoralis
Och 149
42 Bacteria; Proteobacteria; Alphaproteobacteria;
Rhodobacterales; Rhodobacteraceae;
Roseobacter; Roseobacter litoralis
Bacteria CP007013.1 Thermotoga
maritima MSB8
42 Bacteria; Thermotogae; Thermotogae;
Thermotogales; Thermotogaceae; Thermotoga;
Thermotoga maritima
48
Bacteria CP010054.1 Hymenobacter sp.
DG25B
40 Bacteria; FCB group; Bacteroidetes/Chlorobi
group; Bacteroidetes; Cytophagia; Cytophagales;
Hymenobacteraceae; Hymenobacter
Bacteria CP000108.1 Chlorobium
chlorochromatii
CaD3
42 Bacteria; FCB group; Bacteroidetes/Chlorobi
group; Chlorobi; Chlorobia; Chlorobiales;
Chlorobiaceae; Chlorobium/Pelodictyon group;
Chlorobium; Chlorobium chlorochromatii
Bacteria LN554846.1 Aliivibrio wodanis 42 Bacteria; Proteobacteria; Gammaproteobacteria;
Vibrionales; Vibrionaceae; Aliivibrio
Bacteria CP009681.1 Staphylococcus
aureus subsp. aureus
42 Bacteria; Terrabacteria group; Firmicutes; Bacilli;
Bacillales; Staphylococcaceae; Staphylococcus;
Staphylococcus aureus
Bacteria CP008802.1 Actinotignum schaalii 40 Bacteria; Terrabacteria group; Actinobacteria;
Actinobacteria; Actinomycetales;
Actinomycetaceae; Actinotignum
49
Bacteria LN831024.1 Pseudomonas
aeruginosa
40 Bacteria; Proteobacteria; Gammaproteobacteria;
Pseudomonadales; Pseudomonadaceae;
Pseudomonas; Pseudomonas aeruginosa group
Bacteria FP565814.1 Salinibacter ruber M8 40 Bacteria; FCB group; Bacteroidetes/Chlorobi
group; Bacteroidetes; Bacteroidetes Order II.
Incertae sedis; Rhodothermaceae; Salinibacter;
Salinibacter ruber
Bacteria CP011270.1 Planctomyces sp. SH-
PL14
42 Bacteria; PVC group; Planctomycetes;
Planctomycetia; Planctomycetales;
Planctomycetaceae; Planctomyces
Bacteria CP002105.1 Acetohalobium
arabaticum DSM
5501
42 Bacteria; Terrabacteria group; Firmicutes;
Clostridia; Halanaerobiales; Halobacteroidaceae;
Acetohalobium; Acetohalobium arabaticum
50
Bacteria CP013341.1 Nitrosomonas ureae 42 Bacteria; Proteobacteria; Betaproteobacteria;
Nitrosomonadales; Nitrosomonadaceae;
Nitrosomonas
Bacteria CP001103.3 Alteromonas
mediterranea DE
42 Bacteria; Proteobacteria; Gammaproteobacteria;
Alteromonadales; Alteromonadaceae;
Alteromonas; Alteromonas mediterranea
Bacteria CP006740.1 Xylella fastidiosa
MUL0034
42 Bacteria; Proteobacteria; Gammaproteobacteria;
Xanthomonadales; Xanthomonadaceae; Xylella;
Xylella fastidiosa
Bacteria CP001821.1 Xylanimonas
cellulosilytica DSM
15894
42 Bacteria; Terrabacteria group; Actinobacteria;
Actinobacteria; Micrococcales;
Promicromonosporaceae; Xylanimonas;
Xylanimonas cellulosilytica
51
Bacteria CP010409.1 Xanthomonas
sacchari
41 Bacteria; Proteobacteria; Gammaproteobacteria;
Xanthomonadales; Xanthomonadaceae;
Xanthomonas
Bacteria CP006776.1 Streptococcus sp. I-
P16
42 Bacteria; Terrabacteria group; Firmicutes; Bacilli;
Lactobacillales; Streptococcaceae; Streptococcus
Bacteria CP006650.1 Paracoccus
aminophilus JCM
7686
42 Bacteria; Proteobacteria; Alphaproteobacteria;
Rhodobacterales; Rhodobacteraceae; Paracoccus;
Paracoccus aminophilus
Bacteria CP001924.1 Dehalococcoides
mccartyi GT
41 Bacteria; Terrabacteria group; Chloroflexi;
Dehalococcoidia; Dehalococcoidales;
Dehalococcoidaceae; Dehalococcoides;
Dehalococcoides mccartyi
Eukarya CP002497.1 Eremothecium
cymbalariae
DBVPG#7215
42 Eukaryota; Opisthokonta; Fungi; Dikarya;
Ascomycota; saccharomyceta; Saccharomycotina;
Saccharomycetes; Saccharomycetales;
52
Saccharomycetaceae; Eremothecium;
Eremothecium cymbalariae
Eukarya CH408058.1 Saccharomyces
cerevisiae RM11-1a
41 Eukaryota; Opisthokonta; Fungi; Dikarya;
Ascomycota; saccharomyceta; Saccharomycotina;
Saccharomycetes; Saccharomycetales;
Saccharomycetaceae; Saccharomyces;
Saccharomyces cerevisiae
Eukarya AL391737.2 Encephalitozoon
cuniculi GB-M1
42 Eukaryota; Opisthokonta; Fungi; Microsporidia;
Apansporoblastina; Unikaryonidae;
Encephalitozoon; Encephalitozoon cuniculi
Eukarya CP001942.1 Encephalitozoon
intestinalis ATCC
50506
42 Eukaryota; Opisthokonta; Fungi; Microsporidia;
Apansporoblastina; Unikaryonidae;
Encephalitozoon; Encephalitozoon intestinalis
53
Eukarya CP002713.1 Encephalitozoon
hellem ATCC 50504
42 Eukaryota; Opisthokonta; Fungi; Microsporidia;
Apansporoblastina; Unikaryonidae;
Encephalitozoon; Encephalitozoon hellem
Eukarya BDEQ01000001.1 Entamoeba
histolytica
42 Eukaryota; Amoebozoa; Archamoebae;
Entamoebidae; Entamoeba
Eukarya CP016239.1 Plasmodium coatneyi 42 Eukaryota; Alveolata; Apicomplexa;
Aconoidasida; Haemosporida; Plasmodiidae;
Plasmodium
Eukarya CM000429.1 Cryptosporidium
parvum Iowa II
42 Eukaryota; Alveolata; Apicomplexa; Conoidasida;
Coccidia; Eucoccidiorida; Eimeriorina;
Cryptosporidiidae; Cryptosporidium;
Cryptosporidium parvum
Supplementary table 1. All genomes in the in silico metagenomes With full NCBI taxonomy and the number of
CheckM universal markers recovered in each genom
54
NCBI
Accession
Genome name %
G+C
Fold NCBI Taxonomy
NC_006322 Bacillus licheniformis DSM 13 =
ATCC 14580
46.2 1 Bacteria; Firmicutes; Bacilli; Bacillales; Bacillaceae;
Bacillus
NC_011593 Bifidobacterium longum subsp.
infantis ATCC 15697
59.9 4 Bacteria; Actinobacteria; Bifidobacteriales;
Bifidobacteriaceae; Bifidobacterium
NC_003450 Corynebacterium glutamicum
ATCC 13032
53.8 2 Bacteria; Actinobacteria; Corynebacteriales;
Corynebacteriaceae; Corynebacterium
NC_010321 Thermoanaerobacter
pseudethanolicus ATCC 33223
34.5 8 Bacteria; Firmicutes; Clostridia;
Thermoanaerobacterales; Thermoanaerobacteraceae;
Thermoanaerobacter
Supplementary table 2. Genomes used to amend SPOT metagenomes with expected fold concentration, mean G+C%
content, and NCBI taxonomy.
55
Chapter 2
Rare microbes dominate community activity
Introduction
Microbial activity is a crucial component of all ecosystems, as microbes have the
potential to control any major biochemical process (Prosser et al. 2007). Community
structure, or the composition and abundance of organisms, is thought to be linked to
overall ecosystem function (Fuhrman 2009). Yet, the relationship between microbial
community structure and microbial activity, and the connection to ecosystem function,
is not fully understood (Naeem and Li 1997; Carney and Matson 2005; Fuhrman et al.
2006; Graham et al. 2016; Nelson et al. 2016). In recent years, the community structure
and overall diversity of microbes has been described in all major biomes and some host-
associated systems (Thompson et al. 2017). In virtually all environments studied, a few
highly abundant taxa dominate the microbial communities. Still, microbial diversity is
high and most richness is concentrated in the less abundant, or rare, fractions of the
community. This has been described as the ‘long tail’ (Lynch and Neufeld 2015) of
microbial diversity, as most of the diversity is in the lower abundance ranks of the rank
abundance curve. Members of the rare biosphere have been recognized as important
drivers of many key ecosystem functions. Rare microbes may control ecosystems as
keystone members, i.e. community members that the whole ecosystem depends. For
example, marine rhizobia are rare, but control the input for bioavailable nitrogen via N 2
fixation (Sohm et al. 2011). Also, rare microbes may be disproportionately active relative
to their abundance. For example, the rarest detectable taxon in Lake Cadagno,
Switzerland, an oligotrophic lake, was discovered to contribute to >40% and 70% of the
56
total ammonium and carbon uptake (Musat et al. 2008; Jousset et al. 2017),
respectively. However, these instances do not demonstrate the influence of rare
microbes on total community activity. Sequencing of RNA transcripts and DNA of
single marker genes have been employed to understand the influence of rare microbes
on community activity. These methods have revealed that rare microbes are potentially
more active than more abundant ones (Kamke et al. 2010; Campbell et al. 2011; Hunt et
al. 2013), but can suffer from over extrapolations of genome wide function and activity
from a single gene (Blazewicz et al. 2013). Specifically, ratios of rRNA gene transcripts
and rRNA gene quantities have been shown to be poor indicators of cell wide activity
(Cottrell and Kirchman 2016), e.g. cyanobacteria with increased rRNA during dormancy
(Sukenik et al. 2012).
To better understand the influence of rare microbes on community activity we
employed a systems based approach to examine the molecular activity of whole
community high resolution sequence assemblies (Supplementary Figure 1). We de novo
assembled and analyzed over ~3 Tbp (3 x 10
12
basepairs) of environmental DNA and
RNA shotgun sequences, i.e. metagenomes and metatranscriptomes, from the genomic
and transcriptomic reservoir of the global microbiome. Broadly, data are sourced from
publicly available and novel environmental, host-associated, and human-engineered
shotgun sequenced communities. Samples encompass the ocean, Amazon River
(Satinsky et al. 2015) and its plume into the ocean (Satinsky et al. 2014b; a), the human
gut (Franzosa et al. 2014), permafrost soil layers (Hultman et al. 2015), a thermokarst
bog (Hultman et al. 2015), and human-engineered biogas plants (Bremges et al. 2015;
Maus et al. 2016). Ocean samples span the water column, the sunlit epipelagic (Gilbert
et al. 2010; Shi et al. 2011; Dupont et al. 2015; Sunagawa et al. 2015; Alberti et al. 2017;
57
Sieradzki et al. 2017) (0 - 200 m), the dimly lit mesopelagic (200 - 1,000 m), dark
bathypelagic (1,000 to 4,000 m), the benthic zone (near seafloor), and hydrothermal
vent plumes (Baker et al. 2012; Li et al. 2015b) (Supplementary Table 1). Our analysis
uses database independent de novo high resolution 99% average nucleotide (ANI)
contiguous assemblies from all domains of life and viruses. Using these assemblies, we
examined the activities of rare and abundant microbes and their functional traits across
many disparate environments.
Results/Discussion
Microbial activity is dominated by rare microbes. This is seen by plotting community
structure as a function of community activity in the rare sequence assemblies across
disparate environments (Figure 1, Supplementary Figures 8 - 12). Rare microbes are
defined as those with assemblies that are in the “tail” of the DNA rank abundance curve,
in this study >1000th rank or ~0.005% by relative abundance (Supplementary Figure
5). In >90% of community structure and activity samples, >90% of total community
activity was in the rare fraction (Supplementary Figure 13). This was similarly the case
at a functional, i.e. 60% identity predicted proteomic level (Supplementary Figure 11).
This pattern is consistent across all sampled environments and lifestyles, e.g. free-living
or attached (Figure 2d). Accordingly, abundant microbes contributed to community
activity far less than the rare microbes. Therefore, rare microbes are not only more
active than abundant microbes, they are also likely more involved in total community
activity and potentially ecosystem function than abundant microbes.
Samples clustered by environment across DNA, RNA, and specific activity (Hunt
et al. 2013) (RNA:DNA) distributions (Supplementary Figures 2,3,4). Our analyses show
58
that microbial communities cluster together by environment at a fine genomic level, i.e.
99% ANI sequence assemblies, but also cluster by activity of those components. The
rank abundance of samples from all environments using sequence assemblies follow a
highly skewed curve, as has been widely reported for microbial communities using
single-marker genes (Fuhrman 2009; Lynch and Neufeld 2015; Jousset et al. 2017)
(Supplementary Figure 5). This demonstrates, for the first time, that microbial
communities exhibit a highly skewed rank abundance even at a genomic level. Highly
skewed rank abundance curves suggest that there are dominant genotypes in microbial
populations, despite potentially high recombination rates (Shapiro et al. 2012). RNA
activity patterns were similarly skewed by rank expression, demonstrating that
microbial activity is similarly dominated by a small number of overrepresented
assembled sequences (Supplementary Figure 6). Community activity was also more
skewed toward a few assemblies, as indicated by lower Pielou’s evenness (J’) compared
to community structure (Supplementary Figure 7; Mann–Whitney U test; P < 2.2 x 10
-
16
). The rank abundance and rank activity curves reflect that not only do a small number
of microbes dominate community structure, but fewer types, relative to community
structure, also dominate community activity.
The extent that rare microbes contributed to activity varied within and between
environments (Figure 1a). This pattern is driven by temperature variation, as total
activity of rare microbes decreased with increasing temperature (Figure 2b; Spearman’s
ρ = -0.42; P = 3.979 x 10
-7
). The degree that community activity was dominated by fewer
microbes, i.e. activity dominance, alternatively increased with increasing temperature
(Spearman’s ρ = 0.7; P < 2.2 x 10
-16
). Increased activity dominance demonstrates that
rare microbial activity not only increased with decreasing temperatures, but was more
59
distributed among rare microbes (Figure 2c). Temperature is a first order determinant
of chemical reaction kinetics, and, therefore, biochemical processes (Arrhenius 1889).
Higher temperatures induce higher metabolic rates via increased biochemical reaction
rates, and mediate biological rates (Brown et al. 2004). This positive relationship
between temperature and biological process rates has been implicated in controlling
many ecological (Tilman et al. 1981; Allen et al. 2002; Fuhrman et al. 2008) and
evolutionary (Gillooly et al. 2005; Allen et al. 2006) patterns. Our observations show
that temperature is also a major mediator of the structure of microbial community
activity.
Next, we examined the functional contribution of rare microbes to community
activity across all environments. Functional traits were examined that were more than
two fold overexpressed in the rare fraction relative to the abundant fraction (Mann–
Whitney U test; P < 3.9 x 10
-71
). Rare open reading frames (ORFs) were enriched in the
activity of functional traits that have a direct influence on total ecosystem function (e.g.
energy production and conversion, carbohydrate transport and metabolism, coenzyme
transport and metabolism, inorganic ion transport and metabolism, nucleotide
transport and metabolism, amino acid transport and metabolism, lipid transport and
metabolism (Figure 2). Rare ORFs were also enriched for functional traits involved with
cell motility, coinciding with the observation that rare microbes tend toward
chemotactic lifestyles (Lauro et al. 2009; Yooseph et al. 2010; Giovannoni et al. 2014)
(Figure 2b). Other processes linked to growth were also overrepresented in rare ORFs,
including cell growth and death, cell cycle control, cell division, chromosome
partitioning, cell wall/membrane/envelope biogenesis, and translation (Figure 2ab).
60
This suggests that rare microbes have higher growth rates, as well as the
aforementioned higher metabolic activity.
Although higher growth rates should result in higher abundance, rare microbes
were enriched in defense mechanisms and xenobiotic degradation, suggesting that they
are subject to higher pressures of viral predation, grazing, host-defense, and allelopathy
(Figure 2b). Rare ORFs were also enriched in infectious disease categories, implicating
rare microbes in animal and human disease. Finally, many of the genes most expressed
by rare microbes were directly involved in major biogeochemical processes,
photosynthesis, N 2 fixation, and ammonia oxidation (Supplementary Table 2). For
example, a rare Candidatus Atelocyanobacterium thalassa (unicellular cyanobacteria
group A) (Thompson et al. 2012) had a nifH (iron binding component of nitrogenase)
with the highest annotatable contribution to activity in the epipelagic South Atlantic
Ocean (Supplementary Table 2). Other relevant biochemical processes include
ammonium, phosphate, and energy driven carbohydrate ABC transport. The influence
of rare microbes in mediating important ecosystem processes highlights their role as
keystone members of ecosystems.
Deciphering the contribution rare microbial activities is at the core of
understanding the influence these organisms have on ecosystem function. We
demonstrate both that rare microbes are individually more active than abundant
microbes, and that total microbial community activity is dominated by them. This
pattern is consistent across many disparate environments, ranging from the human gut
to human engineered biogas plants to hydrothermal vents. The contribution of rare
microbes to community activity varies across environments and is strongly influenced
by environmental temperature, implying that fundamental biological processes, e.g. rate
61
kinetics, control community activity structure. Rare microbes were more involved than
abundant microbes in important ecosystem processes, e.g. energy transformation and
biogeochemical cycling. Rare microbial activity was also enriched in genes related to
infectious disease, underscoring the potentially detrimental impacts of rare microbes on
animal and human health. Our observations reveal that rare microbes not only
dominate community activity, but also act as keystone members that disproportionately
drive functional traits that are crucial to ecosystem function.
Methods
Sargasso sea sample collection and sequencing. 4 samples were collected from
the end of Spring (3/24/2010 and 3/26/2010) and Summer (8/20/2010 and
8/22/2010) from the Bermuda Atlantic Time Series (BATS). 5 - 20 L were collected and
amended with an equal volume of RNAlater. The RNAlater amended samples were
sequentially filtered through a glass fiber 0.8 µm GF/F (Whatman) filter and finally on a
0.22 µm Durapore (Millipore) filter. DNA and RNA were extracted as previously
described Campbell et al. 2009 (Campbell et al. 2009). Spring samples were rRNA
subtracted using an Ambion MICROBExpress Bacterial mRNA Enrichment Kit.
Illumina metagenomic and metatranscriptomic libraries were prepared using Nugen
Encore NGS Library System I and Ovation RNA-Seq System kits and sequenced paired
end 2 x 100 bp using an Illumina HiSeq 2000. 3/24/2010 and 8/20/2010 DNA samples
were also sequenced on the Roche 454 GS FLX+ platform. 3/24/2010 and 8/20/2010
DNA were sequenced using circular consensus sequencing on a PacBio RS.
62
San Pedro Ocean Time-series (SPOT) benthic zone sample collection
extraction
Seawater samples (1 - 20L) were collected approximately monthly over ~2.5 years (n =
32) near the ocean floor (890 m) in the Southern California Bight at the SPOT station
(33° 33’N, 118° 24’W) aboard the R/V Yellowfin. Samples were sequentially filtered
through a nylon mesh (80 µm), Acrodisc (Pall) glass fiber filter (~1µm), and terminally
with a Sterivex-GP (Millipore) polyethersulfone filter (0.22 µm). Acrodisc and Sterivex
filters were preserved with 250 µl and 500 µl of RNAlater, respectively. Filters were
incubated for 2 min at ambient temperature and immediately frozen in LN 2 and stored
at -80˚C. Total nucleic acids were co-extracted using a modified AllPrep DNA/RNA kit.
Cells were lysed and DNA was extracted and quantified as previously described (Chapter
1). RNA was purified from the DNA purification flow through following the
manufacturer’s instructions with a 30 min on-column DNAse step and a final elution
with 50 µl nuclease free water. 1 µl of RiboGuard RNase Inhibitor (Epicentre) was added
to the eluted RNA to protect against ribonucleases. RNA quantities were determined
using a Qubit RNA High Sensitivity (ThermoFisher) kit.
SPOT benthic zone metagenome and metatranscriptome library
construction and sequencing. Metagenomes were constructed using 10 ng of
sample DNA amended with ~1% of known genomes using a modified NEBNext DNA
Ultra II kit (New England Biolabs) as previously described (Chapter 1). The added
genomes controls are exotic to the SPOT station and were added in concentrations
progressively increasing by 2 fold (Chapter 1). 40 ng of RNA were amended with 8 µl of
a 10,000X dilution of External RNA Control Consortium (Baker et al. 2005) (ERCC)
63
Spike-In Mix 1 (Ambion). The ERCC mix consists of 92 transcripts ranging from 250 to
2,000 nt in length with a large dynamic fold range. The transcripts are mostly novel
synthesized sequences, but do contain Bacillus subtilis transcripts. The added genome
and ERCC controls are sequencing controls used to verify that our sample preparation
and bioinformatic protocols are accurate and reproducible. The amended RNA samples
were rRNA depleted using a RiboZero Bacteria rRNA removal kit. RNA was fragmented
using a Covaris S2 targeting 600 bp: sample volume (130 µl), peak incident power (50
W), duty factor (20%), cycles per burst (200), treatment time (60 s), temperature (7˚C).
Strand specific cDNA was generated using random priming without size selection from a
NEBNext Ultra RNA directional kit. Illumina libraries were constructed from the
resulting cDNA using a modified NEBNext DNA Ultra II dual indexing kit. The protocol
was modified to use RT-PCR to control for overamplification and longer extension times
to control for chimeric amplicons as previously described (Chapter 1), and resulted in 12
cycles of amplification. Metagenomic and metatranscriptomic libraries were sequenced
2 x 250 bp using an Illumina HiSeq 2500.
Sequence and metadata retrieval
Raw metagenomic and metatranscriptomic reads from Illumina, 454, Sanger, and
PacBio sequencing platforms from 504 sequence libraries covering a number of
disparate environments were downloaded for each publicly available dataset
(Supplementary Table 1). Ocean samples encompass the epipelagic, mesopelagic,
bathypelagic, the benthic zone, and hydrothermal vent plumes, ranging in depths 0 -
4,946 m. Epipelagic samples include coastal and open ocean sites. Other samples come
from the Amazon River and its plume into the Atlantic Ocean, active and frozen
64
permafrost, a thermokarst bog, human guts, and biogas plants. Data were retrieved
from iMicrobe, NCBI SRA, and ENA (Supplementary Table 1). Metagenome and
metatranscriptome sample pairings and sample metadata (e.g. temperature and
environment) were determined using sequence database metadata or directly from the
source publications. Human gut sample temperatures were not publicly available and
were inferred to be 37˚C based on the typical human body temperature (Hutchison et al.
2008).
Sequence quality control and assembly
The majority of samples were sequenced using Illumina based flow cell technologies
(Supplementary Table 1). Illumina reads were adapter and quality trimmed in one pass
to retain the largest regions with Q > 25 (BBMap (Bushnell 2016) v36.19; bbduk.sh
qtrim=rl trimq=25 ktrim=r k=25 mink=11 hdist=1 ref=truseq.fa.gz). Reads from long
read sequencing platforms, Roche 454, Sanger, and Pacbio RS, were similarly trimmed
to Q > 20 (BBMap v36.19; bbduk.sh qtrim=rl trimq=20). Genome and ERCC controls
were removed from each SPOT benthic zone sample prior to assembly and mapping.
Reads were mapped to the added genome and ERCC controls at 95% identity and the
unmapped reads and their paired end mates were retained for downstream assembly
and mapping (BBMap (Bushnell 2016) v36.19; bbmap.sh idfilter=0.95). Each
metagenome and metatranscriptome from each sample was individually assembled de
novo using MEGAHIT (Li et al. 2015a) in paired end mode if sequence libraries were
paired end sequenced. MEGAHIT was run to ensure that no bubbles were merged that
were < 99% to ensure that only highly-resolved assemblies were generated (MEGAHIT
v1.0.6; megahit --merge-level 20, 0.99 --k-min 21 --k-max 255 --k-step 6). All
65
assemblies were combined and dereplicated using a semi-global alignment method that
merged together assemblies that were totally contained and >99% similar (BBMap
v36.19; dedupe.sh minidentity=0.99). Assemblies <1 kb were discarded. Processing
resulted in 12,338,658 assemblies, comprised of 25.48 Gbp.
Annotation
Open reading frames (ORFs) were predicted using NCBI ORFfinder and only ORF >
200 amino acids were retained. The resulting ORFs were searched against the KEGG
reference database, KEGG modules and pathways were retrieved using the KEGG API
(DIAMOND (Buchfink et al. 2015); diamond blastp -e 1e-5 --sensitive). Only the best
hits with E-value < 1 x 10
-10
were assigned. ORFs were also searched against the
complete non-redundant NCBI RefSeq Release 83 (NCBI Resource Coordinators 2016)
protein database (DIAMOND; diamond blastp --top 5 -e 1e-5) and hits with E-value 1 x
10
-10
were retained. A best hit was determined by sorting by E-value, bit score, and
percent identity. Taxonomy was assigned for sequence assembly each using a lowest
common ancestor (LCA) approach. RefSeq hits for each ORF that were within 5% of the
bit score of the best hit with E-values < 1 x 10
-10
were retained. The hits were filtered to
retain hits that had E-values < 1 x 10
-10
.
The remaining hits were used to assign a LCA
taxonomy for each assembly.
Metagenome and metatranscriptome mapping and counting
Mapping and counting were performed within each sequencing platform where both
metagenomes and metatranscriptomes were sequenced, i.e. Roche 454 and Illumina
platforms. Small subunit (16S and 18S) and large subunit (5S, 5.8S, 23S, and 28S)
66
rRNAs were removed from each read set prior to mapping reads from each environment
to each dereplicated set of assemblies (SortMeRNA (Kopylova et al. 2012) v2.1;
sortmerna --paired_out --fastx --ref silva-bac-16s-id90.fasta silva-arc-16s-id95.fasta
silva-euk-18s-id95.fasta rfam-5s-database-id98.fasta rfam-5.8s-database-id98.fasta
silva-arc-23s-id98.fasta silva-bac-23s-id98.fasta silva-euk-28s-id98.fasta). The resulting
rRNA filtered reads from each library were mapped to the dereplicated set of
assemblies. Filtered reads were mapped to retain all sites with the highest score, ie. the
assemblies that are the best matches, and >99% identity (BBMap v36.19; bbmap.sh
ambiguous=all maxsites=1000000000 maxsites2=1000000000 sssr=1.0 secondary=t
minid=0.98 idfilter=0.99).
Mapped counts were determined for each sample and all mapped sites with a
minimum alignment length of 50 bp were considered (featureCounts (Liao et al. 2014)
v1.5.3; featureCounts -f -O -M --minOverlap 50 -s 0). Open reading frames (ORFs) were
predicted and only ORFs with > 200 amino acids were retained (ORFfinder (Jenuth -
Bioinformatics Methods and Protocols and 1999 1999) -s 1 -n T). ORF expression and
abundance were similarly counted by using the previously mapped reads and ORF start
and stop locations for counting (featureCounts v1.5.3; featureCounts -f -O -M --
minOverlap 50 -s 0). Strand specific, i.e. sense and anti-sense, counts were determined
for RNA libraries prepared to retain strand information (Supplementary Table 1) using
featureCounts v1.5.3 in forward (featureCounts v1.5.3; featureCounts -f -O -M --
minOverlap 50 -s 1) and reverse (featureCounts v1.5.3; featureCounts -f -O -M --
minOverlap 50 -s 2) strand counting modes. All strand specific libraries were prepared
using methods that result in sequencing reads that are the reverse complement of the
transcribed RNA sequence. Accordingly, the reverse counts were considered as the sense
67
counts and the forward counts as the antisense counts. Next, assemblies that matched to
the Centrifuge (Kim et al. 2016) NCBI nucleotide index (Kim et al. 2016) (12/06/2016)
and matches to human sequences >90% assemblies coverage were removed (Centrifuge
v1.0.3-beta (Kim et al. 2016); default parameters). Assemblies that matched to the
genome, ERCC, and PhiX 174 controls with E-value were also removed (NCBI BLAST+
(Altschul et al. 1990) v2.6.0; blastn -evalue 0).
Prior to counting, assemblies matching mapped counts were divided by assembly
length or ORF length to account for higher recruitment of longer assemblies or ORFs.
Assemblies and unstranded ORF length adjusted counts were subsampled to the length
adjusted counts of the smallest sample to account for uneven sequencing coverage.
Stranded forward and reverse length adjusted counts were subsampled to the total
forward and reverse length adjusted counts of the smallest sample. Subsampling was
done separately for Illumina and 454 sequences to maintain high counts for Illumina
samples because the 454 based sequencing effort was much lower than the Illumina
based sequencing effort. Length adjusted and subsampled counts for each assembly or
ORF were normalized to the total length normalized count of each sample to obtain a
relative count, e.g. relative DNA abundance or relative RNA expression. Pairings
between RNA and DNA counts were determined from sample database metadata or
from the sample publications. In some cases, there were multiple pairings with a
sample. Some samples had > 1 DNA or RNA library sequenced. For example, if a sample
has 2 DNA and 2 RNA sequence libraries, there are 4 possible DNA and RNA pairs. All
data manipulations were done using Python (Van Rossum 2007) and Pandas (McKinney
- Proceedings of the 9th Python in Science and 2010 2010). Specific activity was
68
determined as a ratio by dividing relative RNA and relative DNA counts (Hunt et al.
2013).
SPOT benthic zone metagenomic and metatranscriptomic controls
Genome (DNA) and ERCC (RNA) controls were analyzed using the same mapping and
counting protocols that we used to quantify abundance and expression. Mapping and
counting were performed exactly as for mapping and counting of all Illumina sequences
from all environments. Trimmed and rRNA filtered metagenomic and
metatranscriptomic from the SPOT benthic zone samples were mapped to the genome
and ERCC control sequences (BBMap v36.19; bbmap.sh ambiguous=all
maxsites=1000000000 maxsites2=1000000000 sssr=1.0 secondary=t minid=0.98
idfilter=0.99). Counts (featureCounts v1.5.3; featureCounts -f -O -M --minOverlap 50 -s
0) were transformed into relative counts by length dividing and normalizing to the total
mappings from each sample. Performance was quantified by plotting against the
expected genome and ERCC control concentrations. Both genome and ERCC controls
had high agreement between input quantities and measurements (R
2
> 0.99) using our
protocols (Supplementary Figures 14 - 18).
Functional clusters
Functional clusters were generated by clustering ORFs based on percent identity. ORFs
longer than 200 amino acids were grouped into protein clusters with >60% identity (cd-
hit (Huang et al. 2010) v4.6; cd-hit -c 0.6 -n 4). Relative counts for clusters were
determined by summing the length adjusted subsampled normalized counts of each
ORF contained within a cluster.
69
Diversity metrics and statistics
Diversity metrics were calculated using relative counts for metagenomes and
metranscriptomes. Pielou’s evenness (J’) and Berger–Parker dominance were calculated
using scikit-bio. Linear regressions and nonparametric statistics were calculated using
Spearman’s ρ and the Mann-Whitney U test from the R (Team 2000) stats package.
Non-metric dimensional scaling plots (NMDS) were generated using the vegan (Dixon
2003) R package and Bray-Curtis dissimilarities (vegan v2.4-4; metamds
(distance=”bray”).
70
References
Alberti, A., J. Poulain, S. Engelen, and others. 2017. Viral to metazoan marine plankton
nucleotide sequences from the Tara Oceans expedition. Sci Data 4: 170093.
Allen, A. P., J. H. Brown, and J. F. Gillooly. 2002. Global biodiversity, biochemical kinetics,
and the energetic-equivalence rule. Science 297: 1545–1548.
Allen, A. P., J. F. Gillooly, V. M. Savage, and J. H. Brown. 2006. Kinetic effects of
temperature on rates of genetic divergence and speciation. Proc. Natl. Acad. Sci. U. S. A.
103: 9130–9135.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local
alignment search tool. J. Mol. Biol. 215: 403–410.
Arrhenius, S. 1889. Über die Reaktionsgeschwindigkeit bei der Inversion von Rohrzucker
durch Säuren. Zeitschrift für physikalische Chemie 4: 226–248.
Baker, B. J., R. A. Lesniewski, and G. J. Dick. 2012. Genome-enabled transcriptomics reveals
archaeal populations that drive nitrification in a deep-sea hydrothermal plume. ISME J.
6: 2269–2279.
Baker, S. C., S. R. Bauer, R. P. Beyer, and others. 2005. The External RNA Controls
Consortium: a progress report. Nat. Methods 2: 731–734.
Blazewicz, S. J., R. L. Barnard, R. A. Daly, and M. K. Firestone. 2013. Evaluating rRNA as an
indicator of microbial activity in environmental communities: limitations and uses.
ISME J. 7: 2061–2068.
Bremges, A., I. Maus, P. Belmann, and others. 2015. Deeply sequenced metagenome and
metatranscriptome of a biogas-producing microbial community from an agricultural
production-scale biogas plant. Gigascience 4: 33.
71
Brown, J. H., J. F. Gillooly, A. P. Allen, V. M. Savage, and G. B. West. 2004. TOWARD A
METABOLIC THEORY OF ECOLOGY. Ecology 85: 1771–1789.
Buchfink, B., C. Xie, and D. H. Huson. 2015. Fast and sensitive protein alignment using
DIAMOND. Nat. Methods 12: 59–60.
Bushnell, B. 2016. BBMap short read aligner. University of California, Berkeley, California.
URL http://sourceforge. net/projects/bbmap.
Campbell, B. J., L. Yu, J. F. Heidelberg, and D. L. Kirchman. 2011. Activity of abundant and
rare bacteria in a coastal ocean. Proc. Natl. Acad. Sci. U. S. A. 108: 12776–12781.
Campbell, B. J., L. Yu, T. Straza, and D. L. Kirchman. 2009. Temporal changes in bacterial
rRNA and rRNA genes in Delaware (USA) coastal waters. Aquat. Microb. Ecol. 57: 123–
135.
Carney, K. M., and P. A. Matson. 2005. Plant Communities, Soil Microorganisms, and Soil
Carbon Cycling: Does Altering the World Belowground Matter to Ecosystem
Functioning? Ecosystems 8: 928–940.
Cottrell, M. T., and D. L. Kirchman. 2016. Transcriptional Control in Marine Copiotrophic
and Oligotrophic Bacteria with Streamlined Genomes. Appl. Environ. Microbiol. 82:
6010–6018.
Dixon, P. 2003. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14:
927–930.
Dupont, C. L., J. P. McCrow, R. Valas, and others. 2015. Genomes and gene expression
across light and productivity gradients in eastern subtropical Pacific microbial
communities. ISME J. 9: 1076–1092.
Franzosa, E. A., X. C. Morgan, N. Segata, and others. 2014. Relating the metatranscriptome
and metagenome of the human gut. Proc. Natl. Acad. Sci. U. S. A. 111: E2329–38.
72
Fuhrman, J. A. 2009. Microbial community structure and its functional implications.
Nature 459: 193–199.
Fuhrman, J. A., I. Hewson, M. S. Schwalbach, J. A. Steele, M. V. Brown, and S. Naeem.
2006. Annually reoccurring bacterial communities are predictable from ocean
conditions. Proc. Natl. Acad. Sci. U. S. A. 103: 13104–13109.
Fuhrman, J. A., J. A. Steele, I. Hewson, M. S. Schwalbach, M. V. Brown, J. L. Green, and J.
H. Brown. 2008. A latitudinal diversity gradient in planktonic marine bacteria. Proc.
Natl. Acad. Sci. U. S. A. 105: 7774–7778.
Gilbert, J. A., F. Meyer, L. Schriml, I. R. Joint, M. Mühling, and D. Field. 2010.
Metagenomes and metatranscriptomes from the L4 long-term coastal monitoring
station in the Western English Channel. Stand. Genomic Sci. 3: 183–193.
Gillooly, J. F., A. P. Allen, G. B. West, and J. H. Brown. 2005. The rate of DNA evolution:
effects of body size and temperature on the molecular clock. Proc. Natl. Acad. Sci. U. S.
A. 102: 140–145.
Giovannoni, S. J., J. Cameron Thrash, and B. Temperton. 2014. Implications of streamlining
theory for microbial ecology. ISME J. 8: 1553–1565.
Graham, E. B., J. E. Knelman, A. Schindlbacher, and others. 2016. Microbes as Engines of
Ecosystem Function: When Does Community Structure Enhance Predictions of
Ecosystem Processes? Front. Microbiol. 7: 214.
Huang, Y., B. Niu, Y. Gao, L. Fu, and W. Li. 2010. CD-HIT Suite: a web server for clustering
and comparing biological sequences. Bioinformatics 26: 680–682.
Hultman, J., M. P. Waldrop, R. Mackelprang, and others. 2015. Multi-omics of permafrost,
active layer and thermokarst bog soil microbiomes. Nature 521: 208–212.
73
Hunt, D. E., Y. Lin, M. J. Church, D. M. Karl, S. G. Tringe, L. K. Izzo, and Z. I. Johnson.
2013. Relationship between abundance and specific activity of bacterioplankton in open
ocean surface waters. Appl. Environ. Microbiol. 79: 177–184.
Hutchison, J. S., R. E. Ward, J. Lacroix, and others. 2008. Hypothermia therapy after
traumatic brain injury in children. N. Engl. J. Med. 358: 2447–2456.
Jenuth - Bioinformatics Methods and Protocols, J. P., and 1999. 1999. The NCBI: publicly
available tools and resources on the web. Springer.
Jousset, A., C. Bienhold, A. Chatzinotas, and others. 2017. Where less may be more: how the
rare biosphere pulls ecosystems strings. ISME J. 11: 853–862.
Kamke, J., M. W. Taylor, and S. Schmitt. 2010. Activity profiles for marine sponge-
associated bacteria obtained by 16S rRNA vs 16S rRNA gene comparisons. ISME J. 4:
498–508.
Kim, D., L. Song, F. P. Breitwieser, and S. L. Salzberg. 2016. Centrifuge: rapid and sensitive
classification of metagenomic sequences. Genome Res. 26: 1721–1729.
Kopylova, E., L. Noé, and H. Touzet. 2012. SortMeRNA: fast and accurate filtering of
ribosomal RNAs in metatranscriptomic data. Bioinformatics 28: 3211–3217.
Lauro, F. M., D. McDougald, T. Thomas, and others. 2009. The genomic basis of trophic
strategy in marine bacteria. Proc. Natl. Acad. Sci. U. S. A. 106: 15527–15533.
Liao, Y., G. K. Smyth, and W. Shi. 2014. featureCounts: an efficient general purpose program
for assigning sequence reads to genomic features. Bioinformatics 30: 923–930.
Li, D., C.-M. Liu, R. Luo, K. Sadakane, and T.-W. Lam. 2015a. MEGAHIT: an ultra-fast
single-node solution for large and complex metagenomics assembly via succinct de
Bruijn graph. Bioinformatics 31: 1674–1676.
74
Li, M., B. J. Baker, K. Anantharaman, S. Jain, J. A. Breier, and G. J. Dick. 2015b. Genomic
and transcriptomic evidence for scavenging of diverse organic compounds by
widespread deep-sea archaea. Nat. Commun. 6: 8933.
Lynch, M. D. J., and J. D. Neufeld. 2015. Ecology and exploration of the rare biosphere. Nat.
Rev. Microbiol. 13: 217–229.
Maus, I., D. E. Koeck, K. G. Cibis, and others. 2016. Unraveling the microbiome of a
thermophilic biogas plant by metagenome and metatranscriptome analysis
complemented by characterization of bacterial and archaeal isolates. Biotechnol.
Biofuels 9: 171.
McKinney - Proceedings of the 9th Python in Science, W., and 2010. 2010. Data structures
for statistical computing in python. pdfs.semanticscholar.org.
Musat, N., H. Halm, B. Winterholler, and others. 2008. A single-cell view on the
ecophysiology of anaerobic phototrophic bacteria. Proc. Natl. Acad. Sci. U. S. A. 105:
17861–17866.
Naeem, S., and S. Li. 1997. Biodiversity enhances ecosystem reliability. Nature 390: 507–
509.
NCBI Resource Coordinators. 2016. Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res. 44: D7–19.
Nelson, M. B., A. C. Martiny, and J. B. H. Martiny. 2016. Global biogeography of microbial
nitrogen-cycling traits in soil. Proc. Natl. Acad. Sci. U. S. A. 113: 8033–8040.
Prosser, J. I., B. J. M. Bohannan, T. P. Curtis, and others. 2007. The role of ecological theory
in microbial ecology. Nat. Rev. Microbiol. 5: 384–392.
Satinsky, B. M., B. C. Crump, C. B. Smith, and others. 2014a. Microspatial gene expression
patterns in the Amazon River Plume. Proc. Natl. Acad. Sci. U. S. A. 111: 11085–11090.
75
Satinsky, B. M., C. S. Fortunato, M. Doherty, and others. 2015. Metagenomic and
metatranscriptomic inventories of the lower Amazon River, May 2011. Microbiome 3:
39.
Satinsky, B. M., B. L. Zielinski, M. Doherty, C. B. Smith, S. Sharma, J. H. Paul, B. C. Crump,
and M. A. Moran. 2014b. The Amazon continuum dataset: quantitative metagenomic
and metatranscriptomic inventories of the Amazon River plume, June 2010.
Microbiome 2: 17.
Shapiro, B. J., J. Friedman, O. X. Cordero, S. P. Preheim, S. C. Timberlake, G. Szabó, M. F.
Polz, and E. J. Alm. 2012. Population genomics of early events in the ecological
differentiation of bacteria. Science 336: 48–51.
Shi, Y., G. W. Tyson, J. M. Eppley, and E. F. DeLong. 2011. Integrated metatranscriptomic
and metagenomic analyses of stratified microbial assemblages in the open ocean. ISME
J. 5: 999–1013.
Sieradzki, E. T., J. Cesar Ignacio-Espinoza, D. M. Needham, E. B. Fichot, and J. A. Fuhrman.
2017. Dynamic marine viral infections and major contribution to photosynthetic
processes shown by regional and seasonal picoplankton metatranscriptomes. bioRxiv
176644. doi:10.1101/176644
Sohm, J. A., E. A. Webb, and D. G. Capone. 2011. Emerging patterns of marine nitrogen
fixation. Nat. Rev. Microbiol. 9: 499–508.
Sukenik, A., R. N. Kaplan-Levy, J. M. Welch, and A. F. Post. 2012. Massive multiplication of
genome and ribosomes in dormant cells (akinetes) of Aphanizomenon ovalisporum
(Cyanobacteria). ISME J. 6: 670–679.
Sunagawa, S., L. P. Coelho, S. Chaffron, and others. 2015. Ocean plankton. Structure and
function of the global ocean microbiome. Science 348: 1261359.
76
Team, R. C. 2000. R language definition. Vienna, Austria: R foundation for statistical
computing.
Thompson, A. W., R. A. Foster, A. Krupke, B. J. Carter, N. Musat, D. Vaulot, M. M. M.
Kuypers, and J. P. Zehr. 2012. Unicellular cyanobacterium symbiotic with a single-celled
eukaryotic alga. Science 337: 1546–1550.
Thompson, L. R., J. G. Sanders, D. McDonald, and others. 2017. A communal catalogue
reveals Earth’s multiscale microbial diversity. Nature nature24621.
Tilman, D., M. Mattson, S. Langer - Limnology and Oceanography, and 1981. 1981.
Competition and nutrient kinetics along a temperature gradient: an experimental test of
a mechanistic approach to niche theory. Wiley Online Library.
Van Rossum, G. 2007. Python Programming Language. Proc. USENIX Annu. Tech. Conf.
Yooseph, S., K. H. Nealson, D. B. Rusch, and others. 2010. Genomic and functional
adaptation in surface ocean planktonic prokaryotes. Nature 468: 60–66.
60. Team, R. C. R language definition. Vienna, Austria: R foundation for statistical
computing (2000).
77
Figures
78
Figure 1. Relationships between community structure and community activity of highly
resolved sequence assemblies across disparate environments. Each point represents a
sequence assembly and its relative contribution to community structure and activity.
Environments are biogas plants, freshwater, freshwater plume into the ocean, the
human gut, ocean (epipelagic, benthic zone, hydrothermal vents), permafrost, and
thermokarst bog. Community structure is expressed as relative frequencies of DNA and
community activity as relative frequencies of RNA of samples sequenced with Illumina
platforms. All frequencies were adjusted to sequence assembly length and subsampled
to account for uneven sequencing effort. Points are colored by RefSeq lowest common
ancestor (LCA) taxonomy at the domain rank.
79
80
Figure 2. Patterns of rare high-resolution sequence assembly activity of Illumina
sequenced samples across different environments, lifestyles, and temperatures. a) Box
and whisker plots of rare assembly activity across different environments. Mann–
Whitney U test; P < 2.2 x 10
-16
. b) Rare microbial activity expressed as a ratio of log 10
rare:abundant % RNA as a function of temperature. Linear regression plotted with
Spearman’s ρ = -0.42 and P = 1.701 x 10
-7
. c) Activity dominance expressed as Berger–
Parker dominance of % RNA as a function of temperature. Linear regression plotted
with Spearman’s ρ = -0.7 and P < 2.2 x 10
-6
. d) Box and whisker plots of rare microbial
activity in attached and free, i.e. planktonic, lifestyles. Samples with microbes with
attached lifestyles were considered: sediment samples, i.e. permafrost, thermokarst bog,
and aquatic (freshwater and marine) samples collected on filters >0.8 µm. Samples with
microbes with free lifestyles were considered: the human gut and aquatic (freshwater
and marine) samples collected on filters 0.1 - 3.0 µm.
81
82
Figure 3. Functional traits of Illumina sequenced rare microbial activity overexpressed
in rare fraction relative to the abundant fraction. Overexpression of traits in the rare
fraction was determined by selecting ORFs with >2X activity in the rare fraction that
had a Mann–Whitney U with P < 1 x 10
-70
a) Box and whisker plots of rare
overexpressed NCBI Clusters of Orthologous Groups (COGs) functional categories. c)
Box and whisker plots of KEGG BRITE categories overexpressed in the rare fraction. d)
Box and whisker plots of KEGG BRITE “Energy metabolism” subcategories
overexpressed in the rare fraction. Box and whisker plots are sorted by median and
exclude ORFs where % RNA was 0.
83
Supplemental Figures
Supplementary Figure 1. Bioinformatic processing pipeline for determining proxies
for abundance, activity, and specific activity.
84
DNA
Supplementary Figure 2. Non-metric multidimensional scaling (NMDS) based on
sample DNA relative frequencies, i.e. abundances of Illumina sequences. Sample
distances are based on Bray-Curtis dissimilarities. Each point is a sample colored by its
source environment.
85
Supplementary Figure 3. Non-metric multidimensional scaling (NMDS) based on
sample RNA frequencies, i.e. activity of Illumina sequences. Sample distances are based
on Bray-Curtis dissimilarities. Each point is a sample colored by its source environment.
86
Supplementary Figure 4. Non-metric multidimensional scaling (NMDS) based on
sample specific activities, i.e RNA:DNA ratios of Illumina sequences. Sample distances
are based on Bray-Curtis dissimilarities. Each point is a sample colored by its source
environment.
87
Supplementary Figure 5. DNA % abundance as a function of DNA rank abundance
of metagenomic and metatranscriptomic sequence assemblies for Illumina sequenced
samples. Rank abundance curves are plotted by environment. Rare assemblies are
colored blue and abundant assemblies are yellow. Rare is considered >1000th rank and
are in the tail of the rank abundance curves or a mean of 0.005%.
88
Supplementary Figure 6. Activity (% RNA) as a function of of rank activity (% RNA)
of metagenomic and metatranscriptomic sequence assemblies. Rank activity curves are
plotted by environment.
89
Supplementary Figure 7. Box and whisker plots of abundance (DNA) and expression
(RNA) evenness across all Illumina sampled sequences (Mann–Whitney U test; P < 2.2
x 10
-16
). Evenness is calculated as Pielou’s evenness (J’) of sequence assemblies and their
relative frequencies in metagenomes and metatranscriptomes.
90
Supplementary Figure 8. Relationships between relative abundance (% DNA) and
(% RNA) of all sequence assemblies, regardless of taxonomy, that were sequenced on
the Illumina platform.
91
Supplementary Figure 9. Relationships between relative mapping frequencies of
abundance (% DNA) and activity (% RNA) of samples sequenced with the Roche 454
platform.
92
Supplementary Figure 10. Relationships between relative activity and relative
abundance for samples sequenced with methods that retain RNA strand direction.
Strandedness is only retained at the ORF level so mappings were counted at the ORF
level. The sense (yellow) reflects transcripts that directly correspond to mRNA that will
be translated into peptides. Anti-sense (blue) transcripts directly correspond to
transcripts that are the reverse complement of mRNA and can act as transcriptional and
post-translational regulators(Faghihi and Wahlestedt 2009).
93
Supplementary Figure 11. Relationships between relative abundance (% DNA) and
activity (% RNA) of functional protein clusters for Illumina sequenced samples.
Functional protein clusters were generated by semi-global clustering of ORFs at 60%
identity. Relative frequencies for each cluster were generated by summing the relative
frequencies of ORFs that formed each cluster.
94
Supplementary Figure 12. Relationships between absolute abundance (DNA
sequence L
-1
) and absolute activity (RNA sequence L
-1
) from the Amazon River(Satinsky
et al. 2015) and the Amazon River plume(Satinsky et al. 2014b; a). Absolute quantities
were generated by multiplying relative frequencies by measured quantities of DNA and
RNA from Satinsky et al., 2014a(Satinsky et al. 2014b), Satinsky et al., 2014b(Satinsky et
al. 2014a), Satinsky et al., 2015(Satinsky et al. 2015). Briefly, DNA and RNA samples
were amended with known quantities at the time of nucleic acid extraction. The percent
recovery of the DNA and RNA additions provide a direct conversion of relative
frequencies.
95
Supplementary Figure 13. Distribution of rare activity across all Illumina sequenced
samples. Rare activity is expressed as the relative RNA frequencies of assemblies in the
rare fraction.
96
Supplementary Figure 14. Linear regression of measured metagenomic relative
frequencies against expected relative frequencies of SPOT benthic zone amended
genome controls plotted with 95% confidence interval and SEM (n = 32;R
2
= 0.9933).
97
Supplementary Figure 15. Mean measured and relative frequencies from SPOT
benthic zone metagenomes for each amended genome control with SEM (n = 32).
98
Supplementary Figure 16. Linear regression of measured metatranscriptomic
relative frequencies against expected relative frequencies of SPOT benthic zone ERCC
controls plotted with 95% confidence interval and SEM (n = 30; R
2
= 0.9932).
99
Supplementary Figure 17. Mean measured and relative frequencies from SPOT
benthic zone metatranscriptomes for each amended ERCC control with SEM (n = 30).
100
Supplementary Tables
Source
publication
Source
database
Database ID (SRA ID, ENA
ID, etc) Environment
Pair
bin
Sample
nucleic
acid
type
Sequencing
platform
Poly-A
selected
rRNA
depleted
Directional
RNA library
Temperature
(˚C)
Particle
/
attached
lifestyle
Minimum
filter
(aquatic)
Maximum
filter
(aquatic) Depth
Dupont et
al., 2015 iMicrobe GOS265_454_0p1
Ocean -
epipelagic 78 DNA 454
Free 0.1 0.8 3
Dupont et
al., 2015 iMicrobe GOS265_454_0p8
Ocean -
epipelagic 78 DNA 454
Attached 0.8 3.0 3
Dupont et
al., 2015 iMicrobe GOS265_454_3p0
Ocean -
epipelagic 78 DNA 454
Attached 3.0 20.0 3
Dupont et
al., 2015 iMicrobe GOS265_oligo_dT
Ocean -
epipelagic 78 RNA 454 Yes No No
0.2 20.0 3
Dupont et
al., 2015 iMicrobe GOS265_random_hexamer
Ocean -
epipelagic 78 RNA 454 No Yes No
0.2 20.0 3
Dupont et
al., 2015 iMicrobe GOS265_sanger_0p1
Ocean -
epipelagic 78 DNA sanger
Free 0.1 0.8 3
Dupont et
al., 2015 iMicrobe GOS266_454_0p1
Ocean -
epipelagic 79 DNA 454
Free 0.1 0.8 62
Dupont et
al., 2015 iMicrobe GOS266_454_0p8
Ocean -
epipelagic 79 DNA 454
Attached 0.8 3.0 62
Dupont et
al., 2015 iMicrobe GOS266_454_3p0
Ocean -
epipelagic 79 DNA 454
Attached 3.0 20.0 62
Dupont et
al., 2015 iMicrobe GOS266_oligo_dT
Ocean -
epipelagic 79 RNA 454 Yes No No
0.2 20.0 62
Dupont et
al., 2015 iMicrobe GOS266_random_hexamer
Ocean -
epipelagic 79 RNA 454 No Yes No
0.2 20.0 62
Dupont et
al., 2015 iMicrobe GOS266_sanger_0p1
Ocean -
epipelagic 79 DNA sanger
Free 0.1 0.8 62
Dupont et
al., 2015 iMicrobe GOS267_454_0p1
Ocean -
epipelagic 80 DNA 454
Free 0.1 0.8 3
Dupont et
al., 2015 iMicrobe GOS267_454_0p8
Ocean -
epipelagic 80 DNA 454
Attached 0.8 3.0 3
Dupont et
al., 2015 iMicrobe GOS267_454_3p0
Ocean -
epipelagic 80 DNA 454
Attached 3.0 20.0 3
Dupont et
al., 2015 iMicrobe GOS267_oligo_dT
Ocean -
epipelagic 80 RNA 454 Yes No No
0.2 20.0 3
Dupont et
al., 2015 iMicrobe GOS267_random_hexamer
Ocean -
epipelagic 80 RNA 454 No Yes No
0.2 20.0 3
Dupont et
al., 2015 iMicrobe GOS267_sanger_0p1
Ocean -
epipelagic 80 DNA sanger
Free 0.1 0.8 3
Dupont et
al., 2015 iMicrobe GOS268_454_0p1
Ocean -
epipelagic 81 DNA 454
Free 0.1 0.8 110
Dupont et
al., 2015 iMicrobe GOS268_454_0p8
Ocean -
epipelagic 81 DNA 454
Attached 0.8 3.0 110
Dupont et
al., 2015 iMicrobe GOS268_454_3p0
Ocean -
epipelagic 81 DNA 454
Attached 3.0 20.0 110
101
Dupont et
al., 2015 iMicrobe GOS268_oligo_dT
Ocean -
epipelagic 81 RNA 454 Yes No No
0.2 20.0 110
Dupont et
al., 2015 iMicrobe GOS268_random_hexamer
Ocean -
epipelagic 81 RNA 454 No Yes No
0.2 20.0 110
Dupont et
al., 2015 iMicrobe GOS268_sanger_0p1
Ocean -
epipelagic 81 DNA sanger
Free 0.1 0.8 110
Dupont et
al., 2015 iMicrobe GOS269_454_0p1
Ocean -
epipelagic 82 DNA 454
Free 0.1 0.8 5
Dupont et
al., 2015 iMicrobe GOS269_454_0p8
Ocean -
epipelagic 82 DNA 454
Attached 0.8 3.0 5
Dupont et
al., 2015 iMicrobe GOS269_454_3p0
Ocean -
epipelagic 82 DNA 454
Attached 3.0 20.0 5
Dupont et
al., 2015 iMicrobe GOS269_oligo_dT
Ocean -
epipelagic 82 RNA 454 Yes No No
0.2 20.0 5
Dupont et
al., 2015 iMicrobe GOS269_random_hexamer
Ocean -
epipelagic 82 RNA 454 No Yes No
0.2 20.0 5
Dupont et
al., 2015 iMicrobe GOS269_sanger_0p1
Ocean -
epipelagic 82 DNA sanger
Free 0.1 0.8 5
Dupont et
al., 2015 iMicrobe GOS270_454_0p1
Ocean -
epipelagic 83 DNA 454
Free 0.1 0.8 108
Dupont et
al., 2015 iMicrobe GOS270_454_0p8
Ocean -
epipelagic 83 DNA 454
Free 0.8 3.0 108
Dupont et
al., 2015 iMicrobe GOS270_454_3p0
Ocean -
epipelagic 83 DNA 454
Attached 3.0 20.0 108
Dupont et
al., 2015 iMicrobe GOS270_oligo_dT
Ocean -
epipelagic 83 RNA 454 Yes No No
0.2 20.0 108
Dupont et
al., 2015 iMicrobe GOS270_random_hexamer
Ocean -
epipelagic 83 RNA 454 No Yes No
0.2 20.0 108
Dupont et
al., 2015 iMicrobe GOS270_sanger_0p1
Ocean -
epipelagic 83 DNA sanger
Free 0.1 0.8 108
Dupont et
al., 2015 iMicrobe GOS271_454_0p1
Ocean -
epipelagic 84 DNA 454
Free 0.1 0.8 30
Dupont et
al., 2015 iMicrobe GOS271_454_0p8
Ocean -
epipelagic 84 DNA 454
Attached 0.8 3.0 30
Dupont et
al., 2015 iMicrobe GOS271_454_3p0
Ocean -
epipelagic 84 DNA 454
Attached 3.0 20.0 30
Dupont et
al., 2015 iMicrobe GOS271_oligo_dT_2
Ocean -
epipelagic 84 RNA 454 Yes No No
0.2 20.0 30
Dupont et
al., 2015 iMicrobe GOS271_random_hexamer
Ocean -
epipelagic 84 RNA 454 No Yes No
0.2 20.0 30
Dupont et
al., 2015 iMicrobe GOS271_sanger_0p1
Ocean -
epipelagic 84 DNA sanger
Free 0.1 0.8 30
Dupont et
al., 2015 iMicrobe GOS272_454_0p1
Ocean -
epipelagic 85 DNA 454
Free 0.1 0.8 3
Dupont et
al., 2015 iMicrobe GOS272_454_0p8
Ocean -
epipelagic 85 DNA 454
Attached 0.8 3.0 3
Dupont et
al., 2015 iMicrobe GOS272_454_3p0
Ocean -
epipelagic 85 DNA 454
Attached 3.0 20.0 3
102
Dupont et
al., 2015 iMicrobe GOS272_oligo_dT
Ocean -
epipelagic 85 RNA 454 Yes No No
0.2 20.0 3
Dupont et
al., 2015 iMicrobe GOS272_random_hexamer
Ocean -
epipelagic 85 RNA 454 No Yes No
0.2 20.0 3
Dupont et
al., 2015 iMicrobe GOS272_sanger_0p1
Ocean -
epipelagic 85 DNA sanger
Free 0.1 0.8 3
This study
NCBI
SRA SRR1230729
Ocean -
epipelagic 30 DNA Illumina
18.8 Free
10
This study
NCBI
SRA SRR1238005
Ocean -
epipelagic 30 RNA Illumina No Yes No 18.8 Free
10
This study
NCBI
SRA SRR1211819
Ocean -
epipelagic 31 DNA 454
18.9 Free
10
This study
NCBI
SRA SRR1230754
Ocean -
epipelagic 31 DNA Illumina
18.9 Free
10
This study
NCBI
SRA SRR1230757
Ocean -
epipelagic 31 RNA Illumina No Yes No 18.9 Free
10
This study
NCBI
SRA SRR1211842
Ocean -
epipelagic 31 DNA PacBio
18.9 Free
10
This study
NCBI
SRA SRR1211874
Ocean -
epipelagic 32 DNA 454
23.9 Free
50
This study
NCBI
SRA SRR1230755
Ocean -
epipelagic 32 DNA Illumina
23.9 Free
50
This study
NCBI
SRA SRR1230758
Ocean -
epipelagic 32 RNA Illumina No No No 23.9 Free
50
This study
NCBI
SRA SRR1211893
Ocean -
epipelagic 32 DNA PacBio
23.9 Free
50
This study
NCBI
SRA SRR1230756
Ocean -
epipelagic 33 DNA Illumina
23.2 Free
50
This study
NCBI
SRA SRR1230759
Ocean -
epipelagic 33 RNA Illumina No No No 23.2 Free
50
Shi et al.,
2011
NCBI
SRA SRR010902
Ocean -
epipelagic 86 DNA 454
Free 0.2 1.6 125
Shi et al.,
2011
NCBI
SRA SRR010903
Ocean -
epipelagic 86 DNA 454
Free 0.2 1.6 125
Shi et al.,
2011
NCBI
SRA SRR010904
Ocean -
epipelagic 86 RNA 454 No No No
Free 0.2 1.6 125
Shi et al.,
2011
NCBI
SRA SRR010905
Ocean -
epipelagic 86 RNA 454 No No No
Free 0.2 1.6 125
Shi et al.,
2011
NCBI
SRA SRR010897
Ocean -
epipelagic 87 DNA 454
Free 0.2 1.6 25
Shi et al.,
2011
NCBI
SRA SRR010898
Ocean -
epipelagic 87 DNA 454
Free 0.2 1.6 25
Shi et al.,
2011
NCBI
SRA SRR010899
Ocean -
epipelagic 87 DNA 454
Free 0.2 1.6 25
Shi et al.,
2011
NCBI
SRA SRR010900
Ocean -
epipelagic 87 RNA 454 No No No
Free 0.2 1.6 25
Shi et al.,
2011
NCBI
SRA SRR010901
Ocean -
epipelagic 87 RNA 454 No No No
Free 0.2 1.6 25
103
Shi et al.,
2011
NCBI
SRA SRR000905
Ocean -
epipelagic 89 DNA 454
Free 0.2 1.6 75
Shi et al.,
2011
NCBI
SRA SRR000906
Ocean -
epipelagic 89 DNA 454
Free 0.2 1.6 75
Shi et al.,
2011
NCBI
SRA SRR000907
Ocean -
epipelagic 89 DNA 454
Free 0.2 1.6 75
Shi et al.,
2011
NCBI
SRA SRR000859
Ocean -
epipelagic 89 RNA 454 No No No
Free 0.2 1.6 75
Shi et al.,
2011
NCBI
SRA SRR000860
Ocean -
epipelagic 89 RNA 454 No No No
Free 0.2 1.6 75
Shi et al.,
2011
NCBI
SRA SRR000861
Ocean -
epipelagic 89 RNA 454 No No No
Free 0.2 1.6 75
Shi et al.,
2011
NCBI
SRA SRR020490
Ocean -
epipelagic 90 DNA 454
110
Shi et al.,
2011
NCBI
SRA SRR516509
Ocean -
epipelagic 90 RNA 454 No No No
110
Shi et al.,
2011
NCBI
SRA SRR020493
Ocean -
epipelagic 91 DNA 454
25
Shi et al.,
2011
NCBI
SRA SRR020494
Ocean -
epipelagic 91 DNA 454
25
Shi et al.,
2011
NCBI
SRA SRR020488
Ocean -
epipelagic 93 DNA 454
75
Shi et al.,
2011
NCBI
SRA SRR020489
Ocean -
epipelagic 93 DNA 454
75
Shi et al.,
2011
NCBI
SRA SRR010906
Ocean -
mesopelagic 88 DNA 454
Free 0.2 1.6 500
Shi et al.,
2011
NCBI
SRA SRR010907
Ocean -
mesopelagic 88 DNA 454
Free 0.2 1.6 500
Shi et al.,
2011
NCBI
SRA SRR010908
Ocean -
mesopelagic 88 DNA 454
Free 0.2 1.6 500
Shi et al.,
2011
NCBI
SRA SRR010909
Ocean -
mesopelagic 88 RNA 454 No No No
Free 0.2 1.6 500
Shi et al.,
2011
NCBI
SRA SRR010910
Ocean -
mesopelagic 88 RNA 454 No No No
Free 0.2 1.6 500
Shi et al.,
2011
NCBI
SRA SRR020491
Ocean -
mesopelagic 92 DNA 454
500
Shi et al.,
2011
NCBI
SRA SRR020492
Ocean -
mesopelagic 92 DNA 454
500
Shi et al.,
2011
NCBI
SRA SRR516508
Ocean -
mesopelagic 92 RNA 454 No No No
500
Shi et al.,
2011
NCBI
SRA SRR1298978
Ocean -
mesopelagic 94 DNA 454
770
Shi et al.,
2011
NCBI
SRA SRR1298979
Ocean -
mesopelagic 95 DNA 454
770
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR341454
Ocean -
bathypelagic 11 DNA 454
3
0.2
1600
104
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR490081
Ocean -
bathypelagic 11 RNA 454 No No No 3
0.2
1600
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR453184
Ocean -
bathypelagic 11 RNA Illumina No No No 3
0.2
1600
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR341453
Ocean -
bathypelagic 36 DNA 454
2.6
0.2
1900
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR495365
Ocean -
bathypelagic 36 RNA 454 No No No 2.6
0.2
1900
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR488331
Ocean -
hydrothermal
vent 6 DNA 454
3
0.2
1775
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR488330
Ocean -
hydrothermal
vent 7 DNA 454
3
0.2
1996
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR341455
Ocean -
hydrothermal
vent 10 RNA 454 No No No 3
0.2
1950
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR341456
Ocean -
hydrothermal
vent 10 RNA 454 No No No 3
0.2
1950
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR452448
Ocean -
hydrothermal
vent 10 RNA Illumina No No No 3
0.2
1950
Baker et
al., 2012;
Lesniewski
et al., 2012
NCBI
SRA SRR341457
Ocean -
hydrothermal
vent 29 RNA 454 No No No 2.9
0.2
1963
Sieradzki
et al., 2017 ENA ERR2094672
Ocean -
epipelagic 201 DNA Illumina
15.9 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089000
Ocean -
epipelagic 201 RNA Illumina No Yes Yes 15.9 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094671
Ocean -
epipelagic 202 DNA Illumina
14.1 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2088999
Ocean -
epipelagic 202 RNA Illumina No Yes Yes 14.1 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094669
Ocean -
epipelagic 203 DNA Illumina
19 Free 0.2 1.0 0
105
Sieradzki
et al., 2017 ENA ERR2088997
Ocean -
epipelagic 203 RNA Illumina No Yes Yes 19 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094670
Ocean -
epipelagic 204 DNA Illumina
20.6 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2088998
Ocean -
epipelagic 204 RNA Illumina No Yes Yes 20.6 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094676
Ocean -
epipelagic 206 DNA Illumina
14.5 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089004
Ocean -
epipelagic 206 RNA Illumina No Yes Yes 14.5 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094675
Ocean -
epipelagic 207 DNA Illumina
14.5 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089003
Ocean -
epipelagic 207 RNA Illumina No Yes Yes 14.5 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094673
Ocean -
epipelagic 208 DNA Illumina
16 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089001
Ocean -
epipelagic 208 RNA Illumina No Yes Yes 16 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094674
Ocean -
epipelagic 209 DNA Illumina
20 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089002
Ocean -
epipelagic 209 RNA Illumina No Yes Yes 20 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094680
Ocean -
epipelagic 210 DNA Illumina
15.7 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089008
Ocean -
epipelagic 210 RNA Illumina No Yes Yes 15.7 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094679
Ocean -
epipelagic 211 DNA Illumina
15.2 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089007
Ocean -
epipelagic 211 RNA Illumina No Yes Yes 15.2 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094677
Ocean -
epipelagic 212 DNA Illumina
18 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089005
Ocean -
epipelagic 212 RNA Illumina No Yes Yes 18 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2094678
Ocean -
epipelagic 213 DNA Illumina
20.6 Free 0.2 1.0 0
Sieradzki
et al., 2017 ENA ERR2089006
Ocean -
epipelagic 213 RNA Illumina No Yes Yes 20.6 Free 0.2 1.0 0
Gilbert et
al., 2010 ENA ERR010492
Ocean -
epipelagic 160 DNA 454
9.6 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010493
Ocean -
epipelagic 160 RNA 454 No Yes No 9.6 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010488
Ocean -
epipelagic 161 DNA 454
9.7 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010491
Ocean -
epipelagic 161 RNA 454 No Yes No 9.7 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010489
Ocean -
epipelagic 162 DNA 454
9.7 Free 0.2 1.6 7
106
Gilbert et
al., 2010 ENA ERR010490
Ocean -
epipelagic 162 RNA 454 No Yes No 9.7 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010500
Ocean -
epipelagic 163 DNA 454
15.8 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010501
Ocean -
epipelagic 163 RNA 454 No Yes No 15.8 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010498
Ocean -
epipelagic 164 DNA 454
15.9 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010499
Ocean -
epipelagic 164 RNA 454 No Yes No 15.9 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010496
Ocean -
epipelagic 165 DNA 454
15.8 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010497
Ocean -
epipelagic 165 RNA 454 No Yes No 15.8 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010494
Ocean -
epipelagic 166 DNA 454
15.7 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010495
Ocean -
epipelagic 166 RNA 454 No Yes No 15.7 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010482
Ocean -
epipelagic 167 DNA 454
10.1 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010483
Ocean -
epipelagic 167 DNA 454
10.1 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010484
Ocean -
epipelagic 167 RNA 454 No Yes No 10.1 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010485
Ocean -
epipelagic 167 RNA 454 No Yes No 10.1 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010486
Ocean -
epipelagic 168 DNA 454
10.1 Free 0.2 1.6 7
Gilbert et
al., 2010 ENA ERR010487
Ocean -
epipelagic 168 RNA 454 No Yes No 10.1 Free 0.2 1.6 7
Franzosa
et al., 2014
NCBI
SRA SRR769532 Human gut 169 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769428 Human gut 169 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769427 Human gut 169 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769520 Human gut 170 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769436 Human gut 170 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769404 Human gut 170 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769511 Human gut 171 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769514 Human gut 172 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769403 Human gut 172 RNA Illumina No Yes Yes 37 Free
107
Franzosa
et al., 2014
NCBI
SRA SRR769396 Human gut 172 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769538 Human gut 173 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769420 Human gut 173 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769438 Human gut 173 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769509 Human gut 174 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769433 Human gut 174 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769405 Human gut 174 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769535 Human gut 175 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769515 Human gut 176 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769426 Human gut 176 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769398 Human gut 176 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769516 Human gut 177 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769432 Human gut 177 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769397 Human gut 177 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769527 Human gut 178 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769407 Human gut 178 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769395 Human gut 178 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769539 Human gut 179 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769513 Human gut 180 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769417 Human gut 180 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769440 Human gut 180 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769530 Human gut 181 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769411 Human gut 181 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769423 Human gut 181 RNA Illumina No Yes Yes 37 Free
108
Franzosa
et al., 2014
NCBI
SRA SRR769528 Human gut 182 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769413 Human gut 182 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769441 Human gut 182 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769522 Human gut 183 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769529 Human gut 184 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769422 Human gut 184 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769400 Human gut 184 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769525 Human gut 185 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769409 Human gut 185 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769406 Human gut 185 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769510 Human gut 186 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769416 Human gut 186 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769419 Human gut 186 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769512 Human gut 187 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769518 Human gut 188 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769410 Human gut 188 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769424 Human gut 188 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769534 Human gut 189 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769421 Human gut 189 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769399 Human gut 189 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769526 Human gut 190 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769435 Human gut 190 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769429 Human gut 190 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769521 Human gut 191 DNA Illumina
37 Free
109
Franzosa
et al., 2014
NCBI
SRA SRR769540 Human gut 192 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769412 Human gut 192 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769442 Human gut 192 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769524 Human gut 193 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769414 Human gut 193 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769437 Human gut 193 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769531 Human gut 194 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769418 Human gut 194 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769439 Human gut 194 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769517 Human gut 195 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769537 Human gut 196 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769415 Human gut 196 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769402 Human gut 196 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769519 Human gut 197 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769408 Human gut 197 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769430 Human gut 197 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769533 Human gut 198 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769431 Human gut 198 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769401 Human gut 198 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769536 Human gut 199 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769523 Human gut 200 DNA Illumina
37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769425 Human gut 200 RNA Illumina No Yes Yes 37 Free
Franzosa
et al., 2014
NCBI
SRA SRR769434 Human gut 200 RNA Illumina No Yes Yes 37 Free
Hultman
et al., 2015
NCBI
SRA SRR5009555 Permafrost 55 DNA Illumina
Attached
110
Hultman
et al., 2015 JGI 2242.4.1834 Permafrost 55 RNA Illumina No No No
Attached
Hultman
et al., 2015
NCBI
SRA SRR4027934 Permafrost 56 DNA Illumina
Attached
Hultman
et al., 2015
NCBI
SRA SRR441745 Permafrost 56 RNA Illumina No No No
Attached
Hultman
et al., 2015 JGI 1141.4.1206 Permafrost 98 DNA Illumina
Attached
Hultman
et al., 2015 JGI 2209.1.1816 Permafrost 98 RNA Illumina No No No
Attached
Hultman
et al., 2015 JGI 1141.7.1206 Permafrost 99 DNA Illumina
Attached
Hultman
et al., 2015 JGI 2144.7.1786 Permafrost 99 RNA Illumina No No No
Attached
Hultman
et al., 2015 JGI 1311.2.1306
Thermokarst
bog 57 DNA Illumina
Attached
Hultman
et al., 2015 JGI 2242.7.1834
Thermokarst
bog 57 RNA Illumina No No No
Attached
Hultman
et al., 2015 JGI 1311.4.1306
Thermokarst
bog 58 DNA Illumina
Attached
Hultman
et al., 2015 JGI 2242.5.1834
Thermokarst
bog 58 RNA Illumina No No No
Attached
Maus et
al., 2016 ENA ERR1299062 Biogas plant 147 DNA Illumina
54 Attached
Maus et
al., 2016 ENA ERR1299063 Biogas plant 147 DNA Illumina
54 Attached
Maus et
al., 2016 ENA ERR1299064 Biogas plant 147 DNA Illumina
54 Attached
Maus et
al., 2016 ENA ERR1299065 Biogas plant 147 DNA Illumina
54 Attached
Maus et
al., 2016 ENA ERR1299066 Biogas plant 147 DNA Illumina
54 Attached
Maus et
al., 2016 ENA ERR1299182 Biogas plant 147 RNA Illumina No No Yes 54 Attached
Li et al.,
2015
NCBI
SRA SRR2046222
Ocean -
bathypelagic 66 DNA Illumina
0.2
2327
Li et al.,
2015
NCBI
SRA SRR2044911
Ocean -
bathypelagic 66 RNA Illumina No No No
0.2
2327
Li et al.,
2015
NCBI
SRA SRR2044912
Ocean -
bathypelagic 67 RNA Illumina No No No
0.2
4850
Li et al.,
2015
NCBI
SRA SRR2046221
Ocean -
bathypelagic 68 DNA Illumina
0.2
4950
Li et al.,
2015
NCBI
SRA SRR2044914
Ocean -
bathypelagic 68 RNA Illumina No No No
0.2
4950
Li et al.,
2015
NCBI
SRA SRR453184
Ocean -
bathypelagic 77 RNA Illumina No No No 3
0.2
1600
Li et al.,
2015
NCBI
SRA SRR2044844
Ocean -
bathypelagic 96 RNA Illumina No No No
0.2
2291
111
Li et al.,
2015
NCBI
SRA SRR2044846
Ocean -
bathypelagic 97 RNA Illumina No No No
0.2
2291
Li et al.,
2015
NCBI
SRA SRR1217566
Ocean -
bathypelagic 152 DNA Illumina
0.2
2315
Li et al.,
2015
NCBI
SRA SRR1217567
Ocean -
bathypelagic 153 DNA Illumina
0.2
1785
Li et al.,
2015
NCBI
SRA SRR1217461
Ocean -
bathypelagic 156 DNA Illumina
0.2
800
Li et al.,
2015
NCBI
SRA SRR1217463
Ocean -
bathypelagic 158 DNA Illumina
0.8
2155
Li et al.,
2015
NCBI
SRA SRR2044842
Ocean -
hydrothermal
vent 59 RNA Illumina No No No
0.2
2042
Li et al.,
2015
NCBI
SRA SRR2046235
Ocean -
hydrothermal
vent 60 DNA Illumina
0.2
2041
Li et al.,
2015
NCBI
SRA SRR2044843
Ocean -
hydrothermal
vent 60 RNA Illumina No No No
0.2
2041
Li et al.,
2015
NCBI
SRA SRR2046236
Ocean -
hydrothermal
vent 61 DNA Illumina
0.2
2238
Li et al.,
2015
NCBI
SRA SRR2044878
Ocean -
hydrothermal
vent 61 RNA Illumina No No No
0.2
2238
Li et al.,
2015
NCBI
SRA SRR2044888
Ocean -
hydrothermal
vent 62 RNA Illumina No No No
0.2
2256
Li et al.,
2015
NCBI
SRA SRR2044889
Ocean -
hydrothermal
vent 63 RNA Illumina No No No
0.2
4316
Li et al.,
2015
NCBI
SRA SRR2046238
Ocean -
hydrothermal
vent 64 DNA Illumina
0.2
4869
Li et al.,
2015
NCBI
SRA SRR2044891
Ocean -
hydrothermal
vent 64 RNA Illumina No No No
0.2
4869
Li et al.,
2015
NCBI
SRA SRR2046237
Ocean -
hydrothermal
vent 65 DNA Illumina
0.2
4946
Li et al.,
2015
NCBI
SRA SRR2044892
Ocean -
hydrothermal
vent 65 RNA Illumina No No No
0.2
4946
Li et al.,
2015
NCBI
SRA SRR3577362
Ocean -
hydrothermal
vent 75 DNA Illumina
0.2
1993
Li et al.,
2015
NCBI
SRA SRR452448
Ocean -
hydrothermal
vent 76 RNA Illumina No No No
0.2
1950
112
Li et al.,
2015
NCBI
SRA SRR1217459
Ocean -
hydrothermal
vent 148 DNA Illumina
0.8
2440
Li et al.,
2015
NCBI
SRA SRR1217564
Ocean -
hydrothermal
vent 149 DNA Illumina
0.8
2639
Li et al.,
2015
NCBI
SRA SRR1217460
Ocean -
hydrothermal
vent 150 DNA Illumina
0.8
1960
Li et al.,
2015
NCBI
SRA SRR1217565
Ocean -
hydrothermal
vent 151 DNA Illumina
0.8
2159
Li et al.,
2015
NCBI
SRA SRR1217367
Ocean -
hydrothermal
vent 154 DNA Illumina
0.8
2605
Li et al.,
2015
NCBI
SRA SRR1217452
Ocean -
hydrothermal
vent 155 DNA Illumina
0.8
1890
Li et al.,
2015
NCBI
SRA SRR1217462
Ocean -
hydrothermal
vent 157 DNA Illumina
0.8
1919
Li et al.,
2015
NCBI
SRA SRR1217465
Ocean -
hydrothermal
vent 159 DNA Illumina
0.8
2229
Satinsky et
al., 2015
NCBI
SRA SRR1786279 Freshwater 17 DNA Illumina
Free 0.2 2.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1786608 Freshwater 17 DNA Illumina
Free 0.2 2.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1785209 Freshwater 17 RNA Illumina No Yes No
Free 0.2 2.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1785351 Freshwater 17 RNA Illumina No Yes No
Free 0.2 2.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1786281 Freshwater 18 DNA Illumina
Attached 2.0 297.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1786616 Freshwater 18 DNA Illumina
Attached 2.0 297.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1785350 Freshwater 18 RNA Illumina No Yes No
Attached 2.0 297.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1785352 Freshwater 18 RNA Illumina No Yes No
Attached 2.0 297.0 14
Satinsky et
al., 2015
NCBI
SRA SRR1787940 Freshwater 19 DNA Illumina
Free 0.2 2.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1788318 Freshwater 19 DNA Illumina
Free 0.2 2.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1784299 Freshwater 19 RNA Illumina No Yes No
Free 0.2 2.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1784305 Freshwater 19 RNA Illumina No Yes No
Free 0.2 2.0 10
113
Satinsky et
al., 2015
NCBI
SRA SRR1787943 Freshwater 20 DNA Illumina
Attached 2.0 297.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1790487 Freshwater 20 DNA Illumina
Attached 2.0 297.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1784304 Freshwater 20 RNA Illumina No Yes No
Attached 2.0 297.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1785207 Freshwater 20 RNA Illumina No Yes No
Attached 2.0 297.0 10
Satinsky et
al., 2015
NCBI
SRA SRR1790489 Freshwater 21 DNA Illumina
Free 0.2 2.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1790646 Freshwater 21 DNA Illumina
Free 0.2 2.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1781844 Freshwater 21 RNA Illumina No Yes No
Free 0.2 2.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1781804 Freshwater 21 RNA Illumina No Yes No
Free 0.2 2.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1790644 Freshwater 22 DNA Illumina
Attached 2.0 297.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1790647 Freshwater 22 DNA Illumina
Attached 2.0 297.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1781811 Freshwater 22 RNA Illumina No Yes No
Attached 2.0 297.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1781915 Freshwater 22 RNA Illumina No Yes No
Attached 2.0 297.0 19
Satinsky et
al., 2015
NCBI
SRA SRR1790676 Freshwater 23 DNA Illumina
Free 0.2 2.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1790679 Freshwater 23 DNA Illumina
Free 0.2 2.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1781945 Freshwater 23 RNA Illumina No Yes No
Free 0.2 2.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1782602 Freshwater 23 RNA Illumina No Yes No
Free 0.2 2.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1790678 Freshwater 24 DNA Illumina
Attached 2.0 297.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1790680 Freshwater 24 DNA Illumina
Attached 2.0 297.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1782579 Freshwater 24 RNA Illumina No Yes No
Attached 2.0 297.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1782604 Freshwater 24 RNA Illumina No Yes No
Attached 2.0 297.0 33
Satinsky et
al., 2015
NCBI
SRA SRR1796116 Freshwater 25 DNA Illumina
Free 0.2 2.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1796234 Freshwater 25 DNA Illumina
Free 0.2 2.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1777513 Freshwater 25 RNA Illumina No Yes No
Free 0.2 2.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1779189 Freshwater 25 RNA Illumina No Yes No
Free 0.2 2.0 0.5
114
Satinsky et
al., 2015
NCBI
SRA SRR1792674 Freshwater 26 DNA Illumina
Free 0.2 2.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1793861 Freshwater 26 DNA Illumina
Free 0.2 2.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1779221 Freshwater 26 RNA Illumina No Yes No
Free 0.2 2.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1781714 Freshwater 26 RNA Illumina No Yes No
Free 0.2 2.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1796118 Freshwater 27 DNA Illumina
Attached 2.0 297.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1796236 Freshwater 27 DNA Illumina
Attached 2.0 297.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1778024 Freshwater 27 RNA Illumina No Yes No
Attached 2.0 297.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1779203 Freshwater 27 RNA Illumina No Yes No
Attached 2.0 297.0 0.5
Satinsky et
al., 2015
NCBI
SRA SRR1792852 Freshwater 28 DNA Illumina
Attached 2.0 297.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1793862 Freshwater 28 DNA Illumina
Attached 2.0 297.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1781711 Freshwater 28 RNA Illumina No Yes No
Attached 2.0 297.0 15
Satinsky et
al., 2015
NCBI
SRA SRR1781802 Freshwater 28 RNA Illumina No Yes No
Attached 2.0 297.0 15
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1186930
Freshwater
plume 100 RNA Illumina No No No 29.362795 Free 0.2 2.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193190
Freshwater
plume 101 RNA Illumina No No No 29.362795 Attached 2.0 156.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199277
Freshwater
plume 101 RNA Illumina Yes No No 29.362795 Attached 2.0 156.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199271
Freshwater
plume 102 DNA Illumina
29.362795 Free 0.2 2.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1200525
Freshwater
plume 102 RNA Illumina No No No 29.362795 Free 0.2 2.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205250
Freshwater
plume 103 DNA Illumina
29.362795 Attached 2.0 156.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1209977
Freshwater
plume 104 DNA Illumina
29.362795 Free 0.2 2.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1209978
Freshwater
plume 105 DNA Illumina
29.362795 Attached 2.0 156.0
4.26
m
115
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1200526
Freshwater
plume 105 RNA Illumina No No No 29.362795 Attached 2.0 156.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1185391
Freshwater
plume 105 RNA Illumina Yes No No 29.362795 Attached 2.0 156.0
4.26
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193205
Freshwater
plume 123 RNA Illumina No No No 28.6307375 Free 0.2 2.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199270
Freshwater
plume 123 DNA Illumina
28.6307375 Free 0.2 2.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193177
Freshwater
plume 124 RNA Illumina No No No 28.6307375 Attached 2.0 156.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199281
Freshwater
plume 124 RNA Illumina Yes No No 28.6307375 Attached 2.0 156.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1182512
Freshwater
plume 125 DNA Illumina
28.6307375 Free 0.2 2.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1182511
Freshwater
plume 126 DNA Illumina
28.6307375 Attached 2.0 156.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1185390
Freshwater
plume 126 RNA Illumina Yes No No 28.6307375 Attached 2.0 156.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204563
Freshwater
plume 127 RNA Illumina No No No 28.6307375 Free 0.2 2.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1202081
Freshwater
plume 128 DNA Illumina
28.6307375 Attached 2.0 156.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199282
Freshwater
plume 128 RNA Illumina No No No 28.6307375 Attached 2.0 156.0
4.47
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1186214
Freshwater
plume 106 DNA Illumina
29.21834 Free 0.2 2.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1202089
Freshwater
plume 107 DNA Illumina
29.21834 Attached 2.0 156.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199279
Freshwater
plume 107 RNA Illumina Yes No No 29.21834 Attached 2.0 156.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193632
Freshwater
plume 108 RNA Illumina No No No 29.21834 Free 0.2 2.0
3.64
m
116
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199285
Freshwater
plume 108 RNA Illumina No No No 29.21834 Free 0.2 2.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193237
Freshwater
plume 109 RNA Illumina No No No 29.21834 Attached 2.0 156.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1185392
Freshwater
plume 109 RNA Illumina Yes No No 29.21834 Attached 2.0 156.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204564
Freshwater
plume 109 RNA Illumina No No No 29.21834 Attached 2.0 156.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205252
Freshwater
plume 110 DNA Illumina
29.21834 Free 0.2 2.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204580
Freshwater
plume 111 DNA Illumina
29.21834 Attached 2.0 156.0
3.64
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204581
Freshwater
plume 112 DNA Illumina
29.21451739 Free 0.2 2.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205247
Freshwater
plume 112 RNA Illumina No No No 29.21451739 Free 0.2 2.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205800
Freshwater
plume 113 RNA Illumina No No No 29.21451739 Attached 2.0 156.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199278
Freshwater
plume 113 RNA Illumina Yes No No 29.21451739 Attached 2.0 156.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1202090
Freshwater
plume 114 DNA Illumina
29.21451739 Free 0.2 2.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204579
Freshwater
plume 114 RNA Illumina No No No 29.21451739 Free 0.2 2.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204509
Freshwater
plume 115 RNA Illumina Yes No No 29.21451739 Attached 2.0 156.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205251
Freshwater
plume 115 DNA Illumina
29.21451739 Attached 2.0 156.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204565
Freshwater
plume 116 RNA Illumina No No No 29.21451739 Attached 2.0 156.0
3.93
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1209976
Freshwater
plume 116 DNA Illumina
29.21451739 Attached 2.0 156.0
3.93
m
117
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1183650
Freshwater
plume 117 DNA Illumina
28.39964667 Free 0.2 2.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199286
Freshwater
plume 118 RNA Illumina No No No 28.39964667 Attached 2.0 156.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1202091
Freshwater
plume 119 DNA Illumina
28.39964667 Free 0.2 2.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205249
Freshwater
plume 119 RNA Illumina No No No 28.39964667 Free 0.2 2.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1183643
Freshwater
plume 120 DNA Illumina
28.39964667 Attached 2.0 156.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1185399
Freshwater
plume 120 RNA Illumina Yes No No 28.39964667 Attached 2.0 156.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193627
Freshwater
plume 121 RNA Illumina No No No 28.39964667 Free 0.2 2.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1202095
Freshwater
plume 122 DNA Illumina
28.39964667 Attached 2.0 156.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193629
Freshwater
plume 122 RNA Illumina No No No 28.39964667 Attached 2.0 156.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1204512
Freshwater
plume 122 RNA Illumina Yes No No 28.39964667 Attached 2.0 156.0
3.89
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199283
Freshwater
plume 129 RNA Illumina No No No 28.85325 Free 0.2 2.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1185414
Freshwater
plume 130 DNA Illumina
28.85325 Attached 2.0 156.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193226
Freshwater
plume 130 RNA Illumina No No No 28.85325 Attached 2.0 156.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199280
Freshwater
plume 130 RNA Illumina Yes No No 28.85325 Attached 2.0 156.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1185413
Freshwater
plume 131 DNA Illumina
28.85325 Free 0.2 2.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199284
Freshwater
plume 132 RNA Illumina No No No 28.85325 Attached 2.0 156.0
3.76
m
118
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1205253
Freshwater
plume 133 DNA Illumina
28.85325 Free 0.2 2.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193215
Freshwater
plume 133 RNA Illumina No No No 28.85325 Free 0.2 2.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1199272
Freshwater
plume 134 DNA Illumina
28.85325 Attached 2.0 156.0
3.76
m
Satinsky et
al., 2014a,
2014b
NCBI
SRA SRR1193634
Freshwater
plume 134 RNA Illumina Yes No No 28.85325 Attached 2.0 156.0
3.76
m
This study
Ocean -
benthic zone 1 DNA Illumina
5.22 Free 0.2 1.0 868
This study
Ocean -
benthic zone 1 RNA Illumina No Yes Yes 5.22 Free 0.2 1.0 868
This study
Ocean -
benthic zone 2 DNA Illumina
5.21 Free 0.2 1.0 885.2
This study
Ocean -
benthic zone 2 RNA Illumina No Yes Yes 5.21 Free 0.2 1.0 885.2
This study
Ocean -
benthic zone 3 DNA Illumina
5.14 Free 0.2 1.0 885
This study
Ocean -
benthic zone 3 RNA Illumina No Yes Yes 5.14 Free 0.2 1.0 885
This study
Ocean -
benthic zone 4 DNA Illumina
5.16 Free 0.2 1.0 885.9
This study
Ocean -
benthic zone 4 RNA Illumina No Yes Yes 5.16 Free 0.2 1.0 885.9
This study
Ocean -
benthic zone 5 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 5 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 8 DNA Illumina
5.16 Free 0.2 1.0 885.6
This study
Ocean -
benthic zone 8 RNA Illumina No Yes Yes 5.16 Free 0.2 1.0 885.6
This study
Ocean -
benthic zone 9 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 9 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 12 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 12 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 13 DNA Illumina
5.2 Free 0.2 1.0 885
This study
Ocean -
benthic zone 13 RNA Illumina No Yes Yes 5.2 Free 0.2 1.0 885
119
This study
Ocean -
benthic zone 14 DNA Illumina
5.19 Free 0.2 1.0 885
This study
Ocean -
benthic zone 14 RNA Illumina No Yes Yes 5.19 Free 0.2 1.0 885
This study
Ocean -
benthic zone 15 DNA Illumina
5.16 Free 0.2 1.0 885
This study
Ocean -
benthic zone 15 RNA Illumina No Yes Yes 5.16 Free 0.2 1.0 885
This study
Ocean -
benthic zone 16 DNA Illumina
5.22 Free 0.2 1.0 886.2
This study
Ocean -
benthic zone 16 RNA Illumina No Yes Yes 5.22 Free 0.2 1.0 886.2
This study
Ocean -
benthic zone 34 DNA Illumina
5.22 Free 0.2 1.0 885
This study
Ocean -
benthic zone 34 RNA Illumina No Yes Yes 5.22 Free 0.2 1.0 885
This study
Ocean -
benthic zone 35 DNA Illumina
5.16 Free 0.2 1.0 885
This study
Ocean -
benthic zone 35 RNA Illumina No Yes Yes 5.16 Free 0.2 1.0 885
This study
Ocean -
benthic zone 37 DNA Illumina
5.22 Free 0.2 1.0 871.4
This study
Ocean -
benthic zone 37 RNA Illumina No Yes Yes 5.22 Free 0.2 1.0 871.4
This study
Ocean -
benthic zone 38 DNA Illumina
5.2 Free 0.2 1.0 885.7
This study
Ocean -
benthic zone 38 RNA Illumina No Yes Yes 5.2 Free 0.2 1.0 885.7
This study
Ocean -
benthic zone 39 DNA Illumina
5.16 Free 0.2 1.0 885
This study
Ocean -
benthic zone 39 RNA Illumina No Yes Yes 5.16 Free 0.2 1.0 885
This study
Ocean -
benthic zone 40 DNA Illumina
5.22 Free 0.2 1.0 885.3
This study
Ocean -
benthic zone 40 RNA Illumina No Yes Yes 5.22 Free 0.2 1.0 885.3
This study
Ocean -
benthic zone 41 DNA Illumina
5.22 Free 0.2 1.0 874.9
This study
Ocean -
benthic zone 42 DNA Illumina
5.18 Free 0.2 1.0 885
This study
Ocean -
benthic zone 42 RNA Illumina No Yes Yes 5.18 Free 0.2 1.0 885
This study
Ocean -
benthic zone 43 DNA Illumina
5.24 Free 0.2 1.0 884.9
This study
Ocean -
benthic zone 43 RNA Illumina No Yes Yes 5.24 Free 0.2 1.0 884.9
This study
Ocean -
benthic zone 44 DNA Illumina
5.25 Free 0.2 1.0 875.6
120
This study
Ocean -
benthic zone 44 RNA Illumina No Yes Yes 5.25 Free 0.2 1.0 875.6
This study
Ocean -
benthic zone 45 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 45 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 46 DNA Illumina
5.25 Free 0.2 1.0 885.5
This study
Ocean -
benthic zone 46 RNA Illumina No Yes Yes 5.25 Free 0.2 1.0 885.5
This study
Ocean -
benthic zone 47 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 47 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 48 DNA Illumina
5.27 Free 0.2 1.0 854.6
This study
Ocean -
benthic zone 48 RNA Illumina No Yes Yes 5.27 Free 0.2 1.0 854.6
This study
Ocean -
benthic zone 49 DNA Illumina
5.26 Free 0.2 1.0 885.5
This study
Ocean -
benthic zone 49 RNA Illumina No Yes Yes 5.26 Free 0.2 1.0 885.5
This study
Ocean -
benthic zone 50 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 50 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 51 DNA Illumina
5.26 Free 0.2 1.0 885
This study
Ocean -
benthic zone 51 RNA Illumina No Yes Yes 5.26 Free 0.2 1.0 885
This study
Ocean -
benthic zone 52 DNA Illumina
5.17 Free 0.2 1.0 885.3
This study
Ocean -
benthic zone 53 DNA Illumina
Free 0.2 1.0 885
This study
Ocean -
benthic zone 53 RNA Illumina No Yes Yes
Free 0.2 1.0 885
This study
Ocean -
benthic zone 54 DNA Illumina
5.16 Free 0.2 1.0 885.5
This study
Ocean -
benthic zone 54 RNA Illumina No Yes Yes 5.16 Free 0.2 1.0 885.5
Bremges et
al., 2015 ENA ERR843249 Biogas plant 205 DNA Illumina
40 Attached
Bremges et
al., 2015 ENA ERR843250 Biogas plant 205 DNA Illumina
40 Attached
Bremges et
al., 2015 ENA ERR843255 Biogas plant 205 RNA Illumina No Yes No 40 Attached
Bremges et
al., 2015 ENA ERR843251 Biogas plant 205 DNA Illumina
40 Attached
121
Bremges et
al., 2015 ENA ERR843252 Biogas plant 205 DNA Illumina
40 Attached
Bremges et
al., 2015 ENA ERR843253 Biogas plant 205 DNA Illumina
40 Attached
Bremges et
al., 2015 ENA ERR843254 Biogas plant 205 DNA Illumina
40 Attached
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1328123
Ocean -
epipelagic 135 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 26.812225 Free 0.2 1.6 25
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1328124
Ocean -
epipelagic 135 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 26.812225 Free 0.2 1.6 25
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR599006
Ocean -
epipelagic 137 DNA Illumina
19.854083 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR599022
Ocean -
epipelagic 137 DNA Illumina
19.854083 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1001627
Ocean -
epipelagic 137 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 19.854083 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR599010
Ocean -
epipelagic 139 DNA Illumina
23.349542 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR599126
Ocean -
epipelagic 139 DNA Illumina
23.349542 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1001626
Ocean -
epipelagic 139 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 23.349542 Free 0.2 3.0 5
Sunagawa
et al.,
2015; ENA ERR598976
Ocean -
epipelagic 141 DNA Illumina
17.260108 Free 0.2 3.0 5
122
Alberti et
al., 2017
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1328122
Ocean -
epipelagic 141 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 17.260108 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1328125
Ocean -
epipelagic 141 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 17.260108 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1328126
Ocean -
epipelagic 141 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 17.260108 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1328127
Ocean -
epipelagic 141 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 17.260108 Free 0.2 3.0 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1336911
Ocean -
epipelagic 145 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 99999 Free 0.2 1.6 5
Sunagawa
et al.,
2015;
Alberti et
al., 2017 ENA ERR1336912
Ocean -
epipelagic 145 RNA Illumina No Yes
Yes-
smarter-
soR1isSense 99999 Free 0.2 1.6 5
Alberti et
al., 2017 ENA ERR868454
Ocean -
epipelagic 136 DNA Illumina
26.812225 Attached 0.8 5.0 25
Alberti et
al., 2017 ENA ERR1712196
Ocean -
epipelagic 136 RNA Illumina Yes No No 26.812225 Attached 0.8 5.0 25
Alberti et
al., 2017 ENA ERR1711861
Ocean -
epipelagic 136 RNA Illumina Yes No No 26.812225 Attached 0.8 5.0 25
Alberti et
al., 2017 ENA ERR599237
Ocean -
epipelagic 138 DNA Illumina
19.854083
0.8
5
Alberti et
al., 2017 ENA ERR1007415
Ocean -
epipelagic 138 RNA Illumina Yes No No 19.854083
0.8
5
Alberti et
al., 2017 ENA ERR1711908
Ocean -
epipelagic 138 RNA Illumina Yes No No 19.854083
0.8
5
Alberti et
al., 2017 ENA ERR1712220
Ocean -
epipelagic 138 RNA Illumina Yes No No 19.854083
0.8
5
Alberti et
al., 2017 ENA ERR1740326
Ocean -
epipelagic 140 DNA Illumina
23.349542
0.8
5
Alberti et
al., 2017 ENA ERR599253
Ocean -
epipelagic 140 DNA Illumina
23.349542
0.8
5
123
Alberti et
al., 2017 ENA ERR1007416
Ocean -
epipelagic 140 RNA Illumina Yes No No 23.349542
0.8
5
Alberti et
al., 2017 ENA ERR1712046
Ocean -
epipelagic 140 RNA Illumina Yes No No 23.349542
0.8
5
Alberti et
al., 2017 ENA ERR1712031
Ocean -
epipelagic 140 RNA Illumina Yes No No 23.349542
0.8
5
Alberti et
al., 2017 ENA ERR1726589
Ocean -
epipelagic 142 DNA Illumina
17.260108
0.8
5
Alberti et
al., 2017 ENA ERR1719454
Ocean -
epipelagic 142 RNA Illumina Yes No No 17.260108
0.8
5
Alberti et
al., 2017 ENA ERR868459
Ocean -
epipelagic 143 DNA Illumina
17.260108 Attached 0.8 5.0 5
Alberti et
al., 2017 ENA ERR1719440
Ocean -
epipelagic 143 RNA Illumina Yes No No 17.260108 Attached 0.8 5.0 5
Alberti et
al., 2017 ENA ERR1700895
Ocean -
epipelagic 144 DNA Illumina
99999 Attached 5.0 20.0 5
Alberti et
al., 2017 ENA ERR1712063
Ocean -
epipelagic 144 RNA Illumina Yes No No 99999 Attached 5.0 20.0 5
Alberti et
al., 2017 ENA ERR1712137
Ocean -
epipelagic 144 RNA Illumina Yes No No 99999 Attached 5.0 20.0 5
Alberti et
al., 2017 ENA ERR1726597
Ocean -
epipelagic 146 DNA Illumina
99999
0.8
5
Alberti et
al., 2017 ENA ERR1719231
Ocean -
epipelagic 146 RNA Illumina Yes No No 99999
0.8
5
Alberti et
al., 2017 ENA ERR1719164
Ocean -
epipelagic 146 RNA Illumina Yes No No 99999
0.8
5
Thrash et
al., 2017
NCBI
SRA SRR4342129
Ocean -
epipelagic 69 DNA Illumina
27.27 Free 0.2 2.7 16.5
Thrash et
al., 2017
NCBI
SRA SRR4342137
Ocean -
epipelagic 69 RNA Illumina No Yes No 27.27 Free 0.2 2.7 16.5
Thrash et
al., 2017
NCBI
SRA SRR4342130
Ocean -
epipelagic 70 DNA Illumina
26.8 Free 0.2 2.7 14
Thrash et
al., 2017
NCBI
SRA SRR4342138
Ocean -
epipelagic 70 RNA Illumina No Yes No 26.8 Free 0.2 2.7 14
Thrash et
al., 2017
NCBI
SRA SRR4342133
Ocean -
epipelagic 71 DNA Illumina
26.32 Free 0.2 2.7 16.2
Thrash et
al., 2017
NCBI
SRA SRR4342139
Ocean -
epipelagic 71 RNA Illumina No Yes No 26.32 Free 0.2 2.7 16.2
Thrash et
al., 2017
NCBI
SRA SRR4342134
Ocean -
epipelagic 72 DNA Illumina
28.8 Free 0.2 2.7 6.7
Thrash et
al., 2017
NCBI
SRA SRR4342140
Ocean -
epipelagic 72 RNA Illumina No Yes No 28.8 Free 0.2 2.7 6.7
Thrash et
al., 2017
NCBI
SRA SRR4342135
Ocean -
epipelagic 73 DNA Illumina
26.03 Free 0.2 2.7 13.3
Thrash et
al., 2017
NCBI
SRA SRR4342131
Ocean -
epipelagic 73 RNA Illumina No Yes No 26.03 Free 0.2 2.7 13.3
Thrash et
al., 2017
NCBI
SRA SRR4342136
Ocean -
epipelagic 74 DNA Illumina
25.08 Free 0.2 2.7 28.3
124
Thrash et
al., 2017
NCBI
SRA SRR4342132
Ocean -
epipelagic 74 RNA Illumina No Yes No 25.08 Free 0.2 2.7 28.3
125
Supplementary Table 1. Metagenomes and metatranscriptomes used in this study with relevant sample information.
Most data sources are provided by NCBI SRA or ENA accession numbers. Sequence libraries are characterized by nucleic
acid source (DNA or RNA), sequencing technology, preparation methods, source environment, attached/free lifestyle, and
sample temperature. Pair bin denotes matching DNA and RNA samples. Pairing information was retrieved from the data
source databases and source publications. Public data were retrieved from: Alberti et al., 2017 (Alberti et al. 2017), Baker
et al., 2012 (Baker et al. 2012), Lesniewski et al., 2012 (Lesniewski et al. 2012), Bremges et al., 2015 (Bremges et al. 2015),
Dupont et al., 2015 (Dupont et al. 2015), Franzosa et al., 2014 (Franzosa et al. 2014), Gilbert et al., 2010 (Gilbert et al.
2010), Hultman et al., 2015 (Hultman et al. 2015), Li et al., 2015 (Li et al. 2014), Maus et al., 2016 (Maus et al. 2016),
Satinsky et al., 2014a (Satinsky et al. 2014b), Satinsky et al., 2014b (Satinsky et al. 2014a), Satinsky et al., 2015 (Satinsky
et al. 2015), Shi et al., 2011 (Shi et al. 2009), Sieradzki et al., 2017 (Sieradzki et al. 2017), Sunagawa et al., 2015 (Sunagawa
et al. 2015), Alberti et al., 2017 (Alberti et al. 2017), and Thrash et al., 2017 (Thrash et al. 2017).
126
Environment DNA_RNA ID DNA (%)
RNA
(%)
Specific
activity
(RNA:DN
A) Rare/abundant Taxonomy (LCA) RefSeq description KEGG gene description
Ocean -
epipelagic
GGAGAA_CAT2013April_TA
GCTT 0.01 0.22 21.9 Abundant
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae
WP_023648072.1
biphenyl 2,3-dioxygenase
[Candidatus Pelagibacter
ubique]
Ocean -
epipelagic
AAGAGG_CAT2013Jan_CAG
ATC 0.00 0.48
Rare root;cellular organisms;Bacteria;Proteobacteria
WP_035614803.1
pyridine nucleotide-
disulfide oxidoreductase
[Hyphomonas johnsonii]
3-
phenylpropionate/trans-
cinnamate dioxygenase
ferredoxin reductase
component [EC:1.18.1.3]
Ocean -
epipelagic
CAT_Jul_12_AGCATG_CAT2
012July_AGTCAA 0.01 1.72 297.7 Abundant root
WP_011131862.1
photosystem II protein
D1 [Prochlorococcus
marinus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic BC14_CAT2012Oct_TGACCA 0.00 0.80
Rare
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae;u
nclassified Pelagibacteraceae;alpha proteobacterium HIMB5
WP_041487608.1
hypothetical protein
[alpha proteobacterium
HIMB5]
Ocean -
epipelagic
GTCGTA_POLA2013April_G
GCTAC 0.01 0.77 89.2 Abundant root;cellular organisms;Eukaryota
YP_009093360.1
photosystem II protein
D1 (chloroplast)
[Cerataulina daemon]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic
POLA_Jan_13_AAGCCT_PO
LA2013Jan_ACTTGA 0.00 0.53 186.0 Abundant root;cellular organisms;Eukaryota
NP_043625.1
photosystem II protein
D1 [Odontella sinensis]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic
POLA_Jul_12_TCAGAG_PO
LA2012July_CGATGT 0.00 2.32 810.8 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae
WP_086029885.1
pyridoxal phosphate-
dependent
aminotransferase
[Tenacibaculum
holothuriorum]
aspartate
aminotransferase
[EC:2.6.1.1]
Ocean -
epipelagic
AAGCCT_POLA2012Oct_AC
AGTG 0.01 0.29 28.2 Abundant root;cellular organisms;Eukaryota
YP_002808606.1
photosystem II D1
protein (chloroplast)
[Micromonas commoda]
>YP_002808637.1
photosystem II D1
protein (chloroplast)
[Micromonas commoda]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic
BC15_SPOT2013April_CTTG
TA 0.00 0.23 84.1 Rare
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae
WP_023648072.1
biphenyl 2,3-dioxygenase
[Candidatus Pelagibacter
ubique]
Ocean -
epipelagic
BC13_SPOT2013Jan_GATCA
G 0.00 0.18 41.2 Abundant root;cellular organisms;Eukaryota
YP_002808606.1
photosystem II D1
protein (chloroplast)
[Micromonas commoda]
>YP_002808637.1
photosystem II D1
protein (chloroplast)
[Micromonas commoda]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic
SPOT_Jul_12_GAGTCA_SPO
T2012July_TTAGGC 0.01 0.16 20.8 Abundant root
WP_011131862.1
photosystem II protein
D1 [Prochlorococcus
marinus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic
BC16_SPOT2012Oct_GCCAA
T 0.00 0.51
Rare
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae;u
nclassified Pelagibacteraceae;alpha proteobacterium HIMB5
WP_041487608.1
hypothetical protein
[alpha proteobacterium
HIMB5]
Human gut SRR769532_SRR769428 0.00 1.41
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus;Cucumber green mottle mosaic virus
NP_044579.1 movement
protein [Cucumber green
mottle mosaic virus]
Human gut SRR769532_SRR769427 0.00 1.44
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus;Cucumber green mottle mosaic virus
NP_044579.1 movement
protein [Cucumber green
mottle mosaic virus]
Human gut SRR769520_SRR769436 0.00 2.01
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus;Cucumber green mottle mosaic virus
NP_044579.1 movement
protein [Cucumber green
mottle mosaic virus]
Human gut SRR769520_SRR769404 0.00 1.98
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus;Cucumber green mottle mosaic virus
NP_044579.1 movement
protein [Cucumber green
mottle mosaic virus]
Human gut SRR769514_SRR769403 0.00 1.98
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus;Cucumber green mottle mosaic virus
NP_044579.1 movement
protein [Cucumber green
mottle mosaic virus]
127
Human gut SRR769514_SRR769396 0.00 1.93
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus;Cucumber green mottle mosaic virus
NP_044579.1 movement
protein [Cucumber green
mottle mosaic virus]
Human gut SRR769538_SRR769420 0.00 0.84
Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriacea
e;Methanobrevibacter
WP_011954236.1
MULTISPECIES:
methyl-coenzyme M
reductase I operon
protein C
[Methanobrevibacter]
methyl-coenzyme M
reductase subunit C
Human gut SRR769538_SRR769438 0.00 0.81
Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriacea
e;Methanobrevibacter
WP_011954236.1
MULTISPECIES:
methyl-coenzyme M
reductase I operon
protein C
[Methanobrevibacter]
methyl-coenzyme M
reductase subunit C
Human gut SRR769509_SRR769433 0.00 1.12
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Firmicutes;Clostridia;Clostridiales;unclassified Clostridiales;unclassified Clostridiales
(miscellaneous);Clostridiales bacterium
WP_093957158.1
ribosomal subunit
interface protein
[Clostridiales bacterium]
putative sigma-54
modulation protein
Human gut SRR769509_SRR769405 0.00 1.19
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Firmicutes;Clostridia;Clostridiales;unclassified Clostridiales;unclassified Clostridiales
(miscellaneous);Clostridiales bacterium
WP_093957158.1
ribosomal subunit
interface protein
[Clostridiales bacterium]
putative sigma-54
modulation protein
Human gut SRR769515_SRR769426 0.00 0.38 129.0 Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriacea
e;Methanobrevibacter;Methanobrevibacter smithii
WP_019267456.1
adhesin
[Methanobrevibacter
smithii]
Human gut SRR769515_SRR769398 0.00 0.41 136.2 Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriacea
e;Methanobrevibacter;Methanobrevibacter smithii
WP_019267456.1
adhesin
[Methanobrevibacter
smithii]
Human gut SRR769516_SRR769432 0.00 0.43 131.1 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;Prevotella copri
WP_089544802.1
energy transducer TonB
[Prevotella copri]
periplasmic protein
TonB
Human gut SRR769516_SRR769397 0.00 0.44 132.8 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;Prevotella copri
WP_089544802.1
energy transducer TonB
[Prevotella copri]
periplasmic protein
TonB
Human gut SRR769527_SRR769407 0.00 0.35 563.3 Rare root;cellular organisms;Bacteria
WP_087275201.1
MULTISPECIES: ATP-
dependent chaperone
ClpB [Muribaculum]
ATP-dependent Clp
protease ATP-binding
subunit ClpB
Human gut SRR769527_SRR769395 0.00 0.38
Rare root;cellular organisms;Bacteria
WP_087276736.1
MULTISPECIES: NADP-
specific glutamate
dehydrogenase
[Muribaculum]
glutamate
dehydrogenase (NADP+)
[EC:1.4.1.4]
Human gut SRR769513_SRR769417 0.00 0.94 727.1 Rare root;cellular organisms;Bacteria
WP_087275201.1
MULTISPECIES: ATP-
dependent chaperone
ClpB [Muribaculum]
ATP-dependent Clp
protease ATP-binding
subunit ClpB
Human gut SRR769513_SRR769440 0.00 0.96 742.6 Rare root;cellular organisms;Bacteria
WP_087275201.1
MULTISPECIES: ATP-
dependent chaperone
ClpB [Muribaculum]
ATP-dependent Clp
protease ATP-binding
subunit ClpB
Human gut SRR769530_SRR769411 0.00 0.63
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Firmicutes;Clostridia;Clostridiales;Ruminococcaceae;Ruminiclostridium;[Clostridium]
viride
WP_035139549.1
acetylornithine
transaminase
[[Clostridium] viride]
acetylornithine/N-
succinyldiaminopimelate
aminotransferase
[EC:2.6.1.11 2.6.1.17]
Human gut SRR769530_SRR769423 0.00 0.66
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Firmicutes;Clostridia;Clostridiales;Ruminococcaceae;Ruminiclostridium;[Clostridium]
viride
WP_035139549.1
acetylornithine
transaminase
[[Clostridium] viride]
acetylornithine/N-
succinyldiaminopimelate
aminotransferase
[EC:2.6.1.11 2.6.1.17]
Human gut SRR769528_SRR769413 0.00 0.64
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_005861100.1
MULTISPECIES:
hypothetical protein
[Parabacteroides]
Human gut SRR769528_SRR769441 0.00 0.65
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_005861100.1
MULTISPECIES:
hypothetical protein
[Parabacteroides]
Human gut SRR769529_SRR769422 0.00 0.40
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_005861100.1
MULTISPECIES:
hypothetical protein
[Parabacteroides]
Human gut SRR769529_SRR769400 0.00 0.37
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_005861100.1
MULTISPECIES:
hypothetical protein
[Parabacteroides]
128
Human gut SRR769525_SRR769409 0.00 0.77 226.6 Abundant root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_027433499.1
flagellin
[Lachnospiraceae
bacterium MD2004] flagellin
Human gut SRR769525_SRR769406 0.00 0.82 239.5 Abundant root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_027433499.1
flagellin
[Lachnospiraceae
bacterium MD2004] flagellin
Human gut SRR769510_SRR769416 0.00 0.39 196.8 Rare root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_008401245.1
carbohydrate ABC
transporter substrate-
binding protein
[Clostridium sp. L2-50]
multiple sugar transport
system substrate-binding
protein
Human gut SRR769510_SRR769419 0.00 0.40 201.5 Rare root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_008401245.1
carbohydrate ABC
transporter substrate-
binding protein
[Clostridium sp. L2-50]
multiple sugar transport
system substrate-binding
protein
Human gut SRR769518_SRR769410 0.00 0.49 281.9 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides
WP_005842327.1
MULTISPECIES:
molecular chaperone
GroEL [Bacteroides] chaperonin GroEL
Human gut SRR769518_SRR769424 0.00 0.52 299.5 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides
WP_005842327.1
MULTISPECIES:
molecular chaperone
GroEL [Bacteroides] chaperonin GroEL
Human gut SRR769534_SRR769421 0.00 3.41
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078446.1 unnamed
protein product [Tomato
mosaic virus]
replicase polyprotein 1ab
[EC:2.1.1.- 2.7.7.48
3.4.22.- 3.6.4.13]
Human gut SRR769534_SRR769399 0.00 3.48
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078446.1 unnamed
protein product [Tomato
mosaic virus]
replicase polyprotein 1ab
[EC:2.1.1.- 2.7.7.48
3.4.22.- 3.6.4.13]
Human gut SRR769526_SRR769435 0.00 3.89
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078448.1 unnamed
protein product [Tomato
mosaic virus]
Human gut SRR769526_SRR769429 0.00 4.06
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078448.1 unnamed
protein product [Tomato
mosaic virus]
Human gut SRR769540_SRR769412 0.00 10.19
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078448.1 unnamed
protein product [Tomato
mosaic virus]
Human gut SRR769540_SRR769442 0.00 10.13
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078448.1 unnamed
protein product [Tomato
mosaic virus]
Human gut SRR769524_SRR769414 0.00 1.17
Rare root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_048925332.1
hypothetical protein
[Clostridium sp.
1_1_41A1FAA]
Human gut SRR769524_SRR769437 0.00 1.27
Rare root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_048925332.1
hypothetical protein
[Clostridium sp.
1_1_41A1FAA]
Human gut SRR769531_SRR769418 0.00 0.70
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_036615314.1
hypothetical protein
[Parabacteroides
distasonis]
Human gut SRR769531_SRR769439 0.00 0.66
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_036615314.1
hypothetical protein
[Parabacteroides
distasonis]
Human gut SRR769537_SRR769415 0.00 0.31
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Firmicutes;Clostridia;Clostridiales;Lachnospiraceae;Blautia
WP_021650398.1 rRNA
pseudouridine synthase
[Blautia sp. KLE 1732]
16S rRNA
pseudouridine516
synthase [EC:5.4.99.19]
Human gut SRR769537_SRR769402 0.00 0.31
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales
WP_036615314.1
hypothetical protein
[Parabacteroides
distasonis]
Human gut SRR769519_SRR769408 0.00 0.80
Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriacea
e;Methanobrevibacter;Methanobrevibacter smithii
WP_004036631.1
MULTISPECIES: 5,10-
methylenetetrahydromet
hanopterin reductase
[Methanobrevibacter]
5,10-
methylenetetrahydromet
hanopterin reductase
[EC:1.5.98.2]
Human gut SRR769519_SRR769430 0.00 0.81
Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriacea
e;Methanobrevibacter;Methanobrevibacter smithii
WP_004036631.1
MULTISPECIES: 5,10-
methylenetetrahydromet
hanopterin reductase
[Methanobrevibacter]
5,10-
methylenetetrahydromet
hanopterin reductase
[EC:1.5.98.2]
129
Human gut SRR769533_SRR769431 0.00 0.45
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides
WP_057097810.1 pirin
family protein
[Bacteroides uniformis] uncharacterized protein
Human gut SRR769533_SRR769401 0.00 0.46
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides
WP_057097810.1 pirin
family protein
[Bacteroides uniformis] uncharacterized protein
Human gut SRR769523_SRR769425 0.00 0.33
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078448.1 unnamed
protein product [Tomato
mosaic virus]
Human gut SRR769523_SRR769434 0.00 0.31
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Virgaviridae;Tobamovirus
NP_078448.1 unnamed
protein product [Tomato
mosaic virus]
Permafrost SRR5009555_2242.4.1834 0.00 1.17
Rare root;cellular organisms;Bacteria;Acidobacteria
WP_020720103.1
hypothetical protein
[Acidobacteriaceae
bacterium KBS 96]
Permafrost SRR4027934_SRR441745 0.00 1.12
Rare root;cellular organisms
XP_003088200.1
hypothetical protein
CRE_03576
[Caenorhabditis
remanei]
Thermokarst
bog 1311.2.1306_2242.7.1834 0.00 1.49
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Actinobacteria;Acidimicrobiia;Acidimicrobiales;Acidimicrobiaceae;Acidithrix;Acidithri
x ferrooxidans
WP_082058357.1
hypothetical protein
[Acidithrix ferrooxidans]
D-xylose transport
system substrate-binding
protein
Thermokarst
bog 1311.4.1306_2242.5.1834 0.00 3.01 931.0 Rare root;cellular organisms;Archaea;Euryarchaeota
WP_010879741.1
argininosuccinate
synthase [Archaeoglobus
fulgidus]
argininosuccinate
synthase [EC:6.3.4.5]
Permafrost 1141.4.1206_2209.1.1816 0.00 1.07 392.5 Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Permafrost 1141.7.1206_2144.7.1786 0.01 5.50 709.8 Rare
root;cellular organisms;Bacteria;PVC
group;Planctomycetes;Planctomycetia;Planctomycetales
WP_012910427.1
hypothetical protein
[Pirellula staleyi]
Biogas plant ERR1299062_ERR1299182 0.01 1.68 213.5 Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanomicrobia;Methanomicrobiales;Methanomicrobia
ceae;Methanoculleus
WP_066957221.1
methyl-coenzyme M
reductase operon protein
D [Methanoculleus
thermophilus]
methyl-coenzyme M
reductase subunit D
Biogas plant ERR1299063_ERR1299182 0.01 1.68 176.5 Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanomicrobia;Methanomicrobiales;Methanomicrobia
ceae;Methanoculleus
WP_066957221.1
methyl-coenzyme M
reductase operon protein
D [Methanoculleus
thermophilus]
methyl-coenzyme M
reductase subunit D
Biogas plant ERR1299064_ERR1299182 0.01 1.68 219.0 Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanomicrobia;Methanomicrobiales;Methanomicrobia
ceae;Methanoculleus
WP_066957221.1
methyl-coenzyme M
reductase operon protein
D [Methanoculleus
thermophilus]
methyl-coenzyme M
reductase subunit D
Biogas plant ERR1299065_ERR1299182 0.01 1.68 146.1 Abundant
root;cellular
organisms;Archaea;Euryarchaeota;Methanomicrobia;Methanomicrobiales;Methanomicrobia
ceae;Methanoculleus
WP_066957221.1
methyl-coenzyme M
reductase operon protein
D [Methanoculleus
thermophilus]
methyl-coenzyme M
reductase subunit D
Biogas plant ERR1299066_ERR1299182 0.00 1.68 437.8 Rare
root;cellular
organisms;Archaea;Euryarchaeota;Methanomicrobia;Methanomicrobiales;Methanomicrobia
ceae;Methanoculleus
WP_066957221.1
methyl-coenzyme M
reductase operon protein
D [Methanoculleus
thermophilus]
methyl-coenzyme M
reductase subunit D
Freshwater
plume SRR1182511_SRR1185390 0.00 0.25 54.8 Rare root;cellular organisms;Eukaryota;Stramenopiles;Bacillariophyta
XP_002295258.1
fucoxanthin chlorophyll
a/c protein, LI818 clade
[Thalassiosira
pseudonana CCMP1335]
light-harvesting complex
I chlorophyll a/b binding
protein 1
Freshwater
plume SRR1185414_SRR1193226 0.00 1.75
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Formosa;Formosa sp.
Hel3_A1_48
WP_069674824.1
porphobilinogen
synthase [Formosa sp.
Hel3_A1_48]
porphobilinogen
synthase [EC:4.2.1.24]
Freshwater
plume SRR1185414_SRR1199280 0.00 0.45
Rare root;cellular organisms;Eukaryota;Stramenopiles
XP_002294210.1 heat
shock protein, partial
[Thalassiosira
pseudonana CCMP1335] HSP20 family protein
Freshwater
plume SRR1204581_SRR1205247 0.00 3.09
Rare root;cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales
WP_069299837.1
aromatic ring-
hydroxylating
dioxygenase subunit
choline monooxygenase
[EC:1.14.15.7]
130
alpha [Rhodobacteraceae
bacterium CY02]
Freshwater
plume SRR1205251_SRR1204509 0.00 0.49 399.2 Rare
root;cellular
organisms;Eukaryota;Stramenopiles;Bacillariophyta;Coscinodiscophyceae;Thalassiosirophyci
dae;Thalassiosirales;Thalassiosiraceae;Thalassiosira;Thalassiosira pseudonana;Thalassiosira
pseudonana CCMP1335
XP_002290908.1
predicted protein
[Thalassiosira
pseudonana CCMP1335]
cullin-associated
NEDD8-dissociated
protein 1
Freshwater
plume SRR1199270_SRR1193205 0.00 7.83
Rare
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae
WP_075534999.1
chaperonin GroEL
[Candidatus Pelagibacter
ubique] chaperonin GroEL
Freshwater
plume SRR1202081_SRR1199282 2.67 3.12 1.2 Abundant
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae
WP_075534999.1
chaperonin GroEL
[Candidatus Pelagibacter
ubique] chaperonin GroEL
Freshwater
plume SRR1199271_SRR1200525 0.01 2.19 293.1 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae
WP_067031510.1
transcription-repair
coupling factor
[Muricauda sp. CP2A]
transcription-repair
coupling factor
(superfamily II helicase)
[EC:3.6.4.-]
Freshwater
plume SRR1209978_SRR1200526 0.01 6.39 666.3 Rare root;cellular organisms;Eukaryota
YP_874574.1
photosystem II reaction
center protein D1
[Thalassiosira
pseudonana]
>YP_874620.1
photosystem II reaction
center protein D1
[Thalassiosira
pseudonana]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater
plume SRR1209978_SRR1185391 0.00 0.66
Rare
root;cellular
organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Lophotrochoz
oa;Mollusca
YP_009353851.1
cytochrome c oxidase
subunit III
(mitochondrion)
[Lyonsia norwegica]
cytochrome c oxidase
subunit 3
Freshwater
plume SRR1202089_SRR1199279 0.00 1.03
Rare root;cellular organisms;Eukaryota
XP_002788268.1
translation elongation
factor EF-1, subunit
alpha,, putative
[Perkinsus marinus
ATCC 50983] elongation factor 1-alpha
Freshwater
plume SRR1205253_SRR1193215 0.00 1.44
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Flavobacteriia;Flavobacteriales;Flavobacteriaceae;Formosa;Formosa sp.
Hel3_A1_48
WP_069674824.1
porphobilinogen
synthase [Formosa sp.
Hel3_A1_48]
porphobilinogen
synthase [EC:4.2.1.24]
Freshwater
plume SRR1199272_SRR1193634 0.00 0.76
Rare root;cellular organisms;Eukaryota;Stramenopiles
XP_002294210.1 heat
shock protein, partial
[Thalassiosira
pseudonana CCMP1335] HSP20 family protein
Freshwater
plume SRR1183643_SRR1185399 0.00 0.56
Rare root;cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria
YP_003345138.1
cytochrome c oxidase
subunit I
(mitochondrion)
(mitochondrion)
[Monognathus
jesperseni]
cytochrome c oxidase
subunit 1 [EC:1.9.3.1]
Freshwater
plume SRR1202090_SRR1204579 0.00 1.11
Rare root;cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales
WP_069299837.1
aromatic ring-
hydroxylating
dioxygenase subunit
alpha [Rhodobacteraceae
bacterium CY02]
choline monooxygenase
[EC:1.14.15.7]
Freshwater
plume SRR1209976_SRR1204565 0.00 2.95 1302.7 Rare root;cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales
WP_069299837.1
aromatic ring-
hydroxylating
dioxygenase subunit
alpha [Rhodobacteraceae
bacterium CY02]
choline monooxygenase
[EC:1.14.15.7]
Freshwater
plume SRR1202091_SRR1205249 0.00 2.25
Rare
root;cellular organisms;Bacteria;PVC
group;Verrucomicrobia;Opitutae;Puniceicoccales;Puniceicoccaceae;Coraliomargarita;Coralio
margarita akajimensis
WP_013043993.1
phosphate ABC
transporter permease
subunit PstC
[Coraliomargarita
akajimensis]
phosphate transport
system permease protein
Freshwater
plume SRR1202095_SRR1193629 0.03 5.79 221.3 Abundant
root;cellular organisms;Bacteria;PVC
group;Verrucomicrobia;Opitutae;Puniceicoccales;Puniceicoccaceae;Coraliomargarita;Coralio
margarita akajimensis
WP_013043993.1
phosphate ABC
transporter permease
subunit PstC
[Coraliomargarita
akajimensis]
phosphate transport
system permease protein
131
Freshwater
plume SRR1202095_SRR1204512 0.00 0.27
Rare root;cellular organisms
XP_014287708.1
PREDICTED: 14-3-3
protein zeta isoform X1
[Halyomorpha halys]
>XP_014287711.1
PREDICTED: 14-3-3
protein zeta isoform X1
[Halyomorpha halys]
>XP_014287712.1
PREDICTED: 14-3-3
protein zeta isoform X1
[Halyomorpha halys]
14-3-3 protein
beta/theta/zeta
Freshwater SRR1786279_SRR1785209 0.00 1.63 411.0 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1786279_SRR1785351 0.00 1.44 364.1 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1786281_SRR1785350 0.00 1.75
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1786281_SRR1785352 0.00 1.27
Rare root;cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria
WP_067069115.1
tripartite tricarboxylate
transporter permease
[Mitsuaria
chitosanitabida]
putative tricarboxylic
transport membrane
protein
Freshwater SRR1786608_SRR1785209 0.00 1.63 841.4 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1786608_SRR1785351 0.00 1.44 745.3 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1786616_SRR1785350 0.01 1.75 225.0 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1786616_SRR1785352 0.00 1.27
Rare root;cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria
WP_067069115.1
tripartite tricarboxylate
transporter permease
[Mitsuaria
chitosanitabida]
putative tricarboxylic
transport membrane
protein
Freshwater SRR1787940_SRR1784299 0.00 1.95 974.3 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1787940_SRR1784305 0.00 0.53 265.4 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1787943_SRR1784304 0.00 1.91
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1787943_SRR1785207 0.00 2.02
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1788318_SRR1784299 0.00 1.95
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1788318_SRR1784305 0.00 0.53
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
methane/ammonia
monooxygenase subunit
132
[Candidatus
Nitrosomarinus catalina]
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1790487_SRR1784304 0.00 1.91 961.9 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1790487_SRR1785207 0.00 2.02 1017.1 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1790489_SRR1781844 0.01 2.50 319.5 Abundant root;cellular organisms
YP_009028932.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
>YP_009028986.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1790489_SRR1781804 0.00 1.88
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1790644_SRR1781811 0.05 28.98 618.2 Abundant root;cellular organisms
YP_009028932.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
>YP_009028986.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1790644_SRR1781915 0.05 22.39 477.6 Abundant root;cellular organisms
YP_009028932.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
>YP_009028986.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1790646_SRR1781844 0.01 2.50 422.7 Abundant root;cellular organisms
YP_009028932.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
>YP_009028986.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1790646_SRR1781804 0.01 1.88 239.7 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1790647_SRR1781811 0.04 28.98 777.1 Abundant root;cellular organisms
YP_009028932.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
>YP_009028986.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1790647_SRR1781915 0.04 22.39 600.3 Abundant root;cellular organisms
YP_009028932.1
photosystem II reaction
center protein D1
(chloroplast)
[Coscinodiscus radiatus]
>YP_009028986.1
photosystem II reaction
center protein D1
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
133
(chloroplast)
[Coscinodiscus radiatus]
Freshwater SRR1790676_SRR1781945 0.00 1.07
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota;unclassified Thaumarchaeota
WP_048189101.1
multicopper oxidase
[Candidatus
Nitrosotenuis cloacae]
nitrite reductase (NO-
forming) [EC:1.7.2.1]
Freshwater SRR1790676_SRR1782602 0.00 0.38
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia
WP_082189506.1
glycosyltransferase
family 1 protein
[Lentimicrobium
saccharophilum]
alpha-1,3-
rhamnosyl/mannosyltra
nsferase [EC:2.4.1.-]
Freshwater SRR1790678_SRR1782579 0.00 1.40
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Narnaviridae;Mitovirus
YP_009182163.1 RNA
dependent RNA
polymerase [Botrytis
cinerea mitovirus 4]
Freshwater SRR1790678_SRR1782604 0.00 0.79 390.7 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1790679_SRR1781945 0.00 1.07 219.3 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota;unclassified Thaumarchaeota
WP_048189101.1
multicopper oxidase
[Candidatus
Nitrosotenuis cloacae]
nitrite reductase (NO-
forming) [EC:1.7.2.1]
Freshwater SRR1790679_SRR1782602 0.00 0.38
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Bacteroidia
WP_082189506.1
glycosyltransferase
family 1 protein
[Lentimicrobium
saccharophilum]
alpha-1,3-
rhamnosyl/mannosyltra
nsferase [EC:2.4.1.-]
Freshwater SRR1790680_SRR1782579 0.00 1.40
Rare
root;Viruses;ssRNA viruses;ssRNA positive-strand viruses, no DNA
stage;Narnaviridae;Mitovirus
YP_009182163.1 RNA
dependent RNA
polymerase [Botrytis
cinerea mitovirus 4]
Freshwater SRR1790680_SRR1782604 0.00 0.79
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907103.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
Freshwater SRR1792674_SRR1779221 0.01 0.66 111.4 Abundant root;cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;Chlorophyceae
YP_635978.1 putative
site-specific DNA
endonuclease
(chloroplast)
[Tetradesmus obliquus]
Freshwater SRR1792674_SRR1781714 0.00 0.13
Rare root;cellular organisms;Bacteria;PVC group;Verrucomicrobia
WP_082083102.1
alpha/beta hydrolase
[Verrucomicrobia
bacterium IMCC26134]
endo-1,4-beta-xylanase
[EC:3.2.1.8]
Freshwater SRR1792852_SRR1781711 0.00 1.06
Rare root;cellular organisms;Eukaryota
YP_009059259.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
>YP_009059315.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1792852_SRR1781802 0.00 2.16
Rare root;cellular organisms;Bacteria;Terrabacteria group;Actinobacteria;Actinobacteria
WP_086830248.1 type
II toxin-antitoxin system
PemK/MazF family toxin
[Streptomyces sp. NRRL
B-24572]
Freshwater SRR1793861_SRR1779221 0.00 0.66
Rare root;cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;Chlorophyceae
YP_635978.1 putative
site-specific DNA
endonuclease
(chloroplast)
[Tetradesmus obliquus]
Freshwater SRR1793861_SRR1781714 0.00 0.13
Rare root;cellular organisms;Bacteria;PVC group;Verrucomicrobia
WP_082083102.1
alpha/beta hydrolase
[Verrucomicrobia
bacterium IMCC26134]
endo-1,4-beta-xylanase
[EC:3.2.1.8]
Freshwater SRR1793862_SRR1781711 0.00 1.06
Rare root;cellular organisms;Eukaryota
YP_009059259.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
>YP_009059315.1
photosystem II reaction
center protein D1
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
134
(chloroplast) [Eunotia
naegelii]
Freshwater SRR1793862_SRR1781802 0.00 2.16
Rare root;cellular organisms;Bacteria;Terrabacteria group;Actinobacteria;Actinobacteria
WP_086830248.1 type
II toxin-antitoxin system
PemK/MazF family toxin
[Streptomyces sp. NRRL
B-24572]
Freshwater SRR1796116_SRR1777513 0.00 0.29
Rare root;cellular organisms;Eukaryota
YP_009106012.1 D1
reaction center protein of
photosystem II
(chloroplast) [Choricystis
parasitica]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1796116_SRR1779189 0.00 0.79
Rare
root;cellular organisms;Bacteria;PVC
group;Planctomycetes;Planctomycetia;Planctomycetales;Planctomycetaceae;Schlesneria;Schl
esneria paludicola
WP_010582935.1
cytochrome-c peroxidase
[Schlesneria paludicola]
cytochrome c peroxidase
[EC:1.11.1.5]
Freshwater SRR1796118_SRR1778024 0.00 0.60 409.3 Rare root;cellular organisms;Eukaryota
YP_009059259.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
>YP_009059315.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1796118_SRR1779203 0.00 1.30 687.4 Rare root;cellular organisms;Bacteria;Terrabacteria group;Actinobacteria;Actinobacteria
WP_086830248.1 type
II toxin-antitoxin system
PemK/MazF family toxin
[Streptomyces sp. NRRL
B-24572]
Freshwater SRR1796234_SRR1777513 0.00 0.29
Rare root;cellular organisms;Eukaryota
YP_009106012.1 D1
reaction center protein of
photosystem II
(chloroplast) [Choricystis
parasitica]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1796234_SRR1779189 0.00 0.79
Rare
root;cellular organisms;Bacteria;PVC
group;Planctomycetes;Planctomycetia;Planctomycetales;Planctomycetaceae;Schlesneria;Schl
esneria paludicola
WP_010582935.1
cytochrome-c peroxidase
[Schlesneria paludicola]
cytochrome c peroxidase
[EC:1.11.1.5]
Freshwater SRR1796236_SRR1778024 0.00 0.60
Rare root;cellular organisms;Eukaryota
YP_009059259.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
>YP_009059315.1
photosystem II reaction
center protein D1
(chloroplast) [Eunotia
naegelii]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Freshwater SRR1796236_SRR1779203 0.00 1.30
Rare root;cellular organisms;Bacteria;Terrabacteria group;Actinobacteria;Actinobacteria
WP_086830248.1 type
II toxin-antitoxin system
PemK/MazF family toxin
[Streptomyces sp. NRRL
B-24572]
Ocean - benthic
zone 1.15.14.mg_1.15.14.mt 0.01 0.20 21.9 Abundant
root;cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;unclassified
Gammaproteobacteria
WP_066044781.1
rhodanese-like domain-
containing protein
[Bathymodiolus
septemdierum
thioautotrophic gill
symbiont]
Ocean - benthic
zone 1.15.15.mg_1.15.15.mt 0.00 0.19 61.4 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 1.16.13.mg_1.16.13.mt 0.00 0.43 156.9 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 10.1.14.mg_10.1.14.mt 0.00 0.30 217.0 Rare root;cellular organisms;Bacteria;Proteobacteria
WP_029132648.1 C4-
dicarboxylate ABC
transporter
[Sedimenticola
selenatireducens]
Ocean - benthic
zone 10.15.13.mg_10.15.13.mt 0.00 0.17
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_048079293.1
MULTISPECIES:
ammonia
monooxygenase [Marine
Group I]
methane/ammonia
monooxygenase subunit
A [EC:1.14.18.3
1.14.99.39]
135
Ocean - benthic
zone 11.12.14.mg_11.12.14.mt 0.00 0.29
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907101.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
C
Ocean - benthic
zone 11.13.13.mg_11.13.13.mt 0.00 0.29
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 12.19.13.mg_12.19.13.mt 0.00 0.27 305.7 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 12.23.13.mg_12.23.13.mt 0.00 0.32
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 12.8.14.mg_12.8.14.mt 0.00 0.15 61.2 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907101.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
C
Ocean - benthic
zone 2.13.13.mg_2.13.13.mt 0.00 0.40 215.3 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 2.18.15.mg_2.18.15.mt 0.00 0.20 50.0 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 3.12.15.mg_3.12.15.mt 0.00 0.23 56.3 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 3.13.13.mg_3.13.13.mt 0.00 0.30 308.2 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 4.10.14.mg_4.10.14.mt 0.00 0.18
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 4.22.15.mg_4.22.15.mt 0.00 0.22 86.3 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_086907101.1
ammonia
monooxygenase
[Candidatus
Nitrosomarinus catalina]
methane/ammonia
monooxygenase subunit
C
Ocean - benthic
zone 4.24.13.mg_4.24.13.mt 0.00 0.35 362.3 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 5.20.15.mg_5.20.15.mt 0.01 0.21 33.7 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 5.22.13.mg_5.22.13.mt 0.00 0.34 365.8 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 6.17.15.mg_6.17.15.mt 0.01 0.31 60.3 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 6.18.14.mg_6.18.14.mt 0.01 0.27 27.1 Abundant root;cellular organisms;Bacteria;Proteobacteria
WP_029132648.1 C4-
dicarboxylate ABC
transporter
[Sedimenticola
selenatireducens]
Ocean - benthic
zone 6.19.13.mg_6.19.13.mt 0.00 0.29
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 7.14.15.mg_7.14.15.mt 0.00 0.32 159.1 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 7.18.13.mg_7.18.13.mt 0.00 0.31 344.0 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter,
Amt family
136
ammonium transporter
[Nitrosopumilus]
Ocean - benthic
zone 7.23.14.mg_7.23.14.mt 0.01 0.28 43.6 Abundant root;cellular organisms;Bacteria;Proteobacteria
WP_029132648.1 C4-
dicarboxylate ABC
transporter
[Sedimenticola
selenatireducens]
Ocean - benthic
zone 8.13.14.mg_8.13.14.mt 0.00 0.57
Rare
root;cellular organisms;Bacteria;Terrabacteria
group;Chloroflexi;Thermomicrobia;Thermomicrobiales;Thermomicrobiaceae;Thermomicrob
ium;Thermomicrobium roseum
WP_015922893.1 ABC
transporter permease
[Thermomicrobium
roseum]
peptide/nickel transport
system permease protein
Ocean - benthic
zone 8.14.13.mg_8.14.13.mt 0.00 0.46 506.6 Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 8.5.15.mg_8.5.15.mt 0.00 0.20
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Ocean - benthic
zone 9.18.13.mg_9.18.13.mt 0.01 0.24 16.3 Abundant
root;cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;unclassified
Gammaproteobacteria
WP_066044781.1
rhodanese-like domain-
containing protein
[Bathymodiolus
septemdierum
thioautotrophic gill
symbiont]
Ocean - benthic
zone 9.9.15.mg_9.9.15.mt 0.00 0.30 96.6 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_014964032.1
MULTISPECIES:
ammonium transporter
[Nitrosopumilus]
ammonium transporter,
Amt family
Biogas plant ERR843249_ERR843255 0.00 0.59
Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Biogas plant ERR843250_ERR843255 0.00 0.59
Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Biogas plant ERR843251_ERR843255 0.00 0.59
Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Biogas plant ERR843252_ERR843255 0.00 0.59
Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Biogas plant ERR843253_ERR843255 0.00 0.59
Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Biogas plant ERR843254_ERR843255 0.00 0.59
Rare
root;cellular organisms;Bacteria;Proteobacteria;delta/epsilon
subdivisions;Deltaproteobacteria
WP_071897565.1 tRNA
pseudouridine(38-40)
synthase TruA
[Cystobacter
ferrugineus]
tRNA pseudouridine38-
40 synthase
[EC:5.4.99.12]
Ocean -
epipelagic ERR599010_ERR1001626 0.00 2.55 523.3 Abundant root
WP_002805533.1
MULTISPECIES:
photosystem II protein
D1 [Prochlorococcus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR599126_ERR1001626 0.01 2.55 260.6 Abundant root
WP_002805533.1
MULTISPECIES:
photosystem II protein
D1 [Prochlorococcus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR599006_ERR1001627 0.01 1.85 154.1 Abundant root
WP_002805533.1
MULTISPECIES:
photosystem II protein
D1 [Prochlorococcus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR599022_ERR1001627 0.01 1.85 272.1 Abundant root
WP_002805533.1
MULTISPECIES:
photosystem II protein
D1 [Prochlorococcus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR598976_ERR1328122 0.04 1.87 47.1 Abundant root
WP_011131862.1
photosystem II protein
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
137
D1 [Prochlorococcus
marinus]
Ocean -
epipelagic ERR598976_ERR1328125 0.04 2.44 61.4 Abundant root
WP_011131862.1
photosystem II protein
D1 [Prochlorococcus
marinus]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic SRR4342129_SRR4342137 0.01 0.97 188.0 Abundant
root;cellular organisms;Archaea;TACK
group;Thaumarchaeota;Nitrosopumilales;Nitrosopumilaceae
WP_067958638.1
copper oxidase
[Nitrosopumilus sp.
Nsub]
nitrite reductase (NO-
forming) [EC:1.7.2.1]
Ocean -
epipelagic SRR4342130_SRR4342138 0.00 1.03 239.1 Rare
root;cellular organisms;Archaea;TACK
group;Thaumarchaeota;Nitrosopumilales;Nitrosopumilaceae
WP_048097520.1
hypothetical protein
[Candidatus
Nitrosopumilus salaria]
Ocean -
epipelagic SRR4342133_SRR4342139 0.00 0.71 147.8 Rare
root;cellular organisms;Archaea;TACK
group;Thaumarchaeota;Nitrosopumilales;Nitrosopumilaceae
WP_048097520.1
hypothetical protein
[Candidatus
Nitrosopumilus salaria]
Ocean -
epipelagic SRR4342134_SRR4342140 0.00 0.48
Rare root;cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria
WP_008938183.1
succinate dehydrogenase
flavoprotein subunit
[Marinobacter
santoriniensis]
succinate dehydrogenase
/ fumarate reductase,
flavoprotein subunit
[EC:1.3.5.1 1.3.5.4]
Ocean -
epipelagic SRR4342135_SRR4342131 0.00 0.13
Rare root;cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales
WP_002584949.1
MULTISPECIES: MFS
transporter [Bacteria]
MFS transporter, DHA3
family, macrolide efflux
protein
Ocean -
epipelagic SRR4342136_SRR4342132 0.01 1.16 163.6 Abundant root;cellular organisms;Archaea;TACK group;Thaumarchaeota
WP_048071300.1
ammonium transporter
[Marine Group I
thaumarchaeote SCGC
AAA799-P11]
ammonium transporter,
Amt family
Ocean -
hydrothermal
vent SRR2046235_SRR2044843 0.00 0.13
Rare
root;cellular
organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Oceanospirillales;Oceanospirillace
ae
WP_025265317.1
transcriptional repressor
LexA [Thalassolituus
oleivorans]
repressor LexA
[EC:3.4.21.88]
Ocean -
hydrothermal
vent SRR2046236_SRR2044878 0.00 0.08
Rare
root;cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;unclassified
Gammaproteobacteria
WP_071563583.1
chaperonin GroEL
[Bathymodiolus
thermophilus
thioautotrophic gill
symbiont] chaperonin GroEL
Ocean -
bathypelagic SRR2046222_SRR2044911 0.00 0.03
Rare root;cellular organisms;Archaea;TACK group;Thaumarchaeota;unclassified Thaumarchaeota
WP_048084089.1
ammonium transporter
[Marine Group I
thaumarchaeote SCGC
AAA799-D07]
ammonium transporter,
Amt family
Ocean -
hydrothermal
vent SRR2046238_SRR2044891 0.00 0.22
Rare
root;cellular organisms;Archaea;TACK group;Thaumarchaeota;unclassified
Thaumarchaeota;Marine Group I;Marine Group I thaumarchaeote SCGC AAA799-D07
WP_048084089.1
ammonium transporter
[Marine Group I
thaumarchaeote SCGC
AAA799-D07]
ammonium transporter,
Amt family
Ocean -
hydrothermal
vent SRR2046237_SRR2044892 0.00 0.17
Rare root;cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria
WP_083728642.1 50S
ribosomal protein L3
[Pseudomonas
pachastrellae]
large subunit ribosomal
protein L3
Ocean -
bathypelagic SRR2046221_SRR2044914 0.00 0.08
Rare root;cellular organisms;Eukaryota
YP_874444.1
photosystem II reaction
center protein D1
[Phaeodactylum
tricornutum]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR1726597_ERR1719231 0.00 0.34 147.9 Rare root;cellular organisms;Eukaryota
XP_003080744.1
Glucose-repressible
alcohol dehydrogenase
transcriptional effector
CCR4 and related
proteins (ISS)
[Ostreococcus tauri]
Ocean -
epipelagic ERR1726597_ERR1719164 0.00 0.32 124.4 Rare root;cellular organisms;Eukaryota
XP_009037400.1
polyubiquitin
[Aureococcus
anophagefferens] ubiquitin C
Ocean -
epipelagic ERR1700895_ERR1712063 0.00 1.19
Rare root;cellular organisms;Eukaryota
XP_011399024.1
Elongation factor 1-alpha
[Auxenochlorella
protothecoides] elongation factor 1-alpha
Ocean -
epipelagic ERR1700895_ERR1712137 0.00 1.37
Rare root;cellular organisms;Eukaryota
XP_011399024.1
Elongation factor 1-alpha
[Auxenochlorella
protothecoides] elongation factor 1-alpha
138
Ocean -
epipelagic ERR868454_ERR1712196 0.00 0.31
Rare
root;cellular
organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Pa
narthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;
Neoptera;Holometabola
YP_003735174.1
cytochrome c oxidase
subunit III
(mitochondrion) [Apis
cerana]
cytochrome c oxidase
subunit 3
Ocean -
epipelagic ERR868454_ERR1711861 0.00 0.33
Rare
root;cellular
organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Pa
narthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;
Neoptera;Holometabola
YP_003735174.1
cytochrome c oxidase
subunit III
(mitochondrion) [Apis
cerana]
cytochrome c oxidase
subunit 3
Ocean -
epipelagic ERR1740326_ERR1007416 0.00 0.28
Rare root;cellular organisms
YP_005088651.1
photosystem II protein
D1 [Phaeocystis
antarctica]
>YP_008145425.1
photosystem II protein
D1 (chloroplast)
[Phaeocystis globosa]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR1740326_ERR1712046 0.00 0.39
Rare root;cellular organisms;Eukaryota
XP_016655124.1
cytochrome b
(mitochondrion)
[Plasmodium chabaudi
chabaudi]
ubiquinol-cytochrome c
reductase cytochrome b
subunit
Ocean -
epipelagic ERR1740326_ERR1712031 0.00 0.40
Rare root;cellular organisms;Eukaryota
XP_016655124.1
cytochrome b
(mitochondrion)
[Plasmodium chabaudi
chabaudi]
ubiquinol-cytochrome c
reductase cytochrome b
subunit
Ocean -
epipelagic ERR599253_ERR1007416 0.00 0.28
Rare root;cellular organisms
YP_005088651.1
photosystem II protein
D1 [Phaeocystis
antarctica]
>YP_008145425.1
photosystem II protein
D1 (chloroplast)
[Phaeocystis globosa]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR599253_ERR1712046 0.00 0.39
Rare root;cellular organisms;Eukaryota
XP_016655124.1
cytochrome b
(mitochondrion)
[Plasmodium chabaudi
chabaudi]
ubiquinol-cytochrome c
reductase cytochrome b
subunit
Ocean -
epipelagic ERR599253_ERR1712031 0.00 0.40
Rare root;cellular organisms;Eukaryota
XP_016655124.1
cytochrome b
(mitochondrion)
[Plasmodium chabaudi
chabaudi]
ubiquinol-cytochrome c
reductase cytochrome b
subunit
Ocean -
epipelagic ERR599237_ERR1007415 0.00 0.23 47.2 Abundant root;cellular organisms;Eukaryota;Haptophyceae
YP_005088651.1
photosystem II protein
D1 [Phaeocystis
antarctica]
>YP_008145425.1
photosystem II protein
D1 (chloroplast)
[Phaeocystis globosa]
photosystem II P680
reaction center D1
protein [EC:1.10.3.9]
Ocean -
epipelagic ERR599237_ERR1711908 0.00 0.24
Rare
root;cellular organisms;Bacteria;Terrabacteria group;Cyanobacteria/Melainabacteria
group;Cyanobacteria;Oscillatoriophycideae;Chroococcales;Aphanothecaceae;Candidatus
Atelocyanobacterium;Candidatus Atelocyanobacterium thalassa
WP_012954002.1
nitrogenase iron protein
[Candidatus
Atelocyanobacterium
thalassa]
nitrogenase iron protein
NifH [EC:1.18.6.1]
Ocean -
epipelagic ERR599237_ERR1712220 0.00 0.26
Rare
root;cellular organisms;Bacteria;Terrabacteria group;Cyanobacteria/Melainabacteria
group;Cyanobacteria;Oscillatoriophycideae;Chroococcales;Aphanothecaceae;Candidatus
Atelocyanobacterium;Candidatus Atelocyanobacterium thalassa
WP_012954002.1
nitrogenase iron protein
[Candidatus
Atelocyanobacterium
thalassa]
nitrogenase iron protein
NifH [EC:1.18.6.1]
Ocean -
epipelagic ERR1726589_ERR1719454 0.00 0.24
Rare
root;cellular
organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Saccharomy
cotina;Saccharomycetes;Saccharomycetales;Metschnikowiaceae;Clavispora;Clavispora
lusitaniae;Clavispora lusitaniae ATCC 42720
XP_002616000.1 actin
[Clavispora lusitaniae
ATCC 42720]
Ocean -
epipelagic ERR868459_ERR1719440 0.00 0.10
Rare root;cellular organisms;Eukaryota
XP_001705719.1
Hypothetical protein
GL50803_112080
[Giardia lamblia ATCC
50803]
Ocean -
epipelagic
BATS_MG_256_C10_5A_BA
TS_MT_256_C10_5A 0.00 0.56
Rare
root;cellular
organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae;C
andidatus Pelagibacter;Candidatus Pelagibacter ubique
WP_029453648.1
MULTISPECIES: TRAP
transporter large
permease [Candidatus
Pelagibacter]
TRAP-type transport
system large permease
protein
139
Ocean -
epipelagic
BATS_MG_256_C3_1B_BAT
S_MT_256_C3_1B 0.00 0.58
Rare root;cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria
WP_015459239.1 RimK
family alpha-L-glutamate
ligase [Sphingomonas sp.
MM-1]
gamma-F420-2:alpha-L-
glutamate ligase
[EC:6.3.2.32]
Ocean -
epipelagic
BATS_MG_261_C2_2_BATS
_MT_261_C2_B2 0.00 0.89
Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Flavobacteriia
WP_019670706.1
quinol:cytochrome C
oxidoreductase
[Eudoraea adriatica]
prokaryotic
molybdopterin-
containing
oxidoreductase family,
iron-sulfur binding
subunit
Ocean -
epipelagic
BATS_MG_261_C8_2_BATS
_MT_261_C8_F2 0.00 0.79 326.7 Rare
root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi
group;Bacteroidetes;Flavobacteriia
WP_019670706.1
quinol:cytochrome C
oxidoreductase
[Eudoraea adriatica]
prokaryotic
molybdopterin-
containing
oxidoreductase family,
iron-sulfur binding
subunit
Supplementary Table 2.
RefSeq and KEGG annotations of the most expressed ORFs from each sample pairing that had a RefSeq match
140
Supplemental references
Alberti, A., J. Poulain, S. Engelen, and others. 2017. Viral to metazoan marine plankton
nucleotide sequences from the Tara Oceans expedition. Sci Data 4: 170093.
Baker, B. J., R. A. Lesniewski, and G. J. Dick. 2012. Genome-enabled transcriptomics reveals
archaeal populations that drive nitrification in a deep-sea hydrothermal plume. ISME J.
6: 2269–2279.
Bremges, A., I. Maus, P. Belmann, and others. 2015. Deeply sequenced metagenome and
metatranscriptome of a biogas-producing microbial community from an agricultural
production-scale biogas plant. Gigascience 4: 33.
Dupont, C. L., J. P. McCrow, R. Valas, and others. 2015. Genomes and gene expression
across light and productivity gradients in eastern subtropical Pacific microbial
communities. ISME J. 9: 1076–1092.
Faghihi, M. A., and C. Wahlestedt. 2009. Regulatory roles of natural antisense transcripts.
Nat. Rev. Mol. Cell Biol. 10: 637–643.
Franzosa, E. A., X. C. Morgan, N. Segata, and others. 2014. Relating the metatranscriptome
and metagenome of the human gut. Proc. Natl. Acad. Sci. U. S. A. 111: E2329–38.
Gilbert, J. A., F. Meyer, L. Schriml, I. R. Joint, M. Mühling, and D. Field. 2010.
Metagenomes and metatranscriptomes from the L4 long-term coastal monitoring
station in the Western English Channel. Stand. Genomic Sci. 3: 183–193.
Hultman, J., M. P. Waldrop, R. Mackelprang, and others. 2015. Multi-omics of permafrost,
active layer and thermokarst bog soil microbiomes. Nature 521: 208–212.
141
Lesniewski, R. A., S. Jain, K. Anantharaman, P. D. Schloss, and G. J. Dick. 2012. The
metatranscriptome of a deep-sea hydrothermal plume is dominated by water column
methanotrophs and lithotrophs. ISME J. 6: 2257–2268.
Li, M., B. M. Toner, B. J. Baker, J. A. Breier, C. S. Sheik, and G. J. Dick. 2014. Microbial iron
uptake as a mechanism for dispersing iron from deep-sea hydrothermal vents. Nat.
Commun. 5: 3192.
Maus, I., D. E. Koeck, K. G. Cibis, and others. 2016. Unraveling the microbiome of a
thermophilic biogas plant by metagenome and metatranscriptome analysis
complemented by characterization of bacterial and archaeal isolates. Biotechnol.
Biofuels 9: 171.
Satinsky, B. M., B. C. Crump, C. B. Smith, and others. 2014a. Microspatial gene expression
patterns in the Amazon River Plume. Proc. Natl. Acad. Sci. U. S. A. 111: 11085–11090.
Satinsky, B. M., C. S. Fortunato, M. Doherty, and others. 2015. Metagenomic and
metatranscriptomic inventories of the lower Amazon River, May 2011. Microbiome 3:
39.
Satinsky, B. M., B. L. Zielinski, M. Doherty, C. B. Smith, S. Sharma, J. H. Paul, B. C. Crump,
and M. A. Moran. 2014b. The Amazon continuum dataset: quantitative metagenomic
and metatranscriptomic inventories of the Amazon River plume, June 2010.
Microbiome 2: 17.
Shi, Y., G. W. Tyson, and E. F. DeLong. 2009. Metatranscriptomics reveals unique microbial
small RNAs in the ocean’s water column. Nature 459: 266–269.
Sieradzki, E. T., J. Cesar Ignacio-Espinoza, D. M. Needham, E. B. Fichot, and J. A. Fuhrman.
2017. Dynamic marine viral infections and major contribution to photosynthetic
142
processes shown by regional and seasonal picoplankton metatranscriptomes. bioRxiv
176644. doi:10.1101/176644
Sunagawa, S., L. P. Coelho, S. Chaffron, and others. 2015. Ocean plankton. Structure and
function of the global ocean microbiome. Science 348: 1261359.
Thrash, J. C., K. W. Seitz, B. J. Baker, B. Temperton, L. E. Gillies, N. N. Rabalais, B.
Henrissat, and O. U. Mason. 2017. Metabolic Roles of Uncultivated Bacterioplankton
Lineages in the Northern Gulf of Mexico “Dead Zone.” MBio 8.
doi:10.1128/mBio.01017-17
143
Chapter 3
Multi-year temporal dynamics of highly resolved microbial communities in an oxygen
minimum zone.
Introduction
Global ocean oxygen (O 2) concentrations are declining (Schmidtko et al. 2017). The
ocean has regions and depths characterized by low oxygen. These oxygen minimum
zones (OMZs) range from occasional to chronic hypoxia and anoxia (Karstensen et al.
2008). OMZs result from an interactive combination of high aerobic respiration rates
and low inputs of oxygen. High primary productivity, often driven by high nutrient
concentrations supplied by upwelling, can result in high respiration rates that drive
down oxygen levels (Gilly et al. 2013). Simultaneously, ocean regions with with poor
ventilation and low circulation can limit the supply of oxygen (Ulloa et al. 2012). OMZs
are expanding as a result of increased global ocean warming that causes a decrease in
the solubility of oxygen in seawater (Stramma et al. 2008). Concomitantly, increased
warming has the potential to increase ocean stratification, limiting the mixing of oxygen
replete surface waters with deeper oxygen deficient depths.
OMZs fundamentally alter the flow of energy as oxygen, the most efficient
terminal electron acceptor, becomes scarce. This can result in a decreased or complete
shutdown of aerobic respiration, leading to the death of obligate aerobic animals and
microbes (Wright et al. 2012). Decreased oxygen supply can also lead to alterations of
biogeochemical cycles, especially the nitrogen cycle. For example, nitrification, the
aerobic complete oxidation of ammonia (NH 3) to nitrate (NO 3
-
), is potentially reduced
in OMZs, affecting the distribution of inorganic nitrogen species (Wright et al. 2012).
144
Removal of oxygen as a terminal electron acceptor increases microbial respiration
processes that rely on less oxidizing nitrogen electron acceptors, and a subsequent loss
of fixed nitrogen. Accordingly, denitrification is a microbial mediated process that
begins with the use of oxidized forms of nitrogen as alternative electron acceptors and
the reduction of nitrate > nitrite (NO 2
-
) > nitric oxide (NO) > nitrous oxide (N 2O) >
dinitrogen gas (N 2). Full denitrification results in the loss of bioavailable nitrogen, and
partial denitrification results in the accumulation of nitrous oxide, an ozone depleting
greenhouse gas (Portmann et al. 2012). Anammox is the anaerobic oxidation of NH 4
+
by
nitrite, resulting in the formation of N 2 and a loss of fixed N 2. Dissimilatory nitrate
reduction to ammonium (DNRA), the reduction of nitrate to ammonia, results in a shift
of the availability of nitrogen species. After the depletion of nitrogen based oxidants,
other electron acceptors, such as sulfate (SO 4
2-
) , are used for anaerobic respiration
(Canfield et al. 2010). Reduction of sulfate to hydrogen sulfide (H 2S) results in the
accumulation of potentially toxic hydrogen sulfide and the selection of sulfur reducing
microorganisms. Microbial diversity has been shown to decrease inside the core of
OMZs relative to surrounding oxygenated waters (Beman and Carolan 2013; Ganesh et
al. 2014). Microbial community structure also can also be affected in response to lower
oxygen in OMZs (Stevens and Ulloa 2008; Orsi et al. 2012).
To better understand the role of oxygen in structuring microbial community
structure and activity over time, we sampled an OMZ at 890 m in the Southern
California Bight as part of the San Pedro Ocean Time series (SPOT). The OMZ at SPOT
was consistently hypoxic during this study with oxygen concentrations < 7.8 µm. The
Southern California Bight is seasonally upwelled in the late spring and early summer
(Haskell et al. 2015) and is the major source of new nutrients in the spring that drives
145
the export of carbon from the euphotic zone (Collins et al. 2011; Haskell et al. 2016). The
OMZ is at 890 m (~10 m from the seafloor) and is in a basin that restricts circulation
(Huh et al. 1990) that possibly contributes to the collection of suspended particles in the
OMZ. The waters of the SPOT OMZ are episodically flushed with nutrient rich
oxygenated cold water every 1 - 2 years (Berelson 1991; Hickey 1992). The combination
of hydrography and seasonal upwelling likely contribute to low oxygen levels at the
SPOT OMZ. Examinations of the microbial communities at SPOT have shown that
abundance based microbial community structure is seasonal and repeating at the
surface and at 890 m (Fuhrman et al. 2006; Chow et al. 2013; Cram et al. 2015).
Seasonality is less pronounced at the seafloor and likely results from particle deposition
from surface waters directly linking the seafloor communities to the seasonality of the
surface communities. Still, there are portions of the microbial community that are not
seasonal at the surface, but are seasonal at 890 m, suggesting independent dynamics
control the communities at depth. For example, (Parada and Fuhrman 2017)
demonstrated that Marine Group II euryarchaeal communities are only seasonal at 890
m in the OMZ.
To better understand the temporal dynamics and influence of oxygen on the
structure and activity of microbial communities, we assessed microbial communities in
the benthic OMZ at SPOT approximately monthly over a period of 2.8 years (n = 30).
Microbial community structure was assessed using metagenomic sequencing. The
structure of microbial activity was determined using metatranscriptomic sequencing of
the same samples. To directly probe the genetic content of microbial communities, we
assembled all metagenomes and metatranscriptomes at ~99% average nucleotide
identity (ANI). We define DNA abundance, RNA expression, and specific activity
146
(RNA:DNA) ratio based on the frequencies of DNA and RNA of contiguous assemblies.
This approach allowed us to overcome limitations of reference based approaches of
single marker gene studies.
Methods
Sample collection and processing
Samples were collected approximately monthly (n = 33) between 1/16/2013 and
9/9/2015 from 890m at the San Pedro Time-series (SPOT) in the San Pedro Channel.
Temperature, salinity, oxygen, ammonium, nitrite, nitrate, phosphate, total cell counts,
total virus counts, and leucine based heterotrophic production were collected and
processed as previously described (Strickland and Parsons 1972; Parada and Fuhrman
2017). Picoplankton were collected and nucleic acids were extracted as described in
chapters 1 and 2. Samples were sequenced using 2 x 250 bp Illumina paired-end
sequencing. Library preparation was performed as described in chapters 1 and 2.
Sequence quality control and assembly
Sequences were adapter and quality trimmed as described in chapter 2 for Illumina
sequences. Genome and External RNA Control Consortium (ERCC) controls were
removed as in chapter 2. Adapter and quality trimmed reads from each metagenomic
and metatranscriptomic sample were individually assembled using MEGAHIT (Li et al.
2016) in paired end mode to produce assemblies with 99% average nucleotide identity
(ANI) (MEGAHIT v1.0.6 ; megahit -merge-level 20,0.99 --k-min 21 --k-max 255 --k-
step 6). The resulting assemblies were combined and deduplicated using a semi-global
alignment that absorbed smaller or equal size assemblies into larger or equal size
147
assemblies at 99% identity (BBMap v36.19 (Bushnell 2016); dedupe.sh
minidentity=99). Assemblies >525 bp were retained.
Annotation
Open reading frames (ORFs) were predicted using NCBI ORFfinder (Jenuth -
Bioinformatics Methods and Protocols and 1999 1999; ORFfinder -s 1 -n T) and ORFs
>100 amino acids in length were retained. The resulting amino acid sequences were
searched against the Kyoto Encyclopedia of Genes and Genomes (KEGG (Kanehisa and
Goto 2000), Clusters of Orthologous Groups (COG) (Galperin et al. 2015) protein
databases (DIAMOND (Buchfink et al. 2015); diamond blastp -e 1e-5 --sensitive). Hits
with E-value > 1 x 10
-5
were removed. A best hit for each ORF was determined by
sequentially sorting by E-value, bit score, and percent identity. Taxonomy was assigned
for each sequence assembly using a lowest common ancestor (LCA) approach based on
ORF matches for each assembly. ORFs were searched against the complete non-
redundant NCBI RefSeq Release 83 (NCBI Resource Coordinators 2016) protein
database (DIAMOND (Buchfink et al. 2015); diamond blastp -e 1e-5) and hits with E-
value < 1 x 10
-10
were retained. RefSeq hits for each ORF that were within 5% of the bit
score of the best hit were used to assign the LCA for each assembly. Nitrogen cycle genes
and pathways were assigned to KEGG annotations using the KEGG API.
Sequence mapping and counting
Read mappings and counts were performed identically for metagenomic and
metatranscriptomic samples. Only samples with paired metagenomes and
metatranscriptomes were considered (n=30). Small subunit (16S and 18S) and large
148
subunit (5S, 5.8S, 23S, and 28S) rRNAs were removed from each sequence library prior
to mapping reads from each environment to each dereplicated set of assemblies
(SortMeRNA v2.1 (Kopylova et al. 2012); sortmerna --paired_out --fastx --ref silva-bac-
16s-id90.fasta silva-arc-16s-id95.fasta silva-euk-18s-id95.fasta rfam-5s-database-
id98.fasta rfam-5.8s-database-id98.fasta silva-arc-23s-id98.fasta silva-bac-23s-
id98.fasta silva-euk-28s-id98.fasta). The rRNA filtered reads were mapped to the
previously dereplicated total set of assemblies to provide a reference for comparing
sample count distributions. rRNA gene filtered reads were mapped to each assembly
allowing for all multi-mappers (BBMap v36.19; bbmap.sh ambiguous=all
maxsites=1000000000 maxsites2=1000000000 nhtag =t). The resulting bam files
were then filtered to only allow alignments with >99% identity (BBMap v36.19;
reformat.sh idfilter=0.99).
Paired end counts were determined as fragments for each sample. Both ends
mapping in the proper orientation were required, unless one read exceeded the bounds
of each assembly. A fragment length of 100bp was required and all multi-mappings were
counted (featureCounts (Liao et al. 2014) v1.5.3; featureCounts -f -O -M -p --
minOverlap 100 -s 0). For ORFs, counts were performed in forward (featureCounts
v1.5.3; featureCounts -f -O -M --minOverlap 50 -s 1) and reverse mode (featureCounts
v1.5.3; featureCounts -f -O -M --minOverlap 50 -s 2). The reverse orientation was
considered sense and the forward orientation considered anti-sense. Each assembly was
decontaminated by searching against human genome, PhiX genome, and added ERCC
and genome controls. (NCBI BLAST v2.8 (Altschul et al. 1990); blastn -outfmt ‘6 std
qcovs’). Matches with an E-value of 0 were removed prior to further processing.
Following decontamination, counts <10 were removed for each assembly or ORF. All
149
counts were divided by assembly or ORF length to account for higher probabilities of
recruitment of longer assemblies. Length adjusted counts were subsampled to the total
length adjusted count of the smallest sample. This was performed independently for
assemblies, forward ORFs,and reverse ORFs. The length adjusted subsampled counts
were normalized to the total length adjusted count for each sample to provide relative
frequencies. Sense and anti-sense relative frequencies were calculated by normalizing to
the total of the combined sense and anti-sense to length adjusted count. DNA relative
frequencies of 0 were all replaced with the minimum DNA frequency detected, i.e. the
limit of detection, across all samples within an assembly, and ORF counting to allow for
removal of 0s only for RNA:DNA ratio calculations (Satinsky et al. 2017). All data
manipulations were performed with Python (Van Rossum 2007) and Pandas (McKinney
- Proceedings of the 9th Python in Science and 2010 2010).
Small subunit ribosomal RNA (SSU rRNA) gene similarity
SSU rRNA were retrieved from each metagenome by searching the rRNA sequences
retrieved during rRNA filtering (see Sequence mapping and counting) for SSU
rRNA sequences only (SortMeRNA v2.1; sortmerna --paired_out --fastx --ref silva-bac-
16s-id90.fasta silva-arc-16s-id95.fasta silva-euk-18s-id95.fasta). Only forward reads
were used for downstream analysis. The resulting reads were screened to remove
sequences <150bp with >10 homopolymers. These reads were classified against the full
SILVA v123 reference database using mothur (Schloss et al. 2009) (mothur v1.36;
classify.seqs cutoff=60). Sequences that could not be classified were removed (mothur
v1.36; remove.lineage(taxon=unknown)). The reads were grouped into “species” level
bins that matched the SILVA database because reads were not sequenced in a fully
150
overlapping region (mothur v1.36; phylotype
(taxonomy=join.good.nr_v123.wang.pick.taxonomy, cutoff=1). The resulting list file was
subsampled to the count of the smallest sample and each sample was normalized to the
total count of each sample.
Assembly free community similarity
Assembly-free kmer based similarity between samples were calculated using the
abundance of kmer lengths of 31 using sourmash (Titus Brown and Irber 2016).
sourmash counts all kmers in each sample and their abundance. This profile in
converted into a hash and the Jaccard similarity is calculated between hashes to provide
sample similarity. All samples were subsampled to the numbers of quality and adapter
trimmed pairs of the smallest sample prior to sourmash analysis (BBMap v36.19;
reformat.sh samplereadstarget=8092232). sourmash was run as follows: sourmash
compute --track-abundance -k 31; sourmash compare.
Statistics and ecological metrics
Mann-Whitney U tests and Spearman’s rank correlations were calculated using the R
(Team 2014) stats package. Mann-Kendall tests were calculated using the Kendall R
package. P-values were corrected for multiple testing using the bonferroni correction.
Loess smoothings for plots were generated using ggplot2 (Wickham 2006). Mantel and
partial Mantel tests were calculated using Spearman’s correlations and 999
permutations using skbio and the vegan R package, respectively. Pielou’s evenness (J’)
was calculated using vegan. Non-metric dimensional scaling plots (NMDS) were
151
generated using vegan with the Bray-Curtis dissimilarity as a distance metric (vegan
v2.4-4 (Dixon 2003); metamds (distance=”bray”)). For partial Mantel tests, the absolute
number of days between samples was used as the third control distance matrix.
Seasonality tests were performed as described by Cram et al. 2015. Euclidean distances
were used as distances between environmental and biological parameters of samples
using skbio. Bray Curtis distances were used for community dissimilarities between all
samples, and were calculated using skbio.
Results
Environment and biology of the SPOT OMZ
An OMZ in the Eastern North Pacific off of the Southern California coast was sampled
monthly over 33 months for biological, biogeochemical, and molecular parameters as
part of the SPOT. The SPOT OMZ remained hypoxic throughout the 33 month sampling
with oxygen between 1.7 - 7.8 µM (Figure 1). Oxygen peaked on 11/24/14 with minima
on 9/18/13 and 8/13/14 (Figure 1). Oxygen did show a slight upward trend through the
timeseries (Mann-Kendall τ = 0.26, P = 0.05). Temperature was consistently low (5.14 -
5.27˚C) and had a repeating pattern in 2014 and 2015 with maxima in the summer
(Figure 1). Temperature exhibited a monotonic increase over time (Mann-Kendall τ =
0.44; P = 0.001) and was anti-correlated with oxygen (Spearman ρ = -0.37; P = 0.04).
Total cell and virus counts peaked in the summer of 2014 (Figure 1) and virus counts
were anti-correlated with oxygen (Spearman ρ = 0.46; P = 0.01). A combined
measurement of inorganic species of nitrogen, i.e. nitrite and nitrate, peaked in the
152
winters (Figure 1). Nitrite and nitrate measurements were also positively correlated with
oxygen (Spearman ρ = 0.55; P = 0.002).
Patterns of abundance and activity
RNA expression of sense transcripts was 1.4X than of antisense expression. Sense and
antisense expression were also highly correlated (Spearman rho = 0.42, P = 0).
Antisense expression was higher in lower abundance open reading frames (ORFs)
(Mann Whitney U; P = 0). COGs involved with cell motility, prophages, transposons,
nucleotide transport and metabolism, and defense mechanisms had the highest levels of
antisense to sense transcription. The highest RNA expressing lowest common ancestor
(LCA) classifiable taxon was a Candidatus Nitrosomarinus catalina (Ahlgren et al.
2017), and the highest RNA frequencies were represented by Thaumarchaea (Table 1).
The most abundant, i.e. highest DNA frequency, was a Nitrospina sp. (Table 2). The
most active, i.e. highest specific activity (RNA:DNA) ratio, was a Variovorax sp. (Table
3).
All samples had highly skewed rank abundance (DNA) curves. >96.8% of
assemblies were always rare and 0.16% remained abundant the whole time (Figure 2).
0.5% cycled between abundant and rare (abundant >50% of the time). RNA expression
was dominated by rare microbes (Figures 3 and 4) and was generally stable, but there
was variation of rare rna expression among dates, ranging from 88.1 - 99.1% (Figure 5).
Rare microbial RNA expression was defined as RNA expression assemblies in the tail of
the DNA rank abundance curve, or in this study, >10000th rank (Figure 2). Rare
microbial RNA expression remained between ~98-99%. Starting on 3/16/2014 rare
153
microbial expression dropped until reaching a low of 88% on 8/16/2014. Rare microbial
expression then began to increase and returned to 98% on 11/16/2014 (Figure 5).
Community structure and activity
DNA similarity using metagenomes, i.e. abundance of microbes, decreased overall with
increasing time between samples (Mantel r = 0.62; P = 0.001) with similarity ranging
from 24 - 73% (Figure 6). 1 - 6 month intervals maintained comparable levels of
similarity of ~60%. Median community similarity was highest at intervals >30 days, i.e.
0 months and had the highest reduction from 0 to 1 month intervals. Communities
sampled closer in time were more similar and community similarity decrease began to
reduce at samples ~18 months apart. Following this decrease, community similarity
began to increase at intervals of ~25 months, but only to ~45% similarity. Sampling in
12/2013 occurred twice at 4 days apart with a similarity of 51%, lower than many
similarities with intervals of months. Small subunit rRNA (SSU rRNA) gene sequences
retrieved from the metagenomes did also show seasonality (Mantel r = 0.3 ; P < 0.01)
with peak similarities at 9 and 17 months (Figure 7). Minimum similarities were at 6, 15,
and 22 months. Mantel tests revealed between distance matrices of DNA to RNA and
RNA:DNA revealed correlations 0.44 and 0.48, respectively (Spearman rho P = 0.001).
RNA community similarity, i.e. the expression of sequence assemblies, decreased
more strongly over shorter periods of time relative to community structure similarity
over time. RNA similarity was lowest over periods of ~15 months and then consistently
increased over intervals from 14 - 32 months (Figure 8). The longest sampling periods
reached similarities close to similarities at shorter intervals. Communities sampled with
greater temporal proximity were similar in RNA expression profiles, until ~13 months,
154
and further sampled communities were more more similar. Community similarity of the
ratios of RNA and DNA, i.e. specific activity, had a similar pattern to RNA community
similarity over time. RNA:DNA community similarity was highest at an interval of 0
months and reached its minimum at ~15 months (Figure 9). DNA and RNA community
similarity had similar patterns using assembly free kmer frequencies as community
similarities measured using frequencies of assemblies (Figures 10 and 11). There was no
detectable seasonal pattern for DNA, RNA, or RNA:DNA assembly and kmer similarities
(Mantel’s test P > 0.05).
Environment, community structure and activity
We examined correlations of community dissimilarity of DNA, RNA, and RNA:DNA
ratios between individual environmental and biological parameters using partial Mantel
tests. Distances between individual biological and environmental parameters for each
sample were correlated to distances between the community structure of DNA, RNA,
and RNA:DNA ratios while controlling for the effect of time. Species of inorganic
nitrogen (ammonium, nitrite, and nitrate), phosphate, oxygen, temperature total cell
and virus counts, cell and virus ratios, and heterotrophic production were the
environmental and biological parameters utilized. Oxygen was the only parameter
correlated to DNA community profiles. All species of inorganic nitrogen, oxygen, and
viral counts were significantly correlated to RNA-based community structure (Table 4).
Nitrate, nitrite + nitrate, oxygen and viral counts were correlated to RNA:DNA-based
community structure (Table 4). Oxygen had the highest correlation among all
parameters to DNA, RNA, RNA:DNA-based community profiles. DNA, RNA, and
RNA:DNA clustering of samples likewise clustered along oxygen gradients (Figures 11,
155
12, and 13). Mean oxygen distances per time interval followed mean DNA community
dissimilarities for intervals up to ~17 months and not further (Spearman’s rho = 0.59 ; P
= 1 x 10
-4
) (Figure 14). Mean oxygen distances per time interval closely tracked mean
RNA and RNA:DNA community dissimilarities for most time intervals (Spearman rho =
0.87 and 0.89 respectively; P < 4 x 10
-6
) (Figures 15 and 16). Oxygen was also correlated
to the number of unique RNA transcripts, or the richness of RNA expression, and
RNA:DNA ratios (Spearman’s rho = 0.66 and 0.66 respectively; P < 9 x 10
-5
), but not
DNA richness. Similarly, oxygen was correlated to the evenness of RNA expression and
RNA:DNA ratios (Spearman rho = 0.72 and 0.76 respectively; P < 1 x 10
-6
).
Nitrogen cycle dynamics
Over the time-series, anammox, nitrification, denitrification, and DNRA pathways were
detected in DNA and RNA samples. Anammox had low abundance (0.01%) and low
RNA expression (0.1%), but high specific activity (10). Nitrification was the most
expressed (115%) and also had the highest specific activity (14) (Figure 17).
Denitrification and DNRA had similar values as nitrification, but did not have as high of
RNA expression values or activities. Within nitrification genes, ammonia oxidation was
highly expressed and active, but hao expression (0.02%) and activity (0.25) was low.
Nitrification and denitrification both were correlated to each other in terms of DNA,
RNA, and RNA:DNA, and steadily increased in abundance over time (Mann-Kendall tau
= 0.47 and 0.31, respectively; P < 0.02) (Figure 18). Both pathways were more
correlated in DNA than RNA and RNA:DNA (Tables 5,6,7). Nitrification expression did
not increase over time, and resulted in a decrease in nitrification specific activity
(Figures 19 and 20). Denitrification expression did increase with time, but did not
156
increase enough to offset the increase in abundance and so specific activity decreased
(Figure 20). Anammox abundance and expression spiked in the summer of 2014, but
specific activity was reduced relative to other times (Figure 20). This coincided with a
decrease in specific activity of anammox, but a specific activity increase in DNRA.
Anammox and DNRA expression were correlated and both peaked in the summer of
2014, this coincided with a decrease in expression of nitrification and denitrification
(Table 6).
Among the measured environmental and biological parameters, oxygen was the
best predictor of the abundance of nitrification and denitrification pathways.
Heterotrophic activity was the most correlated with anammox, and phosphate was the
most correlated with DNRA (Table 8). Oxygen was also the most correlated with the
expression of denitrification. Oxygen was the most correlated with the specific activity of
anammox and nitrification. The abundance, expression, and specific activity of the
structure of the genes comprising components of the nitrogen cycle were compared with
each environmental and biological parameter. This was performed using partial Mantel
tests of the DNA, RNA, or RNA:DNA of the genes comprising each major component of
the nitrogen cycle controlled for temporal autocorrelations. Using DNA, the underlying
sequences types for nitrification and denitrification were most correlated with oxygen
(Table 9). Anammox and DNRA had no significant correlations with any environmental
or biological parameter. Use the expression, oxygen was most correlated with
nitrification and for DNRA, along with total cell count and viral count (Table 9). Using
specific activities, oxygen was the most correlated with nitrification (Table 9).
157
Discussion
Using abundance (DNA) based community similarity of assemblies and assembly free
kmers, there was no apparent seasonality in the SPOT OMZ over 33 months.
Community similarity mostly decreased with increased differences in time, except for
the longest time comparisons where there was less statistical power due to lower
sampling. This decreasing pattern suggests that novel or different microbial genomic
variants emerge over time, and therefore ecological and evolutionary forces have more
time to act. There was repeating seasonality using SSU rRNA gene sequences retrieved
from the metagenomes as was previously reported by Cram et al. 2015 at 890m.
Community similarities based on SSU rRNA transcripts could not be calculated from the
metatranscriptomes because rRNA depletion was performed prior to sequencing.
Different temporal dynamics likely reflect different representations of microbial
populations that SSU rRNA genes and whole genomes provide. SSU rRNA genes diverge
at very low rates, 1% every 50 million years (Ochman and Wilson 1987; Swan et al. 2013)
and are under strong purifying selection (Smit et al. 2006) relative to other portions of
the genomes. Therefore, SSU rRNA genes likely represent a chimera of closely related
microbial populations. That is to say that SSU rRNA gene sequences do not represent
individual extant microbial populations, but a grouping of them. These populations
potentially form a combined operational unit of SSU rRNA genes that may respond to
the environment seasonally, but do not at the scale of >1% divergent genomic and
transcriptomic sequences. Note, SSU rRNA analysis was performed at the “species”
phylotype level because an overlapping region was not sequenced. Increased SSU rRNA
resolution may better reflect the genomic content of microbial communities. For
example, Needham et al. 2017 showed that increasing levels of resolution for SSU rRNA
158
genes sequences increased ecological associations. Taxa with highly similar single
marker genes have been to shown to harbor different genomes with the potential for
different functionality (Acinas et al. 2004; Kashtan et al. 2014). Coleman et al. 2006
found that Prochlofrococcus with >99% 16S rRNA gene identity can have different
phenotypes and harbor different genomic islands, as a result of horizontal gene transfer
(HGT) from phages. Indeed, taxa with identical 16S rRNA genes sequences can have
different ecologies (Hahn and Pöckl 2005) and genomic content (Jaspers and
Overmann 2004).
Community similarity measured as abundance showed different temporal
patterns compared to similarity based on expression and specific activity. Although
patterns were similar at shorter time intervals, longer interval comparisons showed that
there is a disconnect between community structure by abundance and
expression/specific activity. Although abundance based community structure did not
directly correspond to expression or specific activity of communities, there was a
correlation of abundance and expression/activity similarities. This suggests that
composition of microbial communities has some predictive power in terms of
phenotype.
In terms of environmental and biological parameters, the main driver of
community similarity across time was oxygen concentration. Community dissimilarities
and clustering of communities were reflected in oxygen concentrations. Further,
differentiation between abundance and expression/activity was apparently driven by
oxygen concentrations. Differences in mean dissimilarities of oxygen were highly similar
in expression/dissimilarities. This indicates that microbial expression and activity more
closely responds to certain environmental stimulus than community structure of
159
abundance does. This suggests that although microbial communities may not respond to
certain environmental factors in terms of genotypic presence, communities will adjust
expression and activity to respond to the environment.
The influence of oxygen on community structure, in terms of abundance,
expression, and specific activity, underscores its importance in shaping the ecology of
OMZs. Oxygen has a direct role in mediating microbial metabolism as a terminal
electron acceptor in respiration. Depletion of oxygen further influences community
metabolism respiration shifts to other electron acceptors, such as nitrate and sulfate.
Oxygen also has the potential to alter other processes that are not mechanistically
directly linked to oxygen availability. For example, Rastelli et al. 2016 found that viral
activity can increase in response to environmental oxygen depletion.
We finally examined the dynamics of nitrogen cycling in the OMZ. Nitrification
and denitrification genes both were high in terms of absolute expression and specific
activity. This was particularly true for nitrification. As a result, both nitrification and
denitrification potentially are keystone processes, in the sense that low abundance genes
have a disproportionate amount of expression. Microbes controlling these processes are
potentially more susceptible to perturbation, as microbes with smaller population sizes
may be more easily removed. This could potentially affect the whole ecosystem, as
nitrogen cycling directly mediates the availability of bioreactive nitrogen. Further,
nitrification and denitrification were correlated in abundance, expression and specific
activity, as the production of nitrite, and to a lesser extent nitrate, potentially provides
oxidizing substrate for denitrification. The linking of these two processes also indicates
that alterations to one will affect the other, potentially disrupting the whole nitrogen
cycle (Francis et al. 2007). Oxygen was important in changing bulk measurements of
160
abundance, expression, and activity of the components of the nitrogen cycle. Oxygen
was also important in structuring the underlying sequences or variants that comprise
each gene involved in components in the nitrogen cycle.
Conclusion
Temporal metagenomic and metatranscriptomic analysis of microbial communities in
the SPOT OMZ demonstrate that oxygen is important in structuring microbial
community structure at a genomic and transcriptomic level. This was especially true of
the structure of gene expression and activity, which closely tracked changes in oxygen
over time. Additionally, oxygen was important in controlling genomic and
transcriptomic components of the nitrogen cycle, both in terms of total abundance and
activity, and the variants of genes in involved. The results of this study also show that
temporal structuring of community abundance can be independent of expression and
activity. Abundance community structure may not respond to certain environmental
factors, e.g. oxygen, but community expression and activity may compensate. This also
highlights that microbial communities have structure at a fine scale, i.e. genotypes, that
do respond to environmental factors. This also suggests that microbes should be studied
at extremely fine resolutions, e.g. single cells, similar to examinations of animal
population structure (Hartl et al. 1997). Our findings indicate that alterations to oxygen
concentrations in a rapidly changing ocean will have a measurable effect on the activity
and structure of microbial communities.
161
References
Acinas, S. G., V. Klepac-Ceraj, D. E. Hunt, C. Pharino, I. Ceraj, D. L. Distel, and M. F. Polz.
2004. Fine-scale phylogenetic architecture of a complex bacterial community. Nature
430: 551–554.
Ahlgren, N. A., Y. Chen, D. M. Needham, A. E. Parada, R. Sachdeva, V. Trinh, T. Chen, and
J. A. Fuhrman. 2017. Genome and epigenome of a novel marine Thaumarchaeota strain
suggest viral infection, phosphorothioation DNA modification and multiple restriction
systems. Environ. Microbiol. 19: 2434–2452.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local
alignment search tool. J. Mol. Biol. 215: 403–410.
Beman, J. M., and M. T. Carolan. 2013. Deoxygenation alters bacterial diversity and
community composition in the ocean’s largest oxygen minimum zone. Nat. Commun. 4:
2705.
Berelson, W. M. 1991. The flushing of two deep-sea basins, southern California borderland.
Limnol. Oceanogr. 36: 1150–1166.
Buchfink, B., C. Xie, and D. H. Huson. 2015. Fast and sensitive protein alignment using
DIAMOND. Nat. Methods 12: 59–60.
Bushnell, B. 2016. BBMap short read aligner. University of California, Berkeley, California.
URL http://sourceforge. net/projects/bbmap.
Canfield, D. E., F. J. Stewart, B. Thamdrup, L. De Brabandere, T. Dalsgaard, E. F. Delong, N.
P. Revsbech, and O. Ulloa. 2010. A Cryptic Sulfur Cycle in Oxygen-Minimum–Zone
Waters off the Chilean Coast. Science 330: 1375–1378.
162
Chow, C.-E. T., R. Sachdeva, J. A. Cram, J. A. Steele, D. M. Needham, A. Patel, A. E. Parada,
and J. A. Fuhrman. 2013. Temporal variability and coherence of euphotic zone bacterial
communities over a decade in the Southern California Bight. ISME J. 7: 2259–2273.
Coleman, M. L., M. B. Sullivan, A. C. Martiny, C. Steglich, K. Barry, E. F. Delong, and S. W.
Chisholm. 2006. Genomic islands and the ecology and evolution of Prochlorococcus.
Science 311: 1768–1770.
Collins, L. E., W. Berelson, D. E. Hammond, A. Knapp, R. Schwartz, and D. Capone. 2011.
Particle fluxes in San Pedro Basin, California: A four-year record of sedimentation and
physical forcing. Deep Sea Res. Part I 58: 898–914.
Cram, J. A., C.-E. T. Chow, R. Sachdeva, D. M. Needham, A. E. Parada, J. A. Steele, and J. A.
Fuhrman. 2015. Seasonal and interannual variability of the marine bacterioplankton
community throughout the water column over ten years. ISME J. 9: 563–580.
Dixon, P. 2003. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14:
927–930.
Francis, C. A., J. M. Beman, and M. M. M. Kuypers. 2007. New processes and players in the
nitrogen cycle: the microbial ecology of anaerobic and archaeal ammonia oxidation.
ISME J. 1: 19–27.
Fuhrman, J. A., I. Hewson, M. S. Schwalbach, J. A. Steele, M. V. Brown, and S. Naeem.
2006. Annually reoccurring bacterial communities are predictable from ocean
conditions. Proc. Natl. Acad. Sci. U. S. A. 103: 13104–13109.
Galperin, M. Y., K. S. Makarova, Y. I. Wolf, and E. V. Koonin. 2015. Expanded microbial
genome coverage and improved protein family annotation in the COG database. Nucleic
Acids Res. 43: D261–9.
163
Ganesh, S., D. J. Parris, E. F. DeLong, and F. J. Stewart. 2014. Metagenomic analysis of size-
fractionated picoplankton in a marine oxygen minimum zone. ISME J. 8: 187–211.
Gilly, W. F., J. M. Beman, S. Y. Litvin, and B. H. Robison. 2013. Oceanographic and
biological effects of shoaling of the oxygen minimum zone. Ann. Rev. Mar. Sci. 5: 393–
420.
Hahn, M. W., and M. Pöckl. 2005. Ecotypes of planktonic actinobacteria with identical 16S
rRNA genes adapted to thermal niches in temperate, subtropical, and tropical
freshwater habitats. Appl. Environ. Microbiol. 71: 766–773.
Hartl, D. L., A. G. Clark, and A. G. Clark. 1997. Principles of population genetics, Sinauer
associates Sunderland.
Haskell, W. Z., II, D. E. Hammond, and M. G. Prokopenko. 2015. A dual-tracer approach to
estimate upwelling velocity in coastal Southern California. Earth Planet. Sci. Lett. 422:
138–149.
Haskell, W. Z., II, M. G. Prokopenko, D. E. Hammond, R. H. R. Stanley, W. M. Berelson, J.
J. Baronas, J. C. Fleming, and L. Aluwihare. 2016. An organic carbon budget for coastal
Southern California determined by estimates of vertical nutrient flux, net community
production and export. Deep Sea Res. Part I 116: 49–76.
Hickey, B. M. 1992. Circulation over the Santa Monica-San Pedro Basin and Shelf. Plan.
Perspect. 37: 115.
Huh, C.-A., L. F. Small, S. Niemnil, B. P. Finney, B. M. Hickey, N. B. Kachel, D. S. Gorsline,
and P. M. Williams. 1990. Sedimentation dynamics in the Santa Monica-San Pedro
Basin off Los Angeles: radiochemical, sediment trap and transmissometer studies. Cont.
Shelf Res. 10: 137–164.
164
Jaspers, E., and J. Overmann. 2004. Ecological significance of microdiversity: identical 16S
rRNA gene sequences can be found in bacteria with highly divergent genomes and
ecophysiologies. Appl. Environ. Microbiol. 70: 4831–4839.
Jenuth - Bioinformatics Methods and Protocols, J. P., and 1999. 1999. The NCBI: publicly
available tools and resources on the web. Springer.
Kanehisa, M., and S. Goto. 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic
Acids Res. 28: 27–30.
Karstensen, J., L. Stramma, and M. Visbeck. 2008. Oxygen minimum zones in the eastern
tropical Atlantic and Pacific oceans. Prog. Oceanogr. 77: 331–350.
Kashtan, N., S. E. Roggensack, S. Rodrigue, and others. 2014. Single-cell genomics reveals
hundreds of coexisting subpopulations in wild Prochlorococcus. Science 344: 416–420.
Kopylova, E., L. Noé, and H. Touzet. 2012. SortMeRNA: fast and accurate filtering of
ribosomal RNAs in metatranscriptomic data. Bioinformatics 28: 3211–3217.
Liao, Y., G. K. Smyth, and W. Shi. 2014. featureCounts: an efficient general purpose program
for assigning sequence reads to genomic features. Bioinformatics 30: 923–930.
Li, D., R. Luo, C.-M. Liu, C.-M. Leung, H.-F. Ting, K. Sadakane, H. Yamashita, and T.-W.
Lam. 2016. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by
advanced methodologies and community practices. Methods 102: 3–11.
McKinney - Proceedings of the 9th Python in Science, W., and 2010. 2010. Data structures
for statistical computing in python. pdfs.semanticscholar.org.
NCBI Resource Coordinators. 2016. Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res. 44: D7–19.
165
Needham, D. M., R. Sachdeva, and J. A. Fuhrman. 2017. Ecological dynamics and co-
occurrence among marine phytoplankton, bacteria and myoviruses shows
microdiversity matters. ISME J. 11: 1614–1629.
Ochman, H., and A. C. Wilson. 1987. Evolution in bacteria: evidence for a universal
substitution rate in cellular genomes. J. Mol. Evol. 26: 74–86.
Orsi, W., Y. C. Song, S. Hallam, and V. Edgcomb. 2012. Effect of oxygen minimum zone
formation on communities of marine protists. ISME J. 6: 1586–1601.
Parada, A. E., and J. A. Fuhrman. 2017. Marine archaeal dynamics and interactions with the
microbial community over 5 years from surface to seafloor. ISME J.
doi:10.1038/ismej.2017.104
Portmann, R. W., J. S. Daniel, and A. R. Ravishankara. 2012. Stratospheric ozone depletion
due to nitrous oxide: influences of other gases. Philos. Trans. R. Soc. Lond. B Biol. Sci.
367: 1256–1264.
Rastelli, E., C. Corinaldesi, B. Petani, A. Dell’Anno, I. Ciglenečki, and R. Danovaro. 2016.
Enhanced viral activity and dark CO2 fixation rates under oxygen depletion: the case
study of the marine Lake Rogoznica. Environ. Microbiol. 18: 4511–4522.
Satinsky, B. M., C. B. Smith, S. Sharma, and others. 2017. Expression patterns of elemental
cycling genes in the Amazon River Plume. ISME J. 11: 1852–1864.
Schloss, P. D., S. L. Westcott, T. Ryabin, and others. 2009. Introducing mothur: open-
source, platform-independent, community-supported software for describing and
comparing microbial communities. Appl. Environ. Microbiol. 75: 7537–7541.
Schmidtko, S., L. Stramma, and M. Visbeck. 2017. Decline in global oceanic oxygen content
during the past five decades. Nature 542: 335–339.
166
Smit, S., M. Yarus, and R. Knight. 2006. Natural selection is not required to explain
universal compositional patterns in rRNA secondary structure categories. RNA 12: 1–
14.
Stevens, H., and O. Ulloa. 2008. Bacterial diversity in the oxygen minimum zone of the
eastern tropical South Pacific. Environ. Microbiol. 10: 1244–1259.
Stramma, L., G. C. Johnson, J. Sprintall, and V. Mohrholz. 2008. Expanding oxygen-
minimum zones in the tropical oceans. Science 320: 655–658.
Strickland, J. D. H., and T. R. Parsons. 1972. A practical handbook of seawater analysis.
Swan, B. K., B. Tupper, A. Sczyrba, and others. 2013. Prevalent genome streamlining and
latitudinal divergence of planktonic bacteria in the surface ocean. Proc. Natl. Acad. Sci.
U. S. A. 110: 11463–11468.
Team, R. C. 2014. R: A language and environment for statistical computing. Vienna, Austria:
R Foundation for Statistical Computing; 2014.
Titus Brown, C., and L. Irber. 2016. sourmash: a library for MinHash sketching of DNA. The
Journal of Open Source Software 2016. doi:10.21105/joss.00027
Ulloa, O., D. E. Canfield, E. F. DeLong, R. M. Letelier, and F. J. Stewart. 2012. Microbial
oceanography of anoxic oxygen minimum zones. Proc. Natl. Acad. Sci. U. S. A. 109:
15996–16003.
Van Rossum, G. 2007. Python Programming Language. Proc. USENIX Annu. Tech. Conf.
Wickham, H. 2006. Ggplot: an Implementation of the Grammar of Graphics. R Package
Version 04 0. J. R. Soc. Interface 10.
Wright, J. J., K. M. Konwar, and S. J. Hallam. 2012. Microbial ecology of expanding oxygen
minimum zones. Nat. Rev. Microbiol. 10: 381–394.
167
Figures
Figure 1. Environmental and biological parameters measured over the sampling
period. Units are as follows: Leucine incorporation, i.e. heterotrophic production (Cells
ml
-1
day
-1
), NH4 (µm), NO2 (µm), NO3 (µm), NOx, i.e. NO2 + NO3 (µm), oxygen (µm),
PO4 (µm), salinity (PSU), temperature (˚C), total cell count (cells ml
-1
), viral count
(viruses ml
-1
), virus to bacteria ratio (viruses total cell counts
-1
).
168
Figure 2. Rank abundance curves by sample using abundance of assemblies. Rare
assemblies are colored yellow and abundant assemblies are blue.
169
Figure 3. RNA expression of assemblies plotted as a function of DNA abundance of
assemblies. Points are colored by LCA of assemblies at the domain level. Only
assemblies that could be classified are shown.
170
Figure 4. RNA expression of assemblies plotted as a function of DNA abundance of
assemblies. All assemblies are shown, regardless of taxonomic classification.
171
Figure 5. Rare microbial expression of assemblies over time.
172
Figure 6. Pairwise community similarity of DNA abundance of assemblies at all
monthly time lags. Data points are mean Bray Curtis similarities with standard error of
the mean (SEM). Trend line is the fitted loess curve with the 95% confidence interval.
173
Figure 7. Pairwise community similarity of SSU rRNA abundance at all monthly time
lags. Data points are mean Bray Curtis similarities with standard error of the mean
(SEM). Trend line is the fitted loess curve with the 95% confidence interval.
174
Figure 8. Pairwise community similarity of RNA expression of assemblies at all
monthly time lags. Data points are mean Bray Curtis similarities with standard error of
the mean (SEM). Trend line is the fitted loess curve with the 95% confidence interval.
175
Figure 9. Pairwise community similarity of RNA:DNA specific activities of assemblies
at all monthly time lags. Data points are mean Bray Curtis similarities with standard
error of the mean (SEM). Trend line is the fitted loess curve with the 95% confidence
interval.
176
Figure 10. Pairwise community similarity of DNA abundance of assembly free kmers at
all monthly time lags. Data points are mean Bray Curtis similarities with standard error
of the mean (SEM). Trend line is the fitted loess curve with the 95% confidence interval.
177
Figure 11. Pairwise community similarity of RNA expression of assembly free kmers at
all monthly time lags. Data points are mean Bray Curtis similarities with standard error
of the mean (SEM). Trend line is the fitted loess curve with the 95% confidence interval.
178
Figure 12. Non-metric multidimensional scaling (NMDS) plot of samples clustered by
Bray Curtis dissimilarities of DNA abundance of assemblies. Each sample is a point
colored by its corresponding O 2 concentration.
179
Figure 13. Non-metric multidimensional scaling (NMDS) plot of samples clustered by
Bray Curtis dissimilarities of RNA expression of assemblies. Each sample is a point
colored by its corresponding O 2 concentration.
180
Figure 14. Pairwise mean Bray Curtis DNA dissimilarities of samples (green) and O 2
distances of samples (green) plotted as a function of time lags.
181
Figure 15. Pairwise mean Bray Curtis RNA dissimilarities of samples (green) and O 2
distances of samples (green) plotted as a function of time lags.
182
Figure 16. Pairwise mean Bray Curtis RNA:DNA dissimilarities of samples (green) and
O 2 distances of samples (green) plotted as a function of time lags.
183
Figure 17. Genes involved in different parts of the nitrogen cycle. Red numbers are
RNA expression of genes. Blue numbers are DNA abundance of genes.
184
Figure 18. Abundance of nitrogen cycle pathways over time. Points are sized by RNA
expression.
185
Figure 19. RNA expression of nitrogen cycle pathways over time. Points are sized by
DNA abundance.
186
Figure 20. Specific activities (RNA:DNA) of nitrogen cycle pathways over time. Points
are sized by RNA expression.
187
Tables
NCBI species taxonomy Total abundance (% DNA)
Nitrospina 77.3
Caldithrix abyssi 9.6
Pedosphaera 7.2
Flavobacteriaceae 6.5
Chlorobiaceae 5.0
Candidatus Pelagibacter ubique 3.9
Archangiaceae 2.7
Candidatus Nitrosomarinus catalina 2.5
Candidatus Ruthia magnifica 2.2
Ilumatobacter 1.9
Table 1. Top 10 most abundant taxa grouped by NCBI “species” rank. Taxonomies were
determined using LCA of RefSeq ORF matches.
188
NCBI species taxonomy Total expression (% RNA
Candidatus Nitrosomarinus catalina 129.5
Nitrospina 118.8
Flavobacteriaceae 5.3
Candidatus Nitrosopumilus salaria 3.8
Candidatus Microthrix 3.5
Candidatus Nitrosopumilus adriaticus 2.5
Caldithrix abyssi 2.2
Pedosphaera 1.9
Candidatus Pelagibacter ubique 1.5
Candidatus Nitrosoarchaeum koreensis 1.5
Table 2. Top 10 highest expressing taxa grouped by NCBI “species” rank. Taxonomies
were determined using LCA of RefSeq ORF matches.
189
NCBI species taxonomy Total specific activity (RNA:DNA)
Variovorax sp. CF313 8683.700955
Methanocaldococcus jannaschii 2191.479517
Leptothrix 1574.587979
Azospira oryzae 1255.240352
Pucciniomycetes 1162.801338
Halorubrum sp. SAH-A6 997.5385234
Azospirillum sp. B4 891.8055812
Taphrinomycotina incertae sedis 847.566321
Lysobacter spongiicola 809.9070313
Leucocytozoon 602.0052942
Table 3. Top 10 taxa with highest specific activities (RNA:DNA) grouped by NCBI
“species” rank. Taxonomies were determined using LCA of RefSeq ORF matches.
190
DNA RNA RNA:DNA
NH4 (µm)
0.491028613
NO 2 + NO 2 (µm)
0.415081352 0.387016774
NO 3 (µm)
0.378738327 0.356880469
O 2 (µm) 0.502864683 0.529754211 0.540533748
Viral count (cell ml
-1
)
0.353606398 0.320745895
able 4. Partial Mantel spearman correlations between community profiles and
environmental/biological parameters with P < 0.05. All P values were Bonferroni
corrected for multiple testing.
191
Anammox Denitrification DNRA Nitrification
Anammox 1
Denitrification
1 -0.478 0.8889
DNRA
-0.478 1
Nitrification
0.8889
1
Table 5. Spearman correlations between DNA of nitrogen cycle pathways with
Bonferroni corrected P < 0.05.
192
Anammox Denitrification DNRA Nitrification
Anammox 1 -0.5293 0.5049 -0.4762
Denitrification -0.5293 1
0.5263
DNRA 0.5049
1
Nitrification -0.4762 0.5263
1
Table 6. Spearman correlations between RNA of nitrogen cycle pathways with
Bonferroni corrected P < 0.05.
Anammox Denitrification DNRA Nitrification
Anammox 1
Denitrification
1
0.5922
DNRA
1
Nitrification
0.5922
1
Table 7. Spearman correlations between RNA:DNA of nitrogen cycle pathways with
Bonferroni corrected P < 0.05.
193
Pathway Parameter DNA RNA RNA:DNA
Anammox NO2+NO3
-0.64
Anammox NO3
-0.604130435
Denitrification Oxygen
0.817487685
Nitrification Oxygen
0.781773399
Anammox NO2+NO3 -0.64
Anammox NO3 -0.604130435
Denitrification Oxygen 0.817487685
Nitrification Oxygen 0.781773399
Anammox NO2+NO3
-0.64
Anammox NO3
-0.604130435
Denitrification Oxygen
0.817487685
Nitrification Oxygen
0.781773399
Table 8. Spearman correlations between DNA,RNA,RNA:DNA total and
environmental/biological parameters with Bonferroni corrected P < 0.05.
194
Pathway Parameter DNA RNA RNA:DNA
Nitrification NO2_NO3
0.33
Nitrification NO3
0.34
Nitrification Oxygen 0.56 0.40 0.38
Denitrification NH4
0.49
Denitrification NO2
0.45 0.46
Denitrification Oxygen 0.62 0.25 0.31
Denitrification Viral count
0.34
DNRA Bacterial count
0.33
DNRA Oxygen
0.31
DNRA Viral count
0.34
Table 9. Spearman correlations between community profiles of DNA,RNA,RNA:DNA
of nitrogen cycle pathways and environmental/biological parameters with Bonferroni
corrected P < 0.05.
Abstract (if available)
Abstract
Microorganisms are the dominant form of life on the planet. Microbes are crucial mediators of major chemical cycles on the planet that are necessary for all life. The vast majority of microorganisms are virtually uncutivatible and, therefore, cannot be directly studied. This dissertation focuses on developing and applying techniques to study microbial life in the planet’s major biomes, with a special focus on the ocean. Metagenomics, or the direct sequencing of microbial genomic from the environment, is used to provide phylogenetic and functional information about virtually uncutivatible microbes. A typical metagenomic pipeline involves DNA extraction, library preparation, sequencing, assembly, genome binning, bin selection, and bin refinement to functional and structural annotation. Notably, each step of this pipeline is automatable, using robotics and software, except for bin refinement. A novel metric for genome contamination using phylogenetics was developed and applied to automatically refine metagenome assembled genomes. This was packaged into a software tool, PhyLigo, and was applied to in silico metagenomes, a time-series of infant gut metagenomes, and complex metagenomes from the deep ocean. Automated PhyLigo refinement resulted in genome contamination reductions that could outperform human-guided curation. To understand the role of microbes in the environment, metagenomic (DNA) and metatranscriptomic (RNA) data were sourced from publicly available and novel environmental, host-associated, and human-engineered shotgun sequenced communities. Samples encompass the ocean, Amazon River and its plume into the ocean, the human gut, permafrost soil layers, a thermokarst bog, and human-engineered biogas plants. Ocean samples span the water column, the sunlit epipelagic (0 - 200 m), the dimly lit mesopelagic (200 - 1,000 m), dark bathypelagic (1,000 to 4,000 m), the benthic zone (near seafloor), and hydrothermal vent plumes. Using metagenomes and metatranscriptomes as proxies for abundance and activity, respectively, revealed that the rare fractions of microbial communities dominate activity across all environments sampled. The extent that microbial activity is concentrated in the rare microbial fraction was anti-correlated with environmental temperature. This indicates that temperature is important in controlling the fundamental relationship between microbial abundance and activity. Rare fractions were not only important in total community activity, but were overrepresented in functions related to growth, defense, infectious disease, and biogeochemical cycling. This suggests that rare microbes can act as keystone community members by controlling the activities of processes that are important to total ecosystem function. Microbial abundance and activity was examined temporally in an oxygen minimum zone (OMZ) off the coast of Southern California as part of the San Pedro Ocean Time-series (SPOT). OMZs are expanding in the ocean, resulting in global ocean declines. Previous work using DNA fingerprinting has shown that the microbial community in the SPOT OMZ has a seasonal repeating pattern. The SPOT OMZ was sampled monthly over 2.8 years (n = 30) as a model for understanding the impact of changing oxygen on microbial community structure and activity. Microbial community activity was structured by oxygen content and not season, indicating that global declines in oxygen will cause a shift in microbial community activity. Community structure via genomic content abundance did not show repeating seasonality as previously reported. Ribosomal rRNA genes from the metagenomes did show seasonality, indicating that core portions of the community are seasonal, but accessory elements are not. The results of this dissertation provide new methods and insights into understanding the role of microbial communities and their activities in the environment.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Microbial ecology in the deep terrestrial biosphere: a geochemical, metagenomic and culture-based approach
PDF
Changes in the community composition of marine microbial eukaryotes across multiple temporal scales of measurement
PDF
Temporal variability of marine archaea across the water column at SPOT
PDF
Microbe to microbe: monthly microbial community dynamics and interactions at the San Pedro Ocean Time-series
PDF
Genetic characterization of microbial eukaryotic diversity and metabolic potential
PDF
Spatial and temporal dynamics of marine microbial communities and their diazotrophs in the Southern California Bight
PDF
Unexplored microbial communities in marine sediment porewater
PDF
Characterizing protistan diversity and quantifying protistan grazing in the North Pacific Subtropical Gyre
PDF
Enhancing recovery of understudied and uncultured lineages from metagenomes
PDF
Using sequencing techniques to explore the microbial communities associated with ferromanganese nodules and sediment from the South Pacific gyre
PDF
Spatial and temporal investigations of protistan grazing impact on microbial communities in marine ecosystems
PDF
Annual pattern and response of the bacterial and microbial eukaryotic communities in an aquatic ecosystem restructured by disturbance
PDF
Dynamics of marine bacterial communities from surface to bottom and the factors controling them
PDF
Big data analytics in metagenomics: integration, representation, management, and visualization
PDF
Developing statistical and algorithmic methods for shotgun metagenomics and time series analysis
PDF
Ecological patterns of free-living and particle-associated prokaryotes, protists, and viruses at the San Pedro Ocean Time-series between 2005 and 2018
PDF
Marine protistan diversity, spatiotemporal dynamics, and physiological responses to environmental cues
PDF
Marine bacterioplankton biogeography over short to medium spatio-temporal scales
PDF
Thermal diversity within marine phytoplankton communities
PDF
Metagenomic analysis of the microbial changes following non-surgical periodontal therapy in aggressive periodontitis
Asset Metadata
Creator
Sachdeva, Rohan
(author)
Core Title
Patterns of molecular microbial activity across time and biomes
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Biology (Marine Biology and Biological Oceanography)
Publication Date
03/19/2020
Defense Date
12/04/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
climate change,Marine biology,metagenomics,metatranscriptomics,microbial ecology,microbial genomics,microbiology,OAI-PMH Harvest,oxygen minimum zone
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Heidelberg, John (
committee chair
), Fuhrman, Jed (
committee member
), Ouellette, Andre (
committee member
)
Creator Email
rohansach@gmail.com,rsachdev@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-486290
Unique identifier
UC11267782
Identifier
etd-SachdevaRo-6123.pdf (filename),usctheses-c40-486290 (legacy record id)
Legacy Identifier
etd-SachdevaRo-6123.pdf
Dmrecord
486290
Document Type
Dissertation
Rights
Sachdeva, Rohan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
climate change
metagenomics
metatranscriptomics
microbial ecology
microbial genomics
microbiology
oxygen minimum zone