Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Enhancing recovery of understudied and uncultured lineages from metagenomes
(USC Thesis Other)
Enhancing recovery of understudied and uncultured lineages from metagenomes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Enhancing recovery of understudied and uncultured lineages from metagenomes
by
Elaina D. Graham
A dissertation presented to the faculty of the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In partial fulfillment of the
requirement for the Degree
DOCTOR OF PHILOSOPHY
BIOLOGY (MARINE BIOLOGY AND BIOLOGICAL OCEANOGRAPHY)
December 2021
Copyright 2021 Elaina Graham
i
Table of Contents
Abstract……………………………………………………………………………………... ii
Dissertation Introduction
Metagenomics for microbial discovery and diversity.…………………... 1
Understanding the uncultured majority………………………………..... 2
Microbial diversity in the global ocean…………………………………... 3
References………………….………………………………………………. 4
Chapter One: BinSanity: unsupervised clustering of environmental microbial
assemblies using coverage and affinity propagation……………………………………..
10
Chapter Two: Potential for primary productivity in a globally distributed bacterial
anoxygenic phototroph and targeted enrichment from the San Pedro Ocean Time
Series (SPOT)……………………………………………………………………………….
30
Abstract……………………………………………………………………. 31
Introduction……………………………………………………………….. 31
Methods……………………………………………………………………. 33
Discussion Part 1: Metagenomics………………………………………... 36
Discussion Part 2: Culturing……………………………………………... 39
Conclusion……………………………………………………………….... 41
References…………………………………………………………………. 42
Main Figures & Tables………………………………………………….... 47
Chapter Three: Marine Dadabacteria exhibit genome streamlining and phototrophy-
driven niche partitioning…………………………………………………………………...
59
Chapter Four: Phylogenetic and functional diversity within `Candidatus
Nitrosopelagicus brevis`…………………………………………………………………....
69
Abstract……………………………………………………………………. 70
Introduction……………………………………………………………….. 70
Methods……………………………………………………………………. 73
Results……………………………………………………………………... 77
Discussion…………………………………………………………………. 82
Conclusion……………………………………………………………….... 85
References…………………………………………………………………. 87
Main Figures & Tables………………………………………………….... 94
Conclusions…………………………………………………………………………………. 107
References…………………………………………………………………. 108
Bibliography………………………………………………………………………………... 109
Appendix
Chapter 1 Supplemental Figures and Tables……………………………. 131
Chapter 2 Supplemental Figures and Tables…………………………..... 180
Chapter 3 Supplemental Figures and Tables……………………………. 186
Chapter 4 Supplemental Figures and Tables…………………………… 191
ii
Abstract
Metagenomics, the direct sequencing of the genomic content of the entire community, is
increasingly being used to better understand the uncultivated majority in the environment. This
dissertation develops a novel method to extract near complete draft genomes from metagenomic datasets
and demonstrates the usefulness of subsampling approaches for assembling large metagenomic datasets.
Using these frameworks, I have discovered a unique aerobic anoxygenic phototroph coined ‘Ca.
Luxescamonaceae’ that possessed the genomic potential for anoxygenic phototrophy and carbon fixation.
I uncovered the largest set of marine Dadabacteria genomes to date showing phototrophy driven depth
partitioning and genome streamlining. Finally I analyzes the largest set of ammonia oxidizing
Thaumarchaeota genomes yielding insight into the distinction between the unique clades and potential
partitioning due to substrate affinity.
1
Dissertation Introduction
Metagenomics for microbial discovery and diversity
In 1985 Staley and Konopka
1
identified a phenomena they termed “the great plate count
anomaly”, which described the discrepancy between the number of cells from the environment that form
colonies on agar based media and the numbers enumerated under a microscope
2
. This ultimately led to the
paradigm in microbiology that only 1% of microbes are culturable. Although this number has come under
debate recently it is still thought that most bacterial taxa remain uncultured
3,4
. As recently as 2017 models
comparing global bacterial diversity indices to described or named species showed >99% of bacteria
remained uncultivated
5
. To contend with this issue, techniques using high-throughput sequencing were
implemented allowing the study of microbes in the environment in a culture-independent manner. The
first approaches in high throughput sequencing implemented techniques based on a single gene or loci
(such as the rRNA gene) (e.g., denaturing gradient gel electrophoresis (DGGE) and automated ribosomal
intergenic space analysis (ARISA)
6
.
Eventually this led to the implementation of “next-gen” sequencing on the hypervariable regions
of the 16S rRNA gene that have been widely used to assess diversity and biogeography
7
. These datasets
are limited though in only providing phylogenetic operational taxonomic unit (OTU) profiles while
ignoring functional potential. Further improvement on techniques in high-throughput sequencing have led
to advancements in techniques necessary to perform metagenomics, direct sequencing of the genomic
content of the entire community. Metagenomics has given us a way to study microbes within the context
of their community and assess biogeochemical potential and population ecology. Since some of the
earliest metagenomes in the Sargasso Sea
8
and acid mine drainage
9
there have been over 20,000
metagenomic BioProjects submitted to NCBI including global efforts such as the global ocean survery
10
,
Tara Oceans
11
, and bioGEOTRACES
12
. These types of datasets have greatly expanded our view of the
tree of life and even led to the discovery of the candidate phyla radiation (CPR) which was initially
proposed to comprise up to 15% of the phylogenetic diversity of the domain bacteria
13,14
. In my
2
dissertation I have continued to push the limits of our understanding of phylogenetic and functional
diversity through `binning` of metagenomically-assembled genomes (MAGs) from metagenomic datasets
and subsequent identification of novel and unique clades representing widely understudied or uncultured
lineages. This is important as evidence has previously shown that many microbes are distantly related to
their cultured “representatives”
15–18
.
Understanding the uncultured majority
While it has long been said that only around 1% of microbes are currently culturable, recent
assessments have amended this by comparing environmental 16S rRNA gene amplicon datasets to 16S
rRNA genes from cultured representative databases in SILVA
19
and RDP
20
indicating that between 2-18%
of 16S rRNA from a diverse array of environments are represented in databases by cultured
representatives
3
. Metagenomics methodologies are often employed to study microorganisms that have yet
to be brought into culture. Metagenomics is the process of sequencing DNA directly from environmental
samples
21
. Metagenomics requires deep genome coverage of closely related strains for successful genome
assembly. Recovery of incomplete “draft” genomes from metagenomes has been shown in many
cases
13,14,22–29
. Most of the bioinformatic methods to extract draft genomes from metagenomic datasets
rely on coverage-based binning, making sampling replicates important to accurate identification of
phylogenetically related units. My dissertation works to expand the options for recovering draft genomes
from metagenomic data by developing a novel program called BinSanity
30
. BinSanity was built to
leverage multi-sample metagenomes and differentiate closely related organisms using a machine learning
algorithm called affinity propagation.
Even with the improvements in binning draft genomes from metagenomic datasets provided by
BinSanity we still find that generating high quality draft microbial genomes remains difficult in
environments with high genomic richness combined with a community composition that exhibits a
relatively even distribution. Beyond this issue, organisms with high heterogeneity in the environment
often cannot be assembled
31,32
. SAR11 as well as the ammonia oxidizing archaea (AOA) are known to be
3
highly abundant and widely distributed, but often do not assemble well when implementing short read
technology such as Illumina
8,29
. The fourth chapter of my dissertation attempts to implement subsampling
assembly protocols to reduce the microdiversity of our metagenomic datasets prior to assembly and
improve the recovery of the high abundance and high heterogeneity communities. This approach did not
appear to significantly improve recovery in the predicted manner, but using metagenome assembled
genomes (MAGs) from this approach as well as draft genomes collected from publicly available sources,
we were able to produce the largest Nitrosopelagicus draft genome dataset to date, illuminating unique
features differentiating the group at the population level including identifying a unique set of
Nitrosopelagicus groups that appear to lack the potential for ammonia oxidation.
Microbial diversity in the global ocean
Its estimated that the total number of prokaryotic cells (e.g bacteria and archaea) on the earth is 4-
6 × 10
30
with a cellular carbon content of 350-550 Pg of C (1 Pg = 10
15
g)
33
. Most of the prokaryotes are
found in natural environments such as the open ocean where millions of microbial cells exists per mL of
seawater
34
. These microbes play important roles in the broader marine ecosystem as important members
of the food chain (e.g., microbial loop)
35,36
and via genes coding for the major redox reactions essential
for modulating biogeochemical cycles
37
. While much of my dissertation has focused on the identification
of novel and uncultivated lineages these discoveries are ultimately made significant by our ability to
move beyond phylogenetic novelty to analyzing the functional “inventory” of a group of related
microbes. By leveraging publicly available draft genomes as well as newly generated MAGs generated
from the Tara Oceans
24
and bioGEOTRACES
12
datasets we gain enough representatives from key
uncultivated groups to make robust predictions regarding the functional potential of novel groups of
microbes. As a whole my dissertation aims to not only improve the protocols and reproducibility of
current metagenomic approaches, but to leverage publicly available metagenomes to better understand the
swaths of uncultured microbes we have yet to discover which includes the Dadabacteria which until 2015
were only known as the “Candidate phylum SBR1093” due to having no genome representatives
available; the aerobic anoxygenic phototrophs which I demonstrated in my dissertation have the potential
4
for a novel lithoautotrophic metabolism. In addition, leveraging public datasets allowed me to collect the
largest set of Nitrosopelagicus genomes to date which currently only has two cultured isolates.
References
1. Staley, J. T. & Konopka, A. Measurement of in Situ Activities of Nonphotosynthetic
Microorganisms in Aquatic and Terrestrial Habitats. Annu. Rev. Microbiol. 39, 321–346 (1985).
2. JANNASCH, H. W. & JONES, G. E. Bacterial Populations in Sea Water as Determined by Different
Methods of Enumeration. Limnol. Oceanogr. 4, 128–139 (1959).
3. Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain
uncultured. ISME J. 13, 3126–3130 (2019).
4. Martiny, A. C. High proportions of bacteria are culturable across major biomes. ISME J. 13, 2125–
2128 (2019).
5. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev. Microbiol.
71, 711–730 (2017).
6. Hugerth, L. W. & Andersson, A. F. Analysing microbial community composition through amplicon
sequencing: From sampling to hypothesis testing. Frontiers in Microbiology vol. 8 (2017).
7. Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’.
Proc. Natl. Acad. Sci. U. S. A. 103, 12115–12120 (2006).
8. Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science (80-.
). 304, 66–74 (2004).
9. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial
genomes from the environment. Nature 428, 37–43 (2004).
10. Rusch, D. B. et al. The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through
eastern tropical Pacific. PLoS Biol. 5, 0398–0431 (2007).
11. Bork, P. et al. Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction. Science
348, 873 (2015).
12. Biller, S. J. et al. Data descriptor: Marine microbial metagenomes sampled across space and time.
Sci. Data 5, (2018).
13. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
14. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria.
Nature 523, 208–211 (2015).
15. Lee, M. D. et al. Marine Synechococcus isolates representing globally abundant genomic lineages
demonstrate a unique evolutionary path of genome reduction without a decrease in GC content.
Environ. Microbiol. 21, 1677–1686 (2019).
16. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev. Microbiol.
71, 711–730 (2017).
17. Palkova, Z. Multicellular microorganisms: Laboratory versus nature. EMBO Reports vol. 5 470–
5
476 (2004).
18. Hobman, J. L., Penn, C. W. & Pallen, M. J. Laboratory strains of Escherichia coli: Model citizens or
deceitful delinquents growing old disgracefully? Molecular Microbiology vol. 64 881–885 (2007).
19. Quast, C. et al. The SILVA ribosomal RNA gene database project: Improved data processing and
web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
20. Cole, J. R. et al. Ribosomal Database Project: Data and tools for high throughput rRNA analysis.
Nucleic Acids Res. 42, D633–D642 (2014).
21. Bragg, L. & Tyson, G. W. Metagenomics using next-generation sequencing. Methods Mol. Biol.
1096, 183–201 (2014).
22. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11,
1144–1146 (2014).
23. Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 2019, (2019).
24. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-
assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
25. Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are
abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
26. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509 (2021).
27. Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136
microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 1–6 (2016).
28. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially
expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
29. Hug, L. A. et al. Critical biogeochemical functions in the subsurface are associated with bacteria
from new phyla and little studied lineages. Environ. Microbiol. 18, 159–173 (2016).
30. Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised clustering of environmental
microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035 (2017).
31. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from metagenomic
data. PeerJ 8, e10119 (2020).
32. Hedlund, B. P., Dodsworth, J. A., Murugapiran, S. K., Rinke, C. & Woyke, T. Impact of single-cell
genomics and metagenomics on the emerging view of extremophile “microbial dark matter”.
Extremophiles vol. 18 865–875 (2014).
33. Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: The unseen majority. Proceedings of
the National Academy of Sciences of the United States of America vol. 95 6578–6583 (1998).
34. Kai, W., Peisheng, Y., Rui, M., Wenwen, J. & Zongze, S. Diversity of culturable bacteria in deep-sea
water from the South Atlantic Ocean. Bioengineered 8, 572–584 (2017).
35. Fenchel, T. The microbial loop - 25 years later. J. Exp. Mar. Bio. Ecol. 366, 99–103 (2008).
6
36. Azam, F. et al. The Ecological Role of Water-Column Microbes in the Sea. Mar. Ecol. Prog. Ser. 10,
257–263 (1983).
37. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive earth’s
biogeochemical cycles. Science vol. 320 1034–1039 (2008).
38. Hunter, C. N., Daldal, F., Thurnauer, M. C. & Beatty, J. T. The purple phototrophic bacteria. vol. 28
(Springer Science & Business Media, 2008).
39. Pfennig, N. & Trüper, H. G. Taxonomy of phototrophic green and purple bacteria: A review. Ann.
l’Institut Pasteur / Microbiol. 134, 9–20 (1983).
40. Madigan, M. T. & Jung, D. O. An Overview of Purple Bacteria: Systematics, Physiology, and
Habitats. in 1–15 (Springer, Dordrecht, 2009). doi:10.1007/978-1-4020-8815-5_1.
41. Ehrenreich, A. & Widdel, F. Anaerobic oxidation of ferrous iron by purple bacteria, a new type of
phototrophic metabolism. Appl. Environ. Microbiol. 60, 4517–26 (1994).
42. Griffin, B. M., Schott, J. & Schink, B. Nitrite, an Electron Donor for Anoxygenic Photosynthesis.
Science (80-. ). 316, 1870–1870 (2007).
43. HARASHIMA, K., SHIBA, T., TOTSUKA, T., SIMIDU, U. & TAGA, N. Occurrence of
bacteriochlorophyll a in a strain of an aerobic heterotrophic bacterium. Agric. Biol. Chem. 42,
1627–1628 (1978).
44. Shiba, T., Simidu, U. & Taga, N. Distribution of aerobic bacteria which contain bacteriochlorophyll
a. Appl. Environ. Microbiol. 38, 43–5 (1979).
45. Beatty, J. T. On the natural selection and evolution of the aerobic phototrophic bacteria.
Photosynth. Res. 73, 109–114 (2002).
46. NISHIMURA, Y. et al. DNA relatedness and chemotaxonomic feature of aerobic
bacteriochlorophyll-containing bacteria isolated from coasts of Australia. J. Gen. Appl. Microbiol.
40, 287–296 (1994).
47. Arai, H., Roh, J. H. & Kaplan, S. Transcriptome dynamics during the transition from anaerobic
photosynthesis to aerobic respiration in Rhodobacter sphaeroides 2.4.1. J. Bacteriol. 190, 286–99
(2008).
48. Swingley, W. D. et al. The complete genome sequence of Roseobacter denitrificans reveals a
mixotrophic rather than photosynthetic metabolism. J. Bacteriol. 189, 683–90 (2007).
49. Yurkov, V. V & Beatty, J. T. Aerobic anoxygenic phototrophic bacteria. Microbiol. Mol. Biol. Rev.
62, 695–724 (1998).
50. Tang, K.-H., Feng, X., Tang, Y. J. & Blankenship, R. E. Carbohydrate Metabolism and Carbon
Fixation in Roseobacter denitrificans OCh114. PLoS One 4, e7233 (2009).
51. Wood, H. ., Werkman, C. ., Hemingway, A. & Nier, A. . The position of carbon dioxide carbon in
succinic acid synthesized by heterotrophic bacteria. J. Biol. Chem. 139, 377–381 (1941).
52. Cohen-Bazire, G., Sistrom, W. R. & Stanier, R. Y. Kinetic studies of pigment synthesis by non-sulfur
purple bacteria. J. Cell. Comp. Physiol. 49, 25–68 (1957).
53. Kolber, Z. S. et al. Contribution of aerobic photoheterotrophic bacteria to the carbon cycle in the
7
ocean. Science (80-. ). (2001) doi:10.1126/science.1059707.
54. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled genomes
from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
55. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification.
BMC Bioinformatics 11, 1–11 (2010).
56. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for Functional
Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 428, 726–731 (2016).
57. Graham, E. D., Heidelberg, J. F. & Tully, B. J. Binsanity: Unsupervised clustering of environmental
microbial assemblies using coverage and affinity propagation. PeerJ 2017, e3035 (2017).
58. Eren, A. M. et al. Anvi’o: An advanced analysis and visualization platformfor ’omics data. PeerJ
2015, (2015).
59. Santos, S. R. & Ochman, H. Identification and phylogenetic sorting of bacterial lineages with
universally conserved genes and proteins. Environ. Microbiol. 6, 754–759 (2004).
60. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic Acids
Res. 33, D34 (2005).
61. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: Interactive sequence similarity
searching. Nucleic Acids Res. 39, W29–W37 (2011).
62. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222-30 (2014).
63. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids
Res. 31, 371–3 (2003).
64. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space
complexity. BMC Bioinformatics 5, 1–19 (2004).
65. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated alignment
trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
66. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 - Approximately maximum-likelihood trees for
large alignments. PLoS One 5, (2010).
67. Yutin, N., Suzuki, M. T. & Béjà, O. Novel primers reveal wider diversity among marine aerobic
anoxygenic phototrophs. Appl. Environ. Microbiol. 71, 8958–62 (2005).
68. Yutin, N. et al. Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria
in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition
metagenomes. Environ. Microbiol. 9, 1464–1475 (2007).
69. Béjà, O. et al. Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature 415,
630–633 (2002).
70. Markowitz, V. M. et al. IMG: The integrated microbial genomes database and comparative
analysis system. Nucleic Acids Res. 40, D115 (2012).
71. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature
Methods vol. 12 59–60 (2014).
8
72. Tabita, F. R. et al. Function, Structure, and Evolution of the RubisCO-Like Proteins and Their
RubisCO Homologs. Microbiol. Mol. Biol. Rev. 71, 576–599 (2007).
73. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential
coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
74. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search
programs. Nucleic Acids Research vol. 25 3389–3402 (1997).
75. Schwalbach, M. S. & Fuhrman, J. A. Wide-ranging abundances of aerobic anoxygenic
phototrophic bacteria in the world ocean revealed by epifluorescence microscopy and
quantitative PCR. Limnol. Oceanogr. 50, 620–628 (2005).
76. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled genomes
from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
77. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the
quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome
Res. 25, 1043–55 (2015).
78. Moran, M. A. The global ocean microbiome. Science vol. 350 (2015).
79. Hamada, M., Toyofuku, M., Miyano, T. & Nomura, N. cbb3-type cytochrome c oxidases, aerobic
respiratory enzymes, impact the anaerobic life of Pseudomonas aeruginosa PAO1. J. Bacteriol.
196, 3881–3889 (2014).
80. Giuffrè, A., Borisov, V. B., Arese, M., Sarti, P. & Forte, E. Cytochrome bd oxidase and bacterial
tolerance to oxidative and nitrosative stress. Biochimica et Biophysica Acta - Bioenergetics vol.
1837 1178–1187 (2014).
81. Boldareva-Nuianzina, E. N., Bláhová, Z., Sobotka, R. & Koblížek, M. Distribution and origin of
oxygen-dependent and oxygen-independent forms of mg-protoporphyrin monomethylester
cyclase among phototrophic proteobacteria. Appl. Environ. Microbiol. 79, 2596–2604 (2013).
82. Gough, S. P., Petersen, B. O. & Duus, J. Anaerobic chlorophyll isocyclic ring formation in
Rhodobacter capsulatus requires a cobalamin cofactor. Proc. Natl. Acad. Sci. U. S. A. 97, 6908–
6913 (2000).
83. Yurkov, V., Gad’on, N. & Drews, G. The major part of polar carotenoids of the aerobic bacteria
Roseococcus thiosulfatophilus RB3 and Erythromicrobium ramosum E5 is not bound to the
bacteriochlorophyll a-complexes of the photosynthetic apparatus. Arch. Microbiol. 160, 372–376
(1993).
84. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
85. Gómez-Consarnau, L. et al. Light stimulates growth of proteorhodopsin-containing marine
Flavobacteria. Nature 445, 210–213 (2007).
86. Johnston, W. et al. Carbon mass balance methodology to characterize the growth of pigmented
marine bacteria under conditions of light cycling. Bioprocess Biosyst. Eng. 27, 163–174 (2005).
87. Tomasch, J., Gohl, R., Bunk, B., Diez, M. S. & Wagner-Döbler, I. Transcriptional response of the
photoheterotrophic marine bacterium Dinoroseobacter shibae to changing light regimes. ISME J.
9
5, 1957–68 (2011).
88. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
89. Tang, K. et al. An Aerobic Anoxygenic Phototrophic Bacterium Fixes CO2 via the Calvin-Benson-
Bassham Cycle 2 3 Running title: An AAnPB implements the CBB cycle. bioRxiv 2021.04.29.441244
(2021) doi:10.1101/2021.04.29.441244.
10
Chapter One: BinSanity: unsupervised clustering of environmental
microbial assemblies using coverage and affinity propagation
Chapter one was previously published in the PeerJ Journal March 8
th
2017 (10.7717/peerj.3035).
Included below is the main text as it was published. Supplemental information can be found in the
appendix to this dissertation.
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Chapter Two: Potential for primary productivity in a globally distributed
bacterial anoxygenic phototroph and targeted enrichment from the San
Pedro Ocean Time Series (SPOT)
Chapter two is partially published in the ISME Journal March 9
th
2018
(https://doi.org/10.1038/s41396-018-0091-3). Included below is material encompassing both the
published data and subsequent experiments in metagenomic guided enrichment. Supplemental
Information can be found in the appendix to this dissertation.
31
Abstract
Aerobic anoxygenic phototrophs (AAnPs) are considered common in the global oceans and have been
canonically assumed to be photoheterotrophic. Using the Tara Oceans metagenomic dataset we identified
nine high-quality bacterial draft genomes within the Alphaproteobacteria that possess the genomic
potential for anoxygenic phototrophy, carbon fixation via the Calvin-Benson-Bassham (CBB) cycle, and
the oxidation of sulfite and/or thiosulfate. These organisms have tentatively been assigned the designation
‘Ca. Luxescamonaceae’ and are globally distributed with relative abundances of 0.1-1% in the North
Pacific, Mediterranean Sea, East Africa Coastal Province, Atlantic, Hawaiian Ocean Time Series (HOT),
Bermuda Atlantic Time Series (BATS), and the San Pedro Ocean Time Series (SPOT). Using surface
water samples from SPOT collected in the winter of 2018 and metagenome-assembled genome (MAG)
guided defined minimal media configurations, we enriched for ‘Ca. Luxescamonaceae’ and identified
bulk carbon fixation rates under dark and light conditions as measured by incorporation of isotopically-
labeled sodium bicarbonate [NaH
14
CO 3].
Introduction
Anoxygenic phototrophy is a process in which photons are captured and converted into ATP, but
unlike oxygenic photosynthesis, oxygen is not produced, as water is not used as an electron donor. The
AAPs utilize the type-II photochemical reaction centers (RCIIs) and bacteriochlorophyll (bcl) to capture
photons that energize electrons and generate proton motive force. The anoxygenic phototrophs
canonically belong to the green sulfur bacteria (GSB), green non-sulfur bacteria (GNSB), purple sulfur
bacteria (PSB), purple non-sulfur bacteria (PNSB), and heliobacteria. These groups were named using the
presence of specific bacteriochlorophyll variants (e.g., bacteriochlorophyll a, b, c, d, e, and g). The PSB
are the most phylogenetically diverse group of anoxygenic phototrophs spanning the Proteobacteria in a
polyphyletic distribution. The PNSB belong to the alphaproteobacterial and the betaproteobacteria and
tend to be capable to growth via aerobic or anaerobic respiration, fermentation, and anoxygenic
phototrophy making them more metabolically versatile than the PSBs
38
. The primary difference between
32
the PSBs and PNSBs is that the former is primarily photoautotroph with limited potential for
photoheterotrophy, while the latter are primarily photoheterotrophs that are conditionally capable of
photoautotrophy. For the most part, when these organisms are grown under photoautotrophic conditions
they use sulfide, thiosulfate, or H 2 as an electron donor
39,40
. There is also limited evidence for specific
species that can also use ferrous (Fe
2+
)
41
or nitrite as an electron donor
42
.
Recently focus has shifted to a subset of the PSB, namely the Aerobic anoxygenic phototrophs
(AAnPs). The AAnPs were not discovered until the 1970s
43
when researchers spectrophotomically
discovered a signal for BChl-a in pigmented isolates from an organic rich medium
44
grown aerobically.
Unlike canonical anoxygenic phototrophs in which pigment synthesis is entirely repressed by oxygen,
AAnPs require oxygen and can produce pigments in oxygen rich environments but tend to do so only in
the dark, which is theorized to be a mechanism to avoid oxidative damage
45,46
. The AAnPs have been
identified as photoheterotrophs that require a supply of organic carbon substrates. Some species, such as
Rhodobacter sphaeroides or Roseobacter denitrificans, have been shown to be facultative
photoautotrophs when grown under anaerobic or microaerobic conditions, but transition back to
heterotrophy when exposed to aerobic conditions
47
. AAnPs lack the enzyme ribulose-1,5-bisphosphate
carboxylase (RuBisCO) and other proteins in the Calvin-Benson-Bassham (CBB) cycle that are key to
carbon fixation and found in many closely related anaerobic purple bacteria
48
. Several cultured marine
AANPs
49
, can incorporate a significant amount of inorganic carbon through uptake via anaplerotic
pathways
50
. Anaplerotic reactions are catalyzed by pyruvate carboxylase, phosphoenolpyruvate (PEP)
carboxylase, PEP carboxykinase, and malic enzymes
51
. Specifically, this involves PEP carboxylase
attaching CO 2 to phosphoenolpyruvate (PEP) and forming oxaloacetate. In photosynthetic organisms this
would generate CO 2 substrate into RuBisCO but in anaplerotic reactions the oxaloacetate is fed directly to
the tricarboxylic acid (TCA) cycle
48
. A study attempting to quantify anaplerotic carbon fixation in
Roseobacter denitrificans found that this process could represent 10-15% of the carbon biomass, but that
this group could not survive with CO 2 as a sole carbon source. Currently all the known autotrophic
33
anoxygenic phototrophs exist exclusively under anoxic conditions and there is evidence showing that
pigment synthesis pathways (e.g. bacteriochlorophyll) are repressed by molecular oxygen
52
.
There have been various phylogenetic and diversity-based studies conducted on AANPs but
limited efforts have been made to assess their contribution to the global carbon cycle
53
. With the
increasing number of AANP genomes being identified by metagenomics there is a need to reassess our
basic assumptions about the potential functionality and evolution of the group. In work on the Tara
Oceans dataset over 60 metagenome-assembled genomes (MAGs) belonging to the anoxygenic
phototrophs were identified. These MAGs exhibited a wide range of phylogenetic distribution and
facilitated the identification of a group of six AAnP MAGs in the surface ocean that indicate a potential
for aerobic carbon fixation contrary to previous research and are being tentatively named ‘Ca.
Luxescamonaceae.’
Methods
Genome Identification & Annotation
A collection of 3,655 non-redundant MAGs generated from several studies using the Tara Oceans
metagenomic dataset
24,25,54
and from the Red Sea
27
had putative DNA coding sequences (CDSs) predicted
using Prodigal
55
(v.2.6.2; -m -p meta) and annotated by the KEGG database
56
using BlastKOALA
(taxonomy group, Prokaryotes; database, genus_prokaryotes + family_eukaryotes; accessed March 2017).
Assessment of pathways and metabolisms of interest were determined using the script KEGG-decoder.py
(www.github.com/bjtully/BioData/tree/master/KEGGDecoder). Genomes were screened for the predicted
presence of genes assigned as the M and L subunits of type-II photochemical reaction center (PufML) and
the large and small units of ribulose-1,5-bisphosphate carboxylase (RbcLS). After identification of the
AAnP genomes of interest, genomes were subjected to manual assessment of quality. For genomes from
Tully et al. (2017)
24
, read coverage and DNA compositional data was utilized to bin additional putative
contigs into the genomes of interest using CONCOCT
22
(v.0.4.1; parameters: -c 800 -I 500) on contigs
>5kb from each province with an AAnP genome. To improve completion estimates, overlapping
34
CONCOCT and BinSanity
57
bins were visualized using anvi’o
58
(v.2.1.0) and manually refined to
improve genome completion and minimize contamination estimates. The anvi’o databases for four
genomes that had to be manually refined are shown visually for MED800, EAC638, SAT68, and NP970
in Supplemental Figures S1-S4. Genomes from Delmont et al. (2017)
25
were visualized in anvi’o and
manually curated based on DNA composition (%G+C and tetranucleotide frequencies).
Phylogenetics
To determine phylogeny, a set of 31 single copy marker genes
59
were identified in the MAGs of
interest that consisted of ribosomal proteins, often used for phylogenomic analysis
13
, and proteins
essential for cellular metabolism (Supplemental Information 2). A total of 1,099 reference genomes
(Supplemental Information 3) accessed from NCBI GenBank
60
using HMMER
61
(v3.1b2; hmmsearch -E
1e-5) and hidden Markov models collected from the Pfam
62
and TIGRfam
63
databases (Accessed March
2017). Genomes with ≥17 markers were used for phylogenetic placement. Each individual marker gene
was aligned using MUSCLE
64
(v3.8.31; parameter: -maxiters 32), trimmed using TrimAL
65
(v.1.2rev59;
parameter: -automated1), manually assessed, and concatenated. A maximum likelihood tree was
generated using FastTree
66
(v.2.1.10; parameters: -lg -gamma).
Sequences for PufM were collected from previously described lineages
67,68
and bacterial artificial
chromosomes
69
and from genomes in Integrated Microbial Genomes
70
based on their KEGG Ontology
annotation (K08929; Supplemental Information 5 and 6). Global Ocean Survey (GOS) assemblies
8
with
predicted CDS (as above) were searched using the collected PufM sequences with a DIAMOND
BLASTP search
71
(v.0.8.36.98; default settings; Supplemental Information 2 and 3). RbcL sequences
were collected from previously described lineages
72
(Supplemental Information 7 and 8). Two separate
phylogenetic trees were constructed (as above).
Relative Abundance
The relative contribution of the ‘Ca. Luxescamonaceae’ genomes to the overall measured
planktonic communities was calculated by two different methods in their respective manuscripts
24,25
.
35
Briefly, for MAGs described in Tully et al. (2017)
24
, an approximate relative abundance was determined
based on the total the bacterial and archaeal community signature as detected by a set of 100 single-copy
gene markers
73
. This was achieved by searching for all copies of the gene markers in the total, assembled
metagenomes from Tully et al. (2017)
24
and identifying which of these markers could be assigned to a
MAG. A length normalized relative abundance value was calculated for each MAG in each sample
24
.
Single-copy marker genes for the Delmont et al. (2018)
25
MAGs were detected in the Tully et al. (2017)
24
assembled metagenomes from the Mediterranean and North Atlantic using BLAST
74
and used to calculate
relative abundance. Briefly, for MAGs described in Delmont et al. (2017)
25
, reads were recruited to the
contigs of each MAG and used to calculate the relative fraction of aligned reads from the total
metagenome.
Enrichment Conditions
Surface seawater was collected during November and December at the San Pedro Ocean Time
Series (SPOT, 33°33′N, 118°24′W). For Media Configuration 1 (MC1) seawater was 0.1 μm sterile
filtered, autoclaved and the following nutrients were added at the given μM concentrations: NaNO 3 (200
μM), NH 4Cl (200 μM), KH 2PO 4 (200 μM), Trace Metals(1X), Na 2S 2O 3 (1000 μM), Na 2SO 3 (1000 μM),
NaHCO 3 (1500 μM), vitamins (1X). Aliquots of 29.5 mL of media and 500 μL of SPOT surface water
were added to individual culture flasks. Half were incubated in the dark while the other half were
incubated under 950 nm and 850nm infrared lights and transferred every 48-72 hours.
48-hour growth assays were conducted by inoculating newly made media as indicated above,
collecting samples every two hours, and counting cells using a flow cytometer (Guava EasyCyte Plus,
MA, USA) equipped with a fluorescence detector. Briefly, every 2 hours 200 μL of enrichment was
preserved with paraformaldehyde PFA (2% final concentration), stained with SybrGreen I nucleic acid
gel stain (Thermofischer) and run on a Guava easyCyte 5HT System (EMD Millipore). Flow cytometry
measurements were validated periodically using epifluorescence microscopy as detailed in Schwalbach et
al. (2005)
75
for total bacterial enumeration.
36
Bulk Carbon Fixation Rates
To calculate bulk carbon fixation rates, we used a Multi-Purpose Scintillation Counter (model
LS-6500, Beckman Coulter). Fresh cultures were inoculated and grown up in triplicate. When the
enrichment reached log phase (as determined via flow cytometry), labelled bicarbonate (H
14
CO 3)
(56mCi/mmol; final concentration 1.098 μCi/μL) was added into the samples. Samples were incubated for
24 hours, filtered on 0.45 μm HAWP membrane filters (Millipore), and measured on a multi-purpose
scintillation counter. Autotrophic carbon fixation rates were calculated by taking the ratio of radioactivity
(
14
C incorporation over time) to the total radioactivity of H
14
CO 3 and multiplying by the total dissolved
inorganic carbon concentration.
DNA extraction and Polymerase Chain Reaction
Growth rates were assessed using epifluorescence microscopy and flow cytometry under light and
dark conditions. Using these initial growth curves six enrichment replicates (enrichments: 7, 13, 19, 29,
30, 42) were selected for DNA extraction using the PowerWater DNA Isolation Kit (Qiagen) per
manufacturers protocols. DNA was quantified using the Qubit dsDNA HS Assay kit (Thermo Fischer
Scientific). PCR was performed using the two primer sets in Table 1 were conducted according to the
thermocycler protocols in Yutin et al. (2005)
67
and Béjà et al. (2002)
69
. These results were used to
determine if anoxygenic phototrophs were present in enrichments.
Discussion Part 1: Metagenomics
Between 2003-2010, the Tara Oceans Expedition circumnavigated the globe collecting more than
200 metagenomic samples targeting viral and microbial life at the surface, deep chlorophyll maximum
(DCM), and mesopelagic
11
. There have been various efforts geared at reconstructing microbial genomes
from these metagenomes
25,28,76
. But these studies varied in methodology and scope with one study only
looking at the microbial fractions (0.2-3 μm) and another analyzing only samples from the Mediterranean.
I reconstructed genomes from 234 samples collected from 61 stations falling into the ‘viral’ (<0.22 μm),
37
‘girus’ (0.22-0.8 μm), ‘bacterial’ (0.22-1.6 μm), and ‘protistan’ (0.8-5.0 μm) fractions. The source
metagenomes were available at multiple depths with the surface (~5 m) and DCM being sequenced at
every station. I binned contigs from these metagenomes into metagenome assembled genomes (MAGs)
using BinSanity
30
with a detailed protocol described in Tully et al. (2018)
24
. In total, 2,631 MAGs were
produced with >50% completion as determined using single copy genes via CheckM
77
. Of the 2,631
MAGs, 1,491 were >70% complete and 420 were >90% complete with less than 5% contamination.
Despite the large number of high-quality genomes produced it should be noted that the MAGs are only a
small percentage of the entire metagenomic dataset (Figure 1B).
Each of these 2,631 MAGs were screened for phylogeny and function. Using phylogenomics
24
,
we identified MAGs for multiple groups that lack cultured representatives including a novel marine clade
of Dadabacteria, Marine Group II and III Euryarchaea, Marinimicrobia, and SAR324 (Figure 2). In
addition to these groups, six MAGs belonging to an unknown lineage of Alphaproteobacteria (Figure 3)
were uncovered. These Alphaproteobacteria were shown to belong to a group of uncultured
Rhodobacteraceae.
These six ‘Ca. Luxescamonaceae’ MAGs encoded the genes PufML and RbcLS, which represent
the marker for the type-II anoxygenic reaction center and RuBisCo, respectively. It is well documented
that AAPs are incapable of carbon fixation
78
, so this group warranted further investigation. Utilizing
MAGs produced by Delmont et al. (2018)
25
12 additional phylogenetically related genomes were
identified, of which three contained genes encoding PufML and RbcLS. The lack of these genes in some
MAGs could be due to variable levels of completion (38.24%-94.77%; <5% contamination). As is
common in MAGs, none of the 18 genomes contained a full length 16S rRNA gene sequence, but a
partial 90 bp 16S rRNA gene fragment in one MAG was linked to a clade of uncultured organisms related
to Rhodobacteraceae.
Using a combination of the KEGG
56
, Pfam
62
, and TIGRfam
63
databases, these MAGs were
annotated (Figure 4). This illustrated that of the genomes containing RbcLS all contain the potential for
complete carbon fixation via the Calvin-Benson-Bassham (CBB) cycle. Based on the functional heatmap
38
MAGs belonging to ‘Ca. Luxescamonaceae’ were lacking genes for the cytochrome c oxidase cbb3-
type
79
and cytochrome bd complex
80
which represent a high oxygen affinity cytochrome and
microaerophilic cytochrome, respectively. These genes are typically found in aerobic anoxygenic
phototrophs like Rhodobacter sphaeroides, which facultatively fix carbon under microaerobic or
anaerobic conditions only
47
. These MAGs also only contain the potential for the complete pathway of
bacteriochlorophyll synthesis and the oxygen-dependent ring cyclase (acsF; K04035) which codes for an
oxidative cyclase necessary for bacteriochlorophyll synthesis in oxic environments
81
. They subsequently
lack the oxygen-independent ring cyclase (bchE; K04034) which is typically found in AANPs that
facultatively fix carbon under microaerobic conditions
82
. ‘Ca. Luxescamonaceae’ also contains the
potential for the oxidation of inorganic sulfur compounds, either thiosulfate via the SOX system and/or
sulfite via sulfite dehydrogenase and lack ability to use a variety of alternative electron acceptors
including nitrite, nitrate, and sulfate. They also contain the complete bacteriochlorophyll metabolism
pathway and RCII complex (Figure 5).
To confirm phototrophic potential a phylogenetic analysis of the RuBisCO sequences from the
nine ‘Ca. Luxescamonaceae’ MAGs demonstrated that they clustered with the Type-IC/D RuBisCOs,
which are considered bona fide RuBisCOs capable of carbon fixation (Figure 6). This is in contrast to the
Type IV rubisco-like proteins that some known AAPs in the Rhodobacteraceae contain
48
and are
incapable of facilitating carbon fixation. A detailed phylogenetic analysis of the PufM sequences from our
MAGs (Figure 7 & Supplemental Figure S5) indicated that they fell into two monophyletic groups
represented solely by environmental sequences pulled from the Global Ocean Survey (GOS)
10
,
bioGEOTRACES
12
, and from the San Pedro Ocean Time series (Unpublished data).
Within a global context ‘Ca. Luxescamonaceae’ have been identified in metagenomes from
GEOTRACES
12
(Figure 8) and Tara Oceans
11
(Figure 9) covering vast swatches of the ocean. The
highest relative fraction of the community for this specific group reaches almost 4% in samples in the
South Atlantic. In the Tara Oceans dataset, the highest abundances for ‘Ca. Luxescamonaceae’ occur in
39
the surface. Interestingly, samples from the bioGEOTRACES (cruises GA10, GA03, GP13) and BATS
show detection for the group at depths ranging from 5-100 m at select stations. Light attenuation in the
ocean suggests that IR light quickly vanishes in the surface ocean indicating that while peak absorbance
of bacteriochlorophyll occurs in the IR range some AAPs may be utilizing light in the 350nm range where
a secondary absorption peak exists. In addition carotenoids play a large role in anoxygenic phototrophs
with many containing carotenoids in the blue green regions of the absorption spectra. The number of
carotenoids in any one cell varies, for example Erythrobacter ramosum has been shown to contain 20
different carotenoids, of which only a fraction are structurally characterizes
83
. It is thought that the
primary role of most of the carotenoids in AAPs is photoprotection to prevent oxidative damage by
scavenging singlet oxygen or free radicals
49
. In vitro the BChl to carotenoid ratio is typically around 1:9.
Further analysis of ‘Ca. Luxescamonaceae’ from SPOT (Figure 10) indicates a seasonality to
their proliferation with abundances peaking in the winter seasons. In addition, a release of >12,000 single-
cell amplified genomes (SAGs) from the GEOTRACES cruise tracks found 31 SAGs belonging to ‘Ca.
Luxescamonaceae,’ 14 of which have a complete 16S rRNA gene sequences
84
. All of this suggest
identification of a globally distributed, potentially photolithoautotrophic AAP whose contribution to
global carbon cycling is unknown.
Discussion Part 2: Culturing
Based on the identification of a globally distributed novel lithoautotrophic anoxygenic phototroph
via metagenomics a defined minimal media was generated. During November and December of 2018
surface water samples were collected from SPOT and 500 μL was added to 29.5 mL of MC1 media
(configuration found in methods) and incubated under 850 and 950 nm infrared light fixtures. In total 42
enrichments were started and naming schemas were given numerically from 1 through 42. Enrichments
were spot checked for growth using epifluorescence microscopy according to the protocols from
Schwalbach et al. 2005
75
for total bacterial enumeration.
40
Once cells were detected in each of the 42 enrichments, growth curves were determined under
both light and dark conditions over the course of 32 hours with time points at 8, 16, 24, and 32 hours
(Figure 11) using flow cytometry. While evidence exists suggesting proteorhodopsin containing bacteria
have increased growth rates under light
85
, there are various anoxygenic phototrophs in culture that have
been shown to grower faster under light
86,87
. These results indicated that several our enrichments showed
increased growth rates under 950 nm and 850 nm IR light sources indicating the potential presence of
bacteriochlorophyll containing organisms (Figure 11 & Supplemental Figure 6).
From these 42 enrichments, six (enrichments 7, 13, 19, 29, 30, 42) were selected to conduct PCR
for the pufM gene using two sets of primers. All reactions using primers from Yutin et al. (2005)
67
yielded no positive PCRs. In contrast the pufM primers from Béjà et al (2002)
69
yielded positive reactions
for all six selected enrichments (Figure 12).
Once it was confirmed that pufM was present in the six selected enrichments we measured bulk
carbon fixation rates. Briefly, these enrichments were inoculated in fresh MC1 media (see methods),
inoculated with labeled bicarbonate (H
14
CO 3), and incubated for 24 hours under both light and dark
conditions in triplicate (Figure 13). These results yielded largely inconclusive results as enrichments 19
and 30 have higher
14
C incorporation under dark conditions than under light conditions. This could be
indicative of some form of anaplerotic uptake as shown in Roseobacter denitrificans
50
previously.
Enrichments 13, 29, and 42 show minor differences between light and dark incubations with each
incorporating slightly more under light conditions. Enrichment 7 shows a more obvious difference
between light and dark incubations with the dark incubations incorporating nearly zero labeled
14
C.
Subsequent 16S rRNA gene amplicon analysis (unpublished) for each of the six enrichments used for
14
C
incorporation ultimately showed some of the enrichments that exhibited carbon fixation had amplicon
sequence variants (ASVs) that could be assigned to putative lithoautotrophic sulfur oxidizers which could
further explain the
14
C incorporation in the dark enrichments. This analysis also showed that I had
enriched at least one ASV that clustered within 95% identity of the 16S rRNA gene sequences identified
41
in a ‘Ca. Luxescamonaceae’ SAG produced by Pachiadaki et al. (2019)
88
. To avoid potential enrichment
of sulfur oxidizers in the future experimental design should carefully assess how much sulfur is being
added to the enrichment media.
These results indicate a possible potential for anoxygenic photolithoautotrophs at SPOT and
while a marine representative remains elusive, the first cultured isolate of an AAP exhibiting this
metabolism was recently identified from moss-dominated soil crusts in China
89
indicating this metabolism
may be more widespread across different biomes than previously thought.
Conclusion
The identification of a novel alphaproteobacterium with the potential for aerobic carbon fixation
and anoxygenic phototrophy in the surface ocean may represent a previously unconstrained contributor to
global carbon cycling. These organisms predominately were recovered from the 0.2-3 μm fraction of the
Tara Oceans dataset and primarily at the DCM or surface depths indicating that ‘Ca. Luxescamonaceae’
is a member of the `free-living` fraction of the photic zone. This paired with the genomic potential for
complete carbon fixation via the CBB cycle, biosynthesis of bacteriochlorophyll, oxidation of inorganic
sulfur compounds (SOX or sulfite dehydrogenase), and lack of microaerobic cytochromes indicates that
this group has the potential for lithoautotrophic growth. Using the functional potential as predicted in the
MAGs, media MC1 was designed to enrich for ‘Ca. Luxescamonaceae’ and initial enrichments indicated
the presence of bacteriochlorophyll via PCR, an overall higher growth rate under light conditions when
grown under IR frequencies, and in enrichments 13, 29, 42 and 7 a higher
14
C incorporation rate with light
42
References
1. Staley, J. T. & Konopka, A. Measurement of in Situ Activities of Nonphotosynthetic
Microorganisms in Aquatic and Terrestrial Habitats. Annu. Rev. Microbiol. 39, 321–346 (1985).
2. JANNASCH, H. W. & JONES, G. E. Bacterial Populations in Sea Water as Determined by
Different Methods of Enumeration. Limnol. Oceanogr. 4, 128–139 (1959).
3. Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain
uncultured. ISME J. 13, 3126–3130 (2019).
4. Martiny, A. C. High proportions of bacteria are culturable across major biomes. ISME J. 13, 2125–
2128 (2019).
5. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev.
Microbiol. 71, 711–730 (2017).
6. Hugerth, L. W. & Andersson, A. F. Analysing microbial community composition through
amplicon sequencing: From sampling to hypothesis testing. Frontiers in Microbiology vol. 8
(2017).
7. Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’.
Proc. Natl. Acad. Sci. U. S. A. 103, 12115–12120 (2006).
8. Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science (80-.
). 304, 66–74 (2004).
9. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial
genomes from the environment. Nature 428, 37–43 (2004).
10. Rusch, D. B. et al. The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through
eastern tropical Pacific. PLoS Biol. 5, 0398–0431 (2007).
11. Bork, P. et al. Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction. Science
348, 873 (2015).
12. Biller, S. J. et al. Data descriptor: Marine microbial metagenomes sampled across space and time.
Sci. Data 5, (2018).
13. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
14. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain
Bacteria. Nature 523, 208–211 (2015).
15. Lee, M. D. et al. Marine Synechococcus isolates representing globally abundant genomic lineages
demonstrate a unique evolutionary path of genome reduction without a decrease in GC content.
Environ. Microbiol. 21, 1677–1686 (2019).
16. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev.
Microbiol. 71, 711–730 (2017).
17. Palkova, Z. Multicellular microorganisms: Laboratory versus nature. EMBO Reports vol. 5 470–
476 (2004).
18. Hobman, J. L., Penn, C. W. & Pallen, M. J. Laboratory strains of Escherichia coli: Model citizens
or deceitful delinquents growing old disgracefully? Molecular Microbiology vol. 64 881–885
43
(2007).
19. Quast, C. et al. The SILVA ribosomal RNA gene database project: Improved data processing and
web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
20. Cole, J. R. et al. Ribosomal Database Project: Data and tools for high throughput rRNA analysis.
Nucleic Acids Res. 42, D633–D642 (2014).
21. Bragg, L. & Tyson, G. W. Metagenomics using next-generation sequencing. Methods Mol. Biol.
1096, 183–201 (2014).
22. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11,
1144–1146 (2014).
23. Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 2019, (2019).
24. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-
assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
25. Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are
abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
26. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509
(2021).
27. Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136
microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 1–6 (2016).
28. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially
expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
29. Hug, L. A. et al. Critical biogeochemical functions in the subsurface are associated with bacteria
from new phyla and little studied lineages. Environ. Microbiol. 18, 159–173 (2016).
30. Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035
(2017).
31. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from
metagenomic data. PeerJ 8, e10119 (2020).
32. Hedlund, B. P., Dodsworth, J. A., Murugapiran, S. K., Rinke, C. & Woyke, T. Impact of single-
cell genomics and metagenomics on the emerging view of extremophile “microbial dark matter”.
Extremophiles vol. 18 865–875 (2014).
33. Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: The unseen majority. Proceedings
of the National Academy of Sciences of the United States of America vol. 95 6578–6583 (1998).
34. Kai, W., Peisheng, Y., Rui, M., Wenwen, J. & Zongze, S. Diversity of culturable bacteria in deep-
sea water from the South Atlantic Ocean. Bioengineered 8, 572–584 (2017).
35. Fenchel, T. The microbial loop - 25 years later. J. Exp. Mar. Bio. Ecol. 366, 99–103 (2008).
36. Azam, F. et al. The Ecological Role of Water-Column Microbes in the Sea. Mar. Ecol. Prog. Ser.
10, 257–263 (1983).
37. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive earth’s
44
biogeochemical cycles. Science vol. 320 1034–1039 (2008).
38. Hunter, C. N., Daldal, F., Thurnauer, M. C. & Beatty, J. T. The purple phototrophic bacteria. vol.
28 (Springer Science & Business Media, 2008).
39. Pfennig, N. & Trüper, H. G. Taxonomy of phototrophic green and purple bacteria: A review. Ann.
l’Institut Pasteur / Microbiol. 134, 9–20 (1983).
40. Madigan, M. T. & Jung, D. O. An Overview of Purple Bacteria: Systematics, Physiology, and
Habitats. in 1–15 (Springer, Dordrecht, 2009). doi:10.1007/978-1-4020-8815-5_1.
41. Ehrenreich, A. & Widdel, F. Anaerobic oxidation of ferrous iron by purple bacteria, a new type of
phototrophic metabolism. Appl. Environ. Microbiol. 60, 4517–26 (1994).
42. Griffin, B. M., Schott, J. & Schink, B. Nitrite, an Electron Donor for Anoxygenic Photosynthesis.
Science (80-. ). 316, 1870–1870 (2007).
43. HARASHIMA, K., SHIBA, T., TOTSUKA, T., SIMIDU, U. & TAGA, N. Occurrence of
bacteriochlorophyll a in a strain of an aerobic heterotrophic bacterium. Agric. Biol. Chem. 42,
1627–1628 (1978).
44. Shiba, T., Simidu, U. & Taga, N. Distribution of aerobic bacteria which contain
bacteriochlorophyll a. Appl. Environ. Microbiol. 38, 43–5 (1979).
45. Beatty, J. T. On the natural selection and evolution of the aerobic phototrophic bacteria.
Photosynth. Res. 73, 109–114 (2002).
46. NISHIMURA, Y. et al. DNA relatedness and chemotaxonomic feature of aerobic
bacteriochlorophyll-containing bacteria isolated from coasts of Australia. J. Gen. Appl. Microbiol.
40, 287–296 (1994).
47. Arai, H., Roh, J. H. & Kaplan, S. Transcriptome dynamics during the transition from anaerobic
photosynthesis to aerobic respiration in Rhodobacter sphaeroides 2.4.1. J. Bacteriol. 190, 286–99
(2008).
48. Swingley, W. D. et al. The complete genome sequence of Roseobacter denitrificans reveals a
mixotrophic rather than photosynthetic metabolism. J. Bacteriol. 189, 683–90 (2007).
49. Yurkov, V. V & Beatty, J. T. Aerobic anoxygenic phototrophic bacteria. Microbiol. Mol. Biol.
Rev. 62, 695–724 (1998).
50. Tang, K.-H., Feng, X., Tang, Y. J. & Blankenship, R. E. Carbohydrate Metabolism and Carbon
Fixation in Roseobacter denitrificans OCh114. PLoS One 4, e7233 (2009).
51. Wood, H. ., Werkman, C. ., Hemingway, A. & Nier, A. . The position of carbon dioxide carbon in
succinic acid synthesized by heterotrophic bacteria. J. Biol. Chem. 139, 377–381 (1941).
52. Cohen-Bazire, G., Sistrom, W. R. & Stanier, R. Y. Kinetic studies of pigment synthesis by non-
sulfur purple bacteria. J. Cell. Comp. Physiol. 49, 25–68 (1957).
53. Kolber, Z. S. et al. Contribution of aerobic photoheterotrophic bacteria to the carbon cycle in the
ocean. Science (80-. ). (2001) doi:10.1126/science.1059707.
54. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled
genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
55. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification.
45
BMC Bioinformatics 11, 1–11 (2010).
56. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for
Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 428, 726–731
(2016).
57. Graham, E. D., Heidelberg, J. F. & Tully, B. J. Binsanity: Unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 2017, e3035
(2017).
58. Eren, A. M. et al. Anvi’o: An advanced analysis and visualization platformfor ’omics data. PeerJ
2015, (2015).
59. Santos, S. R. & Ochman, H. Identification and phylogenetic sorting of bacterial lineages with
universally conserved genes and proteins. Environ. Microbiol. 6, 754–759 (2004).
60. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic
Acids Res. 33, D34 (2005).
61. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: Interactive sequence similarity
searching. Nucleic Acids Res. 39, W29–W37 (2011).
62. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222-30 (2014).
63. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic
Acids Res. 31, 371–3 (2003).
64. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space
complexity. BMC Bioinformatics 5, 1–19 (2004).
65. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
66. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 - Approximately maximum-likelihood trees
for large alignments. PLoS One 5, (2010).
67. Yutin, N., Suzuki, M. T. & Béjà, O. Novel primers reveal wider diversity among marine aerobic
anoxygenic phototrophs. Appl. Environ. Microbiol. 71, 8958–62 (2005).
68. Yutin, N. et al. Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria
in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition
metagenomes. Environ. Microbiol. 9, 1464–1475 (2007).
69. Béjà, O. et al. Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature 415,
630–633 (2002).
70. Markowitz, V. M. et al. IMG: The integrated microbial genomes database and comparative
analysis system. Nucleic Acids Res. 40, D115 (2012).
71. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND.
Nature Methods vol. 12 59–60 (2014).
72. Tabita, F. R. et al. Function, Structure, and Evolution of the RubisCO-Like Proteins and Their
RubisCO Homologs. Microbiol. Mol. Biol. Rev. 71, 576–599 (2007).
73. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential
coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
46
74. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database
search programs. Nucleic Acids Research vol. 25 3389–3402 (1997).
75. Schwalbach, M. S. & Fuhrman, J. A. Wide-ranging abundances of aerobic anoxygenic
phototrophic bacteria in the world ocean revealed by epifluorescence microscopy and quantitative
PCR. Limnol. Oceanogr. 50, 620–628 (2005).
76. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled
genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
77. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing
the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome
Res. 25, 1043–55 (2015).
78. Moran, M. A. The global ocean microbiome. Science vol. 350 (2015).
79. Hamada, M., Toyofuku, M., Miyano, T. & Nomura, N. cbb3-type cytochrome c oxidases, aerobic
respiratory enzymes, impact the anaerobic life of Pseudomonas aeruginosa PAO1. J. Bacteriol.
196, 3881–3889 (2014).
80. Giuffrè, A., Borisov, V. B., Arese, M., Sarti, P. & Forte, E. Cytochrome bd oxidase and bacterial
tolerance to oxidative and nitrosative stress. Biochimica et Biophysica Acta - Bioenergetics vol.
1837 1178–1187 (2014).
81. Boldareva-Nuianzina, E. N., Bláhová, Z., Sobotka, R. & Koblížek, M. Distribution and origin of
oxygen-dependent and oxygen-independent forms of mg-protoporphyrin monomethylester cyclase
among phototrophic proteobacteria. Appl. Environ. Microbiol. 79, 2596–2604 (2013).
82. Gough, S. P., Petersen, B. O. & Duus, J. Anaerobic chlorophyll isocyclic ring formation in
Rhodobacter capsulatus requires a cobalamin cofactor. Proc. Natl. Acad. Sci. U. S. A. 97, 6908–
6913 (2000).
83. Yurkov, V., Gad’on, N. & Drews, G. The major part of polar carotenoids of the aerobic bacteria
Roseococcus thiosulfatophilus RB3 and Erythromicrobium ramosum E5 is not bound to the
bacteriochlorophyll a-complexes of the photosynthetic apparatus. Arch. Microbiol. 160, 372–376
(1993).
84. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
85. Gómez-Consarnau, L. et al. Light stimulates growth of proteorhodopsin-containing marine
Flavobacteria. Nature 445, 210–213 (2007).
86. Johnston, W. et al. Carbon mass balance methodology to characterize the growth of pigmented
marine bacteria under conditions of light cycling. Bioprocess Biosyst. Eng. 27, 163–174 (2005).
87. Tomasch, J., Gohl, R., Bunk, B., Diez, M. S. & Wagner-Döbler, I. Transcriptional response of the
photoheterotrophic marine bacterium Dinoroseobacter shibae to changing light regimes. ISME J.
5, 1957–68 (2011).
88. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
89. Tang, K. et al. An Aerobic Anoxygenic Phototrophic Bacterium Fixes CO2 via the Calvin-
Benson-Bassham Cycle 2 3 Running title: An AAnPB implements the CBB cycle. bioRxiv
2021.04.29.441244 (2021) doi:10.1101/2021.04.29.441244.
47
Main Figures & Tables
Figure 1. (A) Map depicting approximate locations of the Tara Oceans sampling stations from which metagenomics data
was collected. Stations are grouped based on Longhurst provinces and site proximity. Province abbreviations are used for
MAG IDs. (B) Violin plots illustrating the fraction of the estimated total bacterial and archaeal community represented
by the draft genomes for samples from the different size fractions
.
Table 1. Primers to be used in screening
Sequence Target Primers Original Paper
pufM pufM_uniF
(GGNAAYYTNTWYTAYAAYCCNTTYCA)
pufM_uniR* (5’-YCCATNGTCCANCKCCARAA-3’)
pufM_WAW (5’-AYNGCRAACCACCANGCCCA-3’)
Yutin et al. (2005)
67
pufM pufM Forward 5′-TACGGSAACCTGTWCTAC-3′
pufM Reverse 5′-CCATSGTCCAGCGCCAGAA-3′
Béjà et al. 2002
69
48
Figure 2. A maximum likelihood tree of the MAGs produced using 16 concatenated single copy markers
10
. Bootstrap
values >0.75 are shown. Circle size representing the bootstrap value is scaled from 0.75-1.0. Nodes where the average
branch length distance is <0.5 were collapsed and the number of draft genomes in each node are provided. The image was
generated using the Interactive Tree of Life (ITOL) web tool
37
.
49
Figure 3. Phylogenomic tree of 31 concatenated marker genes for the Alphaproteobacteria
38
. Numbers in parentheses
represent the number of genomes collapsed within a branch. Black stars denote genomes within the ‘Ca.
Luxescamonaceae’ that possess PufM and RbsL. MAGs highlighted in red were produced from the Tara Oceans dataset
by our lab and MAGs highlighted in yellow were produced by Delmont et al. (2019). Bootstrap values >0.75 are shown.
Circle size representing the bootstrap value is scaled from 0.75-1.0
50
Figure 4. heat map showing completion of various key pathways from KEGG generated using KEGGDecoder
(https://github.com/bjtully/BioData/tree/master/KEGGDecoder ). MAGs within ‘Ca.Luxescamonaceae’ are ordered
based on phylogeny using 31 concatenated marker genes. Common AAPs and AnAPs are included for comparison
51
Figure 5. Cellular schematic comparing the six AAnP genomes.
52
Figure 6. Phylogenetic tree of the ribulose-1,5-bisphosphate carboxylase large subunit (RuBisCO, RbcL) with the major
forms denoted. Inset – A zoomed in view of the Form IC/D RuBisCO. Purple sequence names denote RbcL proteins from
the Global Ocean Survey. Orange sequence names denote RbcL proteins detected in genomes from this study.
53
Figure 7. Phylogenetic tree of the M subunit of type-II photochemical reaction center (PufM). Purple sequence names denote
PufM proteins from the Global Ocean Survey. Orange sequence names denote PufM proteins detected in genomes from this
study. Green sequence names denoted PufM from Béjá et al. (2002). Bootstrap values >0.75 are shown. Circle size
representing the bootstrap value is scaled from 0.75-1.0
54
Figure 8. Global plot indicating relative fraction of the community attributed to ‘Ca.Luxescamonaceae’ calculated as the
number of reads mapped to MAGs divided by the total number of reads in the sample. All stations where the relative
fraction is greater than 0.005 are shown. Figure made using python plotly
55
Figure 9. Global map illustrating the Tara Oceans sampling sites. Sites at which the AAnP members of the ‘Ca.
Luxescamonaceae’ were detected at >0.01% relative abundance are depicted. For each site, filter size fractions that were
not collected are represented by an ‘X’ and each column represents one of the three Tara Oceans filter size fractions. Red
asterisks denote filter fractions in which the relative abundance of genomes from Delmont et al.
25
contributed at least
0.01% (max. 0.04%) of the total relative abundance (this study). Squares highlighted in red denote filter samples in which
‘Ca. Luxescamonaceae’ genomes from Delmont et al.
25
had a reported relative fraction of the metagenome of 0.01–0.03%
56
Figure 10. Line graphs showing the relative fraction of the six most complete ‘Ca. Luxescamonaceae’ at SPOT across four
years by month.
57
Figure 11. Growth curves for enrichments 1-42 over the course of 32 hours with each time point corresponding to ~8
hours. These results indicated overall that most of the samples had higher growth rates under light conditions than under
dark conditions.
Figure 12. Gel electrophoresis of pufM PCR results. In order from left to right the image shows a DNA ladder (1kb),
sample blank (control), enrichment 7, enrichment 13, enrichment 19, enrichment 29, enrichment 30, enrichment 42, and
enrichment 7.
58
Figure 13. H
14
CO3 bulk incorporation for enrichments 7, 13, 19, 29, 39, and 42.
These results indicate a possible potential for anoxygenic photolithoautotrophs at the San Pedro
Ocean Time Series and while a marine representative remains elusive the first cultured isolate of an AAP
exhibiting this metabolism was recently identified from moss-dominated soil crusts in China
89
indicating
this metabolism may be more widespread across different biomes than previously thought.
59
Chapter Three: Marine Dadabacteria exhibit genome streamlining and
phototrophy-driven niche partitioning
Chapter three was previously published in the ISME Journal 23
rd
November 2020
(https://doi.org/10.1038/s41396-020-00834-5). Included below is the main text as it was published.
Supplemental information can be found in the appendix to this dissertation.
60
61
62
63
64
65
66
67
68
69
Chapter Four: Phylogenetic and functional diversity within `Candidatus
Nitrosopelagicus brevis`
Elaina D. Graham
1
, John F. Heidelberg
1
, and Benjamin J. Tully
1,2
1
Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
2
Center for Dark Biosphere Investigations, University of Southern California, Los Angeles, CA, USA
70
Abstract
The Nitrososphaerales are highly abundant archaea playing key roles in global biogeochemical
cycling in the ocean. Most Nitrososphaerales are considered chemolithoautotrophic ammonia oxidizers.
Here we describe a collection of 1,384 Nitrososphaerales draft genomes >50% complete of which 663
belonged to Nitrosopealgicus which we identified as the most abundant Nitrososphaerales group within
the global oceans as shown throughout the GEOTRACES metagenomes. A subset of the 663
Nitrosopealgicus genomes (n=217) were used to identify 12 clades of Nitrosopealgicus with
Nitrosopealgicus BG4 (e.g `Ca. Nitrosopealgicus brevis`) and Ca. Nitrosopealgicus BG6 being the most
abundant clades. BG4 was defined by the presence of the genes for nitrite reduction and in turn was most
highly correlated with nitrite concentrations across GEOTRACES cruise track GA02. B6 was defined by
the presence of the ferrous iron transporter FeOB and was most correlated with temperature. A
particularly intriguing observation was that 5 Nitrosopealgicus groups (BG10, BG11, BG12, BG9, BG1)
all appear to lack the ammonia oxidation genes amoABC while simultaneously still containing the genes
for the 4-Hydroxybutyrate/3-hydroxypropionate carbon fixation pathway. My discoveries here highlight
previously uncharacterized diversity within the Nitrosopealgicus specifically identifying groups that are
likely not oxidizing ammonia.
Introduction
Ammonia-oxidizing archaea (AOA) were first identified in the 1990s, before their capacity for
ammonia oxidation was known, based on 16S rRNA sequences
1,2
. They were originally designated as the
Marine Group I (MGI) archaea and then assigned to the Crenarchaeota before being reclassified as the
Thaumarchaeota following the cultivation of the first representative Nitrosopumilus maritimus
3
as well as
multiple genomic studies
4–6
. The Thaumarchaetoa were recently reclassified by GTDB
7
into the phylum
Thermoproteota, class Nitrososphaeria, and order Nitrososphaerales. The order Nitrososphaerales now
encompasses what was once the Thaumarchaeaota. The Nitrososphaerales are considered amongst the
most abundant microorganisms representing up to 20% of all microbial life in the ocean and comprising
71
up to 39% of the total microbial community below the photic zone
8
. There are nine named groups of
AOAs within the Nitrososphaerales order including the Nitrososphaera, Nitrosocosmiscus, Nitrosocaldus,
Nitrosotalea, Nitrosopumilus, Nitrosopealgicus, Nitrosoarchaeum, Nitrosotenuis, and Nitrososphaera.
The marine Thaumarchaeota are thought to be ammonia-oxidizing archaea with the exception of some
basal clades
9
such as the pSL12-like group identified in the North Pacific Subtropical Gyre at station
ALOHA
10
and the heterotrophic marine Thaumarchaeota (HMT) identified in hadopalegic (750-5000 m)
metagenomes from the Pacific and Atlantic
11
. Interestingly the HMTs appear to have undergone genome
streamlining having genomes ~1Mbp
11
. The Thaumarchaeota play an important role in the nitrogen cycle
comprising all the known AOA
12
. Compared to the ammonia oxidizing bacteria (AOB), the AOA are
several magnitudes more abundant, particularly in oligotrophic marine systems and ammonium depleted
soils
13–19
. Ammonia oxidation is one of the rate-limiting steps in nitrification thus Thaumarchaeota play
major roles facilitating nitrification in low ammonia environments.
Ammonia oxidation is performed by only three groups of organisms, the ammonia oxidizing
bacteria (AOB)
20
, ammonia oxidizing archaea (AOA) and complete ammonia oxidizers (comammox)
21
.
Similar to the ammonia oxidizing bacteria (AOB), Thaumarchaeota incorporate one oxygen from water
into nitrite
22
. In the AOB a two-step oxidation pathway whereby ammonia is oxidized to hydroxylamine
(NH 2OH) by ammonia monooxygenase (Amo) and further to nitrite (NO 2-) by hydroxylamine
oxidoreductase is proposed. No homolog of hydroxylamine oxidoreductase has been identified in a
Thaumarchaeota despite marine archaea being found to produce hydroxylamine
23
. One possible pathway
to oxidize hydroxylamine could be via the nitroxyl pathway
24
. The archaea have a proposed three step
process as they produce nitric oxide (NO) as an obligate intermediate via nitrite reductase (NirK)
25,26
which could potentially react with hydroxylamine to form nitrite. Both the AOA and AOB contain
homologs of the Amo subunits A, B, and C
27
. In addition it has been shown that other potential
hypothetical genes consistently are linked to amoA
27,28
. Carini et al. (2018)
28
found 15 proteins that
consistently were co-expressed in ‘Candidatus Nitrosopelagicus brevis’ and other work has supported the
72
potential role for multiple conserved multicopper oxidases and F420-dependent enzymes in oxidation of
ammonia or hydroxylamine
29
. Some Thaumarchaeota have even been shown to generate reduced nitrogen
from urea
30–32
while other (non-marine) groups have been shown to use cyanate
33
. In terms of carbon
fixation most of the autotrophic AOAs have a version of the 3-hydroxypropionate/4-hydroxybutyrate
cycle
8
.
Based on ammonia monooxygenase subunit A (amoA) sequences the Thaumarchaeota fall into
three broad groups including the Nitrosopumilus maritimus group, the water column A (WCA) group and
the water column B (WCB) group
8
. Within these ‘Ca. Nitrosopelagicus brevis’ falls into the WCA group.
The WCA and Nitrosopumilus groups tend to represent the more shallow depths while the WCB group is
dominant below around 300 meters of depth
34–38
. These broad groups are further broken down into eight
clades with the NP-Epsilon group corresponding to the WCA
39
. It has been suggested that ammonium
concentration dictates the presence of the different groups in the water column
40,41
. Specifically, AOA
depth distributions in the ocean reflect a pattern of low surface abundance increasing through the
mesopelagic and subsequently decreasing at deeper depth
8
. In surface waters, AOA abundance is variable
across space and time as shown at the San Pedro Ocean Time Series (SPOT)
42
; potentially being
correlated with periods of deep water upwelling
14,43
.
The marine AOAs appear to be adapted to oligotrophic lifestyles with isolates shown to have
extremely high substrate affinity for NH 3/NH 4
+
uptake
44
and a small cell size with a high surface area to
volume ratio
3,22
. Despite the high substrate affinity they are still outcompeted by phytoplankton
45
. The
reliance on copper (Cu) dependent enzymes
46
rather than iron may also be a reason for the AOA success
in marine systems where iron is typically limiting
47
. Of the AOAs the Nitrosopelagicus genus is
widespread in marine environments, but there has been limited genomic data on the group beyond the two
cultured isolates
34
. This is likely due to the fact that this group is highly abundant and tends to have a
heterogeneous population that makes assembly from short read data difficult
48
. Here we implement a
subsampled assembly methodology using metagenomes from bioGEOTRACES, the Hawaiian Ocean
73
Time Series (HOT), and the Bermuda Ocean Time Series (BATS)
49
geared towards improving our
recovery of high abundant, high heterogeneity organisms such as Nitrosopelagicus. Additionally, using
metagenome assembled genomes (MAGs) from this work as well as MAGs from GTDB
50
(r95), Haroon
et al. (2016)
51
, Nayfach et al (2021)
52
, Pachiadaki et al. (2019)
53
, and Paoli et al. (2021)
54
we collected the
largest analysis of the Thaumarchaeota to date, collecting 1,384 draft genomes >50% complete and <10%
contaminated. Of these 663 belonged to the Nitrosopelagicus with 218 >80% complete. This large
number of genomes allowed us to conduct the first population level analysis of the Nitrosopelagicus.
Methods
Subsampled Assembly Approach
Metagenomes (~5 Tbp) produced by Biller et al (2018)
49
encompassing four GEOTRACES
cruises (GA10, GP13, GA02, GA03) and two years of time series data from Station ALOHA and BATS
were accessed from NCBI. Reads were quality filtered using fastp (v0.18.0; flags=-3, -t 3)
55
. These reads
were subsampled at the following percentages for each individual metagenome: 1, 3, 5, 10, 25, 30, and
50%. Subsampled reads were assembled using megahit (v1.2.8; --presets=meta-sensitive)
56
. In addition,
the 100% read datasets were also assembled using the same parameters. As performed in Tully et al.
2018
57
all of the primary contigs from the MegaHit runs belonging to the same sample were passed
together through CD-HIT-EST (v 4.8.1; -M 500000 -c 0.99 -n 10)
58
followed by a secondary assembly in
Minimus2 (AMOS v3.1.0; -D OVERLAP=1000 MINID=98)
59
. The Minimus2 generated contigs and the
primary contigs that did not assemble with Minimus2 were combined to create the final contigs for each
sample. The reads were mapped back to the assemblies using Bowtie2 (v2.3.5)
60
and filtered using
coverM (v0.4.0; --min-read-percent-identity 0.95 --min-read-aligned-percent 0.75) to calculate
coverage
61
. Coverage was calculated for each assembly using featureCounts
62
implemented through
BinSanity-profile (v0.5.2)
63
. Each assembly was binned using a custom BinSanity
63
pipeline
implementing the beta Binsanity2 script (https://github.com/edgraham/BinSanity/tree/master/Binsanity2-
Beta) that iterated sequentially through the following preference pairs: -p -25 --refine-preference -50, -p -
74
15 --refine-preference -35, -p -10 --refine-preference -25, -p -5 --refine-preference -25, -p -3 --refine-
preference -25. In addition each metagenome was binned using MetaBat2
64
generating coverage values
using the jgi_summarize_bam_contig_depths script provided with MetaBat2
64
(v2.215).
Co-assembly Approaches
Following the protocol from Tully et al. (2018)
57
only the full unsubsumapled (or 100% read
assemblies) assemblies generated in MegaHit were grouped by cruise track or time series (e.g. HOT,
BATS, GA03, GP13, GA10, GA02) and piped as primary contigs into cd-hit-est (v 4.8.1; -M 500000 -c
0.99 -n 10)
58
followed by a secondary assembly in Minimus2 (AMOS v3.1.0; -D OVERLAP=1000
MINID=98)
59
. The Minimus2 generated contigs and the primary contigs that did not assemble with
Minimus2 were combined to create the final contigs for each cruise track. The reads were mapped back to
the assemblies using Bowtie2 (v2.3.5)
60
, and filtered using coverM
61
(v0.4.0; --min-read-percent-identity
0.95 --min-read-aligned-percent 0.75). Coverage was calculating using the metabat2 provided script
jgi_summarize_bam_contig_depths and binned using metabat2
64
(v2.215).
An additional approach was done on HOT and BATS metagenomes using a co-assembly
approach whereby all reads were pooled and assembled in MegaHit (v1.2.8)
56
. For the GEOTRACES,
cruise track reads were pooled from across all cruises based on the following depth ranges in meters: 1-
20, 20-50, 50-100, 100-175, and 175+ m. Each set of pooled reads was assembled in MegaHit (v1.2.8)
56
.
The resultant assemblies for these two approaches had reads mapped back to assemblies using kart
(version)
65
, SAM files converted to BAM via samtools
66
, then coverage was calculating using the
MetaBat2 provided script jgi_summarize_bam_contig_depths and binned using MetaBat2
64
(v2.215).
Genome assessment and cleaning
All genomes produced via the above methodology were pooled and assessed through MetaSanity
PhyloSanity
67
(v1.2.0) which implements GTDB-Tk
68
(R05-RS95), FastANI
69
(v1.32), and CheckM
70
(v1.1.3). All draft MAGs were then processed with the program Charcoal (https://github.com/dib-
75
lab/charcoal; db: gtdb-r95.nucleotide-k31-scaled1000.sbt.zip, lineages: gtdb-r95-reps.lineages.csv,
match_rank: order) to remove contamination. All genomes assigned to the Nitrososphaerales by GTDB-
Tk assignment were re-run through CheckM lineage_wf
70
. Nitrososphaerales MAGs >50% complete and
<10% contamination were run through gtdbtk identify and gtdbtk align to generate an alignment of the
arc122 markers from GTDB-Tkthen trimmed with Trimal
71
(v1.4;-automated1). This alignment was run
through FastTree (v 2.1.10; -lg,-gamma) to generate an initial tree and this tree was used as an input for
IQTree (v 2.1.4-beta; --alrt 1000 -B 1000) to generate a final tree.
Functional Analysis
All MAGs >50% complete and <10% contaminated were run through anvi’o
72
to generate a
contigs database with gene calls generated within anvi’o via Prodigal
73
. Genes were annotated using
Prokka
74
(v1.14.6; --addgenes --addmrna --kingdom Archaea –metagenome), KofamScan
75
(v 1.3.0),
dbCAN2
76
(standalone: run_dbcan.py; --tools {hmmer,diamond,hotpep,all} --cgc_sig_genes all --
use_signalP True --gram all), MEROPS
77
(release 12.3) via DIAMOND
78
db (v2.0.9; --ultra-sensitive)
sorted for hits with a bitscore >150, eggnog (v2.1.2, --itype proteins -m diamond --sensmode ultra-
sensitive –report_orthologs), antismash
79
(v6.0.0; --cb-general --cb-knownclusters --cb-subclusters --rre --
cc-mibig --fullhmmer --asf --pfam2go --smcog-trees --tigrfam), RGI
80
(RGI version: 5.2.0; db version
v3.1.1;--clean -a BLAST -d wgs --include_loose --exclude_nudge), psortb
81
(v3.0), virsorter2
82
(2.2.1; --
min-length 1500), and signalP
83
(v5; -format long -org arch).
In addition to the standard annotation pipelines above 11 genes identified by Carini et al (2018)
28
as the ammonia oxidation module in Nitrosopelagicus brevis CN25 via network analysis were blasted
against all our MAGs using DIAMOND
78
(v2.0.9; --ultra-sensitive).
Read Mapping
MAGs that were deemed high quality (>80% complete and <10% contamination) were run
through dRep
84
(v3.2.0;-comp 0 -con 1000 --S_algorithm fastANI -sa 0.95 -nc 0.2) to generate a non-
76
redundant set. Metagenomes from Biller et al. (2018)
49
were mapped back to the non-redundant genome
set using Bowtie2 (v2.3.5)
60
, filtered using coverM
61
(v0.4.0; --min-read-percent-identity 0.95 --min-read-
aligned-percent 0.75), and converted from SAM to BAM using samtools
66
(v.1.9; view; sort).
featureCounts
62
implemented through BinSanity-profile (v0.5.2)
63
was used to generate read counts for
each contig from the filtered BAM files. Read counts were used to calculate relative fraction (Eq 1) of
each MAG in the sample and the reads per kbp of each MAG per Mbp of metagenome (RPKM) (Eq 2).
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 =
# 𝑟𝑒𝑎𝑑 𝑠 𝑟𝑒𝑐𝑟𝑢𝑖𝑡𝑒𝑑 𝑡𝑜 𝑀𝐴𝐺 𝑡𝑜𝑡𝑎𝑙 𝑟𝑒𝑎𝑑𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 (1)
𝑅𝑃𝐾𝑀 =
#𝑟𝑒𝑎𝑑𝑠 𝑟𝑒𝑐𝑟𝑢𝑖𝑡𝑒𝑑 𝑡𝑜 𝑎 𝑀𝐴𝐺 ÷(𝑀𝐴𝐺 𝑙𝑒𝑛𝑔𝑡 ℎ 𝑖𝑛 𝑏𝑝 ÷1000)
𝑡𝑜𝑡𝑎𝑙 𝑏𝑝 𝑖𝑛 𝑚𝑒𝑡𝑎𝑔𝑒𝑛𝑜𝑚𝑒 ÷1,000,000
(2)
Pangenome
All high-quality Nitrososphaerales MAGs (e.g. >50 complete and <10% contaminated) were
further filtered for quality prior to species-level pangenome generation keeping only HQ MAGs that were
greater than 80% complete. Using the dRep generated non-redundant clusters at a 95% average nucleotide
(ANI) identity cutoff value (n = 62 non redundant clusters), pangenomes were generated using Roary
85
(v3.13.0; -e --mafft -cd 99 -qc ) and GFF files generated in Prokka
74
. Using the same protocols above a
pangenomes was also built for all MAGs being assigned to the genus level distinction Nitrosopelagicus
by GTDB. In total there were 12 Nitrosospelagicus clusters defined by a 95% ANI cut off with
membership ranging from 1 MAG to up to 161 MAGs. In total there were 217 draft genomes and MAGs
assigned to the genus Nitrosospealgicus by GTDB
For the cluster belonging to the Nitrosopelagicus brevis species (cluster_77_2), pangenome core
genes were extracted using the roary `query_pan_genome` function (-c 90), aligned using MUSCLE
86
(v3.8.1551; -maxiters 32) and trimmed using Trimal (v1.4; -automated 1). Trimmed alignments were then
tested for evidence of recombination using PhiPack
87
. Recombinant bases were removed and remaining
77
alignments were concatenated and a phylogenetic tree was built initially in FastTreeDbl
88
(v 2.1.10; -nc -
gtr -gamma) to generate an initial tree and this tree was used as an input for IQ-TREE (v 2.1.4-beta; --alrt
1000 -B 1000) to generate a final tree. dRep representative genomes identified as Nitrosopelagicus were
used to determine approximate species-level clades with support from the IQ-TREE phylogenomic tree.
Clades within the N. brevis species cluster (cluster_77_2) were used to assign genes in pangenome with
names that reflect the occurrence across members of the clades, as done in Horesh et al. (2021)
89
using the
script classify_genes.R (https://github.com/ghoresh11/twilight ; -c 0.90 -s 5) which defines gene groups
identified by roary into categories of core (present in ≥95% of genomes), multi-lineage (present in
multiple [but not all] lineages), lineage specific (present in a single lineage), intermediate (present in
≥15%, but <95% of genomes) and rare (present in <15% of genomes)
Ecological distribution and environmental correlations
Using the nonredundant set of MAGs generated via dRep
84
(v3.2.0;-comp 0 -con 1000 --
S_algorithm fastANI -sa 0.95 -nc 0.2) and the RPKM calculated as above transect plots were made using
Ocean Data view (v5.2.1; DIVA Gridding; Schlitzer, Reiner, Ocean Data View, https://odv.awi.de, 2020).
Bathymetry was pulled from General Bathymetric Chart of the Oceans (GEBCO 2014;
https://doi.org/10.1564/PANGAEA.708081). Environmental data accessed from GEOTRACES
intermediate Data Product 2017 (Version 2)
90
prepped as indicated in Graham et al. (2021)
91
. RPKM
values from the 12 assigned Nitrosospelagicus brevis clades in this study and available environmental
data that was present in at least 80% of samples were used in a canonical correspondence analysis (CCA)
in Past4(V4.01)
92
. RPKM was normalized (log (n + 1)) prior to CCA.
Results
As a globally abundant archaea with important roles in nitrification, it is important to understand
the ecological distribution and functional potential within the Nitrososphaerales, especially considering
the group can comprise up to 30% of the total prokaryotic community
93
. Here we describe a dataset of
78
1,384 Nitrososphaerales draft genomes >50% complete, (determined via CheckM
94
with an mean
completion 75.09%, mean contamination 1.15%). Of these draft genomes, 12 were from Tully et. al.
(2018)
57
, 616 from Paoli et. al. (2021)
54
, 39 from Pachiadaki et. al .(2019)
53
, 359 from Nayfach et. al,
(2021)
52
, 8 from Haroon et. al. (2016)
51
, and 165 from this study, with the remaining genomes pulled from
the representative set of GTDB
7
(r95) (Figure 1). From this set of genomes and MAGs, 85 genomes were
determined to be taxonomically novel via relative evolutionary distance (RED) as determined by GTDB.
Of those 85, most the taxonomically novel genomes were assigned to the GTDB defined clades UBA141,
UBA1045, TA-21, and Nitrosocaldus. Most of these 85 genomes were derived from Nayfach et al
(2021)
52
(n=64). The remaining were derived from Paoli et al. (2021)
54
(n=15) and this study (n=6).
Global marine abundance maps of RPKM (Figure 6) for the major Nitrososphaerales groups indicate that
the Nitrosopelagicus genus is the most abundant across large swaths of the ocean and throughout the
water column. The second most abundant group was the genus Nitrosopumilus, which dominated the
community in shallow samples above 150 meters in major coastal systems, off the coasts of Argentina
and South Africa. These regions both correspond to areas of upwelling which may contribute to the
increased abundance of Nitrosopumilus over Nitrosopelagicus. Nitrosopumilus has previously been found
to be dominant in intertidal environments
95
with various representative strains isolated from coastal
regions
96,97
. The only other group with a notable contribution to community composition is the
Nitrosocosmicus which appears to be primarily abundant at the Bermuda Ocean Time Series (BATS).
The largest group of draft genomes from the Nitrososphaerales was assigned to the
Nitrosopelagicus genus (n=663) with 217 high-quality genomes > 80% complete. Using a phylogenomic
tree generated with 122 archaeal markers from GTDB
7
and supported by the dRep
84
generated 95% ANI
clusters, the Nitrosopelagicus were could be grouped into 12 different approximately species-level clades
and designated BG1-BG12 (Figure 3). Using these initial 12 groups as a guide 14 additional species-level
clades were assigned using the same criteria as above for all Nitrosopelagicus genomes >50% complete
(n=663) (Figure 4). The largest group was assigned to Nitrosopelagicus clade BG4 which contains to the
79
‘Ca. Nitrosopelagicus brevis CN25’ isolate (n=162). Clades BG4, BG8, BG7, and BG6 largely
correspond to samples collected at shallower depths (<300 m) and the BG5, BG3, BG2, BG1, and BG9
clusters correspond to samples collected at depths >300 m, with clades, BG2, BG3, and BG5
corresponding to depths >500 m. The species clade that contains ‘Ca. Nitrosopelagicus brevis CN25’
(BG4) was filtered for completion (>80%) and further broken down into subclades based on a
phylogenetic tree of 413 core genes defined from roary
85
(Figure 5). The 162 genomes >80% complete
within BG4 were assigned to eight subclades, with the ‘Ca. N. brevis CN25’
34
in subgroup BG4_B with
24 other genomes. The remaining groups primarily represent MAGs from the GEOTRACES cruise tracks
as produced by this study and Paoli et al. (2021)
54
.
A phylogenetic tree built using the amoA sequences from every genome in this collection (as
identified via KEGG
75
) expanded on and refined classifications made by Alves et al. (2018)
39
. Genomes
from GTDB with representatives already included in the Alves et al (2018) phylogenetic tree were
excluded to prevent duplication (Figure 2). The majority of novel amoA sequences from this study were
identified in genomes belonging to the Nitrosopelagicus within amoA clade NP-Epsilon. A large number
of genomes representing the amoA clades NP-Alpha and NS-Delta were also detected. These groups have
no cultured lineages within them. The NS-Delta clade largely corresponds to the GTDB genus TA-21.
While the NP-Alpha group has no cultured lineages, it has previously been associated with the Water
Column B (WCB) group of Nitrosopelagicus. Unfortunately, only NP-Alpha containing group BG1
contains MAGs with >80% completion. Nitrosopelagicus groups BG48, BG44, BG28, BG34, and BG13
also contain the NP-Alpha amoA but only contain MAGs with between 50-80% completion. Group BG4
which contains ‘Ca. Nitrosopelagicus brevis’ is assigned to NP-Epsilon along with groups BG7, BG8,
BG46, BG44, and BG6. The remaining Nitrosopelagicus clades predominantly lacked the amoA gene,
highlighted by BG10. In addition, this dataset produced a large number of new sequences for the NC-
Alpha which corresponded with the Nitrosocaldus and GTDB assigned UBA213 genus. These help to
80
expand our ability to link function and phylogeny by highlighting new MAGs assigned to the amoA NC-
Alpha clade.
Abundances of the 12 Nitrosopelagicus groups with MAGS >80% complete across the
GEOTRACES cruise track GA02 (which spans the Atlantic from the northeast coast of Newfoundland to
the southeast coast of Argentina) indicate unique depth distribution patterns (Figure 7). Most
interestingly is the separation between BG4 (‘Ca. Nitrosopelagicus brevis’) and BG6. This indicates that
while BG4 is the most abundant subclade across most of the cruise track >100 m, during a segment of the
cruise near the equatorial upwelling zone off the coast of Brazil, BG6 becomes highly abundant and BG4
drops to near zero detection at depths >50 m. Canonical correspondence analysis of BG1-12 indicates
correlations to both major nutrients and trace metals (Figure 7). BG4, for example, is correlated with
nitrite concentrations, while BG6 is correlated with temperature. BG1, BG2, BG3, and B5 are correlated
with nitrate and silicate concentrations. BG8 and B9 appear to be correlated with nickel and molybdenum.
Due to the sporadic nature of the publicly available environmental data from trace metal measurements
from GEOTRACES intermediate data products(2017)
90
the BG8 and B9 correlations to nickel and
molybdenum need to be further confirmed with a more robust and complete dataset as three fourths of the
samples had to be dropped due to missing measurements throughout the cruise track.
Functional analysis of the Nitrosopelagicus clades (Figure 9) indicate variable amino acid
production potential with none of the genomes possessing the pathways for tyrosine, aspartate, glutamate,
cysteine, glutamine, or histidine production. Glycine and alanine production is variable across the
subgroups with no detection in BG4, BG6, BG7, and BG8. The high affinity ammonia transporter
(PF00909) was sporadically detected in the BG10 and BG11 clades. The ferrous iron transporter (feoB;
K04759) was only present in two subclades, specifically BG6 and B11. No cytochromes were detected
which has previously been report for the Thaumarchaea
98,99
. The potential for nitrite reduction (nirK) was
detected largely within BG4. All the genomes contained the potential for the 4-hydroxybutyrate/3-
hydroxypropionate carbon fixation pathway, which is common in the Nitrososphaerales
11
. While these
81
features are noteworthy the most interesting feature is that subgroups BG10, BG11, BG12, BG9, and BG1
appear to largely lack the ammonia oxidation pathway including amoA (K10944), amoB (K10945), and
amoC (K10946).
In an effort to confirm the absence of the ammonia oxidation pathway in these subgroups
DIAMOND
78
Blast was used to search for the 15 genes identified by Carini et al (2018)
28
to be associated
with ammonia oxidation in ‘Ca. N. brevis CN25’ (Figure 10). This provides further evidence that
amoABC are largely absent in BG10, BG11, BG12, BG9. These subgroups also lacked two other genes
identified as a part of the ammonia oxidation module
28
including an iron-sulfur cluster assembly
accessory protein and a conserved hypothetical protein with unknown function. This may indicate that
similar to other Nitrososphaerales
11
that these subgroups lack the ability for ammonia oxidation.
To understand the variation at the population level within the eight BG4 (‘Ca. Nitrosopelagicus
brevis’) subclades, a pangenome was generated in roary
85
(Figure 13) and gene groups within the
pangenome categorized as core, intermediate, or rare (11 & 12). For this assessment only subclades with
>5 members were included. This analysis detected a large core genome (n = 189 gene clusters) shared
between the subgroups, as expected, and a large number of collection intermediate gene clusters (n = 550
gene clusters) that appear in some but not all lineages. Interestingly, most of the variation occurs in
lineage specific rare gene clusters (n = 1,599 gene clusters) which make up a small portion of the average
genome within the subclades (Figure 12). A deeper dive into these gene clusters indicate the majority to
be hypothetical proteins with no assigned function via KEGG
75
, MEROPS
77
, dbCAN
76
, Prokka
74
, or
eggNOG
100
. Of those that were annotated, they included peptidases assigned to MEROPS families (C26,
C44, M03, M10, M29, and M38) and carbohydrate-active enzyme families from dbCAN (Carbohydrate
Esterase Family 14, Glycosyl Transferase Family 2, Glycosyl Transferase Family 4, and Glycosyl
Transferase Family 66). Other clusters contained genes such as ABC-2 type transport system ATP-
binding protein, ammonium transporter (amt family), cold shock protein (cspA), and nitric oxide
reductase (norQ). As identified previously, two large genomic islands were identified in ‘Ca. N. brevis
82
CN25’
34
. These were previously shown (and further confirmed here) to contain genes responsible for cell
surface modifications via glycosylation
34
with the remaining genes lacking known functions. The cell
surface modification were predicted by Santoro et al. (2015) to help deter grazers
101
or potentially reduce
phage infection. Further analysis of genomic islands identified in the environmental MAGs is under way.
Discussion
Nitrification, the oxidation of ammonia to nitrate, is an essential process to the global nitrogen
cycle. Within the ocean the Nitrosopumilus and Nitrosopealgicus have been identified as abundant groups
of ammonia-oxidizing Archaea (AOA). Early classifications based on amoA sequences of the
Nitrososphaerales have indicated that there are three dominant clades associated with the AOA: the
shallow clade
96
, the WCA clade which contains ‘Ca. Nitrosopelagicus brevis’
34
, and the WCB clade
41
which represents a deep water clade. Recent efforts to link these initially described clades to the diversity
of observed amoA sequences has identified the NP-Epsilon as equivalent to the WCA clade, while the
NP-Alpha as the WCB clade
39
. Our data confirms that the ‘Ca. Nitrosopelagicus brevis’ clade (BG4)
indeed contains the NP-Epsilon form of amoA. Within our data, clade BG1 is the only group of
Nitrosopelagicus that both contains the NP-Alpha amoA and when only considering MAGs with >80%
completion. Relative abundance of BG1 (which is represented by MAG
bgeo_samn07136555_metag_ajngoidf in Figure7 & Supplementary Figures 1-3) does suggest that this
clade is most abundant at depths <300 meters. The initial analysis excluded genomes <80% complete, but
many of the genomes excluded in this manner formed distinct species-level clades and predominantly
possessed the NP-Alpha amoA. While some evidence supports a link between NP-Alpha and the WCB
group due to the presence of BG1-3 in deep samples >300 m, we also find that groups BG11 occurs in
deep samples > 300 meters (Supplemental Figures 1-9). BG11 interestingly was one of the groups that
appeared to lack amoA. This is the first reported genomic evidence for MAGs belonging to the NP-Alpha.
The former WCA clade encompasses Nitrosopelagicus clades BG4, BG6-8, BG38, BG46, BG44,
and BG19 based on presence of the NP-Epsilon amoA clade. Depth distributions of BG4 and BG6-8
83
(Figure 9 & Supplementals 1-9) indicate all of these are most abundant between 100-300 m. An
interesting distinction between these WCB and WCA containing lineages is that the WCB containing
groups (BG1-3) are correlated with other depth specific indicators, such as nitrate, phosphate, and silicate.
Unlike with the WCB-like lineages though the WCA lineages appear to correlate with a broader set of
environmental parameters with the two most abundant groups being most highly correlated with nitrite
and temperature for BG4 and BG6, respectively. The temperature component of BG6 is interesting with
the only cruise track having a significant contribution in abundance from BG6 being along cruise track
GA02, specifically off the coast of Brazil at the Equator. This may indicate that while BG4 is more
cosmopolitan, BG6 predominates in more tropical environments, specifically along the equatorial
upwelling zone.
Currently the distinction between the NP-Alpha and NP-Epsilon clades is a phylogenetic one
39
but it could be possible that the two variants correspond to variable kinetic efficiencies of amoA. The
AOA have long been identified as having higher substrate affinities for ammonia making them more
competitive in oligotrophic marine environments
3,102
than ammonia oxidizing bacteria which tended to
have higher half saturation constants. Recent data has challenged this notion though showing that multiple
AOA strains from soils and hot springs have much lower substrate affinities than expected when
compared to AOB
103,104
. Due to the limitation of only having a single isolate for the Nitrosopelagicus
(‘Ca. N. brevis CN25’), it is unclear whether the substrate affinity for NP-Epsilon is significantly
different from its counterpart NP-Alpha, but this could be an explanation for the partition within the
Nitrosopelagicus phylogenetic tree (Figure 4) which indicates that the more deeply branching lineages
contain the NP-Alpha whereas the more recently branching lineages contain the NP-Epsilon form.
Beyond this distinction between the NP-Alpha- and NP-Epsilon-containing Nitrosopelagicus
clades, we have identified multiple groups appearing to lack the potential for ammonia oxidation entirely.
The largest of which is the clade BG10 for which amoABC was mostly absent. Pairing 16S rRNA and
amoA is a common strategy for accessing AOA abundance and diversity in the environment
22,105–107
. Our
84
data indicates that it may not be uncommon for some clades of Nitrosopelagicus to lack ammonia
oxidation genes thus making estimates of diversity and abundance utilizing 16S rRNA and amoA
sequences alone insufficient without the Nitrosopelagicus assigned 16S rRNA being placed within the
context of genomic potential as understood from draft genomes.
Overall a broad analysis is provided here for all available Nitrososphaerales genomes indicating a
wide array of diversity, including taxonomically novel genomes within the UBA141, UBA1045, TA-21,
and the Nitrosocaldus. The genus Nitrosopelagicus was the most abundant group across the global ocean,
followed by the Nitrosopumilus. As shown previously, ‘Ca. Nitrosopelagicus brevis’ and most of the
other Nitrosopelagicus were found to be assigned to the amoA clade NP-Epsilon. A set of
Nitrosopelagicus clades belonged to the amoA clade NP-Alpha. Due to the large number of
Nitrosopelagicus genomes collected for this study, we were able to identify 26 Nitrosopelagicus
approximately species-level clades with 95% ANI cutoffs (with only 12 of those containing genomes
>80% complete), the largest of which was the clade BG4 which encompassed the ‘Ca. Nitrosopelagicus
brevis’ isolates. The data revealed a depth-based distribution with the clades BG4, BG8, BG7, and BG6
recovered in samples <300 m and clades BG5, BG3, BG2, BG1, and BG9 recovered from depths >300 m.
BG4 and BG6 were the most abundant Nitrosopelagicus clades with contrasting abundances across the
GEOTRACES GA02 cruise track. BG4 was correlated with nitrite, potentially due to the identified
potential for nitrite reduction. Most notably BG10, BG11, BG12, BG9, and BG1 all appear to lack the
potential for ammonia oxidation, missing genes for amoABC, as well as two genes previously shown to
be correlated with ammonia oxidation expression in ‘Ca. Nitrosopelagicus brevis CN25’.
BG4 encompassing the known Nitrosopelagicus isolates was further broken down into eight
clusters. A population level pangenome of the BG4 subclades indicated that most the variation occurred at
the lineage specific or individual genome level, with a large number of intermediate genes present in
some but not all genomes. With ‘Ca. N. brevis CN25’ (BG4_B) as the only cultured representative for
the group, we produced the largest-scale pangenomic analysis. Here, we were able to identify that most of
85
the variation between the BG4 subspecies were considered “lineage specific rare” based on classifcations
by Horesh et al(2021)
89
, meaning they were unique to individual genomes. The majority of differences
between the genomes were related to heterotrophic processes (e.g peptidases, carbohydrate activated
enzymes, and transporters), amino acid biosynthesis, and genes putatively assigned to exopolysaccharide
biosynthesis.
The largest genomic islands appeared in subgroup BG4_B and BG4_G. The genomic island in
BG4_B belonged to the cultured Ca. Nitrosopealgicus brevis CN25 and was previously identified by
Santoro et al (2015)
34
. A previous study has shown that many genomic islands encode extracellular
products such as lipopolysaccharides or exopolysaccharide biosynthesis clusters
108
. Lipopolysaccharides
are often a target for phage receptors
109
. The exopolysaccharide structures often involve modification of
sugars on extracellular structures or extracellular components of motility proteins (e.g. pili and
flagellum)
110,111
. Genomic islands in ‘Candidatus Pelagibacter ubique’ have also been shown to contain
transport and sensing genes
108
. Unfortunately a substantial portion of archaeal genes (up to 80%) encode
hypothetical proteins
112
making further interpretation difficult.
Conclusion
The Nitrososphaerales are a diverse group of globally abundant Archaea that play a key role in
the nitrogen cycle within the ocean. Here we have further clarified the previously existing distinction
between the WCA and WCB clades of the ammonia oxidizing archaea by assigning clade level
distinctions across the Nitrosopelagicus genus and showing that these clades support the assignment of
the NP-Alpha to the WCB clade and NP-Epsilon to the WCA clade. Previously the NP-Alpha lacked
genomic representatives and the NP-Epsilon had limited genomic representation beyond the cultured
isolate for `Ca. Nitrosopealgicus brevis`. The distinction partition between the NP-Epsilon- and NP-
Alpha-containing lineages of the Nitrosopelagicus may suggest that there is a difference related to
substrate affinity for ammonia, as these affinities have been shown to be much more variable across the
AOAs than previously thought. More research would need to be conducted to confirm this relationship.
86
Our data also begins to suggest that certain Nitrosopelagicus clades lack ammonia oxidation capabilities
which could require a reassessment of global AOA diversity and abundance that often relies of 16S rRNA
or amoA abundance ratios. Finally, further in-depth analysis needs to be conducted on the genomic
islands differentiating the BG4 subclades by further exhausting our annotation options for the largely
hypothetical gene content with these gene content variations.
87
References
1. Fuhrman, J. A., McCallum, K. & Davis, A. A. Novel major archaebacterial group from marine
plankton. Nature 356, 148–149 (1992).
2. DeLong, E. F. Archaea in coastal marine environments. Proc. Natl. Acad. Sci. U. S. A. 89, 5685–9
(1992).
3. Könneke, M. et al. Isolation of an autotrophic ammonia-oxidizing marine archaeon. Nature 437,
543–546 (2005).
4. Hallam, S. J. et al. Pathways of carbon assimilation and ammonia oxidation suggested by
environmental genomic analyses of marine Crenarchaeota. PLoS Biol. 4, 520–536 (2006).
5. Francis, C. A., Roberts, K. J., Beman, J. M., Santoro, A. E. & Oakley, B. B. Ubiquity and diversity
of ammonia-oxidizing archaea in water columns and sediments of the ocean. Proc. Natl. Acad. Sci.
U. S. A. 102, 14683–14688 (2005).
6. Wuchter, C. et al. Archaeal nitrification in the ocean. Proc. Natl. Acad. Sci. U. S. A. 103, 12317–
12322 (2006).
7. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially
revises the tree of life. Nat. Biotechnol. 36, 996 (2018).
8. Santoro, A. E., Richter, R. A. & Dupont, C. L. Planktonic marine archaea. Annual Review of
Marine Science vol. 11 131–158 (2019).
9. Oton, E. V., Quince, C., Nicol, G. W., Prosser, J. I. & Gubry-Rangin, C. Phylogenetic congruence
and ecological coherence in terrestrial Thaumarchaeota. ISME J. 10, 85–96 (2016).
10. Reji, L. & Francis, C. A. Metagenome-assembled genomes reveal unique metabolic adaptations of
a basal marine Thaumarchaeota lineage. ISME J. 14, 2105–2115 (2020).
11. Aylward, F. O. & Santoro, A. E. Heterotrophic Thaumarchaea with Small Genomes Are
Widespread in the Dark Ocean. mSystems 5, (2020).
12. Pester, M., Schleper, C. & Wagner, M. The Thaumarchaeota: An emerging view of their
phylogeny and ecophysiology. Curr. Opin. Microbiol. 14, 300–306 (2011).
13. Leininger, S. et al. Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature
442, 806–809 (2006).
14. Mincer, T. J. et al. Quantitative distribution of presumptive archaeal and bacterial nitrifiers in
Monterey Bay and the North Pacific Subtropical Gyre. Environ. Microbiol. 9, 1162–1175 (2007).
15. Nicol, G. W., Leininger, S., Schleper, C. & Prosser, J. I. The influence of soil pH on the diversity,
abundance and transcriptional activity of ammonia oxidizing archaea and bacteria. Environ.
Microbiol. 10, 2966–2978 (2008).
16. Santoro, A. E., Casciotti, K. L. & Francis, C. A. Activity, abundance and diversity of nitrifying
archaea and bacteria in the central California Current. Environ. Microbiol. 12, 1989–2006 (2010).
17. Verhamme, D. T., Prosser, J. I. & Nicol, G. W. Ammonia concentration determines differential
growth of ammonia-oxidising archaea and bacteria in soil microcosms. ISME J. 5, 1067–1071
(2011).
18. Stewart, F. J., Ulloa, O. & Delong, E. F. Microbial metatranscriptomics in a permanent marine
oxygen minimum zone. Environ. Microbiol. 14, 23–40 (2012).
88
19. Horak, R. E. A. et al. Ammonia oxidation kinetics and temperature sensitivity of a natural marine
community dominated by Archaea. ISME J. 7, 2023–2033 (2013).
20. Kowalchuk, G. A. & Stephen, J. R. Ammonia-Oxidizing Bacteria: A Model for Molecular
Microbial Ecology. http://dx.doi.org/10.1146/annurev.micro.55.1.485 55, 485–529 (2003).
21. Koch, H., Kessel, M. A. H. J. van & Lücker, S. Complete nitrification: insights into the
ecophysiology of comammox Nitrospira. Appl. Microbiol. Biotechnol. 2018 1031 103, 177–189
(2018).
22. Santoro, A. E. & Casciotti, K. L. Enrichment and characterization of ammonia-oxidizing archaea
from the open ocean: Phylogeny, physiology and stable isotope fractionation. ISME J. 5, 1796–
1808 (2011).
23. Vajrala, N. et al. Hydroxylamine as an intermediate in ammonia oxidation by globally abundant
marine archaea. Proc. Natl. Acad. Sci. U. S. A. 110, 1006–1011 (2013).
24. Walker, C. B. et al. Nitrosopumilus maritimus genome reveals unique mechanisms for nitrification
and autotrophy in globally distributed marine crenarchaea. Proc. Natl. Acad. Sci. U. S. A. 107,
8818–8823 (2010).
25. Kozlowski, J. A., Dimitri Kits, K. & Stein, L. Y. Comparison of nitrogen oxide metabolism among
diverse ammonia-oxidizing bacteria. Front. Microbiol. 7, 1090 (2016).
26. Martens-Habbena, W. et al. The production of nitric oxide by marine ammonia-oxidizing archaea
and inhibition of archaeal ammonia oxidation by a nitric oxide scavenger. Environ. Microbiol. 17,
2261–2274 (2015).
27. Kerou, M. et al. Genomes of Thaumarchaeota from deep sea sediments reveal specific adaptations
of three independently evolved lineages. ISME J. 1–17 (2021) doi:10.1038/s41396-021-00962-6.
28. Carini, P., Dupont, C. L. & Santoro, A. E. Patterns of thaumarchaeal gene expression in culture
and diverse marine environments. Environ. Microbiol. 20, 2112–2124 (2018).
29. Bartossek, R., Spang, A., Weidler, G., Lanzen, A. & Schleper, C. Metagenomic analysis of
ammonia-oxidizing archaea affiliated with the soil group. Front. Microbiol. 3, (2012).
30. Tolar, B. B., Wallsgrove, N. J., Popp, B. N. & Hollibaugh, J. T. Oxidation of urea-derived
nitrogen by thaumarchaeota-dominated marine nitrifying communities. Environ. Microbiol. 19,
4838–4850 (2017).
31. Tully, B. J., Nelson, W. C. & Heidelberg, J. F. Metagenomic analysis of a complex marine
planktonic thaumarchaeal community from the Gulf of Maine. Environ. Microbiol. 14, 254–267
(2012).
32. Bayer, B. et al. Physiological and genomic characterization of two novel marine thaumarchaeal
strains indicates niche differentiation. ISME J. 10, 1051–1063 (2016).
33. Palatinszky, M. et al. Cyanate as an energy source for nitrifiers. Nature 524, 105–108 (2015).
34. Santoro, A. E. et al. Genomic and proteomic characterization of ‘Candidatus Nitrosopelagicus
brevis’: An ammonia-oxidizing archaeon from the open ocean. Proc. Natl. Acad. Sci. U. S. A. 112,
1173–1178 (2015).
35. Shiozaki, T. et al. Nitrification and its influence on biogeochemical cycles from the equatorial
Pacific to the Arctic Ocean. ISME J. 10, 2184–2197 (2016).
89
36. Smith, J. M., Damashek, J., Chavez, F. P. & Francis, C. A. Factors influencing nitrification rates
and the abundance and transcriptional activity of ammonia-oxidizing microorganisms in the dark
northeast Pacific Ocean. Limnol. Oceanogr. 61, 596–609 (2016).
37. Santoro, A. E. et al. Thaumarchaeal ecotype distributions across the equatorial Pacific Ocean and
their potential roles in nitrification and sinking flux attenuation. Limnol. Oceanogr. 62, 1984–2003
(2017).
38. Sintes, E., De Corte, D., Haberleitner, E. & Herndl, G. J. Geographic distribution of archaeal
ammonia oxidizing ecotypes in the Atlantic Ocean. Front. Microbiol. 7, 77 (2016).
39. Alves, R. J. E., Minh, B. Q., Urich, T., Von Haeseler, A. & Schleper, C. Unifying the global
phylogeny and environmental distribution of ammonia-oxidising archaea based on amoA genes.
Nat. Commun. 9, 1–17 (2018).
40. Villanueva, L., Schouten, S. & Sinninghe Damsté, J. S. Depth-related distribution of a key gene of
the tetraether lipid biosynthetic pathway in marine Thaumarchaeota. Environ. Microbiol. 17,
3527–3539 (2015).
41. Sintes, E., Bergauer, K., De Corte, D., Yokokawa, T. & Herndl, G. J. Archaeal amoA gene
diversity points to distinct biogeography of ammonia-oxidizing Crenarchaeota in the ocean.
Environ. Microbiol. 15, 1647–1658 (2013).
42. Parada, A. E. & Fuhrman, J. A. Marine archaeal dynamics and interactions with the microbial
community over 5 years from surface to seafloor. ISME J. 11, 2510–2525 (2017).
43. Galand, P. E., Gutiérrez-Provecho, C., Massana, R., Gasol, J. M. & Casamayor, E. O. Inter-annual
recurrence of archaeal assemblages in the coastal NW Mediterranean Sea (Blanes Bay Microbial
Observatory). Limnol. Oceanogr. 55, 2117–2125 (2010).
44. Martens-Habbena, W., Berube, P. M., Urakawa, H., De La Torre, J. R. & Stahl, D. A. Ammonia
oxidation kinetics determine niche separation of nitrifying Archaea and Bacteria. Nature 461,
976–979 (2009).
45. Wan, X. S. et al. Ambient nitrate switches the ammonium consumption pathway in the euphotic
ocean. Nat. Commun. 9, 1–9 (2018).
46. Qin, W. et al. Stress response of a marine ammonia-oxidizing archaeon informs physiological
status of environmental populations. ISME J. 12, 508–519 (2018).
47. Amin, S. A. et al. Copper requirements of the ammonia-oxidizing archaeon Nitrosopumilus
maritimus SCM1 and implications for nitrification in the marine environment. Limnol. Oceanogr.
58, 2037–2045 (2013).
48. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from
metagenomic data. PeerJ 8, e10119 (2020).
49. Biller, S. J. et al. Data descriptor: Marine microbial metagenomes sampled across space and time.
Sci. Data 5, (2018).
50. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially
revises the tree of life. Nat. Biotechnol. (2018) doi:10.1038/nbt.4229.
51. Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136
microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 1–6 (2016).
52. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509
90
(2021).
53. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
54. Paoli, L. et al. Uncharted biosynthetic potential of the ocean microbiome. bioRxiv (2021).
55. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics 34, i884–i890 (2018).
56. Li, D. et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced
methodologies and community practices. Methods 102, 3–11 (2016).
57. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-
assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
58. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation
sequencing data. Bioinformatics 28, 3150–3152 (2012).
59. Treangen, T. J., Sommer, D. D., Angly, F. E., Koren, S. & Pop, M. Next generation sequence
assembly with AMOS. Curr. Protoc. Bioinforma. 1–18 (2011)
doi:10.1002/0471250953.bi1108s33.
60. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–
359 (2012).
61. Woodcroft, B. J. CoverM. CoverM https://github.com/wwood/CoverM (2007).
62. Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for
assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
63. Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035
(2017).
64. Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 2019, (2019).
65. Lin, H. N. & Hsu, W. L. Kart: A divide-and-conquer algorithm for NGS read alignment.
Bioinformatics 33, 2281–2287 (2017).
66. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079
(2009).
67. Neely, C. J., Graham, E. D. & Tully, B. J. MetaSanity: an integrated microbial genome evaluation
and annotation pipeline. Bioinformatics 36, 4341–4344 (2020).
68. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify
genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).
69. Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput
ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9,
(2018).
70. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: Assessing
the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome
Res. 25, 1043–1055 (2015).
91
71. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
72. Eren, A. M. et al. Anvi’o: An advanced analysis and visualization platformfor ’omics data. PeerJ
2015, (2015).
73. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification.
BMC Bioinformatics 11, 1–11 (2010).
74. Seemann, T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
75. Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and
adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
76. Zhang, H. et al. DbCAN2: A meta server for automated carbohydrate-active enzyme annotation.
Nucleic Acids Res. 46, W95–W101 (2018).
77. Rawlings, N. D., Barrett, A. J. & Bateman, A. MEROPS: The peptidase database. Nucleic Acids
Res. 38, D227–D233 (2009).
78. Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using
DIAMOND. Nat. Methods 18, 366–368 (2021).
79. Medema, M. H. et al. AntiSMASH: Rapid identification, annotation and analysis of secondary
metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids
Res. 39, W339 (2011).
80. Alcock, B. P. et al. CARD 2020: Antibiotic resistome surveillance with the comprehensive
antibiotic resistance database. Nucleic Acids Res. 48, D517–D525 (2020).
81. Yu, N. Y. et al. PSORTb 3.0: Improved protein subcellular localization prediction with refined
localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26, 1608–
1615 (2010).
82. Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and
RNA viruses. Microbiome 9, 1–13 (2021).
83. Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep neural
networks. Nat. Biotechnol. 37, 420–423 (2019).
84. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. DRep: A tool for fast and accurate
genomic comparisons that enables improved genome recovery from metagenomes through de-
replication. ISME J. 11, 2864–2868 (2017).
85. Page, A. J. et al. Roary: Rapid large-scale prokaryote pan genome analysis. Bioinformatics 31,
3691–3693 (2015).
86. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space
complexity. BMC Bioinformatics 5, 1–19 (2004).
87. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the
presence of recombination. Genetics 172, 2665–2681 (2006).
88. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 - Approximately maximum-likelihood trees
for large alignments. PLoS One 5, (2010).
89. Horesh, G. et al. Different evolutionary trends form the twilight zone of the bacterial pan-genome.
92
bioRxiv 2021.02.15.431222 (2021) doi:10.1101/2021.02.15.431222.
90. Schlitzer, R. et al. The GEOTRACES Intermediate Data Product 2017. Chem. Geol. 493, 210–223
(2018).
91. Graham, E. D. & Tully, B. J. Marine Dadabacteria exhibit genome streamlining and phototrophy-
driven niche partitioning. ISME J. 15, 1248–1256 (2021).
92. Hammer, O., Harper, D. & Ryan, P. PAST: Paleontological Statistics Software Package for
Education and Data Analysis. Palaeontol. Electron. 4, 1–9 (2001).
93. DeLong, E. F. Exploring Marine Planktonic Archaea: Then and Now. Front. Microbiol. 11,
616086 (2021).
94. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing
the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome
Res. 25, 1043–55 (2015).
95. Hu, J. et al. Ecological Success of the Nitrosopumilus and Nitrosospira Clusters in the Intertidal
Zone. Microb. Ecol. 78, 555–564 (2019).
96. Bayer, B. et al. Physiological and genomic characterization of two novel marine thaumarchaeal
strains indicates niche differentiation. ISME J. 10, 1051–1063 (2016).
97. Qin, W. et al. Nitrosopumilus maritimus gen. nov., sp. nov., Nitrosopumilus cobalaminigenes sp.
nov., Nitrosopumilus oxyclinae sp. nov., and Nitrosopumilus ureiphilus sp. nov., four marine
ammoniaoxidizing archaea of the phylum thaumarchaeo. Int. J. Syst. Evol. Microbiol. 67, 5067–
5079 (2017).
98. Murali, R., Gennis, R. B. & Hemp, J. Evolution of the cytochrome bd oxygen reductase
superfamily and the function of CydAA’ in Archaea. ISME J. 1–15 (2021) doi:10.1038/s41396-
021-01019-4.
99. Kletzin, A. et al. Cytochromes c in Archaea: Distribution, maturation, cell architecture, and the
special case of Ignicoccus hospitalis. Front. Microbiol. 6, 439 (2015).
100. Huerta-Cepas, J. et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated
orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314
(2019).
101. Palenik, B. et al. The genome of a motile marine Synechococcus. Nature 424, 1037–1042 (2003).
102. Stahl, D. A. & Torre, J. R. de la. Physiology and Diversity of Ammonia-Oxidizing Archaea.
http://dx.doi.org/10.1146/annurev-micro-092611-150128 66, 83–101 (2012).
103. Kits, K. D. et al. Kinetic analysis of a complete nitrifier reveals an oligotrophiclifestyle. Nature
549, 269 (2017).
104. Jung, M.-Y. et al. Ammonia-oxidizing archaea possess a wide range of cellular ammonia
affinities. bioRxiv 2021.03.02.433310 (2021) doi:10.1101/2021.03.02.433310.
105. Xu, M., Schnorr, J., Keibler, B. & Simon, H. M. Comparative Analysis of 16S rRNA and amoA
Genes from Archaea Selected with Organic and Inorganic Amendments in Enrichment Culture.
Appl. Environ. Microbiol. 78, 2137 (2012).
106. Li, J. et al. amoA gene abundances and nitrification potential rates suggest that benthic ammonia-
oxidizing bacteria and not archaea dominate N cycling in the Colne estuary, United Kingdom.
93
Appl. Environ. Microbiol. 81, 159–165 (2015).
107. SJ, P., BJ, P. & SK, R. Comparative analysis of archaeal 16S rRNA and amoA genes to estimate
the abundance and diversity of ammonia-oxidizing archaea in marine sediments. Extremophiles
12, 605–615 (2008).
108. Rodriguez-Valera, F. et al. Explaining microbial population genomics through phage predation.
Nat. Rev. Microbiol. 7, 828–836 (2009).
109. Sharma, R. S., Mishra, V., Mohmmed, A. & Babu, C. R. Phage specificity and lipopolysaccarides
of stem- and root-nodulating bacteria (Azorhizobium caulinodans, Sinorhizobium spp., and
Rhizobium spp.) of Sesbania spp. Arch. Microbiol. 189, 411–418 (2008).
110. Reva, O. & Tümmler, B. Think big - Giant genes in bacteria. Environ. Microbiol. 10, 768–777
(2008).
111. Ivars-Martinez, E. et al. Comparative genomics of two ecotypes of the marine planktonic
copiotroph Alteromonas macleodii suggests alternative lifestyles associated with different kinds of
particulate organic matter. ISME J. 2, 1194–1212 (2008).
112. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Towards functional characterization of archaeal
genomic dark matter. Biochemical Society Transactions vol. 47 389–398 (2019).
94
Main Figures & Tables
Figure 1. A phylogenetic tree built using 122 archaeal markers using the GTDB-Tk de novo workflow and visualized in
ITOL. Sulfolobus acidocaldarius (GCF_000012285.1) was used as an outgroup. This tree includes 1,371 Nitrososphaerales
collected from a variety of sources including Paoli et al. 2021, Tully et al. 2018, Nayfach et al. 2021, Pachiadaki et al. 2019,
Haroon et al. 2016, and all Nitrososphaerales MAGs in GTDB as of January 2021. Tree branches are colored based on the
lowest assigned groups via GTDB. All branches representing draft genomes with a completion of greater than 80% are
marked with green circles and all branches representing draft genomes that were determined to be novel via GTDB
relative evolutionary distance (RED) scores are marked with red stars.
95
Figure 2. A phylogenetic tree was built using sequences described and classified by Alves et al. 2018. All amoA sequences
from every genome being analyzed for this study (that were not already contained within the Alves et al. 2018
classification schema) were identified via KEGG (K10944), aligned via muscle, trimmed via trimal, and a phylogenetic
tree was built using FastTree. The final tree was visualized in ITOL. amoA sequences added to the tree via this study are
highlighted with red stars along the tree leaves. Organisms that represent isolates or are currently enriched in the lab are
marked on the tree with grey triangles and the names of the organisms are linked to them.
96
Figure 3. A phylogenetic tree of all Nitrosopelagicus genomes built using 122 archaeal markers using the GTDB-Tk de
novo workflow and visualized in ITOL. In total this tree includes 217 Nitrosopelagicus genomes that are all >80%
complete based on CheckM. Colored ranges indicate assigned Nitrosopelagicus approximate species-level clades for each
genome. The first data concentric circle indicates genome source (indicated by Source) which highlights which study each
genome is derived from. The cultured Candidatus Nitrosopelagicus are indicated with red stars on the exterior of the tree.
The second data concentric circle indicates what oceanic province each genome was derived from. The third data
concentric circle indicates the depth that genome was derived from.
97
Figure 4. A phylogenetic tree of all Nitrosopelagicus genomes built using 122 archaeal markers using the GTDB-Tk de
novo workflow and visualized in ITOL. In total this tree includes 613 Nitrosopelagicus genomes that are all >50%
complete based on CheckM. Colored ranges indicate assigned Nitrosopelagicus approximate species-level clades for each
genome as determined using the phylogenetic tree here and ANI cutoffs of 95%. For MAGs containing AmoA the AmoA
type is shown with either a red arrow (NP-Alpha) or blue arrow (NP-Epsilon)
98
Figure 5. A phylogenetic tree built on all Nitrosopelagicus brevis genomes greater than 80% complete using 413 core genes
defined from roary that were passed a test for recombination using PhiPack. Phylogenetic tree was built using FastTree
and IQ-Tree and visualized in ITOL. Nitrosopelagicus brevis groups were assigned using this tree with four genomes
being labeled BG4_Un as they represented unique branches on the tree. The cultured N. brevis is designated via a red star
and a single amplified genome that was >80% complete is marked with a blue circle.
99
Figure 6. RPKM for each major Nitrososphaerales group plotted across the GEOTRACES cruise. (A) <10 m. (B) 10-50 m.
(C) 50-150 m. (D) 150-500 m. (E) >500 m.
100
101
Figure 7. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG1-BG12 along the
GEOTRACES transect GA02. Cruise Track for GA02 is shown below ODV maps with the red circle denoting the start of
the cruise track or 0km.
Figure 8. Canonical correspondence analysis of the nonredundant marine Nitrosopelagicus groups BG1-BG12. Vectors
denote correlations with environmental parameters and have been modified for easier visualization: trioplot amp 1.5,
scaling type 2. A contains associations primarily with trace elements, while B contains associations with general nutrients.
102
Figure 9. Heatmap showing average completion of various pathways and presence of select genes across all genomes
belonging to each of the 12 Nitrosopelagicus groups BG1-BG12.
103
Figure 10. A phylogenetic tree of all Nitrosopelagicus genomes built using 122 archaeal markers using the GTDB-Tk de
novo workflow and visualized in ITOL. Heatmap indicates the percent ID match determined using DIAMOND to each of
the 15 genes associated with an ammonia oxidation module identified by Carini et al (2018)
28
in Nitrosopelagicus brevis
CN25. Genes with no match in a genome are shown in grey. Bar chart on exterior indicates genome completion (with
minimum completion being 80%) and in red contamination (maximum contamination being 5%).
104
Figure 11. Classification of pangenome gene clusters in the Nitrosopelagicus brevis groups BG4_A, BG4_B, BG4_C,
BG4_E, BG4_G, BG4_H, BG4_Un split by various categories (as defined by Horesh et al. 2021
89
). Category definitions:
core (present in ≥95% of genomes); multi-lineage (multiple [but not all] lineages); lineage specific (present in a single
lineage); intermediate (present in ≥15%, but <95% of genomes); rare (present in <15% of genomes). Gene clusters were
identified using Roary and split into categories using Twilight.
105
Figure 12. Classification of typical genome in the Nitrosopelagicus brevis groups BG4_A, BG4_B, BG4_C, BG4_E,
BG4_G, BG4_H, BG4_Un split by various categories (as defined by Horesh et al. 2021
89
). Category definitions: core
(present in ≥95% of genomes); multi-lineage (multiple [but not all] lineages); lineage specific (present in a single lineage);
intermediate (present in ≥15%, but <95% of genomes); rare (present in <15% of genomes). Gene clusters were identified
using Roary and split into categories using Twilight.
106
Figure 13. Presence or absence heatmap for every gene cluster in the Nitrosopelagicus brevis groups as determined by
roary
85
.
107
Conclusions
This dissertation develops approaches to further improve our ability to extract near
complete draft genomes from large metagenomic datasets by tackling two of the major hurdles and
limitations associated with next generation sequencers. First, I developed a novel “binning” program
called BinSanity implementing a unique clustering algorithm called Affinity Propagation that leveraged
coverage metrics and composition metrics across many samples to aid in draft genome generation. The
biphasic approach used by BinSanity indicated that it had a higher precision and recall compared to
commonly used methodologies. In addition, BinSanity outperformed all other methodologies in teasing
apart sequences at the strain level. Using BinSanity I showed its efficacy across a set of publicly available
marine metagenomics collected from across the globe during the Tara oceans expedition1. This led to the
discovery of a unique aerobic anoxygenic phototroph coined ‘Ca. Luxescamonaceae’ that possessed the
genomic potential for anoxygenic phototrophy and carbon fixation. This contradicted the canonical idea
that aerobic anoxygenic phototrophs were exclusively photoheterotrophic. Estimates from our work
indicated the group may be globally abundant and as we work to assess broader biogeochemical cycling
in the ocean, particularly with respect to carbon, it’s important that we are able to constrain every
potential metabolism playing a role in up taking CO2 and primary productivity. In addition to this group
of novel aerobic anoxygenic phototrophs I also identified the largest set of marine Dadabacteria genomes
to date, demonstrating clear proteorhodopsin driven depth partitioning of clades and genome streamlining.
While “binning” represents one hurdle to metagenomic sequence analysis another hurdle is
related to the difficulty of short read assemblers to piece together genomes from groups that are highly
abundant or heterogeneous. This means highly abundant marine groups like the ammonia oxidizing
archaea, SAR11, and cyanobacteria are woefully underrepresented in metagenomic draft genome datasets.
To overcome this, I implemented a unique subsampled assembly approach and generated a significant
number of ammonia oxidizing Thaumarchaeota genomes. Using genomes generated in my study as well
as all publicly available Thaumarchaeota I identified unique distinctions between the NP-Alpha and NP-
108
Epsilon containing AOA groups that were both depth partitioned and potentially linked to substrate
affinity.
Overall metagenomic-enabled phylogenomics as conducted in this dissertation has enabled us to
gain insight into a vast diversity of bacteria and archaea that to date have eluded culturing. This allows us
to better understand the complex interactions between microbial metabolisms and abiotic or biotic
features that mediate ecosystem-scale biogeochemical transformations. These community analyses
facilitate a systems-level approach that will become essential as we delve into the effects a changing
climate2 may have on the diversity, metabolism, and broader biogeochemical cycles within the ocean and
other environments. While long-read technology such as Pacific Biosciences single-molecule-real time
(SMRT) and Oxford Nanopore Technologies (ONT) are becoming more common, the majority of
metagenomics studies are still conduction on short read sequencers such as Illumina3. These short-read
metagenomes rely on complex assembly and binning algorithms to generate “draft” genomes that can be
used to better understand the phylogenetic and functional diversity within a community.
References
1. Bork, P. et al. Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction. Science (New
York, N.Y.) 348, 873 (2015).
2. Vicuña, R. & González, B. The microbial world in a changing environment. Revista Chilena de Historia
Natural 2021 94:1 94, 1–5 (2021).
3. Xiao, T. & Zhou, W. The third generation sequencing: the advanced approach to genetic diseases.
Translational Pediatrics 9, 163 (2020).
109
Bibliography
1. Staley, J. T. & Konopka, A. Measurement of in Situ Activities of Nonphotosynthetic
Microorganisms in Aquatic and Terrestrial Habitats. Annu. Rev. Microbiol. 39, 321–346 (1985).
2. JANNASCH, H. W. & JONES, G. E. Bacterial Populations in Sea Water as Determined by
Different Methods of Enumeration. Limnol. Oceanogr. 4, 128–139 (1959).
3. Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain uncultured.
ISME J. 13, 3126–3130 (2019).
4. Martiny, A. C. High proportions of bacteria are culturable across major biomes. ISME J. 13, 2125–
2128 (2019).
5. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev.
Microbiol. 71, 711–730 (2017).
6. Hugerth, L. W. & Andersson, A. F. Analysing microbial community composition through
amplicon sequencing: From sampling to hypothesis testing. Frontiers in Microbiology vol. 8
(2017).
7. Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’. Proc.
Natl. Acad. Sci. U. S. A. 103, 12115–12120 (2006).
8. Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science (80-.
). 304, 66–74 (2004).
9. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial
genomes from the environment. Nature 428, 37–43 (2004).
10. Rusch, D. B. et al. The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic
through eastern tropical Pacific. PLoS Biol. 5, 0398–0431 (2007).
11. Bork, P. et al. Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction. Science
348, 873 (2015).
12. Biller, S. J. et al. Data descriptor: Marine microbial metagenomes sampled across space and time.
Sci. Data 5, (2018).
13. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
14. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain
Bacteria. Nature 523, 208–211 (2015).
15. Lee, M. D. et al. Marine Synechococcus isolates representing globally abundant genomic lineages
demonstrate a unique evolutionary path of genome reduction without a decrease in GC content.
Environ. Microbiol. 21, 1677–1686 (2019).
16. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev.
Microbiol. 71, 711–730 (2017).
17. Palkova, Z. Multicellular microorganisms: Laboratory versus nature. EMBO Reports vol. 5 470–
476 (2004).
18. Hobman, J. L., Penn, C. W. & Pallen, M. J. Laboratory strains of Escherichia coli: Model citizens
or deceitful delinquents growing old disgracefully? Molecular Microbiology vol. 64 881–885
(2007).
19. Quast, C. et al. The SILVA ribosomal RNA gene database project: Improved data processing and
web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
20. Cole, J. R. et al. Ribosomal Database Project: Data and tools for high throughput rRNA analysis.
Nucleic Acids Res. 42, D633–D642 (2014).
21. Bragg, L. & Tyson, G. W. Metagenomics using next-generation sequencing. Methods Mol. Biol.
1096, 183–201 (2014).
22. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11,
1144–1146 (2014).
110
23. Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 2019, (2019).
24. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-
assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
25. Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are
abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
26. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509
(2021).
27. Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136
microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 1–6 (2016).
28. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially
expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
29. Hug, L. A. et al. Critical biogeochemical functions in the subsurface are associated with bacteria
from new phyla and little studied lineages. Environ. Microbiol. 18, 159–173 (2016).
30. Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035
(2017).
31. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from
metagenomic data. PeerJ 8, e10119 (2020).
32. Hedlund, B. P., Dodsworth, J. A., Murugapiran, S. K., Rinke, C. & Woyke, T. Impact of single-
cell genomics and metagenomics on the emerging view of extremophile “microbial dark matter”.
Extremophiles vol. 18 865–875 (2014).
33. Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: The unseen majority. Proceedings
of the National Academy of Sciences of the United States of America vol. 95 6578–6583 (1998).
34. Kai, W., Peisheng, Y., Rui, M., Wenwen, J. & Zongze, S. Diversity of culturable bacteria in deep-
sea water from the South Atlantic Ocean. Bioengineered 8, 572–584 (2017).
35. Fenchel, T. The microbial loop - 25 years later. J. Exp. Mar. Bio. Ecol. 366, 99–103 (2008).
36. Azam, F. et al. The Ecological Role of Water-Column Microbes in the Sea. Mar. Ecol. Prog. Ser.
10, 257–263 (1983).
37. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive earth’s
biogeochemical cycles. Science vol. 320 1034–1039 (2008).
38. Hunter, C. N., Daldal, F., Thurnauer, M. C. & Beatty, J. T. The purple phototrophic bacteria. vol.
28 (Springer Science & Business Media, 2008).
39. Pfennig, N. & Trüper, H. G. Taxonomy of phototrophic green and purple bacteria: A review. Ann.
l’Institut Pasteur / Microbiol. 134, 9–20 (1983).
40. Madigan, M. T. & Jung, D. O. An Overview of Purple Bacteria: Systematics, Physiology, and
Habitats. in 1–15 (Springer, Dordrecht, 2009). doi:10.1007/978-1-4020-8815-5_1.
41. Ehrenreich, A. & Widdel, F. Anaerobic oxidation of ferrous iron by purple bacteria, a new type of
phototrophic metabolism. Appl. Environ. Microbiol. 60, 4517–26 (1994).
42. Griffin, B. M., Schott, J. & Schink, B. Nitrite, an Electron Donor for Anoxygenic Photosynthesis.
Science (80-. ). 316, 1870–1870 (2007).
43. HARASHIMA, K., SHIBA, T., TOTSUKA, T., SIMIDU, U. & TAGA, N. Occurrence of
bacteriochlorophyll a in a strain of an aerobic heterotrophic bacterium. Agric. Biol. Chem. 42,
1627–1628 (1978).
44. Shiba, T., Simidu, U. & Taga, N. Distribution of aerobic bacteria which contain
bacteriochlorophyll a. Appl. Environ. Microbiol. 38, 43–5 (1979).
111
45. Beatty, J. T. On the natural selection and evolution of the aerobic phototrophic bacteria.
Photosynth. Res. 73, 109–114 (2002).
46. NISHIMURA, Y. et al. DNA relatedness and chemotaxonomic feature of aerobic
bacteriochlorophyll-containing bacteria isolated from coasts of Australia. J. Gen. Appl.
Microbiol. 40, 287–296 (1994).
47. Arai, H., Roh, J. H. & Kaplan, S. Transcriptome dynamics during the transition from anaerobic
photosynthesis to aerobic respiration in Rhodobacter sphaeroides 2.4.1. J. Bacteriol. 190, 286–99
(2008).
48. Swingley, W. D. et al. The complete genome sequence of Roseobacter denitrificans reveals a
mixotrophic rather than photosynthetic metabolism. J. Bacteriol. 189, 683–90 (2007).
49. Yurkov, V. V & Beatty, J. T. Aerobic anoxygenic phototrophic bacteria. Microbiol. Mol. Biol.
Rev. 62, 695–724 (1998).
50. Tang, K.-H., Feng, X., Tang, Y. J. & Blankenship, R. E. Carbohydrate Metabolism and Carbon
Fixation in Roseobacter denitrificans OCh114. PLoS One 4, e7233 (2009).
51. Wood, H. ., Werkman, C. ., Hemingway, A. & Nier, A. . The position of carbon dioxide carbon in
succinic acid synthesized by heterotrophic bacteria. J. Biol. Chem. 139, 377–381 (1941).
52. Cohen-Bazire, G., Sistrom, W. R. & Stanier, R. Y. Kinetic studies of pigment synthesis by non-
sulfur purple bacteria. J. Cell. Comp. Physiol. 49, 25–68 (1957).
53. Kolber, Z. S. et al. Contribution of aerobic photoheterotrophic bacteria to the carbon cycle in the
ocean. Science (80-. ). (2001) doi:10.1126/science.1059707.
54. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled
genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
55. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification.
BMC Bioinformatics 11, 1–11 (2010).
56. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for
Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 428, 726–731
(2016).
57. Graham, E. D., Heidelberg, J. F. & Tully, B. J. Binsanity: Unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 2017, e3035
(2017).
58. Eren, A. M. et al. Anvi’o: An advanced analysis and visualization platformfor ’omics data. PeerJ
2015, (2015).
59. Santos, S. R. & Ochman, H. Identification and phylogenetic sorting of bacterial lineages with
universally conserved genes and proteins. Environ. Microbiol. 6, 754–759 (2004).
60. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic
Acids Res. 33, D34 (2005).
61. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: Interactive sequence similarity
searching. Nucleic Acids Res. 39, W29–W37 (2011).
62. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222-30 (2014).
63. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic
Acids Res. 31, 371–3 (2003).
64. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space
complexity. BMC Bioinformatics 5, 1–19 (2004).
65. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
66. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 - Approximately maximum-likelihood trees
for large alignments. PLoS One 5, (2010).
112
67. Yutin, N., Suzuki, M. T. & Béjà, O. Novel primers reveal wider diversity among marine aerobic
anoxygenic phototrophs. Appl. Environ. Microbiol. 71, 8958–62 (2005).
68. Yutin, N. et al. Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria
in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition
metagenomes. Environ. Microbiol. 9, 1464–1475 (2007).
69. Béjà, O. et al. Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature 415,
630–633 (2002).
70. Markowitz, V. M. et al. IMG: The integrated microbial genomes database and comparative
analysis system. Nucleic Acids Res. 40, D115 (2012).
71. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND.
Nature Methods vol. 12 59–60 (2014).
72. Tabita, F. R. et al. Function, Structure, and Evolution of the RubisCO-Like Proteins and Their
RubisCO Homologs. Microbiol. Mol. Biol. Rev. 71, 576–599 (2007).
73. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential
coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
74. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database
search programs. Nucleic Acids Research vol. 25 3389–3402 (1997).
75. Schwalbach, M. S. & Fuhrman, J. A. Wide-ranging abundances of aerobic anoxygenic
phototrophic bacteria in the world ocean revealed by epifluorescence microscopy and quantitative
PCR. Limnol. Oceanogr. 50, 620–628 (2005).
76. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled
genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
77. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing
the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
Genome Res. 25, 1043–55 (2015).
78. Moran, M. A. The global ocean microbiome. Science vol. 350 (2015).
79. Hamada, M., Toyofuku, M., Miyano, T. & Nomura, N. cbb3-type cytochrome c oxidases, aerobic
respiratory enzymes, impact the anaerobic life of Pseudomonas aeruginosa PAO1. J. Bacteriol.
196, 3881–3889 (2014).
80. Giuffrè, A., Borisov, V. B., Arese, M., Sarti, P. & Forte, E. Cytochrome bd oxidase and bacterial
tolerance to oxidative and nitrosative stress. Biochimica et Biophysica Acta - Bioenergetics vol.
1837 1178–1187 (2014).
81. Boldareva-Nuianzina, E. N., Bláhová, Z., Sobotka, R. & Koblížek, M. Distribution and origin of
oxygen-dependent and oxygen-independent forms of mg-protoporphyrin monomethylester
cyclase among phototrophic proteobacteria. Appl. Environ. Microbiol. 79, 2596–2604 (2013).
82. Gough, S. P., Petersen, B. O. & Duus, J. Anaerobic chlorophyll isocyclic ring formation in
Rhodobacter capsulatus requires a cobalamin cofactor. Proc. Natl. Acad. Sci. U. S. A. 97, 6908–
6913 (2000).
83. Yurkov, V., Gad’on, N. & Drews, G. The major part of polar carotenoids of the aerobic bacteria
Roseococcus thiosulfatophilus RB3 and Erythromicrobium ramosum E5 is not bound to the
bacteriochlorophyll a-complexes of the photosynthetic apparatus. Arch. Microbiol. 160, 372–376
(1993).
84. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
85. Gómez-Consarnau, L. et al. Light stimulates growth of proteorhodopsin-containing marine
Flavobacteria. Nature 445, 210–213 (2007).
113
86. Johnston, W. et al. Carbon mass balance methodology to characterize the growth of pigmented
marine bacteria under conditions of light cycling. Bioprocess Biosyst. Eng. 27, 163–174 (2005).
87. Tomasch, J., Gohl, R., Bunk, B., Diez, M. S. & Wagner-Döbler, I. Transcriptional response of the
photoheterotrophic marine bacterium Dinoroseobacter shibae to changing light regimes. ISME J.
5, 1957–68 (2011).
88. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
89. Tang, K. et al. An Aerobic Anoxygenic Phototrophic Bacterium Fixes CO2 via the Calvin-
Benson-Bassham Cycle 2 3 Running title: An AAnPB implements the CBB cycle. bioRxiv
2021.04.29.441244 (2021) doi:10.1101/2021.04.29.441244.
90. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ,
Andersson AF, and Quince C. 2014. Binning metagenomic contigs by coverage and composition.
Nat Meth 11:1144-1146. 10.1038/nmeth.3103
91. Anantharaman K, Breier JA, and Dick GJ. 2016. Metagenomic resolution of microbial functions
in deep-sea hydrothermal plumes across the Eastern Lau Spreading Center. ISME J 10:225-239.
10.1038/ismej.2015.81
92. Bohlin J, Snipen L, Hardy SP, Kristoffersen AB, Lagesen K, Dønsvik T, Skjerve E, and Ussery
DW. 2010. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC
Genomics 11:1-8. 10.1186/1471-2164-11-464
93. Bowers RM, Clum A, Tice H, Lim J, Singh K, Ciobanu D, Ngan CY, Cheng J-F, Tringe SG, and
Woyke T. 2015. Impact of library preparation protocols and template quantity on the
metagenomic reconstruction of a mock microbial community. BMC Genomics 16:1-12.
10.1186/s12864-015-2063-6
94. Chen-Chia C, Yan-Cheng L, Jin-Tsong J, Chih-Kai C, and Zhi-Qian W. 2015. Feature genes
selection of adult ALL microarray data with affinity propagation clustering. Consumer
Electronics - Taiwan (ICCE-TW), 2015 IEEE International Conference on. p 230-231.
95. Chen SL, Lee W, Hottes AK, Shapiro L, and McAdams HH. 2004. Codon usage between
genomes is constrained by genome-wide mutational processes. Proceedings of the National
Academy of Sciences of the United States of America 101:3480-3485. 10.1073/pnas.0307827100
96. Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas BC, Yelton AP, and Banfield JF. 2009.
Community-wide analysis of microbial genome sequence signatures. Genome Biology 10:1-16.
10.1186/gb-2009-10-8-r85
97. Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, and Delmont TO. 2015.
Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3:e1319.
10.7717/peerj.1319
98. Flynn SD, and Moneypenny NF. 2013. Affinity propagation in adaptive network-based systems.
Google Patents.
99. Frey BJ, and Dueck D. 2007. Clustering by Passing Messages Between Data Points. Science
315:972-976. 10.1126/science.1136800
100. Fujiwara Y, Nakatsuji M, Shiokawa H, Ida Y, and Toyoda M. 2015. Adaptive Message Update
for Fast Affinity Propagation. Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. Sydney, NSW, Australia: ACM. p 309-318.
101. Gan G, and Ng MK-P. 2015. Subspace clustering using affinity propagation. Pattern
Recognition 48:1455-1464.
102. Handelsman J, Rondon MR, Brady SF, Clardy J, and Goodman RM. 1998. Molecular biological
access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry
& Biology 5:R245-R249. 10.1016/S1074-5521(98)90108-9
114
103. Hassanabadi B, Shea C, Zhang L, and Valaee S. 2014. Clustering in Vehicular Ad Hoc
Networks using Affinity Propagation. Ad Hoc Networks 13, Part B:535-548.
104. Hubert L, and Arabie P. 1985. Comparing partitions. Journal of Classification 2:193-218.
10.1007/bf01908075
105. Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, and Tyson GW. 2014. GroopM:
an automated tool for the recovery of population genomes from related metagenomes. PeerJ
2:e603. 10.7717/peerj.603
106. J T Staley a, and Konopka A. 1985. Measurement of in Situ Activities of Nonphotosynthetic
Microorganisms in Aquatic and Terrestrial Habitats. Annual Review of Microbiology 39:321-
346. doi:10.1146/annurev.mi.39.100185.001541
107. Kanaya S, Yamada Y, Kudo Y, and Ikemura T. 1999. Studies of codon usage and tRNA genes
of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level
and species-specific diversity of codon usage based on multivariate analysis. Gene 238:143-155.
108. Kang DD, Froula J, Egan R, and Wang Z. 2015. MetaBAT, an efficient tool for accurately
reconstructing single genomes from complex microbial communities. PeerJ 3:e1165.
109. Langmead B, and Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature
methods 9:357-359. 10.1038/nmeth.1923
110. Leone M, Sumedha, and Weigt M. 2007. Clustering by soft-constraint affinity propagation:
applications to gene-expression data. Bioinformatics 23:2708-2715.
10.1093/bioinformatics/btm414
111. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R,
and Genome Project Data Processing S. 2009. The Sequence Alignment/Map format and
SAMtools. Bioinformatics 25:2078-2079. 10.1093/bioinformatics/btp352
112. Lin H-H, and Liao Y-C. 2016. Accurate binning of metagenomic contigs via automated
clustering sequences using information of genomic signatures and marker genes. Scientific
Reports 6:24175. 10.1038/srep24175
113. Lu YY, Chen T, Fuhrman JA, and Sun F. 2016. COCACOLA: binning metagenomic contigs
using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.
Bioinformatics. 10.1093/bioinformatics/btw290
114. Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Pillay M, Ratner A, Huang J,
Woyke T, Huntemann M, Anderson I, Billis K, Varghese N, Mavromatis K, Pati A, Ivanova NN,
and Kyrpides NC. 2014. IMG 4 version of the integrated microbial genomes comparative analysis
system. Nucleic Acids Research 42:D560-D567. 10.1093/nar/gkt963
115. Mehmood R, and Bie R. 2015. Optimal Preference Detection Based on Golden Section and
Genetic Algorithm for Affinity Propagation Clustering. Wireless Algorithms, Systems, and
Applications: 10th International Conference, WASA 2015, Qufu, China, August 10-12, 2015,
Proceedings: Springer. p 253.
116. Meyer JL, Jaekel U, Tully BJ, Glazer BT, Wheat CG, Lin H-T, Hsieh C-C, Cowen JP, Hulme
SM, Girguis PR, and Huber JA. 2016. A distinct and active bacterial community in cold
oxygenated fluids circulating beneath the western flank of the Mid-Atlantic ridge. Scientific
Reports 6:22541. 10.1038/srep22541
117. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, and Tyson GW. 2015. CheckM: assessing
the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
Genome Research. 10.1101/gr.186072.114
118. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M,
Prettenhofer P, Weiss R, and Dubourg V. 2011. Scikit-learn: Machine learning in Python. The
Journal of Machine Learning Research 12:2825-2830.
115
119. Pride DT, Meinersmann RJ, Wassenaar TM, and Blaser MJ. 2003. Evolutionary Implications of
Microbial Genome Tetranucleotide Frequency Biases. Genome Research 13:145-158.
10.1101/gr.335003
120. Pruitt KD, Tatusova T, and Maglott DR. 2007. NCBI reference sequences (RefSeq): a curated
non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research
35:D61-D65. 10.1093/nar/gkl842
121. Quinlan AR, and Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic
features. Bioinformatics 26:841-842. 10.1093/bioinformatics/btq033
122. Rosenberg A, and Hirschberg J. 2007. V-Measure: A Conditional Entropy-Based External
Cluster Evaluation Measure. EMNLP-CoNLL. p 410-420.
123. Sandberg R, Winberg G, Bränden C-I, Kaske A, Ernberg I, and Cöster J. 2001. Capturing
Whole-Genome Characteristics in Short Sequences Using a Naïve Bayesian Classifier. Genome
Research 11:1404-1409. 10.1101/gr.186401
124. Santos JM, and Embrechts M. 2009. On the Use of the Adjusted Rand Index as a Metric for
Evaluating Supervised Classification. In: Alippi C, Polycarpou M, Panayiotou C, and Ellinas G,
eds. Artificial Neural Networks – ICANN 2009: 19th International Conference, Limassol,
Cyprus, September 14-17, 2009, Proceedings, Part II. Berlin, Heidelberg: Springer Berlin
Heidelberg, 175-184.
125. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, and Banfield JF. 2013. Time
series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage
during infant gut colonization. Genome Research 23:111-120. 10.1101/gr.142315.112
126. Tully BJ, and Heidelberg JF. 2016. Potential Mechanisms for Microbial Energy Acquisition in
Oxic Deep-Sea Sediments. Applied and Environmental Microbiology 82:4232-4243.
10.1128/aem.01023-16
127. Tully BJ, Sachdeva R, Heidelberg KB, and Heidelberg JF. 2014. Comparative genomics of
planktonic Flavobacteriaceae from the Gulf of Maine using metagenomic data. Microbiome 2:34.
10.1186/2049-2618-2-34
128. Walter SF, Fischer B, and Buhmann JM. 2007. Clustering by affinity propagation. Master’s
thesis, Department of Physics, ETH Zurich, Switzerland. Procedia Computer Science 00 (2014)
1–9.
129. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, and Singer SW. 2014. MaxBin: an automated
binning method to recover individual genomes from metagenomes using an expectation-
maximization algorithm. Microbiome 2:1.
130. Zhengdong L, and Carreira-Perpinan MA. 2008. Constrained spectral clustering through affinity
propagation. Computer Vision and Pattern Recognition, 2008 CVPR 2008 IEEE Conference on. p
1-8.
131. Zhou F, Olman V, and Xu Y. 2008. Barcodes for genomes and applications. BMC
Bioinformatics 9:1-11. 10.1186/1471-2105-9-546
132. Staley, J. T. & Konopka, A. Measurement of in Situ Activities of Nonphotosynthetic
Microorganisms in Aquatic and Terrestrial Habitats. Annu. Rev. Microbiol. 39, 321–346 (1985).
133. JANNASCH, H. W. & JONES, G. E. Bacterial Populations in Sea Water as Determined by
Different Methods of Enumeration. Limnol. Oceanogr. 4, 128–139 (1959).
134. Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain
uncultured. ISME J. 13, 3126–3130 (2019).
135. Martiny, A. C. High proportions of bacteria are culturable across major biomes. ISME J. 13,
2125–2128 (2019).
116
136. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev.
Microbiol. 71, 711–730 (2017).
137. Hugerth, L. W. & Andersson, A. F. Analysing microbial community composition through
amplicon sequencing: From sampling to hypothesis testing. Frontiers in Microbiology vol. 8
(2017).
138. Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’.
Proc. Natl. Acad. Sci. U. S. A. 103, 12115–12120 (2006).
139. Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science
(80-. ). 304, 66–74 (2004).
140. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial
genomes from the environment. Nature 428, 37–43 (2004).
141. Rusch, D. B. et al. The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic
through eastern tropical Pacific. PLoS Biol. 5, 0398–0431 (2007).
142. Bork, P. et al. Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction.
Science 348, 873 (2015).
143. Biller, S. J. et al. Data descriptor: Marine microbial metagenomes sampled across space and
time. Sci. Data 5, (2018).
144. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
145. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain
Bacteria. Nature 523, 208–211 (2015).
146. Lee, M. D. et al. Marine Synechococcus isolates representing globally abundant genomic
lineages demonstrate a unique evolutionary path of genome reduction without a decrease in GC
content. Environ. Microbiol. 21, 1677–1686 (2019).
147. Overmann, J., Abt, B. & Sikorski, J. Present and Future of Culturing Bacteria. Annu. Rev.
Microbiol. 71, 711–730 (2017).
148. Palkova, Z. Multicellular microorganisms: Laboratory versus nature. EMBO Reports vol. 5 470–
476 (2004).
149. Hobman, J. L., Penn, C. W. & Pallen, M. J. Laboratory strains of Escherichia coli: Model
citizens or deceitful delinquents growing old disgracefully? Molecular Microbiology vol. 64 881–
885 (2007).
150. Quast, C. et al. The SILVA ribosomal RNA gene database project: Improved data processing
and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
151. Cole, J. R. et al. Ribosomal Database Project: Data and tools for high throughput rRNA analysis.
Nucleic Acids Res. 42, D633–D642 (2014).
152. Bragg, L. & Tyson, G. W. Metagenomics using next-generation sequencing. Methods Mol. Biol.
1096, 183–201 (2014).
153. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11,
1144–1146 (2014).
154. Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 2019, (2019).
155. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-
assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
156. Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are
abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
157. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509
(2021).
117
158. Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136
microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 1–6 (2016).
159. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially
expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
160. Hug, L. A. et al. Critical biogeochemical functions in the subsurface are associated with bacteria
from new phyla and little studied lineages. Environ. Microbiol. 18, 159–173 (2016).
161. Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035
(2017).
162. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from
metagenomic data. PeerJ 8, e10119 (2020).
163. Hedlund, B. P., Dodsworth, J. A., Murugapiran, S. K., Rinke, C. & Woyke, T. Impact of single-
cell genomics and metagenomics on the emerging view of extremophile “microbial dark matter”.
Extremophiles vol. 18 865–875 (2014).
164. Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: The unseen majority.
Proceedings of the National Academy of Sciences of the United States of America vol. 95 6578–
6583 (1998).
165. Kai, W., Peisheng, Y., Rui, M., Wenwen, J. & Zongze, S. Diversity of culturable bacteria in
deep-sea water from the South Atlantic Ocean. Bioengineered 8, 572–584 (2017).
166. Fenchel, T. The microbial loop - 25 years later. J. Exp. Mar. Bio. Ecol. 366, 99–103 (2008).
167. Azam, F. et al. The Ecological Role of Water-Column Microbes in the Sea. Mar. Ecol. Prog. Ser.
10, 257–263 (1983).
168. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive earth’s
biogeochemical cycles. Science vol. 320 1034–1039 (2008).
169. Hunter, C. N., Daldal, F., Thurnauer, M. C. & Beatty, J. T. The purple phototrophic bacteria.
vol. 28 (Springer Science & Business Media, 2008).
170. Pfennig, N. & Trüper, H. G. Taxonomy of phototrophic green and purple bacteria: A review.
Ann. l’Institut Pasteur / Microbiol. 134, 9–20 (1983).
171. Madigan, M. T. & Jung, D. O. An Overview of Purple Bacteria: Systematics, Physiology, and
Habitats. in 1–15 (Springer, Dordrecht, 2009). doi:10.1007/978-1-4020-8815-5_1.
172. Ehrenreich, A. & Widdel, F. Anaerobic oxidation of ferrous iron by purple bacteria, a new type
of phototrophic metabolism. Appl. Environ. Microbiol. 60, 4517–26 (1994).
173. Griffin, B. M., Schott, J. & Schink, B. Nitrite, an Electron Donor for Anoxygenic
Photosynthesis. Science (80-. ). 316, 1870–1870 (2007).
174. HARASHIMA, K., SHIBA, T., TOTSUKA, T., SIMIDU, U. & TAGA, N. Occurrence of
bacteriochlorophyll a in a strain of an aerobic heterotrophic bacterium. Agric. Biol. Chem. 42,
1627–1628 (1978).
175. Shiba, T., Simidu, U. & Taga, N. Distribution of aerobic bacteria which contain
bacteriochlorophyll a. Appl. Environ. Microbiol. 38, 43–5 (1979).
176. Beatty, J. T. On the natural selection and evolution of the aerobic phototrophic bacteria.
Photosynth. Res. 73, 109–114 (2002).
177. NISHIMURA, Y. et al. DNA relatedness and chemotaxonomic feature of aerobic
bacteriochlorophyll-containing bacteria isolated from coasts of Australia. J. Gen. Appl.
Microbiol. 40, 287–296 (1994).
178. Arai, H., Roh, J. H. & Kaplan, S. Transcriptome dynamics during the transition from anaerobic
photosynthesis to aerobic respiration in Rhodobacter sphaeroides 2.4.1. J. Bacteriol. 190, 286–99
(2008).
118
179. Swingley, W. D. et al. The complete genome sequence of Roseobacter denitrificans reveals a
mixotrophic rather than photosynthetic metabolism. J. Bacteriol. 189, 683–90 (2007).
180. Yurkov, V. V & Beatty, J. T. Aerobic anoxygenic phototrophic bacteria. Microbiol. Mol. Biol.
Rev. 62, 695–724 (1998).
181. Tang, K.-H., Feng, X., Tang, Y. J. & Blankenship, R. E. Carbohydrate Metabolism and Carbon
Fixation in Roseobacter denitrificans OCh114. PLoS One 4, e7233 (2009).
182. Wood, H. ., Werkman, C. ., Hemingway, A. & Nier, A. . The position of carbon dioxide carbon
in succinic acid synthesized by heterotrophic bacteria. J. Biol. Chem. 139, 377–381 (1941).
183. Cohen-Bazire, G., Sistrom, W. R. & Stanier, R. Y. Kinetic studies of pigment synthesis by non-
sulfur purple bacteria. J. Cell. Comp. Physiol. 49, 25–68 (1957).
184. Kolber, Z. S. et al. Contribution of aerobic photoheterotrophic bacteria to the carbon cycle in the
ocean. Science (80-. ). (2001) doi:10.1126/science.1059707.
185. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled
genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
186. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site
identification. BMC Bioinformatics 11, 1–11 (2010).
187. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for
Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 428, 726–731
(2016).
188. Graham, E. D., Heidelberg, J. F. & Tully, B. J. Binsanity: Unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 2017, e3035
(2017).
189. Eren, A. M. et al. Anvi’o: An advanced analysis and visualization platformfor ’omics data. PeerJ
2015, (2015).
190. Santos, S. R. & Ochman, H. Identification and phylogenetic sorting of bacterial lineages with
universally conserved genes and proteins. Environ. Microbiol. 6, 754–759 (2004).
191. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank.
Nucleic Acids Res. 33, D34 (2005).
192. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: Interactive sequence similarity
searching. Nucleic Acids Res. 39, W29–W37 (2011).
193. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222-30 (2014).
194. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic
Acids Res. 31, 371–3 (2003).
195. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space
complexity. BMC Bioinformatics 5, 1–19 (2004).
196. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
197. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 - Approximately maximum-likelihood trees
for large alignments. PLoS One 5, (2010).
198. Yutin, N., Suzuki, M. T. & Béjà, O. Novel primers reveal wider diversity among marine aerobic
anoxygenic phototrophs. Appl. Environ. Microbiol. 71, 8958–62 (2005).
199. Yutin, N. et al. Assessing diversity and biogeography of aerobic anoxygenic phototrophic
bacteria in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling
expedition metagenomes. Environ. Microbiol. 9, 1464–1475 (2007).
200. Béjà, O. et al. Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature
415, 630–633 (2002).
119
201. Markowitz, V. M. et al. IMG: The integrated microbial genomes database and comparative
analysis system. Nucleic Acids Res. 40, D115 (2012).
202. 71. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using
DIAMOND. Nature Methods vol. 12 59–60 (2014).
203. Tabita, F. R. et al. Function, Structure, and Evolution of the RubisCO-Like Proteins and Their
RubisCO Homologs. Microbiol. Mol. Biol. Rev. 71, 576–599 (2007).
204. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential
coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).
205. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database
search programs. Nucleic Acids Research vol. 25 3389–3402 (1997).
206. Schwalbach, M. S. & Fuhrman, J. A. Wide-ranging abundances of aerobic anoxygenic
phototrophic bacteria in the world ocean revealed by epifluorescence microscopy and quantitative
PCR. Limnol. Oceanogr. 50, 620–628 (2005).
207. Tully, B. J., Sachdeva, R., Graham, E. D. & Heidelberg, J. F. 290 metagenome-assembled
genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ 5, e3558 (2017).
208. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM:
assessing the quality of microbial genomes recovered from isolates, single cells, and
metagenomes. Genome Res. 25, 1043–55 (2015).
209. Moran, M. A. The global ocean microbiome. Science vol. 350 (2015).
210. Hamada, M., Toyofuku, M., Miyano, T. & Nomura, N. cbb3-type cytochrome c oxidases,
aerobic respiratory enzymes, impact the anaerobic life of Pseudomonas aeruginosa PAO1. J.
Bacteriol. 196, 3881–3889 (2014).
211. Giuffrè, A., Borisov, V. B., Arese, M., Sarti, P. & Forte, E. Cytochrome bd oxidase and bacterial
tolerance to oxidative and nitrosative stress. Biochimica et Biophysica Acta - Bioenergetics vol.
1837 1178–1187 (2014).
212. Boldareva-Nuianzina, E. N., Bláhová, Z., Sobotka, R. & Koblížek, M. Distribution and origin of
oxygen-dependent and oxygen-independent forms of mg-protoporphyrin monomethylester
cyclase among phototrophic proteobacteria. Appl. Environ. Microbiol. 79, 2596–2604 (2013).
213. Gough, S. P., Petersen, B. O. & Duus, J. Anaerobic chlorophyll isocyclic ring formation in
Rhodobacter capsulatus requires a cobalamin cofactor. Proc. Natl. Acad. Sci. U. S. A. 97, 6908–
6913 (2000).
214. Yurkov, V., Gad’on, N. & Drews, G. The major part of polar carotenoids of the aerobic bacteria
Roseococcus thiosulfatophilus RB3 and Erythromicrobium ramosum E5 is not bound to the
bacteriochlorophyll a-complexes of the photosynthetic apparatus. Arch. Microbiol. 160, 372–376
(1993).
215. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
216. Gómez-Consarnau, L. et al. Light stimulates growth of proteorhodopsin-containing marine
Flavobacteria. Nature 445, 210–213 (2007).
217. Johnston, W. et al. Carbon mass balance methodology to characterize the growth of pigmented
marine bacteria under conditions of light cycling. Bioprocess Biosyst. Eng. 27, 163–174 (2005).
218. Tomasch, J., Gohl, R., Bunk, B., Diez, M. S. & Wagner-Döbler, I. Transcriptional response of
the photoheterotrophic marine bacterium Dinoroseobacter shibae to changing light regimes.
ISME J. 5, 1957–68 (2011).
219. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
120
220. Tang, K. et al. An Aerobic Anoxygenic Phototrophic Bacterium Fixes CO2 via the Calvin-
Benson-Bassham Cycle 2 3 Running title: An AAnPB implements the CBB cycle. bioRxiv
2021.04.29.441244 (2021) doi:10.1101/2021.04.29.441244.
221. Fuhrman, J. A., McCallum, K. & Davis, A. A. Novel major archaebacterial group from marine
plankton. Nature 356, 148–149 (1992).
222. DeLong, E. F. Archaea in coastal marine environments. Proc. Natl. Acad. Sci. U. S. A. 89,
5685–9 (1992).
223. Könneke, M. et al. Isolation of an autotrophic ammonia-oxidizing marine archaeon. Nature 437,
543–546 (2005).
224. Hallam, S. J. et al. Pathways of carbon assimilation and ammonia oxidation suggested by
environmental genomic analyses of marine Crenarchaeota. PLoS Biol. 4, 520–536 (2006).
225. Francis, C. A., Roberts, K. J., Beman, J. M., Santoro, A. E. & Oakley, B. B. Ubiquity and
diversity of ammonia-oxidizing archaea in water columns and sediments of the ocean. Proc. Natl.
Acad. Sci. U. S. A. 102, 14683–14688 (2005).
226. Wuchter, C. et al. Archaeal nitrification in the ocean. Proc. Natl. Acad. Sci. U. S. A. 103,
12317–12322 (2006).
227. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially
revises the tree of life. Nat. Biotechnol. 36, 996 (2018).
228. Santoro, A. E., Richter, R. A. & Dupont, C. L. Planktonic marine archaea. Annual Review of
Marine Science vol. 11 131–158 (2019).
229. Oton, E. V., Quince, C., Nicol, G. W., Prosser, J. I. & Gubry-Rangin, C. Phylogenetic
congruence and ecological coherence in terrestrial Thaumarchaeota. ISME J. 10, 85–96 (2016).
230. Reji, L. & Francis, C. A. Metagenome-assembled genomes reveal unique metabolic adaptations
of a basal marine Thaumarchaeota lineage. ISME J. 14, 2105–2115 (2020).
231. Aylward, F. O. & Santoro, A. E. Heterotrophic Thaumarchaea with Small Genomes Are
Widespread in the Dark Ocean. mSystems 5, (2020).
232. Pester, M., Schleper, C. & Wagner, M. The Thaumarchaeota: An emerging view of their
phylogeny and ecophysiology. Curr. Opin. Microbiol. 14, 300–306 (2011).
233. Leininger, S. et al. Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature
442, 806–809 (2006).
234. Mincer, T. J. et al. Quantitative distribution of presumptive archaeal and bacterial nitrifiers in
Monterey Bay and the North Pacific Subtropical Gyre. Environ. Microbiol. 9, 1162–1175 (2007).
235. Nicol, G. W., Leininger, S., Schleper, C. & Prosser, J. I. The influence of soil pH on the
diversity, abundance and transcriptional activity of ammonia oxidizing archaea and bacteria.
Environ. Microbiol. 10, 2966–2978 (2008).
236. Santoro, A. E., Casciotti, K. L. & Francis, C. A. Activity, abundance and diversity of nitrifying
archaea and bacteria in the central California Current. Environ. Microbiol. 12, 1989–2006 (2010).
237. Verhamme, D. T., Prosser, J. I. & Nicol, G. W. Ammonia concentration determines differential
growth of ammonia-oxidising archaea and bacteria in soil microcosms. ISME J. 5, 1067–1071
(2011).
238. Stewart, F. J., Ulloa, O. & Delong, E. F. Microbial metatranscriptomics in a permanent marine
oxygen minimum zone. Environ. Microbiol. 14, 23–40 (2012).
239. Horak, R. E. A. et al. Ammonia oxidation kinetics and temperature sensitivity of a natural
marine community dominated by Archaea. ISME J. 7, 2023–2033 (2013).
240. Kowalchuk, G. A. & Stephen, J. R. Ammonia-Oxidizing Bacteria: A Model for Molecular
Microbial Ecology. http://dx.doi.org/10.1146/annurev.micro.55.1.485 55, 485–529 (2003).
121
241. Koch, H., Kessel, M. A. H. J. van & Lücker, S. Complete nitrification: insights into the
ecophysiology of comammox Nitrospira. Appl. Microbiol. Biotechnol. 2018 1031 103, 177–189
(2018).
242. Santoro, A. E. & Casciotti, K. L. Enrichment and characterization of ammonia-oxidizing archaea
from the open ocean: Phylogeny, physiology and stable isotope fractionation. ISME J. 5, 1796–
1808 (2011).
243. Vajrala, N. et al. Hydroxylamine as an intermediate in ammonia oxidation by globally abundant
marine archaea. Proc. Natl. Acad. Sci. U. S. A. 110, 1006–1011 (2013).
244. Walker, C. B. et al. Nitrosopumilus maritimus genome reveals unique mechanisms for
nitrification and autotrophy in globally distributed marine crenarchaea. Proc. Natl. Acad. Sci. U.
S. A. 107, 8818–8823 (2010).
245. Kozlowski, J. A., Dimitri Kits, K. & Stein, L. Y. Comparison of nitrogen oxide metabolism
among diverse ammonia-oxidizing bacteria. Front. Microbiol. 7, 1090 (2016).
246. Martens-Habbena, W. et al. The production of nitric oxide by marine ammonia-oxidizing
archaea and inhibition of archaeal ammonia oxidation by a nitric oxide scavenger. Environ.
Microbiol. 17, 2261–2274 (2015).
247. Kerou, M. et al. Genomes of Thaumarchaeota from deep sea sediments reveal specific
adaptations of three independently evolved lineages. ISME J. 1–17 (2021) doi:10.1038/s41396-
021-00962-6.
248. Carini, P., Dupont, C. L. & Santoro, A. E. Patterns of thaumarchaeal gene expression in culture
and diverse marine environments. Environ. Microbiol. 20, 2112–2124 (2018).
249. Bartossek, R., Spang, A., Weidler, G., Lanzen, A. & Schleper, C. Metagenomic analysis of
ammonia-oxidizing archaea affiliated with the soil group. Front. Microbiol. 3, (2012).
250. Tolar, B. B., Wallsgrove, N. J., Popp, B. N. & Hollibaugh, J. T. Oxidation of urea-derived
nitrogen by thaumarchaeota-dominated marine nitrifying communities. Environ. Microbiol. 19,
4838–4850 (2017).
251. Tully, B. J., Nelson, W. C. & Heidelberg, J. F. Metagenomic analysis of a complex marine
planktonic thaumarchaeal community from the Gulf of Maine. Environ. Microbiol. 14, 254–267
(2012).
252. Bayer, B. et al. Physiological and genomic characterization of two novel marine thaumarchaeal
strains indicates niche differentiation. ISME J. 10, 1051–1063 (2016).
253. Palatinszky, M. et al. Cyanate as an energy source for nitrifiers. Nature 524, 105–108
(2015).Santoro, A. E. et al. Genomic and proteomic characterization of ‘Candidatus
Nitrosopelagicus brevis’: An ammonia-oxidizing archaeon from the open ocean. Proc. Natl.
Acad. Sci. U. S. A. 112, 1173–1178 (2015).
254. Shiozaki, T. et al. Nitrification and its influence on biogeochemical cycles from the equatorial
Pacific to the Arctic Ocean. ISME J. 10, 2184–2197 (2016).
255. Smith, J. M., Damashek, J., Chavez, F. P. & Francis, C. A. Factors influencing nitrification rates
and the abundance and transcriptional activity of ammonia-oxidizing microorganisms in the dark
northeast Pacific Ocean. Limnol. Oceanogr. 61, 596–609 (2016).
256. Santoro, A. E. et al. Thaumarchaeal ecotype distributions across the equatorial Pacific Ocean
and their potential roles in nitrification and sinking flux attenuation. Limnol. Oceanogr. 62, 1984–
2003 (2017).
257. Sintes, E., De Corte, D., Haberleitner, E. & Herndl, G. J. Geographic distribution of archaeal
ammonia oxidizing ecotypes in the Atlantic Ocean. Front. Microbiol. 7, 77 (2016).
122
258. Alves, R. J. E., Minh, B. Q., Urich, T., Von Haeseler, A. & Schleper, C. Unifying the global
phylogeny and environmental distribution of ammonia-oxidising archaea based on amoA genes.
Nat. Commun. 9, 1–17 (2018).
259. Villanueva, L., Schouten, S. & Sinninghe Damsté, J. S. Depth-related distribution of a key gene
of the tetraether lipid biosynthetic pathway in marine Thaumarchaeota. Environ. Microbiol. 17,
3527–3539 (2015).
260. Sintes, E., Bergauer, K., De Corte, D., Yokokawa, T. & Herndl, G. J. Archaeal amoA gene
diversity points to distinct biogeography of ammonia-oxidizing Crenarchaeota in the ocean.
Environ. Microbiol. 15, 1647–1658 (2013).
261. Parada, A. E. & Fuhrman, J. A. Marine archaeal dynamics and interactions with the microbial
community over 5 years from surface to seafloor. ISME J. 11, 2510–2525 (2017).
262. Galand, P. E., Gutiérrez-Provecho, C., Massana, R., Gasol, J. M. & Casamayor, E. O. Inter-
annual recurrence of archaeal assemblages in the coastal NW Mediterranean Sea (Blanes Bay
Microbial Observatory). Limnol. Oceanogr. 55, 2117–2125 (2010).
263. Martens-Habbena, W., Berube, P. M., Urakawa, H., De La Torre, J. R. & Stahl, D. A. Ammonia
oxidation kinetics determine niche separation of nitrifying Archaea and Bacteria. Nature 461,
976–979 (2009).
264. Wan, X. S. et al. Ambient nitrate switches the ammonium consumption pathway in the euphotic
ocean. Nat. Commun. 9, 1–9 (2018).
265. Qin, W. et al. Stress response of a marine ammonia-oxidizing archaeon informs physiological
status of environmental populations. ISME J. 12, 508–519 (2018).
266. Amin, S. A. et al. Copper requirements of the ammonia-oxidizing archaeon Nitrosopumilus
maritimus SCM1 and implications for nitrification in the marine environment. Limnol. Oceanogr.
58, 2037–2045 (2013).
267. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from
metagenomic data. PeerJ 8, e10119 (2020).
268. Biller, S. J. et al. Data descriptor: Marine microbial metagenomes sampled across space and
time. Sci. Data 5, (2018).
269. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially
revises the tree of life. Nat. Biotechnol. (2018) doi:10.1038/nbt.4229.
270. Haroon, M. F., Thompson, L. R., Parks, D. H., Hugenholtz, P. & Stingl, U. A catalogue of 136
microbial draft genomes from Red Sea metagenomes. Sci. Data 3, 1–6 (2016).
271. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509
(2021).
272. Pachiadaki, M. G. et al. Charting the Complexity of the Marine Microbiome through Single-Cell
Genomics. Cell 179, 1623-1635.e11 (2019).
273. Paoli, L. et al. Uncharted biosynthetic potential of the ocean microbiome. bioRxiv (2021).
274. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor.
Bioinformatics 34, i884–i890 (2018).
275. Li, D. et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced
methodologies and community practices. Methods 102, 3–11 (2016).
276. Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-
assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
277. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-
generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
123
278. Treangen, T. J., Sommer, D. D., Angly, F. E., Koren, S. & Pop, M. Next generation sequence
assembly with AMOS. Curr. Protoc. Bioinforma. 1–18 (2011)
doi:10.1002/0471250953.bi1108s33.
279. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9,
357–359 (2012).
280. Woodcroft, B. J. CoverM. CoverM https://github.com/wwood/CoverM (2007).
281. Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for
assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
282. Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised clustering of
environmental microbial assemblies using coverage and affinity propagation. PeerJ 5, e3035
(2017).
283. Kang, D. D. et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome
reconstruction from metagenome assemblies. PeerJ 2019, (2019).
284. Lin, H. N. & Hsu, W. L. Kart: A divide-and-conquer algorithm for NGS read alignment.
Bioinformatics 33, 2281–2287 (2017).
285. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079
(2009).
286. Neely, C. J., Graham, E. D. & Tully, B. J. MetaSanity: an integrated microbial genome
evaluation and annotation pipeline. Bioinformatics 36, 4341–4344 (2020).
287. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify
genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).
288. Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High
throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat.
Commun. 9, (2018).
289. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM:
Assessing the quality of microbial genomes recovered from isolates, single cells, and
metagenomes. Genome Res. 25, 1043–1055 (2015).
290. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
291. Eren, A. M. et al. Anvi’o: An advanced analysis and visualization platformfor ’omics data. PeerJ
2015, (2015).
292. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site
identification. BMC Bioinformatics 11, 1–11 (2010).
293. Seemann, T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069
(2014).
294. Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and
adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
295. Zhang, H. et al. DbCAN2: A meta server for automated carbohydrate-active enzyme annotation.
Nucleic Acids Res. 46, W95–W101 (2018).
296. Rawlings, N. D., Barrett, A. J. & Bateman, A. MEROPS: The peptidase database. Nucleic Acids
Res. 38, D227–D233 (2009).
297. Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using
DIAMOND. Nat. Methods 18, 366–368 (2021).
298. Medema, M. H. et al. AntiSMASH: Rapid identification, annotation and analysis of secondary
metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids
Res. 39, W339 (2011).
124
299. Alcock, B. P. et al. CARD 2020: Antibiotic resistome surveillance with the comprehensive
antibiotic resistance database. Nucleic Acids Res. 48, D517–D525 (2020).
300. Yu, N. Y. et al. PSORTb 3.0: Improved protein subcellular localization prediction with refined
localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26,
1608–1615 (2010).
301. Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and
RNA viruses. Microbiome 9, 1–13 (2021).
302. Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep
neural networks. Nat. Biotechnol. 37, 420–423 (2019).
303. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. DRep: A tool for fast and accurate
genomic comparisons that enables improved genome recovery from metagenomes through de-
replication. ISME J. 11, 2864–2868 (2017).
304. Page, A. J. et al. Roary: Rapid large-scale prokaryote pan genome analysis. Bioinformatics 31,
3691–3693 (2015).
305. Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space
complexity. BMC Bioinformatics 5, 1–19 (2004).
306. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the
presence of recombination. Genetics 172, 2665–2681 (2006).
307. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 - Approximately maximum-likelihood trees
for large alignments. PLoS One 5, (2010).
308. Horesh, G. et al. Different evolutionary trends form the twilight zone of the bacterial pan-
genome. bioRxiv 2021.02.15.431222 (2021) doi:10.1101/2021.02.15.431222.
309. Schlitzer, R. et al. The GEOTRACES Intermediate Data Product 2017. Chem. Geol. 493, 210–
223 (2018).
310. Graham, E. D. & Tully, B. J. Marine Dadabacteria exhibit genome streamlining and
phototrophy-driven niche partitioning. ISME J. 15, 1248–1256 (2021).
311. Hammer, O., Harper, D. & Ryan, P. PAST: Paleontological Statistics Software Package for
Education and Data Analysis. Palaeontol. Electron. 4, 1–9 (2001).
312. DeLong, E. F. Exploring Marine Planktonic Archaea: Then and Now. Front. Microbiol. 11,
616086 (2021).
313. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM:
assessing the quality of microbial genomes recovered from isolates, single cells, and
metagenomes. Genome Res. 25, 1043–55 (2015).
314. Hu, J. et al. Ecological Success of the Nitrosopumilus and Nitrosospira Clusters in the Intertidal
Zone. Microb. Ecol. 78, 555–564 (2019).
315. Bayer, B. et al. Physiological and genomic characterization of two novel marine thaumarchaeal
strains indicates niche differentiation. ISME J. 10, 1051–1063 (2016).
316. Qin, W. et al. Nitrosopumilus maritimus gen. nov., sp. nov., Nitrosopumilus cobalaminigenes sp.
nov., Nitrosopumilus oxyclinae sp. nov., and Nitrosopumilus ureiphilus sp. nov., four marine
ammoniaoxidizing archaea of the phylum thaumarchaeo. Int. J. Syst. Evol. Microbiol. 67, 5067–
5079 (2017).
317. Murali, R., Gennis, R. B. & Hemp, J. Evolution of the cytochrome bd oxygen reductase
superfamily and the function of CydAA’ in Archaea. ISME J. 1–15 (2021) doi:10.1038/s41396-
021-01019-4.
318. Kletzin, A. et al. Cytochromes c in Archaea: Distribution, maturation, cell architecture, and the
special case of Ignicoccus hospitalis. Front. Microbiol. 6, 439 (2015).
125
319. Huerta-Cepas, J. et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated
orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–
D314 (2019).
320. Palenik, B. et al. The genome of a motile marine Synechococcus. Nature 424, 1037–1042
(2003).
321. Stahl, D. A. & Torre, J. R. de la. Physiology and Diversity of Ammonia-Oxidizing Archaea.
http://dx.doi.org/10.1146/annurev-micro-092611-150128 66, 83–101 (2012).
322. Kits, K. D. et al. Kinetic analysis of a complete nitrifier reveals an oligotrophiclifestyle. Nature
549, 269 (2017).
323. Jung, M.-Y. et al. Ammonia-oxidizing archaea possess a wide range of cellular ammonia
affinities. bioRxiv 2021.03.02.433310 (2021) doi:10.1101/2021.03.02.433310.
324. Xu, M., Schnorr, J., Keibler, B. & Simon, H. M. Comparative Analysis of 16S rRNA and amoA
Genes from Archaea Selected with Organic and Inorganic Amendments in Enrichment Culture.
Appl. Environ. Microbiol. 78, 2137 (2012).
325. Li, J. et al. amoA gene abundances and nitrification potential rates suggest that benthic
ammonia-oxidizing bacteria and not archaea dominate N cycling in the Colne estuary, United
Kingdom. Appl. Environ. Microbiol. 81, 159–165 (2015).
326. SJ, P., BJ, P. & SK, R. Comparative analysis of archaeal 16S rRNA and amoA genes to estimate
the abundance and diversity of ammonia-oxidizing archaea in marine sediments. Extremophiles
12, 605–615 (2008).
327. Rodriguez-Valera, F. et al. Explaining microbial population genomics through phage predation.
Nat. Rev. Microbiol. 7, 828–836 (2009).
328. Sharma, R. S., Mishra, V., Mohmmed, A. & Babu, C. R. Phage specificity and
lipopolysaccarides of stem- and root-nodulating bacteria (Azorhizobium caulinodans,
Sinorhizobium spp., and Rhizobium spp.) of Sesbania spp. Arch. Microbiol. 189, 411–418
(2008).
329. Reva, O. & Tümmler, B. Think big - Giant genes in bacteria. Environ. Microbiol. 10, 768–777
(2008).
330. Ivars-Martinez, E. et al. Comparative genomics of two ecotypes of the marine planktonic
copiotroph Alteromonas macleodii suggests alternative lifestyles associated with different kinds
of particulate organic matter. ISME J. 2, 1194–1212 (2008).
331. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Towards functional characterization of archaeal
genomic dark matter. Biochemical Society Transactions vol. 47 389–398 (2019).
332. Cole, J. J., Findlay, S. & Pace, M. L. Bacterial production in fresh and saltwater ecosystems: a
cross-system overview. Mar. Ecol. Prog. Ser. 43, 1–10 (1988).
333. Ducklow, H. W., Kirchman, D. L., Quinby, H. L., Carlson, C. A. & Dam, H. A. Stocks and
dynamics of bacterioplankton carbon during the spring bloom in the eastern North Atlantic
Ocean. Deep Sea Research II 40, 245–263 (1993).
334. Arnosti, C. Microbial Extracellular Enzymes and the Marine Carbon Cycle. Annu. Rev. Marine.
Sci. 3, 401–425 (2011).
335. Malik, A. A., Martiny, J. B. H., Brodie, E. L., Martiny, A. C., Treseder, K. K. & Allison, S. D.
Defining trait-based microbial strategies with consequences for soil carbon cycling under climate
change. ISME J 14, 1–9 (2019).
336. Vergin, K. L., Done, B., Carlson, C. A. & Giovannoni, S. J. Spatiotemporal distributions of rare
bacterioplankton populations indicate adaptive strategies in the oligotrophic ocean. Aquat.
Microb. Ecol. 71, 1–13 (2013).
126
337. Giovannoni, S. J., Cameron Thrash, J. & Temperton, B. Implications of streamlining theory for
microbial ecology. ISME J 8, 1553–1565 (2014).
338. Rocap, G., Larimer, F. W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, N. A., Arellano, A.,
Coleman, M., Hauser, L., Hess, W. R., Johnson, Z. I., Land, M., Lindell, D., Post, A. F., Regala,
W., Shah, M., Shaw, S. L., Steglich, C., Sullivan, M. B., Ting, C. S., Tolonen, A., Webb, E. A.,
Zinser, E. R. & Chisholm, S. W. Genome divergence in two Prochlorococcus ecotypes reflects
oceanic niche differentiation. Nature 424, 1042–1047 (2003).
339. Giovannoni, S. J., Tripp, H. J., Givan, S., Podar, M., Vergin, K. L., Baptista, D., Bibbs, L., Eads,
J., Richardson, T. H., Noordewier, M., Rappe, M. S., Short, J. M., Carrington, J. C. & Mathur, E.
J. Genome streamlining in a cosmopolitan oceanic bacterium. Science 309, 1242–1245 (2005).
340. Luo, H., Swan, B. K., Stepanauskas, R., Hughes, A. L. & Moran, M. A. Evolutionary analysis of
a streamlined lineage of surface ocean Roseobacters. ISME J 8, 1428–1439 (2014).
341. Castelle, C. J., Wrighton, K. C., Thomas, B. C., Hug, L. A., Brown, C. T., Wilkins, M. J.,
Frischkorn, K. R., Tringe, S. G., Singh, A., Markillie, L. M., Taylor, R. C., Williams, K. H. &
Banfield, J. F. Genomic Expansion of Domain Archaea Highlights Roles for Organisms from
New Phyla in Anaerobic Carbon Cycling. Current Biology 25, 690–701 (2015).
342. Brewer, T. E., Handley, K. M., Carini, P., Gilbert, J. A. & Fierer, N. Genome reduction in an
abundant and ubiquitous soil bacterium 'Candidatus Udaeobacter copiosus'. Nature Microbiology
2, 16198–7 (2016).
343. Neuenschwander, S. M., Ghai, R., Pernthaler, J. & Salcher, M. M. Microdiversification in
genome-streamlined ubiquitous freshwater Actinobacteria. ISME J 12, 185–198 (2018).
344. Getz, E. W., Tithi, S. S., Zhang, L. & Aylward, F. O. Parallel Evolution of Genome Streamlining
and Cellular Bioenergetics across the Marine Radiation of a Bacterial Phylum. mBio 9, 1034–14
(2018).
345. Wang, Z., Guo, F., Liu, L. & Zhang, T. Evidence of Carbon Fixation Pathway in a Bacterium
from Candidate Phylum SBR1093 Revealed with Genomic Analysis. PLoS ONE 9, e109571–9
(2014).
346. Tully, B. J., Graham, E. D., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft
metagenome-assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).
347. Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P.-A., Ben J Woodcroft, Evans, P. N.,
Hugenholtz, P. & Tyson, G. W. Recovery of nearly 8,000 metagenome-assembled genomes
substantially expands the tree of life. Nature Microbiology 2, 1–10 (2017).
348. Delmont, T. O., Quince, C., Shaiber, A., Esen, A. X. Z. C., Lee, S. T., Rappé, M. S. R.,
MacLellan, S. L., Lücker, S. L. X. & Eren, A. M. Nitrogen-fixing populations of Planctomycetes
and Proteobacteria are abundant in surface ocean metagenomes. Nature Microbiology 326, 1–12
(2018).
349. Hug, L. A., Baker, B. J., Anantharaman, K., Brown, C. T., Probst, A. J., Castelle, C. J.,
Butterfield, C. N., Hernsdorf, A. W., Amano, Y., Ise, K., Suzuki, Y., Dudek, N., Relman, D. A.,
Finstad, K. M., Amundson, R., Thomas, B. C. & Banfield, J. F. A new view of the tree of life.
Nature Microbiology 1, 16048 (2016).
350. Anantharaman, K., Brown, C. T., Hug, L. A., Sharon, I., Castelle, C. J., Probst, A. J., Thomas,
B. C., Singh, A., Wilkins, M. J., Karaoz, U., Brodie, E. L., Williams, K. H., Hubbard, S. S. &
Banfield, J. F. Thousands of microbial genomes shed light on interconnected biogeochemical
processes in an aquifer system. Nature Communications 7, 13219 (2016).
351. Hug, L. A., Thomas, B. C., Brown, C. T., Frischkorn, K. R., Williams, K. H., Tringe, S. G. &
Banfield, J. F. Aquifer environment selects for microbial species cohorts in sediment and
groundwater. ISME J 9, 1–11 (2019).
127
352. Kato, S., Sakai, S., Hirai, M., Tasumi, E., Nishizawa, M., Suzuki, K. & Takai, K. Long-Term
Cultivation and Metagenomics Reveal Ecophysiology of Previously Uncultivated Thermophiles
Involved in Biogeochemical Nitrogen Cycle. Microbes Environ. 33, 107–110 (2018).
353. Zhou, Z., Liu, Y., Xu, W., Pan, J., Luo, Z.-H. & Li, M. Genome- and Community-Level
Interaction Insights into Carbon Utilization and Element Cycling Functions of
Hydrothermarchaeotain Hydrothermal Sediment. mSys 5, 16002–17 (2020).
354. Ward, L. M., Idei, A., Nakagawa, M., Ueno, Y., Fischer, W. W. & McGlynn, S. E. Geochemical
and Metagenomic Characterization of Jinata Onsen, a Proterozoic-Analog Hot Spring, Reveals
Novel Microbial Diversity including Iron-Tolerant Phototrophs and Thermophilic Lithotrophs.
Microbes Environ. 34, 278–292 (2019).
355. Graham, E. D., Heidelberg, J. F. & Tully, B. J. Potential for primary productivity in a globally-
distributed bacterial phototroph. ISME J 350, 1–6 (2018).
356. Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., Lahti, L.,
Loman, N. J., Andersson, A. F. & Quince, C. Binning metagenomic contigs by coverage and
composition. Nat Meth 11, 1144–1146 (2014).
357. Eren, A. M., Eren, A. M., Esen, O. C., Esen, Ö. C., Quince, C., Vineis, J. H., Vineis, J. H.,
Morrison, H. G., Sogin, M. L. & Delmont, T. O. Anvi‘o: an advanced analysis and visualization
platform for ’omics data. PeerJ 3, e1319 (2015).
358. Neely, C. J., Graham, E. D. & Tully, B. J. MetaSanity: An integrated microbial genome
evaluation and annotation pipeline. Bioinformatics 10, D233 (2020).
359. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM:
assessing the quality of microbial genomes recovered from isolates, single cells, and
metagenomes. Genome Res. 25, 1043–1055 (2015).
360. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify
genomes with the Genome Taxonomy Database. Bioinformatics 1–3 (2019).
doi:10.1093/bioinformatics/btz848
361. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res. 32, 1792–1797 (2004).
362. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2--approximately maximum-likelihood trees
for large alignments. PLoS ONE 5, e9490 (2010).
363. Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and
annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–5 (2016).
364. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069
(2014).
365. Yin, Y., Mao, X., Yang, J., Chen, X., Mao, F. & Xu, Y. dbCAN: a web resource for automated
carbohydrate-active enzyme annotation. Nucleic Acids Res. 40, W445–W451 (2012).
366. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity
searching. Nucleic Acids Res. 39, W29–W37 (2011).
367. El-Gebali, S., Mistry, J., Bateman, A., Eddy, S. R., Luciani, A., Potter, S. C., Qureshi, M.,
Richardson, L. J., Salazar, G. A., Smart, A., Sonnhammer, E. L. L., Hirsh, L., Paladin, L.,
Piovesan, D., Tosatto, S. C. E. & Finn, R. D. The Pfam protein families database in 2019. Nucleic
Acids Res. 47, D427–D432 (2018).
368. Rawlings, N. D., Waller, M., Barrett, A. J. & Bateman, A. MEROPS: the database of proteolytic
enzymes, their substrates and inhibitors. Nucleic Acids Res. 42, D503–D509 (2013).
369. Yu, N. Y., Wagner, J. R., Laird, M. R., Melli, G., Rey, S. B., Lo, R., Dao, P., Sahinalp, S. C.,
Ester, M., Foster, L. J. & Brinkman, F. S. L. PSORTb 3.0: improved protein subcellular
128
localization prediction with refined localization subcategories and predictive capabilities for all
prokaryotes. Bioinformatics 26, 1608–1615 (2010).
370. Petersen, T. N., Brunak, S., Heijne, von, G. & Nielsen, H. SignalP 4.0: discriminating signal
peptides from transmembrane regions. Nat Meth 8, 785–786 (2011).
371. Aramaki, T., Blanc-Mathieu, R., Endo, H., Ohkubo, K., Kanehisa, M., Goto, S. & Ogata, H.
KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score
threshold. Bioinformatics 36, 2251–2252 (2020).
372. Huerta-Cepas, J., Forslund, K., Coelho, L. P., Szklarczyk, D., Jensen, L. J., Mering, von, C. &
Bork, P. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-
Mapper. Molecular Biology and Evolution 34, 2115–2122 (2017).
373. Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H.,
Mende, D. R., Letunic, I., Rattei, T., Jensen, L. J., Mering, von, C. & Bork, P. eggNOG 5.0: a
hierarchical, functionally and phylogenetically annotated orthology resource based on 5090
organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2018).
374. Blin, K., Shaw, S., Steinke, K., Villebro, R., Ziemert, N., Lee, S. Y., Medema, M. H. & Weber,
T. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids
Res. 79, 629–7 (2019).
375. Boeuf, D., Audic, S., Brillet-Guéguen, L., Caron, C. & Jeanthon, C. MicRhoDE: a curated
database for the analysis of microbial rhodopsin diversity and evolution. Database 2015, bav080–
8 (2015).
376. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. & Madden, T. L.
BLAST+: architecture and applications. BMC Bioinformatics 10, 421–9 (2009).
377. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND.
Nat Meth 12, 59–60 (2014).
378. Benedict, M. N., Henriksen, J. R., Metcalf, W. W., Whitaker, R. J. & Price, N. D. ITEP: An
integrated toolkit for exploration of microbial pan-genomes. BMC Genomics 15, (2014).
379. van Dongen, S. & Abreu-Goodger, C. in Metagenomics (ed. Anisimova, M.) 804, 281–295
(Springer New York, 2011).
380. Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High
throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature
Communications 9, 7200–8 (2018).
381. Tully, B. J. Metabolic diversity within the globally abundant Marine Group II Euryarchaea
offers insight into ecological patterns. Nature Communications 10, 1–12 (2019).
382. Biller, S. J., Berube, P. M., Dooley, K., Williams, M., Satinsky, B. M., Hackl, T., Hogle, S. L.,
Coe, A., Bergauer, K., Bouman, H. A., Browning, T. J., De Corte, D., Hassler, C., Hulston, D.,
Jacquot, J. E., Maas, E. W., Reinthaler, T., Sintes, E., Yokokawa, T. & Chisholm, S. W. Data
Descriptor: Marine microbial metagenomes sampled across space and time. Sci. Data 5, 1–7
(2018).
383. Sunagawa, S., Coelho, L. P., Chaffron, S., Kultima, J. R., Labadie, K., Salazar, G., Djahanschiri,
B., Zeller, G., Mende, D. R., Alberti, A., Cornejo-Castillo, F. M., Costea, P. I., Cruaud, C.,
d'Ovidio, F., Engelen, S., Ferrera, I., Gasol, J. M., Guidi, L., Hildebrand, F., Kokoszka, F.,
Lepoivre, C., Lima-Mendez, G., Poulain, J., Poulos, B. T., Royo-Llonch, M., Sarmento, H.,
Vieira-Silva, S., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans
Coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D.,
Jaillon, O., Not, F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M. B.,
Weissenbach, J., Wincker, P., Karsenti, E., Raes, J., Acinas, S. G. & Bork, P. Ocean plankton.
Structure and function of the global ocean microbiome. Science 348, 1261359–1261359 (2015).
129
384. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Meth 9, 357–
359 (2012).
385. Li, H., Handsaker, B., Fennell, T., Ruan, J. & Homer, N. The Sequence Alignment/Map format
and SAMtools. Bioinformatics 25, 2078–2079 (2009).
386. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for
assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
387. Graham, E. D., Graham, E. D., Heidelberg, J. F. & Tully, B. J. BinSanity: unsupervised
clustering of environmental microbial assemblies using coverage and affinity propagation. PeerJ
5, e3035–19 (2017).
388. Schlitzer, R., Anderson, R. F., Dodas, E. M., Lohan, M., Geibert, W., Tagliabue, A., Bowie, A.,
Jeandel, C., Maldonado, M. T., Landing, W. M., Cockwell, D., Abadie, C., Abouchami, W.,
Achterberg, E. P., Agather, A., Aguliar-Islas, A., van Aken, H. M., Andersen, M., Archer, C.,
Auro, M., de Baar, H. J., Baars, O., Baker, A. R., Bakker, K., Basak, C., Baskaran, M., Bates, N.
R., Bauch, D., van Beek, P., Behrens, M. K., Black, E., Bluhm, K., Bopp, L., Bouman, H.,
Bowman, K., Bown, J., Boyd, P., Boye, M., Boyle, E. A., Branellec, P., Bridgestock, L.,
Brissebrat, G., Browning, T., Bruland, K. W., Brumsack, H.-J., Brzezinski, M., Buck, C. S.,
Buck, K. N., Buesseler, K., Bull, A., Butler, E., Cai, P., Mor, P. C., Cardinal, D., Carlson, C.,
Carrasco, G., Casacuberta, N., Casciotti, K. L., Castrillejo, M., Chamizo, E., Chance, R.,
Charette, M. A., Chaves, J. E., Cheng, H., Chever, F., Christl, M., Church, T. M., Closset, I.,
Colman, A., Conway, T. M., Cossa, D., Croot, P., Cullen, J. T., Cutter, G. A., Daniels, C.,
Dehairs, F., Deng, F., Dieu, H. T., Duggan, B., Dulaquais, G., Dumousseaud, C., Echegoyen-
Sanz, Y., Edwards, R. L., Ellwood, M., Fahrbach, E., Fitzsimmons, J. N., Flegal, A. R., Fleisher,
M. Q., van de Flierdt, T., Frank, M., Friedrich, J., Fripiat, F., Fröllje, H., Galer, S. J. G., Gamo,
T., Ganeshram, R. S., Garcia-Orellana, J., Garcia-Solsona, E., Gault-Ringold, M., George, E.,
Gerringa, L. J. A., Gilbert, M., Godoy, J. M., Goldstein, S. L., Gonzalez, S. R., Grissom, K.,
Hammerschmidt, C., Hartman, A., Hassler, C. S., Hathorne, E. C., Hatta, M., Hawco, N., Hayes,
C. T., Heimbürger, L.-E., Helgoe, J., Heller, M., Henderson, G. M., Henderson, P. B., van
Heuven, S., Ho, P., Horner, T. J., Hsieh, Y.-T., Huang, K.-F., Humphreys, M. P., Isshiki, K.,
Jacquot, J. E., Janssen, D. J., Jenkins, W. J., John, S., Jones, E. M., Jones, J. L., Kadko, D. C.,
Kayser, R., Kenna, T. C., Khondoker, R., Kim, T., Kipp, L., Klar, J. K., Klunder, M., Kretschmer,
S., Kumamoto, Y., Laan, P., Labatut, M., Lacan, F., Lam, P. J., Lambelet, M., Lamborg, C. H.,
Le Moigne, F. A. C., Le Roy, E., Lechtenfeld, O. J., Lee, J.-M., Lherminier, P., Little, S., López-
Lora, M., Lu, Y., Masque, P., Mawji, E., Mcclain, C. R., Measures, C., Mehic, S., Barraqueta, J.-
L. M., van der Merwe, P., Middag, R., Mieruch, S., Milne, A., Minami, T., Moffett, J. W.,
Moncoiffe, G., Moore, W. S., Morris, P. J., Morton, P. L., Nakaguchi, Y., Nakayama, N.,
Niedermiller, J., Nishioka, J., Nishiuchi, A., Noble, A., Obata, H., Ober, S., Ohnemus, D. C., van
Ooijen, J., O'Sullivan, J., Owens, S., Pahnke, K., Paul, M., Pavia, F., Pena, L. D., Peters, B.,
Planchon, F., Planquette, H., Pradoux, C., Puigcorbé, V., Quay, P., Queroue, F., Radic, A.,
Rauschenberg, S., Rehkämper, M., Rember, R., Remenyi, T., Resing, J. A., Rickli, J., Rigaud, S.,
Rijkenberg, M. J. A., Rintoul, S., Robinson, L. F., Roca-Martí, M., Rodellas, V., Roeske, T.,
Rolison, J. M., Rosenberg, M., Roshan, S., van der Loeff, M. M. R., Ryabenko, E., Saito, M. A.,
Salt, L. A., Sanial, V., Sarthou, G., Schallenberg, C., Schauer, U., Scher, H., Schlosser, C.,
Schnetger, B., Scott, P., Sedwick, P. N., Semiletov, I., Shelley, R., Sherrell, R. M., Shiller, A. M.,
Sigman, D. M., Singh, S. K., Slagter, H. A., Slater, E., Smethie, W. M., Snaith, H., Sohrin, Y.,
Sohst, B., Sonke, J. E., Speich, S., Steinfeldt, R., Stewart, G., Stichel, T., Stirling, C. H.,
Stutsman, J., Swarr, G. J., Swift, J. H., Thomas, A., Thorne, K., Till, C. P., Till, R., Townsend, A.
T., Townsend, E., Tuerena, R., Twining, B. S., Vance, D., Velazquez, S., Venchiarutti, C., Villa-
130
Alfageme, M., Vivancos, S. M., Voelker, A. H. L., Wake, B., Warner, M. J., Watson, R., van
Weerlee, E., Weigand, M. A., Weinstein, Y., Weiss, D., Wisotzki, A., Woodward, E. M. S., Wu,
J., Wu, Y., Wuttig, K., Wyatt, N., Xiang, Y., Xie, R. C., Xue, Z., Yoshikawa, H., Zhang, J.,
Zhang, P., Zhao, Y., Zheng, L., Zheng, X.-Y., Zieringer, M., Zimmer, L. A., Ziveri, P., Zunino, P.
& Zurbrick, C. The GEOTRACES Intermediate Data Product 2017. Chemical Geology 493, 210–
223 (2018).
389. Hammer, Ø., Harper, D. & Ryan, P. D. PAST: Paleontological statistics software package for
education and data analysis. Palaeontologia Electronica 4, 9 (2001).
390. Grote, J., Thrash, J. C., Huggett, M. J., Landry, Z. C., Carini, P., Giovannoni, S. J. & Rappe, M.
S. Streamlining and Core Genome Conservation among Highly Divergent Members of the
SAR11 Clade. mBio 3, e00252–12–e00252–12 (2012).
391. Luo, H., Thompson, L. R., Stingl, U. & Hughes, A. L. Selection Maintains Low Genomic GC
Content in Marine SAR11 Lineages. Molecular Biology and Evolution 32, 2738–2748 (2015).
392. Martinez-Gutierrez, C. A. & Aylward, F. O. Strong Purifying Selection Is Associated with
Genome Streamlining in Epipelagic Marinimicrobia. Genome Biology and Evolution 11, 2887–
2894 (2019).
393. Swan, B. K., Tupper, B., Sczyrba, A., Lauro, F. M., Martinez-Garcia, M., González, J. M., Luo,
H., Wright, J. J., Landry, Z. C., Hanson, N. W., Thompson, B. P., Poulton, N. J., Schwientek, P.,
Acinas, S. G., Giovannoni, S. J., Moran, M. A., Hallam, S. J., Cavicchioli, R., Woyke, T. &
Stepanauskas, R. Prevalent genome streamlining and latitudinal divergence of planktonic bacteria
in the surface ocean. Proceedings of the National Academy of Sciences 110, 11463–11468
(2013).
394. Béjá, O., Aravind, L., Koonin, E. V., Suzuki, M. T., Hadd, A., Nguyen, L. P., Jovanovich, S.,
Gates, C. M., Feldman, R. A., Spudich, J. L., Spudich, E. N. & DeLong, E. F. Bacterial
rhodopsin: Evidence for a new type of phototrophy in the sea. Science 289, 1902–1906 (2000).
395. Béjá, O., Spudich, E. N., Spudich, J. L., Leclerc, M. & DeLong, E. F. Proteorhodopsin
phototrophy in the ocean. Nature 411, 786–789 (2001).
396. Man, D., Wang, W., Sabehi, G., Aravind, L., Post, A. F., Massana, R., Spudich, E. N., Spudich,
J. L. & Béjà, O. Diversification and spectral tuning in marine proteorhodopsins. EMBO J. 22,
1725–1731 (2003).
397. Gershenzon, J. & Dudareva, N. The function of terpene natural products in the natural world.
Nat Chem Biol 3, 408–414 (2007).
398. Wang, W.-W., Sineshchekov, O. A., Spudich, E. N. & Spudich, J. L. Spectroscopic and
Photochemical Characterization of a Deep Ocean Proteorhodopsin. Journal of Biological
Chemistry 278, 33985–33991 (2003).
399. Bork, P. et al. Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction.
Science (New York, N.Y.) 348, 873 (2015).
400. Vicuña, R. & González, B. The microbial world in a changing environment. Revista Chilena de
Historia Natural 2021 94:1 94, 1–5 (2021).
401. Xiao, T. & Zhou, W. The third generation sequencing: the advanced approach to genetic
diseases. Translational Pediatrics 9, 163 (2020).
131
Appendix
The following pages contain all supplemental information for the chapters presented in his dissertation
Chapter 1 Supplemental Figures and Tables
Chapter 1 has previously been published in peerJ and both the article content and all supplementary
data can be found under the DOI: 10.7717/peerj.3035
Supplemental Figure 1: Clustering results for diverse-mixture-2 using BinSanity, BinSanity+refinement,
CONCOCT, MetaBat, MyCC, MaxBin, and GroopM at five in silico metagenomes (visualized via
Anvi’o) Black dashed boxes highlight bins in each method that contained contigs from two or more
reference organisms. White represents those contigs that were left unclustered.
132
Supplemental Figure 2: Clustering results for the strain-mixture using BinSanity, BinSanity+refinement,
CONCOCT, MetaBat, MyCC, MaxBin, and GroopM at five in silico metagenomes (visualized via Anvi’o
Black dashed boxes highlight bins in each method that contained contigs from two or more reference
organisms. White represents those contigs that were left unclustered.
Supplemental Table S1: Reference Genomes used to generate Artificial metagenomes for benchmarking
diverse-mixture-1 diverse-mixture-2 strain-mixture
Organisms Genbank Accession Organisms GenBank
Accession ID
Organisms GenBank
Accession ID
Thermoplasma
acidophilum
NC_002578 Thermoplasma
acidophilum DSM
1728
NC_002578 Escherichia_coli_str
_K-12
NC_000913
Thermoplasma
volcanium
NC_002689 Thermoplasma
volcanium GSS1
DNA
NC_002689 Thermoplasma_acid
ophilum_DSM_172
8
NC_002578
Geobacter
sulfurreducens
NC_002939 Geobacter
sulfurreducens PCA
chromosome
NC_002939 Thermoplasma_volc
anium_GSS1_DNA
NC_002689
Lactobacillus
plantarum
NC_004567 Lactobacillus
plantarum WCFS1
NC_004567 Geobacter_sulfurred
ucens_PCA
NC_002939
Haloarcula
marismortui
NC_006396 Haloarcula
marismortui ATCC
43049
NC_006396 Lactobacillus_planta
rum_WCFS1
NC_004567
Methanosarcina
barkeri
NC_007355 Methanosarcina
barkeri str Fusaro
NC_007355 Haloarcula_marismo
rtui_ATCC_43049
NC_006396
Geobacter
metallireducens
NC_007517 Geobacter
metallireducens GS-
15
NC_007517 Methanosarcina_bar
keri_str_Fusaro
NC_007355
Synechococcus
elongatus
NC_007604 Synechococcus
elongatus PCC 7942
NC_007604 Geobacter_metallire
ducens_GS-15
NC_007517
Roseobacter
denitrificans
NC_008209 Roseobacter
denitrificans OCh
114
NC_008209 Synechococcus_elon
gatus_PCC_7942
NC_007604
Oenococcus oeni NC_008528 Oenococcus oeni
PSU-1
NC_008528 Roseobacter_denitrif
icans_OCh_114
NC_008209
Methanobrevibacter
smithii
NC_009515 Methanobrevibacter
smithii ATCC
35061
NC_009515 Oenococcus_oeni_P
SU-1
NC_008528
133
Thermoanaerobacter
pseudethanolicus
NC_010321 Thermoanaerobacter
pseudethanolicus
ATCC 33223
NC_010321 Methanobrevibacter
_smithii_ATCC_35
061
NC_009515
Geobacter lovleyi NC_010814 Geobacter lovleyi
SZ
NC_010814 Escherichia_ferguso
nii_ATCC
NC_011740
Tolumonas auensis NC_012691 Tolumonas auensis
DSM 9187
NC_012691 Escherichia_coli_U
MN026
NC_011751
Lactobacillus
rhamnosus
NC_013198 Lactobacillus
rhamnosus GG
NC_013198 Escherichia_coli_O8
3H1
NC_017634
Veillonella parvula NC_013520 Veillonella parvula
DSM 2008
NC_013520 Desulfurococcus_fer
mentans_DSM_165
32
NC_018001
Acidaminococcus
fermentans
NC_013740 Acidaminococcus
fermentans DSM
20731
NC_013740 Alistipes_finegoldii
_DSM_17242
NC_018011
Methanobrevibacter
ruminantium
NC_013790 Methanobrevibacter
ruminantium M1
NC_013790 Clostridium_aciduri
ci_9a_plasmid_pCur
i3
NC_018657
Thermoanaerobacter
italicus
NC_013921 Thermoanaerobacter
italicus Ab9
NC_013921 Escherichia_coli_01
04_H4
NC_018658
Haloferax volcanii NC_013967 Haloferax volcanii
DS2
NC_013967 Dehalobacter_sp_C
F
NC_018867
Thermosphaera
aggregans
NC_014160 Thermosphaera
aggregans DSM
11486
NC_014160 Escherichia_albertii
_KF1
NZ_CP007025
Desulfurococcus
mucosus
NC_014961 Desulfurococcus
mucosus DSM 2162
NC_014961 Methanosarcina_sici
liae_T4M
NZ_CP009506
Roseobacter litoralis NC_015730 Roseobacter litoralis
Och 149
NC_015730 Haloferax_gibbonsii
_strain_ARA6
NZ_CP011947
Streptococcus
pseudopneumoniae
NC_015876 Streptococcus
pseudopneumoniae
IS7493 plasmid
pDRPIS7493
NC_015876 1_Halamonas_haun
gheensis_Strain_BJ
GMM_B45
NZ_CP013106
Haloarcula
hispanica
NC_015948 Haloarcula
hispanica ATCC
33960
NC_015948 Alteromonas_stellip
olaris_LMG_21856
NZ_CP013120
Acidaminococcus
intestini
NC_016077 Acidaminococcus
intestini RyC-MR95
NC_016077
Tannerella forsythia NC_016610 Tannerella forsythia
92A2
NC_016610
Selenomonas
ruminantium subsp
lactilytica
NC_017072 Selenomonas
ruminantium subsp
lactilytica
TAM6421 plasmid
pSRC9 DNA
NC_017072
Streptococcus
thermophilus ND03
NC_017563 Streptococcus
thermophilus ND03
NC_017563
Streptococcus
parasanguinis
NC_017905 Streptococcus
parasanguinis
FW213
NC_017905
Haloferax
mediterranei
NC_017941 Haloferax
mediterranei ATCC
33500
NC_017941
Thermogladius
cellulolyticus
NC_017954 Thermogladius
cellulolyticus 1633
NC_017954
Desulfurococcus
fermentans
NC_018001 Desulfurococcus
fermentans DSM
16532
NC_018001
Alistipes finegoldii NC_018011 Alistipes finegoldii
DSM 17242
NC_018011
Clostridium
acidurici
NC_018657 Clostridium
acidurici 9a plasmid
pCuri3
NC_018657
Dehalobacter sp NC_018867 Dehalobacter sp CF NC_018867
134
Ruminococcus
bromii
NC_021013 Ruminococcus
bromii L2-63 draft
genome
NC_021013
Ruminococcus
torques
NC_021015 Ruminococcus
torques L2-14 draft
genome
NC_021015
Coprococcus sp NC_021018 Coprococcus sp
ART551 draft
genome
NC_021018
Ruminococcus
obeum
NC_021022 Ruminococcus
obeum A2-162 draft
genome
NC_021022
Alistipes shahii NC_021030 Alistipes shahii
WAL 8301 draft
genome
NC_021030
Spiroplasma
syrphidicola
NC_021284 Spiroplasma
syrphidicola EA-1
NC_021284
Erysipelothrix
rhusiopathiae
NC_021354 Erysipelothrix
rhusiopathiae
SY1027
NC_021354
Spiroplasma
taiwanense
NC_021832 Spiroplasma
taiwanense CT-1
plasmid
NC_021832
Spiroplasma
diminutum
NC_021833 Spiroplasma
diminutum CUAS-1
NC_021833
Methanosarcina
siciliae
NZ_CP00506 Methanosarcina
siciliae T4M
NZ_CP009506
Haloferax gibbonsii NZ_CP011947 Nitrospira
moscoviensis strain
NSP M-1
NZ_CP011847
Halamonas
haungheensis
NZ_CP013106 Haloferax gibbonsii
strain ARA6
NZ_CP011901
Alteromonas
stellipolaris
NZ_CP013120 Halamonas
haungheensis Strain
BJGMM B45
NZ_CP01316
Alteromonas addita NZ_CP014322 Alteromonas
stellipolaris LMG
21856
NZ_CP013120
Table 2.1 CheckM results for BinSanity+refinement on the strain mixture
Bin Id Marker
lineage
No. of
genomes
No. of
markers
No. of
marker sets
0 1 2 3 4 5
+
Co
mpl
eten
ess
Contam
ination/
Redund
ancy
Strain
heteroge
neity
metagenome
-strain-edit-
bin_7
p__Euryarch
aeota
(UID49)
95 228 153 0 225 3 0 0 0 100 1.96 0
metagenome
-strain-edit-
bin_4
c__Deltaprot
eobacteria
(UID3216)
83 247 155 1 242 4 0 0 0 99.3
5
2.1 0
metagenome
-strain-edit-
bin_25
p__Euryarch
aeota
(UID49)
95 228 153 2 223 3 0 0 0 99.1
8
1.36 0
metagenome
-strain-edit-
bin_12
p__Euryarch
aeota (UID3)
148 188 125 2 178 8 0 0 0 99.1
4
4.62 0
metagenome
-strain-edit-
bin_23
c__Clostridi
a (UID1085)
35 420 196 4 410 6 0 0 0 99.1 1.66 0
metagenome
-strain-edit-
bin_9
p__Cyanoba
cteria
(UID2143)
129 472 368 4 440 2
8
0 0 0 99.0
5
6.49 0
metagenome
-strain-edit-
bin_28
c__Gammap
roteobacteria
(UID4761)
52 693 297 5 675 1
3
0 0 0 98.9
3
1.87 0
135
metagenome
-strain-edit-
bin_6
f__Halobact
eriaceae
(UID85)
51 395 250 5 384 6 0 0 0 98.8 2.07 0
metagenome
-strain-edit-
bin_5
o__Lactobac
illales
(UID462)
85 367 162 3 353 1
1
0 0 0 98.4
6
4.55 0
metagenome
-strain-edit-
bin_27
c__Gammap
roteobacteria
(UID4444)
263 506 232 6 489 1
1
0 0 0 98.3
7
2.07 0
metagenome
-strain-edit-
bin_10
f__Rhodoba
cteraceae
(UID3360)
56 582 313 8 566 8 0 0 0 98.3
4
1.55 0
metagenome
-strain-edit-
bin_18
c__Thermop
rotei
(UID148)
41 245 158 4 234 7 0 0 0 98.3
4
3.27 0
pass1-refine-
bin_3
f__Enteroba
cteriaceae
(UID5124)
134 1173 336 3
6
109
3
4
3
1 0 0 98.3 4.62 6.52
metagenome
-strain-edit-
bin_3
p__Euryarch
aeota (UID3)
148 186 123 3 179 4 0 0 0 97.9
7
1.01 0
metagenome
-strain-edit-
bin_24
f__Enteroba
cteriaceae
(UID5124)
134 1173 336 4
2
111
9
1
2
0 0 0 97.9
6
1.13 0
metagenome
-strain-edit-
bin_19
p__Bacteroi
detes
(UID2605)
350 314 208 6 298 1
0
0 0 0 97.3
6
2.7 0
metagenome
-strain-edit-
bin_11
o__Lactobac
illales
(UID374)
471 341 187 8 320 1
3
0 0 0 96.8
8
2.92 0
metagenome
-strain-edit-
bin_26
f__Halobact
eriaceae
(UID96)
40 417 263 1
1
390 1
6
0 0 0 96.8
4
3.27 0
metagenome
-strain-edit-
bin_8
c__Deltaprot
eobacteria
(UID3216)
83 247 155 1
0
231 6 0 0 0 96.7
9
2.78 0
pass1-refine-
bin_0
f__Enteroba
cteriaceae
(UID5124)
134 1168 336 9
0
106
2
1
6
0 0 0 96.3
5
1.39 0
metagenome
-strain-edit-
bin_2
p__Euryarch
aeota (UID3)
148 186 123 7 171 8 0 0 0 95.8
9
5.35 0
metagenome
-strain-edit-
bin_20
o__Clostridi
ales
(UID1120)
304 249 142 1
0
234 5 0 0 0 95.4
8
2.02 0
pass1-refine-
bin_4
k__Bacteria
(UID203)
5449 104 58 7 38 2
0
1
1
1 2
7
91.3
8
68.39 99.5
pass1-refine-
bin_2
k__Bacteria
(UID203)
5449 104 58 3
0
50 2
4
0 0 0 84.6
4
13.79 95.83
pass1-refine-
bin_1
k__Bacteria
(UID203)
5449 104 58 5
7
40 7 0 0 0 68.9
7
8.62 100
unbinned root (UID1) 5656 56 24 5
6
0 0 0 0 0 0 0 0
Supplemental table 2.2 Raw percentage of contigs in a MAG belonging to a specific reference genome as binned by BinSanity
100.0 % of contigs in metagenome-strain-edit-bin_26.fna are from NZ_CP011947_Haloferax_gibbonsii_strain_ARA6_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_24.fna are from NZ_CP007025_Escherichia_albertii_KF1_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_23.fna are from NC_018867_1_Dehalobacter_sp_CF_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_4.fna are from NC_002939_5_Geobacter_sulfurreducens_PCA_chromosome_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_27.fna are from
NZ_CP013106_1_Halamonas_haungheensis_Strain_BJGMM_B45_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_12.fna are from NC_009515_Methanobrevibacter_smithii_ATCC_35061_edit.fna
136
100.0 % of contigs in metagenome-strain-edit-bin_3.fna are from NC_002689_Thermoplasma_volcanium_GSS1_DNA_edit.fna
98.275862069 % of contigs in pass1-refine-bin_2.fna are from NC_017634_Escherichia_coli_O83H1_edit.fna
1.72413793103 % of contigs in pass1-refine-bin_2.fna are from NC_011740_Escherichia_fergusonii_ATCC_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_7.fna are from NC_007355_Methanosarcina_barkeri_str_Fusaro_edit.fna
0.516795865633 % of contigs in metagenome-strain-edit-bin_9.fna are from NC_017634_Escherichia_coli_O83H1_edit.fna
99.4832041344 % of contigs in metagenome-strain-edit-bin_9.fna are from NC_007604_Synechococcus_elongatus_PCC_7942_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_11.fna are from NC_008528_Oenococcus_oeni_PSU-1_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_18.fna are from NC_018001_Desulfurococcus_fermentans_DSM_16532_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_8.fna are from NC_007517_Geobacter_metallireducens_GS-15_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_19.fna are from NC_018011_1_Alistipes_finegoldii_DSM_17242_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_28.fna are from NZ_CP013120_Alteromonas_stellipolaris_LMG_21856_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_20.fna are from NC_018657_1_Clostridium_acidurici_9a_plasmid_pCuri3_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_6.fna are from NC_006396_Haloarcula_marismortui_ATCC_43049_edit.fna
81.4903846154 % of contigs in pass1-refine-bin_4.fna are from NC_018658_Escherichia_coli_0104_H4_edit.fna
0.240384615385 % of contigs in pass1-refine-bin_4.fna are from NZ_CP007025_Escherichia_albertii_KF1_edit.fna
4.08653846154 % of contigs in pass1-refine-bin_4.fna are from NC_017634_Escherichia_coli_O83H1_edit.fna
0.721153846154 % of contigs in pass1-refine-bin_4.fna are from NC_000913_Escherichia_coli_str_K-12_edit.fna
0.961538461538 % of contigs in pass1-refine-bin_4.fna are from NC_011740_Escherichia_fergusonii_ATCC_edit.fna
0.240384615385 % of contigs in pass1-refine-bin_4.fna are from NC_008528_Oenococcus_oeni_PSU-1_edit.fna
12.2596153846 % of contigs in pass1-refine-bin_4.fna are from NC_011751_Escherichia_coli_UMN026_edit.fna
100.0 % of contigs in metagenome-strain-edit-bin_10.fna are from NC_008209_Roseobacter_denitrificans_OCh_114_edit.fna
10.781671159 % of contigs in metagenome-strain-edit-bin_5.fna are from NC_000913_Escherichia_coli_str_K-12_edit.fna
89.218328841 % of contigs in metagenome-strain-edit-bin_5.fna are from NC_004567_Lactobacillus_plantarum_WCFS1_edit.fna
0.337268128162 % of contigs in metagenome-strain-edit-bin_25.fna are from NC_007517_Geobacter_metallireducens_GS-15_edit.fna
99.6627318718 % of contigs in metagenome-strain-edit-bin_25.fna are from NZ_CP009506_Methanosarcina_siciliae_T4M_edit.fna
100.0 % of contigs in pass1-refine-bin_0.fna are from NC_011740_Escherichia_fergusonii_ATCC_edit.fna
93.085106383 % of contigs in metagenome-strain-edit-bin_2.fna are from NC_002578_Thermoplasma_acidophilum_DSM_1728_edit.fna
6.91489361702 % of contigs in metagenome-strain-edit-bin_2.fna are from NC_011751_Escherichia_coli_UMN026_edit.fna
99.4388327722 % of contigs in pass1-refine-bin_3.fna are from NC_000913_Escherichia_coli_str_K-12_edit.fna
0.561167227834 % of contigs in pass1-refine-bin_3.fna are from NC_011751_Escherichia_coli_UMN026_edit.fna
28.3707865169 % of contigs in pass1-refine-bin_1.fna are from NC_018658_Escherichia_coli_0104_H4_edit.fna
0.842696629213 % of contigs in pass1-refine-bin_1.fna are from NC_017634_Escherichia_coli_O83H1_edit.fna
70.7865168539 % of contigs in pass1-refine-bin_1.fna are from NC_011751_Escherichia_coli_UMN026_edit.fna
Supplemental table 2.3 Raw percentage of contigs in a specific reference genome in a MAG as binned by BinSanity
100.0 % of contigs from NZ_CP011947_Haloferax_gibbonsii_strain_ARA6_edit.fna are in metagenome-strain-edit-bin_26.fna
77.0454545455 % of contigs from NC_018658_Escherichia_coli_0104_H4_edit.fna are in pass1-refine-bin_4.fna
22.9545454545 % of contigs from NC_018658_Escherichia_coli_0104_H4_edit.fna are in pass1-refine-bin_1.fna
99.5762711864 % of contigs from NZ_CP007025_Escherichia_albertii_KF1_edit.fna are in metagenome-strain-edit-bin_24.fna
0.423728813559 % of contigs from NZ_CP007025_Escherichia_albertii_KF1_edit.fna are in pass1-refine-bin_4.fna
100.0 % of contigs from NC_018001_Desulfurococcus_fermentans_DSM_16532_edit.fna are in metagenome-strain-edit-bin_18.fna
99.5 % of contigs from NC_007517_Geobacter_metallireducens_GS-15_edit.fna are in metagenome-strain-edit-bin_8.fna
137
0.5 % of contigs from NC_007517_Geobacter_metallireducens_GS-15_edit.fna are in metagenome-strain-edit-bin_25.fna
95.9595959596 % of contigs from NC_017634_Escherichia_coli_O83H1_edit.fna are in pass1-refine-bin_2.fna
0.673400673401 % of contigs from NC_017634_Escherichia_coli_O83H1_edit.fna are in metagenome-strain-edit-bin_9.fna
2.86195286195 % of contigs from NC_017634_Escherichia_coli_O83H1_edit.fna are in pass1-refine-bin_4.fna
0.505050505051 % of contigs from NC_017634_Escherichia_coli_O83H1_edit.fna are in pass1-refine-bin_1.fna
100.0 % of contigs from NC_018011_1_Alistipes_finegoldii_DSM_17242_edit.fna are in metagenome-strain-edit-bin_19.fna
100.0 % of contigs from NC_018657_1_Clostridium_acidurici_9a_plasmid_pCuri3_edit.fna are in metagenome-strain-edit-bin_20.fna
0.32292787944 % of contigs from NC_000913_Escherichia_coli_str_K-12_edit.fna are in pass1-refine-bin_4.fna
4.3057050592 % of contigs from NC_000913_Escherichia_coli_str_K-12_edit.fna are in metagenome-strain-edit-bin_5.fna
95.3713670614 % of contigs from NC_000913_Escherichia_coli_str_K-12_edit.fna are in pass1-refine-bin_3.fna
100.0 % of contigs from NZ_CP013106_1_Halamonas_haungheensis_Strain_BJGMM_B45_edit.fna are in metagenome-strain-edit-
bin_27.fna
100.0 % of contigs from NZ_CP013120_Alteromonas_stellipolaris_LMG_21856_edit.fna are in metagenome-strain-edit-bin_28.fna
2.17864923747 % of contigs from NC_011740_Escherichia_fergusonii_ATCC_edit.fna are in pass1-refine-bin_2.fna
0.871459694989 % of contigs from NC_011740_Escherichia_fergusonii_ATCC_edit.fna are in pass1-refine-bin_4.fna
96.9498910675 % of contigs from NC_011740_Escherichia_fergusonii_ATCC_edit.fna are in pass1-refine-bin_0.fna
99.8324958124 % of contigs from NC_008528_Oenococcus_oeni_PSU-1_edit.fna are in metagenome-strain-edit-bin_11.fna
0.167504187605 % of contigs from NC_008528_Oenococcus_oeni_PSU-1_edit.fna are in pass1-refine-bin_4.fna
100.0 % of contigs from NC_008209_Roseobacter_denitrificans_OCh_114_edit.fna are in metagenome-strain-edit-bin_10.fna
100.0 % of contigs from NZ_CP009506_Methanosarcina_siciliae_T4M_edit.fna are in metagenome-strain-edit-bin_25.fna
100.0 % of contigs from NC_007355_Methanosarcina_barkeri_str_Fusaro_edit.fna are in metagenome-strain-edit-bin_7.fna
100.0 % of contigs from NC_002689_Thermoplasma_volcanium_GSS1_DNA_edit.fna are in metagenome-strain-edit-bin_3.fna
100.0 % of contigs from NC_002578_Thermoplasma_acidophilum_DSM_1728_edit.fna are in metagenome-strain-edit-bin_2.fna
100.0 % of contigs from NC_004567_Lactobacillus_plantarum_WCFS1_edit.fna are in metagenome-strain-edit-bin_5.fna
100.0 % of contigs from NC_007604_Synechococcus_elongatus_PCC_7942_edit.fna are in metagenome-strain-edit-bin_9.fna
100.0 % of contigs from NC_002939_5_Geobacter_sulfurreducens_PCA_chromosome_edit.fna are in metagenome-strain-edit-bin_4.fna
100.0 % of contigs from NC_018867_1_Dehalobacter_sp_CF_edit.fna are in metagenome-strain-edit-bin_23.fna
100.0 % of contigs from NC_006396_Haloarcula_marismortui_ATCC_43049_edit.fna are in metagenome-strain-edit-bin_6.fna
100.0 % of contigs from NC_009515_Methanobrevibacter_smithii_ATCC_35061_edit.fna are in metagenome-strain-edit-bin_12.fna
14.6974063401 % of contigs from NC_011751_Escherichia_coli_UMN026_edit.fna are in pass1-refine-bin_4.fna
11.2391930836 % of contigs from NC_011751_Escherichia_coli_UMN026_edit.fna are in metagenome-strain-edit-bin_2.fna
1.4409221902 % of contigs from NC_011751_Escherichia_coli_UMN026_edit.fna are in pass1-refine-bin_3.fna
72.6224783862 % of contigs from NC_011751_Escherichia_coli_UMN026_edit.fna are in pass1-refine-bin_1.fna
Supplemental Table 3: Contig assignments for all binning methods (table inserted below as PDF, but is
also accessible at doi: 10.7717/peerj.3035/supp-5)
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
Chapter 2 Supplemental Figures and Tables
Supplemental Figure 1. MAG MED800 visualized in Anvi’o with layers being coverage in all samples from the
Mediterranean colored by depth, GC Content and contig length. These features paired with tetranucleotide frequencies
(e.g the dendrogram order) were used to manual refine the genome and remove contigs with significantly variant signals.
181
Supplemental Figure 2. MAG EACE638 visualized in Anvi’o with layers being coverage in all samples from the
Mediterranean colored by depth, GC Content and contig length. These features paired with tetranucleotide frequencies
(e.g the dendrogram order) were used to manual refine the genome and remove contigs with significantly variant signals.
182
Supplemental Figure 3. MAG SAT68 visualized in Anvi’o with layers being coverage in all samples from the
Mediterranean colored by depth, GC Content and contig length. These features paired with tetranucleotide frequencies
(e.g the dendrogram order) were used to manual refine the genome and remove contigs with significantly variant signals.
183
Supplemental Figure 4. MAG NP970 visualized in Anvi’o with layers being coverage in all samples from the
Mediterranean colored by depth, GC Content and contig length. These features paired with tetranucleotide frequencies
(e.g the dendrogram order) were used to manual refine the genome and remove contigs with significantly variant signals.
184
185
Supplemental Figure 5. A phylogenetic tree of pufM sequences built with PufM sequences from this study as well as
collected from previously described lineages
67,68
, bacterial artificial chromosomes
69
, from genomes in Integrated
Microbial Genomes
70
, and metagenomes from the San Pedro Ocean Time Series. based on their KEGG Ontology
annotation (K08929; Supplemental Information 5 and 6). Global Ocean Survey (GOS) assemblies
8
with predicted CDS
(as above) were searched using the collected PufM sequences with a DIAMOND BLASTP search
71
(v.0.8.36.98; default
settings; Supplemental Information 2 and 3).
186
Chapter 3 Supplemental Figures and Tables
Chapter 3 has previously been published in ISME and both the article content and all supplementary
data can be found under the DOI: https://doi.org/10.1038/s41396-020-00834-5
187
188
189
190
191
Chapter 4 Supplemental Figures and Tables
Supplementary Figure 1. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG1 (Top
left), BG2 (top right), BG5 (bottom left), and BG7 (bottom right) along the GEOTRACES transect GA03. Cruise Track
for GA03 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
192
Supplementary Figure 2. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG1 (Top
left), BG2 (top right), BG5 (bottom left), and BG7 (bottom right) along the GEOTRACES transect GA10. Cruise Track
for GA10 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
193
Supplementary Figure 3. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG1 (Top
left), BG2 (top right), BG5 (bottom left), and BG7 (bottom right) along the GEOTRACES transect GP13. Cruise Track
for GP13 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
194
Supplementary Figure 4. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG8 (Top
left), BG9 (top right), BG11 (bottom left), and BG12 (bottom right) along the GEOTRACES transect GA03. Cruise Track
for GA03 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
195
Supplementary Figure 5. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG8 (Top
left), BG9 (top right), BG11 (bottom left), and BG12 (bottom right) along the GEOTRACES transect GA10. Cruise Track
for GA10 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
196
Supplementary Figure 6. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG8 (Top
left), BG9 (top right), BG11 (bottom left), and BG12 (bottom right) along the GEOTRACES transect GP13. Cruise Track
for GP13 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
197
Supplementary Figure 7. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG4 (Top
left), BG3 (top right), BG6 (bottom left), and BG10 (bottom right) along the GEOTRACES transect GA03. Cruise Track
for GA03 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
198
Supplementary Figure 8. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG4 (Top
left), BG3 (top right), BG6 (bottom left), and BG10 (bottom right) along the GEOTRACES transect GA10. Cruise Track
for GA10 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
199
Supplementary Figure 9. Ocean Data View plots of percent relative fraction for the Nitrosopelagicus groups BG4 (Top
left), BG3 (top right), BG6 (bottom left), and BG10 (bottom right) along the GEOTRACES transect GP13. Cruise Track
for GP13 is shown below ODV maps with the red circle denoting the start of the cruise track or 0km.
Abstract (if available)
Abstract
Metagenomics, the direct sequencing of the genomic content of the entire community, is increasingly being used to better understand the uncultivated majority in the environment. This dissertation develops a novel method to extract near complete draft genomes from metagenomic datasets and demonstrates the usefulness of subsampling approaches for assembling large metagenomic datasets. Using these frameworks, I have discovered a unique aerobic anoxygenic phototroph coined ‘Ca. Luxescamonaceae’ that possessed the genomic potential for anoxygenic phototrophy and carbon fixation. I uncovered the largest set of marine Dadabacteria genomes to date showing phototrophy driven depth partitioning and genome streamlining. Finally I analyzes the largest set of ammonia oxidizing Thaumarchaeota genomes yielding insight into the distinction between the unique clades and potential partitioning due to substrate affinity.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Identifying functional metabolic guilds: a computational approach to classifying heterotrophic diversity in the marine system
PDF
Microbial ecology in the deep terrestrial biosphere: a geochemical, metagenomic and culture-based approach
PDF
Ecophysiology of important understudied bacterioplankton through an integrated research and education approach
PDF
Spatial and temporal dynamics of marine microbial communities and their diazotrophs in the Southern California Bight
PDF
Patterns of molecular microbial activity across time and biomes
PDF
Marine protistan diversity, spatiotemporal dynamics, and physiological responses to environmental cues
PDF
Microbe to microbe: monthly microbial community dynamics and interactions at the San Pedro Ocean Time-series
PDF
Genetic characterization of microbial eukaryotic diversity and metabolic potential
PDF
Developing statistical and algorithmic methods for shotgun metagenomics and time series analysis
PDF
Temporal variability of marine archaea across the water column at SPOT
PDF
Characterizing protistan diversity and quantifying protistan grazing in the North Pacific Subtropical Gyre
PDF
Changes in the community composition of marine microbial eukaryotes across multiple temporal scales of measurement
PDF
Iron-dependent response mechanisms of the nitrogen-fixing cyanobacterium Crocosphaera to climate change
PDF
Feature engineering and supervised learning on metagenomic sequence data
PDF
Unexplored microbial communities in marine sediment porewater
PDF
Deep learning in metagenomics: from metagenomic contigs sorting to phage-bacterial association prediction
PDF
Multi-dataset analysis of bacterial heterotrophic variability at the San Pedro Ocean Time-series (SPOT): an investigation into the necessity and feasibility of incorporating a dynamic bacterial c...
PDF
Molecular ecology of marine cyanobacteria: microbial assemblages as units of natural selection
PDF
Constructing metagenome-assembled genomes and mobile genetic element host interactions using metagenomic Hi-C
PDF
Dynamics of marine bacterial communities from surface to bottom and the factors controling them
Asset Metadata
Creator
Graham, Elaina D.
(author)
Core Title
Enhancing recovery of understudied and uncultured lineages from metagenomes
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Biology (Marine Biology and Biological Oceanography)
Publication Date
12/06/2021
Defense Date
07/16/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
ammonia oxidizing archaea,anoxygenic phototroph,Dadabacteria,metagenomics,microbial ecology,OAI-PMH Harvest,Thaumarchaeota
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fuhrman, Jed (
committee member
), John, Seth (
committee member
), Thrash, Cameron (
committee member
), Webb, Eric (
committee member
)
Creator Email
edgraham@usc.edu,egraham147@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC17895960
Unique identifier
UC17895960
Legacy Identifier
etd-GrahamElai-10280
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Graham, Elaina D.
Type
texts
Source
20211208-wayne-usctheses-batch-902-nissen
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
ammonia oxidizing archaea
anoxygenic phototroph
Dadabacteria
metagenomics
microbial ecology
Thaumarchaeota