Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Peripheral blood mononuclear cell capture and sequencing: optimization of a droplet based capture method and its applications
(USC Thesis Other)
Peripheral blood mononuclear cell capture and sequencing: optimization of a droplet based capture method and its applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PERIPHERAL BLOOD MONONUCLEAR CELL CAPTURE AND SEQUENCING:
OPTIMIZATION OF A DROPLET BASED CAPTURE METHOD AND ITS APPLICATIONS
by
Sonia Ter-Saakyan
A Thesis Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
MOLECULAR MICROBIOLOGY AND IMMUNOLOGY
May 2021
Copyright 2021 Sonia Ter-Saakyan
ii
Dedication
To my father, Arshak Ter-Saakyan (1967-2003)
iii
Acknowledgements
This work was carried out in the Department of Molecular Microbiology and Immunology at the
Keck School of Medicine, University of Southern California. I am indebted and grateful to many
people, without whom I would not have been able to complete this work.
Firstly, I would like to thank deeply my mentor and principal investigator, Dr. Ha Youn
Lee for her help and guidance during this process. She was aware of the tumultuous times in my
life and provided me with the time and support I needed to complete my research. Without her,
this project and thesis would not exist.
Next, thank you to the current and former members of the Lee Lab: Dr. Sung Yong Park,
who assisted in development of the protocol and the equipment used in the experiments; Gina
Faraci, who’s support and guidance helped me to complete the experiments and my thesis;
Youpeng Zou, without whose collaboration the code and project would not come together; Sayan
Nanda, who conducted the post sequencing data analysis; and Carl Yu, who contributed to the
early optimization of the protocol.
I would also like to thank my committee members: Dr. Weiming Yuan, Dr. Hyungjin Eoh,
and Dr. Lucio Comai whose advice was integral to the completion of this thesis. A special thanks
also to Dr. Yuan and his lab for providing us with cells in culture to run our experiment.
Finally, I would like to thank my mother Armine Perchimyan and my closest friends Edgar
Gabriyelyan, Chase DiBenedetto, and Ahana Dhakal for their undying faith, love, support, and
belief in me. Without all of them, I would not have made it into or through this program.
iv
TABLE OF CONTENTS
Dedication………………………………………………………………………………………....ii
Acknowledgements ...…………………………………………………………………………….iii
List of Tables.………………………………………………………………………………….….vi
List of Figures…………………………………………………………………………………....vii
Abbreviations……………………………………………………………………………………viii
Abstract…………………………………………………………………………………………... x
Chapter 1: Introduction ……………………………………………………………………………1
1.1 Sequencing Technologies 1
1.1.1 Single-Cell RNA Sequencing. 1
1.2 Applications of Single-Cell RNA Sequencing 3
1.2.1 Clinical Applications 3
1.2.2 Cell Classification 4
Chapter 2: Materials and Methods 5
2.1 Blood Processing 5
2.2 DropSeq Protocol 6
2.2.1 Cell Preparation 6
2.2.2 Cell Capture, Lysis, and Droplet Generation 7
2.2.3 Droplet Breakage 11
2.2.4 Reverse Transcription and Exonuclease Treatment 12
2.3 Quality Control of Droplets 13
2.3.1 Imaging of Droplets 14
2.3.2 Quality Check 15
2.3.3 Bead Recovery 17
2.4 Amplification and cDNA Cleanup 18
2.5 Nextera Library Preparation 19
2.6 Next Generation Sequencing of Library 23
Chapter 3: Results of DropSeq and Exploration of Other scRNA-seq Platforms 24
3.1 Results of DropSeq Experiment 24
3.1.1 Protocol Optimization 24
3.1.2 PBMC25 28
3.1.3 HEK293T/3T3 32
3.1.4 Sequencing Results 36
3.2 10X scRNA-seq Platform 39
3.2.1 Introduction 39
3.2.2 Advantages over DropSeq 40
v
Chapter 4: Algorithm Development 42
4.1 Introduction to scRNA-seq Data Analysis Methods 42
4.2 Parameter Selection: LM22 43
4.3 Pearson Correlation Based Algorithm 44
4.3.1 Results 45
Chapter 5: Practical Applications 47
5.1 HIV: Elite Controllers vs Rapid Progressors 47
5.2 COVID-19 Immune Profiling 48
Chapter 6: Summary and Future Directions 51
6.1 Summary 51
6.2 Future Directions 52
6.2.1 Further DropSeq Optimization 52
6.2.2 Future Projects 53
References 54
vi
List of Tables
Table 2.1 Recipe for Lysis Buffer 7
Table 2.2 Recipe for Reverse Transcription Mix 12
Table 2.3 Recipe for Exonuclease Mix 13
Table 2.4 Recipe for PCR Mix 18
Table 2.5 PCR Conditions 18
Table 2.6 Recipe for Nextera Tagmentation 21
Table 2.7 Recipe for Nextera PCR 22
Table 2.8 Nextera PCR Program 22
Table 4.1 PBMC Subtypes Identifiable by LM22 Matrix 44
vii
List of Figures
Figure 2.1 Structure of mRNA Capture Bead Used in DropSeq 8
Figure 2.2 Diagram of Needle Set Up in Autosampler Vial 9
Figure 2.3 Diagram of Microfluidic Chip Used in DropSeq Protocol 10
Figure 2.4 High-speed Camera Still of Microfluidic Chip During Droplet Generation 10
Figure 2.5 Microscope Images of Droplets After DropSeq Protocol 15
Figure 2.6 Structure of Amplification Ready cDNA 18
Figure 2.7 Library Construct for Next Generation Sequencing 23
Figure 3.1 Comparison Between 20 and 25 Cycle PCR on Samples and Control 26
Figure 3.2 Image of Stained Cells in Countess 3 Automated Cell Counter 28
Figure 3.3 Post-Nextera TapeStation Distribution of PBMC25 31
Figure 3.4 Pre-Nextera TapeStation Distribution of PBMC25 31
Figure 3.5 4,000 Bead TapeStation Curve for HEK293T/3T3 Cells 33
Figure 3.6 2,000 Bead TapeStation Curve for HEK293T/3T3 Cells 34
Figure 3.7 Post-Nextera TapeStation Distribution of HEK293T/3T3 Cells 35
Figure 3.8 UMI per Cell and Average Counts per UMI in PBMC25 36
Figure 3.9 UMI per Cell and Average Counts per UMI in HEK293T/3T3 Mixed Sample 37
Figure 3.10 Human vs Mouse Gene Mapping in 1,000 cells from HEK293T/3T3 Sample 37
Figure 4.1 Comparison of Literature and Algorithm PBMC Subtype Percentages 46
viii
Abbreviations
AIDS: acquired immunodeficiency syndrome
ARDS: acute respiratory distress syndrome
BSA: bovine serum albumin
cDNA: complementary DNA
COVID-19: coronavirus disease-19
DMSO: dimethyl sulfoxide
DPBS: Dulbecco’s phosphate-buffered saline
EDTA: ethylene diamine tetraacetic acid
FACS: flow-activated cell sorting
FBS: fetal bovine serum
GEM: gel bead-in emulsion
HEK293T: human embryonic kidney 293T cell
HIV: human immunodeficiency virus
ICU: intensive care unit
IMDM: Iscove’s modified Dulbecco’s medium
mRNA: messenger RNA
PBMC: peripheral blood mononuclear cell
PBS: phosphate-buffered saline
PCR: polymerase chain reaction
PFO: perfluorooctanoic acid
qRT-PCR: qualitative real-time PCR
RT: reverse transcription
ix
scRNA-seq: single cell RNA sequencing
SDS: sodium dodecyl sulfate
SSC: saline-sodium citrate
STAMP: single-cell transcriptome attached to microparticle
TE-SDS: tris-EDTA-SDS
TE-TW: tris-EDTA-Tween
TME: tumor microenvironment
UMI: unique molecular identifier
x
Abstract
We performed peripheral blood mononuclear cell isolation to obtain gene expression data at the
single cell resolution. We generated single cell RNA sequencing data of PBMCs beginning from
whole blood, focusing on a droplet based cellular isolation method which allows for transcript
capture and library generation. Specifically, we isolated single cells within droplets, captured
mRNA transcripts, transcribed them into cDNA, and generated a sequencing library. We optimized
the single cell RNA sequencing workflow by adjusting pump pressures, identifying ideal solution
concentrations, testing PCR conditions, and attempting a within-droplet variation of our protocol.
For our PBMC sample after optimization, we saw an average of 309 UMI per cell with 34 reads
per UMI. We also performed droplet capture on a mixture of HEK293T and 3T3 mouse cells to
test the resolution of our system. In the mixed cell sample, we saw no doublets of the human and
mouse cell types which demonstrates the capacity to obtain a single cell resolution using this
platform. The limitations of the generated data including the low UMI counts and the low reads
per UMI are discussed, along with potential reasons why this may have occurred. Our workflow
is also compared to another scRNA sequencing platform. The utilization of this pipeline in
bioinformatic cell classification is briefly explored. Finally, we discussed the potential applications
of this approach, including HIV patient immune profiling, potential development of
immunotherapeutics, and COVID-19 immune profiling.
1
Chapter 1: Introduction
1.1 Sequencing Technologies
1.1.1 Single-Cell RNA Sequencing
Transcriptome sequencing investigates messenger RNA (mRNA) which allows for in
depth analysis of what genes are being actively transcribed and expressed by cells (Ozsolak &
Milos, 2010). The ability to measure a single cell’s gene expression has become one of the most
popular research tools in the recent years. The ability to analyze individual cells has allowed
researchers to illuminate the widely diverse cell populations present in tissues and organs. While
knowledge of diversity among cell populations remains partial, investigative tools such as single-
cell RNA sequencing (scRNA-seq) can uncover the transcriptome of individual cells (Macosko,
et al, 2015), revealing heterogeneity even among the same cell type (Tang, et al, 2009). This
advancement has led to the ability to analyze specific cellular responses to a variety of changes
that may be imposed upon the cell, providing information on function and cell-to-cell
communication. These include changes in the extracellular environment, responses to drug
delivery, and changes in the cell’s transcriptome in a disease state. A highly adaptable technology
such as this can be critical in the advancement of drug discovery and in the understanding of
various diseases.
The basic principle of the method involves single-cell isolation, RNA extraction and
amplification, and sequencing of the single-cell transcriptome (Tang, et al, 2019). A variety of
isolation methods have been explored, the most commonly used method being flow-activated cell
sorting (FACS) (Saliba, et al. 2014). This method utilizes fluorescently labeled antibodies to
isolate cells based on specific cell-surface markers that are of interest (Saliba, et al, 2014). One
potential limitation of this method is the need for antibodies that are specific to the target cell-
2
surface protein, however antibodies are being constantly produced with the assistance of projects
such as the Human Protein Atlas, thereby allowing for isolation with whatever marker one deems
fit (Saliba, et al, 2014). Other methods such as micromanipulation and the use of optical tweezers
require manual handling of single cells from a cell population; this can be problematic due to the
effort of manual handling and the time-consuming nature, resulting in a very low throughput
(Saliba, et al, 2014). Engineering advances have allowed for the use of microfluidic devices to
isolate cells, which can be beneficial when working with small volumes or in platforms that utilize
droplet generation for scRNA-sequencing (Saliba, et al, 2014).
The most popular method to study single-cell gene expression was qualitative real-time
polymerase chain reaction (qRT-PCR), but this has recently changed since the throughput is
limited and there is a bias toward genes analyzed, as specific primers are required to amplify the
transcripts (Saliba, et al, 2014). The most common method employed to capture and analyze the
single-cell transcriptome involves reverse transcription of RNA into complementary DNA
(cDNA), second-strand cDNA synthesis, and finally, cDNA amplification (Saliba, et al, 2014). In
order to obtain a significant amount of information regarding cell-cell heterogeneity, the
sequencing of hundreds, if not thousands, of cells from a sample is necessary, and therefore the
ability to maintain single-cell resolution requires cell and gene barcoding. For a platform such as
DropSeq, the utilization of mRNA capture beads allows for both cellular and molecular barcoding,
allowing high-throughput flow of cells while ensuring that individual cells and genes can be
identified downstream (Macosko, et al, 2015). The single cell RNA sequencing pipeline consists
of the following procedures: a high-throughput workflow to maximize capture, the lysis of cells,
the reverse transcription of mRNA, and the amplification of the resulting cDNA, all while allowing
for cellular and molecular barcoding while maintaining a single-cell resolution. Certain protocols,
3
such as the one used by Tang, et. al, involve manually picking up single cells under a microscope,
but more recent protocols have fully automated single-cell capture and cDNA generation withing
a single droplet (2009; Macosko, et al, 2015). These methods allow for robust analysis on a
massively wide scale, providing the user with large amounts of transcriptomic data in a relatively
sensitive assay.
1.2 Applications of Single-Cell RNA Sequencing
1.2.1 Clinical Applications
Single-cell sequencing has had applications in a variety of different clinical settings including
cancer, the nervous system, reproductive medicine, and immunology (Tang, et al, 2019). Single
cell sequencing in cancer can help to uncover the characteristics of the various cells residing within
tumor tissue. Tang, et al suggest that traditional methods of sequencing can mask the heterogeneity
that is present among tumor cells, causing an issue when it comes to understanding cell-cell and
cell-tumor microenvironment (TME) interactions (2019). In 2018, Zhang, et al utilized single-cell
sequencing in colorectal cancer cells to map T cell receptors, thereby revealing 20 individual
subsets of T cells with varying functions and different clonalities. Furthermore, single-cell
sequencing has been utilized to study brain cell types and how these cells interact with one another
during neural development (Tang, et al, 2019). Carter, et al performed single-cell transcriptomic
analysis on cerebellar cells in mice, uncovering specific cellular populations that contribute to
development in the cerebellum (2018). Additionally, single-cell sequencing technology has been
applied to the field of reproductive medicine, allowing for “the discovery of key regulators for
specific stages of male germ cell development” (Tang, et al, 2009). Particularly relevant are the
applications in immunology which can assist in uncovering the immune status of the body (Tang,
et al, 2019). One study, which performed single-cell sequencing on CD4+ T cells in both young
4
and old mice, uncovered that aging results in increased heterogeneity among these cells, where in
the younger mice, cell-cell variability is lower (Martinez-Jimenez, et al, 2017). Single-cell
sequencing has been crucial in biomarker discovery, prognosis determination, and disease
diagnosis and treatment.
1.2.2 Cell Classification
Single-cell RNA sequencing technology has been used to identify cell subtypes based on their
transcriptomes (Tasic, 2018). Most commonly, clustering algorithms are used to identify cell-cell
similarities to be able to group a widely diverse cell population into a few different cell types
(Tasic, 2018). Cell-type classification by gene expression mainly relies on marker genes that
known to be expressed in that specific cell type. A multitude of reference gene expression profiles
have been designed in an effort to classify cells based on their transcriptomes (Schelker, et al,
2017).
5
Chapter 2: Materials and Methods
The following experimental workflow, apart from the blood processing step, is based on the
DropSeq paper published by the McCarroll Lab at Harvard University (Macosko, et al, 2015). It
was critical that we optimized the protocol to handle peripheral blood mononuclear cells (PBMC)
specifically since they differ in both mRNA content and size from the cells used in the original
protocol. The original protocol was developed for a mixture of human embryonic kidney 293T
(HEK293T) and 3T3 mouse cells; the mRNA content of PBMC is lower than these cells – about
half the amount found in HEK293T cells (Average RNA Yields, 2009). Further, the sizes of these
cells are reported as an average of 7um for PBMC, 13um for HEK293T cells, and 18um for the
3T3 mouse cells (Kuse, et al, 1985; An Overview of HEK-239 Cell Line; Cell Data Sheet
NIH/3T3). Thus, we had to change the following conditions to optimize this workflow for our
needs: increasing the bead concentration from 120 beads/µL to 500 beads/µL, increasing the cell
solution concentration from 100 cells/µL to 300 cells/µL, and utilizing a different carrier oil
(Macosko, et al, 2015).
2.1 Blood Processing
The whole blood specimen of a healthy individual with 58 mL was used. The blood was obtained
from the Gulf Coast Regional Blood Center, from where it was shipped overnight and processed
immediately the following morning. The blood was transferred into two, separate 50mL conical
tubes. The blood was centrifuged at 400 x g for 10 minutes, and the plasma was removed to within
5mm of the buffy coat layer. A total of 11mL of plasma was moved into a separate 15mL tube and
spun at 1200 x g for 10 minutes. The plasma was then aliquoted into 11, 1mL cryovials and stored.
Enough Dulbecco’s phosphate-buffered saline (DPBS) was added to each original 50mL tube to
bring the blood back to its original volume. Enough diluted blood was removed to have a final
6
volume of 20mL in each 50mL conical tube. 20mL of Histopaque® was added to 2 50mL conical
centrifuge tubes and 20mL of diluted whole blood was carefully layered on top of the
Histopaque®. The tubes were centrifuged at 400 x g for 30 minutes with the brake off (deceleration
set to 0). Clear separation was visible between layers; the top plasma-DPBS layer was removed
within 2cm of the buffy coat and discarded. The cloudy PBMC layer was removed (20mL) and
25mL of DPBS was added to the PBMC to reach a total volume of 45mL. The diluted cells were
centrifuged at 400 x g for 8 minutes and a large cell pellet was observed. The supernatant was
removed, and the pellet was resuspended in 45mL of DPBS and centrifuged at 400 x g for 10
minutes. The supernatant was removed, and the pellet resuspended in 45mL of DPBS and
centrifuged at 200 x g for 10 minutes. The supernatant was removed, and the cells were
resuspended in cryopreservation media consisting of 10% dimethyl sulfoxide (DMSO) and 90%
fetal bovine serum (FBS) so that 1mL aliquots each contained 1.96 x 10
6
cells/mL with 90%
viability. Cells were stored in cryovials in liquid nitrogen.
2.2 DropSeq Protocol
2.2.1 Cell Preparation
Cells were thawed in a 37ºC water bath until there were a few ice crystals remaining in the vial.
Using a wide-bore pipet tip, the cells were transferred into a 15mL conical tube. 1 mL of thawing
media consisting of 10% FBS in Iscove’s modified Dulbecco’s medium (IMDM) was added to the
cryovial to slowly rinse it, then was transferred to the 15mL tube; 9 additional mL of thawing
media was added for a final volume of 11mL. The tube was inverted gently 3-5 times and
centrifuged for 10 minutes at 400 x g. The supernatant was removed, and the pellet resuspended
in 10mL of thawing media, inverted 15 times, and centrifuged at 400 x g for 10 minutes. The
supernatant was removed, and the pellet resuspended in 1mL of 0.08% bovine serum albumin/
7
phosphate-buffered saline (BSA/PBS) solution. This solution was filtered through a 30-micron
cell strainer into a new 15mL conical tube and centrifuged for 5 min at 300 x g. The cells were
washed again using the same solution and strained again using the 30-micron cell strainer. The
solution was diluted 1:4 and 10µL was mixed with 10µL of tryptan blue dye. 10µL of the mixed
sample was loaded into the chambers of the countess slide and the cell concentration and viability
were measured. Over 90% viability was ideal; cells were resuspended in 0.08% BSA/PBS to a
final concentration of 3.0x10
5
cells/mL in 1mL for droplet generation using the formula:
𝐶 1
𝑉 1
= ( 3 . 0 𝑥 10
5
𝑐 𝑒 𝑙𝑙𝑠 𝑚𝐿
) ( 1 𝑚𝐿 ) Eq. (1)
where C1 was the measured concentration of cells, V1 was the unknown volume. This formula
allowed us to determine what volume (V1) of a stock solution (C1) was needed to achieve the
desired concentration of 3.0x10
5
cells/mL in 1mL. Since the concentration of the isolated PBMC
from the blood processing step was 1.96 x 10
6
cells/mL, this solution needed to be diluted to
achieve the target concentration necessary for droplet generation. So, if the measured
concentration (C1) was 1.215x10
6
cells/mL, solving for V1 revealed that 247µL of the cell solution
should be added to 753µL of 0.08% BSA/PBS to reach the target volume and dilution. The sample
was transferred to a 2mL autosampler vial, sealed with an autosampler cap, and kept on ice.
2.2.2 Cell Capture, Lysis, and Droplet Generation
Lysis buffer was prepared using the following amounts and concentrations at a final volume of
400µL:
Table 2.1 Recipe for Lysis Buffer
Reagent Amount
H2O 200µL
20% Ficoll PM-400 120µL
20% Sarkosyl 4µL
0.5M EDTA 16µL
2M Tris pH 7.5 40µL
1M DTT 20µL
8
The capture beads were suspended into the lysis buffer at a concentration of 500 beads/µL and this
was calculated using the following formula:
( 487 .42 𝑏 𝑒 𝑎 𝑑 𝑠 µ 𝐿 ) ( 𝑉 1 ) = ( 500
𝑏 𝑒 𝑎 𝑑 𝑠 µ 𝐿 ) ( 4 0 0 µ 𝐿 ) Eq. (2)
This formula allowed us to determine what volume (V1) of the stock bead solution at a
concentration of 487.42 beads/µL was needed to achieve the desired concentration of 500
beads/µL in 400µL. Solving for V1 showed that 410µL of the stock bead solution was necessary
and so this volume was aliquoted, spun down to isolate the beads, and resuspended in lysis buffer.
The stock concentration of the beads was measured by loading 1µL under a microscope, counting
the number of beads in triplicate, and averaging these results to give a concentration of 487.42
beads/µL. The mRNA capture beads (Chemgene) have a surface coated in extending oligos that
allow for mRNA capture and labeling for downstream identification (see Figure 2.1).
Figure 2.1 Structure of mRNA Capture Bead Used in DropSeq. The bead’s surface is covered
in extending oligos that contain a PCR primer, a cell barcode unique to each bead which will
identify the cell, a unique molecular identifier (UMI) for each oligo that will identify each capture
molecular, and a poly-T tail which will bind to the poly-A tail on the mRNA.
The bead solution was transferred to a 2mL autosampler vial, sealed with an autosampler cap, and
kept on ice. A 4mL autosampler vial was filled with 1.2mL of PicoSurf oil and sealed with an
9
autosampler cap. The cell and bead solution vials were loaded onto a vortex adapter which allowed
for continuous shaking (on shake 1) of the solutions to maintain suspension of the cells and beads,
respectively. The oil vial was kept in a tube rack. A 22-gauge needle was used to puncture the
septa on all three of the autosampler caps, and it was inserted far enough so that it did not touch
the surface of the liquid. This needle was connected to an air pump which provided continuous air
flow to allow for liquid uptake in the second needle, while a separate, blunted end 30-gauge needle
connected to Tygon tubing was used to puncture the septa (Figure 2.2). Each tube was connected
to its respective inlet on the microfluidic chip for the cells, bead solution, and oil (see Figure 2.3).
The 30-gauge needle was used for liquid uptake that was transported to the microfluidic chip. It
was inserted in a clockwise direction far enough so that it was at the submerged in liquid, but not
quite touching the bottom of the vial.
Figure 2.2 Diagram of Needle Set Up in Autosampler Vials. The needle on the left is connected
to the air pump which delivers steady air pressure respective to each liquid. It is placed slightly
above the surface of the liquid to provide pressure allowing the liquid to enter the needle on the
left. The needle on the left is inserted in a clockwise fashion to facilitate bead uptake (the same is
true for the cell solution).
10
Figure 2.3 Diagram of Microfluidic Chip Used in DropSeq Protocol. The microfluidic chip
design has specific inlets for the oil, the cell solution, and the bead solution along with an outflow
where assembled droplets exit. The black portion between “BEADS” and “OUTFLOW” is where
the droplets are generated. (Figure adapted from Macosko, et al, 2015)
One lone piece of Tygon tubing was connected to the outflow port and the end is placed into a 1.5
mL Eppendorf tube for collection. All pieces of tubing were pre-cut to a length of 23cm, and this
length was used for all three inlets and the outflow port. The air pump was then turned on and the
pressures were set for each liquid: 280mmHg for the oil, and 150mmHg for both the cell and bead
solutions. These pressures were determined to be the ideal in forming quality droplets at a steady
pace. Initially, the flow was monitored on camera to ensure proper flow of all components and to
visualize the characteristic “triangle” which suggests proper droplet formation and uniform flow
of beads and droplets (see Figure 2.4).
OIL FLOW
OIL FLOW
CELL FLOW
CELL FLOW
BEAD FLOW
11
Figure 2.4 High-speed Camera Still of Microfluidic Chip During Droplet Generation. On the
left, the mRNA capture bead (circled in red) is visible as it is about to flow through and combine
with the cell solution and oil to form a droplet. The characteristic “triangle” suggesting good flow
is marked in blue.
Droplets were collected until the total volume of droplets reached 500µL; this did not include
residual carrier oil that was present at the bottom of the tube. They were imaged by loading a slide
with 15µL of PicoSurf oil, 6.6µL of droplets, and another 5µL of oil. Microscope images were
taken to assess quality and to determine size, which will be discussed further in section 2.4:
Quality Control of Droplets.
2.2.3 Droplet Breakage
The droplets must then be broken in order to reverse transcribe the mRNA which has been captured
by the beads. Seven 50mL conical centrifuge tubes were prepared for the droplet breakage
protocol. The oil layer present below the generated droplets was removed. The 500µL of generated
droplets were added to the first tube, followed by 15mL of 6X saline-sodium citrate (SSC) and
500µL of perfluorooctanoic acid (PFO). The tube was shaken forcefully in a vertical fashion ten
times to break the droplets. This tube was centrifuged for 1 minute at 100 x g. An oil interface was
formed at the bottom of the first tube with the beads staying above the oil; the supernatant was
removed and moved into tube 2. The oil interface in the first tube was washed with 15mL of 6X
SSC and the supernatant was then transferred to tube 3. All three tubes were centrifuged at 100 x
g for 1 minute. The oil interface of tube 1 was washed again with 15mL of 6X SSC while tubes 2
and 3 were kept on ice. All but 2mL of supernatant was removed from tube 1, placed into tube 4
and kept on ice. All but 2mL of supernatant from tubes 2 and 3 were removed and placed into
tubes 5 and 6, respectively. Tubes 2 through 6 were centrifuged for 1 minute at 100 x g. While
tubes 2 and 3 were kept on ice, the supernatant from tubes 4 through 6 was removed and
consolidated into tube 7. Tubes 4, 5 and 6 were spun down again for 1 minute at 100 x g then all
12
contents from tubes 2 through 6 were consolidated into tube 2; this tube now contained all the
beads. Tube 2 was spun down for 1 minute at 100xg to form a pellet of beads. All but 1mL of
supernatant was removed and put into tube 7, the bead pellet was transferred to a 1.5mL Eppendorf
tube. The contents of the Eppendorf tube were spun down and a visible pellet was seen, and the
pellet was washed twice with 1mL of 6X SSC, once with 1mL of tris-EDTA - sodium dodecyl
sulfate (TE-SDS), twice with 1mL TE-Tween (TE-TW), and once with 600µL of reverse
transcription (RT) buffer. The beads were ready for reverse transcription. Bead recovery was
calculated based on the number of beads that were used in droplet generation and the amount that
was recovered after; this will be further discussed in section 2.3: Quality Control of Droplets.
2.2.4 Reverse Transcription and Exonuclease Treatment
Reverse transcription allowed for the mRNA captured to the beads to be converted into cDNA
which consequently allowed for amplification of the templates. The reverse transcription mix was
prepared using the following recipe:
Table 2.2 Recipe for Reverse Transcription Mix
Reagent Amount
H2O 75µL
Maxima 5X RT Buffer 40µL
20% Ficoll PM-400 40µL
10mM dNTPs 20µL
RNase Out 5µL
Template Switch Oligo (50uM) 10µL
Maxima H-RTase
10µL (added
immediately prior to
incubation)
The beads were spun down and pelleted, the supernatant (RT buffer) was removed, 200µL of the
RT mix was added to the beads, and they were resuspended. The beads incubated at room
temperature for 30 minutes with rotation at 1200rpm in a thermocycler. The temperature then
increased to 42ºC and the beads incubated for 90 minutes at 1200rpm. After the reaction was
13
complete, the beads were washed once with 1mL TE-SDS and twice with 1mL of TE-TW. They
were stored at 4ºC in TE-TW prior to the exonuclease treatment.
The exonuclease treatment was required to chew back excess bead primers that did not
capture an RNA molecule. The exonuclease treatment was done on both the sample beads
following droplet generation and breakage, and a set of control beads. The following recipe was
used for the exonuclease mix, prepared in duplicate:
Table 2.3 Recipe for Exonuclease Mix
Reagent Amount
H2O 170µL
10X Exo I Buffer 20µL
Exo I Enzyme 10µL
The sample beads were washed once with 1mL of 10mM Tris pH 8.0, pelleted and supernatant
removed, and 200µL of the exonuclease mix was added. For the control, 4,000 beads that did not
have any mRNA bound were prepared. This number was chosen since 4,000 beads were loaded
for the subsequent amplification reaction by polymerase chain reaction (PCR). The control beads
had their supernatant removed and were washed in 500µL of 10mM Tris pH 8.0. They were then
spun down again, had the supernatant removed, and were finally resuspended in 40µL of the
exonuclease mix. Both sets of beads were incubated for 37ºC for 45 minutes at 1200rpm in a
thermocycler. Following the completion of the exonuclease treatment, the beads are ready for
amplification.
2.3 Quality Control of Droplets
It is important to ensure that the droplets that were generated were of appropriate size and that
the bead positivity and bead multiplicity were reasonable. The parameters that were explored
included droplet size, bead positivity and multiplicity, bead recovery, and cell positivity.
14
2.3.1 Imaging of Droplets
In order to manually determine droplet size, bead positivity, and bead multiplicity a microscope
image was taken of the droplets following generation. In Figures 2.4 and 2.5, the droplets can be
seen as they were being imaged. Of note here was the number of droplets with a single bead within
them. Ideally these droplets also contained a single cell, allowing for the bead to capture templates
from that cell only, leading to the desired single-cell resolution. However, it was inevitable that
most droplets did not contain a bead and that some droplets had more than one bead (multiplicity).
To calculate these values from the microscope image, the image was loaded into the ImageJ
software and the droplets were counted with the following notations: “0” representing droplets
with no beads, “1” representing droplets with 1 bead, and “2” representing droplets with 2 or more
beads. Then, the number of droplets labelled “1” was divided by the total amount of droplets to
give the bead positivity, while the number of droplets labelled “2” was divided by the total to give
the multiplicity. The average bead positivity across 9 different DropSeq runs was 20% while the
average bead multiplicity was 4.35%. In addition to bead positivity and multiplicity, the
microscope image allowed accurate scaling of the droplet’s size, the average of which across 9
runs was 120um (see Figure 2.5).
15
Figure 2.5 Microscope Images of Droplets After DropSeq Protocol. (A) The variety of droplets
can be seen including empty, single positive, and double positive droplets. This image is used to
conduct quality control checks on the droplets prior to amplification. (B) Here the droplet size
(118um) is measured to scale by the microscope. The droplet selected for measurement was chosen
based on the user’s best estimation of average droplet size. Also, visible here are empty, single
positive, and double positive droplets.
2.3.2 Quality Check
In addition to the manual calculations of droplet size, bead positivity, and bead multiplicity, further
calculations were done based on the amounts of bead solution and oil used during droplet
generation. Also included in the calculation were the remaining volume of the bead solution, the
active droplet volume, the volume of droplets used for imagining, and the volume of droplets
generated as waste. So, the total bead solution volume up taken by the droplets was calculated
using the following formula:
𝐵 𝑒 𝑎 𝑑 𝑉𝑜 𝑙 𝑢 𝑚 𝑒 𝑖𝑛 𝐷 𝑟 𝑜 𝑝 𝑙 𝑒 𝑡 =
( 𝑉 𝑖𝑛𝑖𝑡 𝑖𝑎 𝑙 𝑏 𝑒 𝑎 𝑑 – 𝑉 𝑟𝑒 𝑚 𝑎 𝑖𝑛𝑖𝑛𝑔 𝑏 𝑒 𝑎 𝑑 ) ∗ ( 𝑉 𝑎 𝑐 𝑡𝑖 𝑣 𝑒 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 )
𝑉 𝑎 𝑐 𝑡𝑖 𝑣 𝑒 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 + 𝑉 𝑚 𝑖𝑐 𝑟𝑜𝑠 𝑐 𝑜𝑝 𝑒 + 𝑉 𝑤 𝑎 𝑠 𝑡𝑒 Eq. (3)
where Vinitial bead was the initial volume of the bead solution (400µL), Vremaining bead was the volume
of bead solution remaining after droplet generation, Vactive droplet was the volume of droplets
generated (500µL), Vmicroscope was the amount used for imagining, and Vwaste was the volume of
droplets that were generated as waste. Similarly, the volume of cells used to generate droplets can
be calculated using the following formula:
16
𝐶 𝑒 𝑙 𝑙 𝑉𝑜 𝑙 𝑢 𝑚 𝑒 𝑖𝑛 𝐷 𝑟 𝑜 𝑝 𝑙 𝑒 𝑡 =
( 𝑉 𝑖𝑛𝑖𝑡 𝑖𝑎 𝑙 𝑐 𝑒 𝑙𝑙 – 𝑉 𝑟𝑒 𝑚 𝑎 𝑖𝑛𝑖𝑛𝑔 𝑐 𝑒 𝑙𝑙 ) ∗ ( 𝑉 𝑎 𝑐 𝑡𝑖 𝑣 𝑒 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 )
𝑉 𝑎 𝑐 𝑡𝑖 𝑣 𝑒 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 + 𝑉 𝑚 𝑖𝑐 𝑟𝑜𝑠 𝑐 𝑜𝑝 𝑒 + 𝑉 𝑤 𝑎 𝑠 𝑡𝑒 Eq. (4)
where Vinitial cell was the initial volume of the cell solution (1mL) and Vremaining cell was the volume
of cell solution remaining after droplet generation. Droplet volume was also calculated utilizing
the droplet size obtained from the microscope image using the following formula:
𝑑 𝑟 𝑜 𝑝 𝑙 𝑒 𝑡 𝑠𝑖𝑧𝑒 ( 𝐿 ) =
4
3
𝜋 (
𝑑𝑟𝑜 𝑝 𝑙 𝑒𝑡 𝑠 𝑖 𝑧 𝑒 2
)
3
1000
3
Eq. (5)
which is the formula to determine the volume of a sphere. The numerator was divided by 1000
3
in
order to convert the resulting number from L to µL. The input for droplet size was simply the
measured diameter of the droplet from the microscope image. Furthermore, bead positivity and
cell positivity were estimated mathematically. This was used to determine the total number of
STAMPs (single-cell transcriptomes attached to microparticles) that were generated in the run.
Bead positivity was calculated using the following formula:
𝑏𝑒 𝑎 𝑑 𝑝 𝑜 𝑠 𝑖 𝑡𝑖 𝑣𝑖 𝑡𝑦 = 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 𝑠 𝑖𝑧𝑒 ( 𝐿 ) ∗ 𝑏 𝑒 𝑎 𝑑 𝑣𝑜 𝑙𝑢𝑚 𝑒 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 𝑏 𝑒 𝑎 𝑑 𝑣 𝑜𝑙𝑢𝑚 𝑒 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 + 𝑐 𝑒 𝑙𝑙 𝑣 𝑜𝑙𝑢𝑚 𝑒 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 ∗ [ 𝑏𝑒 𝑎 𝑑 ]
𝑏 𝑒 𝑎 𝑑 𝑠 𝐿 Eq. (6)
This formula took into account the volume of the droplet and the volume of beads actually taken
up into the droplet and divided the product of the two by the sum of the total volumes (both bead
and cell) taken up into the droplets. It was finally multiplied by the concentration of the bead
solution (500 beads/µL) to determine the bead positivity within the sample of droplets. Similarly,
cell positivity was calculated by the following formula:
𝑐𝑒𝑙𝑙 𝑝 𝑜 𝑠 𝑖 𝑡𝑖 𝑣𝑖 𝑡𝑦 = 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 𝑠 𝑖𝑧𝑒 ( 𝐿 ) ∗ 𝑐 𝑒 𝑙𝑙 𝑣 𝑜𝑙𝑢𝑚 𝑒 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 𝑏 𝑒 𝑎 𝑑 𝑣 𝑜𝑙𝑢𝑚 𝑒 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 + 𝑐 𝑒 𝑙𝑙 𝑣 𝑜𝑙𝑢𝑚 𝑒 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡 ∗ [ 𝑐𝑒𝑙𝑙 𝑠 ]
𝑐 𝑒 𝑙𝑙𝑠 𝐿 Eq. (7)
This formula is identical to the bead positivity one, apart from the fact that it utilizes the cell
volume taken up in the numerator, and the fraction is multiplied by the concentration of the cell
solution (300 cells/µL). Finally, in order to calculate the total number of STAMPs, the following
formula was used:
17
𝑇 𝑜 𝑡𝑎 𝑙 # 𝑜𝑓 𝑆 𝑇 𝐴 𝑀 𝑃 𝑠 = ( 𝑏 𝑒 𝑎 𝑑 𝑠 𝑟 𝑒 𝑐𝑜 𝑣𝑒𝑟 𝑒 𝑑 𝑓 𝑟 𝑜 𝑚 𝑏𝑟 𝑒 𝑎 𝑘 𝑎 𝑔 𝑒 ) ∗ ( 𝑐𝑒𝑙𝑙 𝑝 𝑜 𝑠 𝑖 𝑡𝑖 𝑣𝑖 𝑡𝑦 ) Eq. (8)
This formula uses the product of the total number of beads recovered and the cell positivity because
we assume that not all beads came into contact with a cell. As seen in the microscope image, not
all droplets contain a bead, and by this logic, not all droplets contain a cell. Therefore, by
calculating cell positivity, one can infer the number of droplets that have cells and by multiplying
this percentage by the number of beads recovered following breakage, one has an estimate of the
number of transcriptomes captured by the experiment.
2.3.3 Bead Recovery
In order to determine how many beads were recovered, 1µL of beads were loaded into a Kova
slide, after the exonuclease treatment, which contained a grid to ease counting. Since the beads
were suspended in 1mL for the exonuclease treatment, we simply multiply the number of beads
counted in 1µL by 1,000 to get the total number of beads. To calculate the percent recovery, we
first calculate the total number of beads in the active droplet volume using the following formula:
𝑡𝑜 𝑡𝑎 𝑙 𝑏𝑒 𝑎 𝑑 𝑠 𝑖𝑛 𝑑 𝑟 𝑜 𝑝 𝑙 𝑒 𝑡 𝑠 = [ 𝑏𝑒 𝑎 𝑑 ]
𝑏 𝑒 𝑎 𝑑 𝑠 𝐿 ∗ 𝑏𝑒 𝑎 𝑑 𝑣𝑜 𝑙 𝑢 𝑚 𝑒 𝑖𝑛 𝑑 𝑟 𝑜 𝑝 𝑙 𝑒 𝑡 Eq. (9)
This formula simply multiplied the concentration of the bead solution (500 beads/µL) and the total
volume of beads taken up into the droplets to give a total number of beads that were taken up.
After this, we can calculate the percent recovery by simply using the following formula:
𝑝 𝑒 𝑟 𝑐𝑒𝑛 𝑡 𝑟 𝑒 𝑐𝑜 𝑣𝑒𝑟 𝑦 = 𝑏 𝑒 𝑎 𝑑 𝑠 𝑟𝑒 𝑐 𝑜𝑣 𝑒 𝑟𝑒 𝑑 𝑓 𝑟𝑜𝑚 𝑏 𝑟𝑒 𝑎 𝑘 𝑎 𝑔 𝑒 𝑡𝑜𝑡𝑎 𝑙 𝑏 𝑒 𝑎 𝑑 𝑠 𝑖𝑛 𝑑 𝑟𝑜𝑝 𝑙𝑒 𝑡𝑠 ∗ 100% Eq. (10)
The higher the bead recovery, the more STAMPs we will have in the end, as seen in Eq. (8), which
will ultimately give rise to more gene expression data since more transcriptomes can be sequenced.
The lower the bead recovery, the more information is lost regarding individual cells.
18
2.4 Amplification and cDNA Cleanup
Once reverse transcription, exonuclease treatment, and quality control of the droplets have been
completed, the resulting cDNA can then be amplified and run on a bioanalyzer to get a picture of
the variety of templates captured, in both length and amount. The cDNA contains PCR primers, a
cell barcode, a unique molecular identifier (UMI), and the template, and can be inputted directly
into PCR (Figure 2.6).
Figure 2.6 Structure of Amplification Ready cDNA. After reverse transcription, the cDNA
contains a PCR primer, the unique cell barcode, the UMI, the original capture mRNA now with its
complimentary strand, and the flanking PCR primer. This will then be amplified to have more
copies prior to analysis and library preparation for sequencing.
Following exonuclease treatment and determination of the number of beads recovered the beads
were washed twice in 1µL of H2O. Each PCR tube was apportioned with 4,000 beads, just like the
control sample. This yielded approximately 300 STAMPs per PCR tube, which was ideal as to not
oversaturate the reaction. The PCR mix was made using the following recipe, and the replicates
were run using the following program:
Table 2.4 Recipe for PCR Mix
Reagent Amount
H2O 23µL
20uM SMART 2µL
ToughMix 25µL
Table 2.5 PCR Conditions
Temperature Time Cycles
94ºC 3:00
94ºC 0:15
25 Cycles 65ºC 0:45
70ºC 3:00
70ºC 5:00
4ºC ∞
19
The resulting cDNA was then purified and concentrated using AMPure XP magnetic beads. To
the PCR tubes, 30µL of 0.6X AMPure XP beads were added and quickly mixed 15 times using a
pipet. The sample incubated for 5 minutes after being transferred to a 96 well plate.The plate was
placed on a magnet to separate the beads from solution, and the supernatant removed. The beads
were washed with 70% ethanol twice, air-dried, and the DNA was eluted in 20µL of H2O.
2.5 Nextera Library Preparation
The final step prior to sequencing of the captured mRNA was library preparation. This step was
done using the Nextera XT kit and allowed for tagmentation of the cDNA and preparation for
sequencing. Tagmentation is the initial step in library preparation where unfragmented DNA (i.e.,
the full-length templates seen in Figure 2.6) is cleaved and prepared for analysis. Following
tagmentation, a second PCR step was performed to amplify the tagged DNA. This was due to the
fact that the ideal input for Nextera was only 600pg of DNA. Therefore, a series of dilutions and
pooling was done so that a representative sample of all the transcripts was sent to sequencing.
Before the tagmentation step, the post-PCR samples were diluted and pooled. To do so, the
concentration in pg/µL of each PCR replicate was determined using Qubit, a fluorometric analysis
method. Then, the raw DNA amount in ng was calculated using the following formula:
𝐷 𝑁𝐴 𝑎 𝑚 𝑜 𝑢 𝑛 𝑡 ( 𝑛𝑔 ) =
[ 𝐷 𝑁 𝐴 ] 𝑝𝑔 / µ 𝐿 ∗ 18
1000
Eq. (11)
It should be noted that the concentration is multiplied by 18 due to the fact that 2µL of the 20µL
that was eluted was used for Qubit analysis, therefore the final volume of DNA in solution was
18µL. After this, the number of templates in this amount of DNA was calculated using the
following formula:
# 𝑜𝑓 𝑡𝑒 𝑚 𝑝 𝑙 𝑎 𝑡𝑒 𝑠 =
𝐷 𝑁 𝐴 𝑎 𝑚 𝑜𝑢𝑛 𝑡 ( 𝑛𝑔 ) ∗ 6 .022 𝑒 23
1000 ∗ 10
9
∗ 650
Eq. (12)
20
This calculation is done on the assumption that the average weight of a base pair is 650 Daltons,
meaning that one mole of a bp has a weight of 650g (Staroscik, 2004). Therefore, the molecular
weight of any double stranded DNA can be determined by multiplying its length in bp and 650.
First, the amount of DNA in ng was multiplied by Avogadro’s number (6.022x10
23
) to determine
the number of molecules of template per gram. The number of molecules (numerator) was divided
by the average length of template (1,000bp) multiplied by 10
9
(to convert to ng) and 650 to give
the formula above. Then, we determined the replicates per UMI present in the 18µL of DNA. This
was done using the following formula:
𝑟 𝑒 𝑝 𝑙 𝑖 𝑐𝑎 𝑡𝑒 𝑠 𝑝𝑒𝑟 𝑈𝑀𝐼 =
# 𝑜𝑓 𝑡𝑒 𝑚 𝑝 𝑙𝑎 𝑡𝑒 𝑠 4000 ∗ 𝑐 𝑒 𝑙𝑙 𝑝 𝑜𝑠 𝑖𝑡 𝑖𝑣 𝑖𝑡 𝑦 ∗ 1000
Eq. (13)
This calculation was done on the assumption that each cell has about 1,000 UMI, or 1,000
templates captured on each bead. Therefore, the number of templates was divided by the product
of the number of beads inputted to the PCR (4,000), the cell positivity, and the number of UMI per
cell. This revealed how many copies of each UMI there should be within the amplified DNA
sample. Then, because Nextera requires a 600pg input, we calculated the amount in µL that
contained 600pg, which can be done using the following equation:
𝑉𝑜 𝑙 𝑢 𝑚 𝑒 𝑓 𝑜 𝑟 600 𝑝𝑔 ( 𝑢𝐿 ) =
300 𝑝𝑔
µ 𝐿 ∗ 2µ 𝐿 𝐷 𝑁 𝐴 𝑐 𝑜𝑛𝑐 𝑒 𝑛𝑡𝑟𝑎 𝑡𝑖 𝑜𝑛 Eq. (14)
where 300pg/µL was the desired final concentration and 2µL was the desired final volume. Since
this was such a small amount of DNA, it also translated to a small volume and it was not feasible
to pipet as small as 0.06 µL of solution. Therefore, dilution factors were calculated so that the final
concentration would be 300pg/µL using the following formula:
𝑑 𝑖 𝑙 𝑢 𝑡𝑖 𝑜 𝑛 𝑓 𝑎 𝑐𝑡𝑜 𝑟 =
2µ 𝐿 𝑣 𝑜𝑙𝑢𝑚 𝑒 𝑓 𝑜𝑟 600 𝑝𝑔 ( µ 𝐿 )
Eq. (15)
21
where 2µL was the desired final volume and the denominator is the volume of the stock DNA
solution that contained 600pg of DNA. Therefore, if the dilution factor was calculated to be 11.73,
1µL of sample would be added to 10.73µL of water. Since 1µL was also a very small amount, the
dilutions were scaled to allow for 5µL of DNA input by simply using the formula:
𝑊 𝑎 𝑡𝑒 𝑟 ( µ 𝐿 ) = 5 𝐷 − 5 Eq. (16)
where the amount of water was calculated to be 5 times the original dilution factor, minus 5µL
which would be the stock solution input. Following dilution of all replicates, 2µL of each replicate
were pooled to a total combined volume of 22µL. The concentration in pg/µL of this pooled
solution was measured using Qubit, and the resulting concentration should have remained at
300pg/µL. If the pooled concentration was higher or lower than 300pg, it was adjusted for using
the same formula as the 600pg volume, simply substituting the DNA concentration for the actual
concentration of the pooled sample to get the correct volume with 600pg. Finally, the (ideally)
2µL was combined with 3µL of H2O to bring the final volume to 5µL with 600pg total, a final
concentration of 120pg/µL.
After the replicates were diluted, pooled, and adjusted to the correct final concentration
and volume, tagmentation was done. The following recipe was used for Nextera tagmentation:
Table 2.6 Recipe for Nextera Tagmentation
Reagent Amount
Nextera TD buffer (TD) 10 μL
Amplicon Tagment Mix
(ATM)
5 μL
Neutralization Buffer (NT) 5 μL
Once all reagents had thawed, the tubes were inverted 3-5 times to ensure they were adequately
mixed, followed by a brief spin in a microcentrifuge. To each 5µL sample, the TD buffer and ATM
were added and mixed by pipetting up and down 15 times. The sample was incubated at 55ºC for
22
5 minutes and held at 10ºC. Once the sample reached 10ºC, the neutralization buffer was
immediately added, mixed 15 times by pipet, and spun down. The sample was incubated at room
temperature for 5 minutes. PCR followed using the following recipe, in this order:
Table 2.7 Recipe for Nextera PCR
Reagent Amount
Nextera PCR mix (NPM) 15 μL
H2O 8 μL
10 µM New-P5 SMART PCR
hybrid oligo
1 μL
10 µM Nextera N70X oligo 1 μL
The Nextera N70X oligo varied based on sample. If multiple samples from different patients or
different cell lines were to be run, each one would receive a different oligo, spanning from N701
to N705. PCR was run using the following program:
Table 2.8 Nextera PCR program
Temperature Time Cycles
95ºC 0:30
95ºC 0:10
12 55ºC 0:30
72ºC 0:30
72ºC 5:00
4ºC ∞
Purification of amplified DNA was done using AMPure beads as described previously in section
2.4: Amplification and Cleanup.
Finally, the samples were run on the bioanalyzer and an ideal peak of 300-600bp was the
goal. This would signify that the DNA fragmented properly and could be submitted to next
generation sequencing. Additionally, the concentration was inputted into the previously described
formulas to determine replicates per UMI post-Nextera, which should increase compared to the
pre-tagmentation calculation.
23
2.6 Next Generation Sequencing of Library
The final experimental step involves sending the samples out to next generation sequencing. The
Nextera step prepares the samples by tagging them with the correct primers for sequencing. The
sequencing company (Novogene) suggested a 5nM concentration of the final Nextera library in
15µL. In order to determine molarity, the following equation was used:
𝑐𝑜 𝑛 𝑐 . 𝑖𝑛 𝑛𝑀 = 𝑐 𝑜𝑛𝑐 .𝑖𝑛 𝑛𝑔 / µ 𝐿 660 𝑔 𝑚 𝑜 𝑙 ∗ 𝑎 𝑣 𝑔 . 𝑙𝑖𝑏 𝑟𝑎 𝑟𝑦 𝑠 𝑖𝑧𝑒 ( 𝑏𝑝 )
∗ 10
6
Eq. (17)
This formula converted the concentration of the final library into nanomolar by utilizing the
assumption that each base pair had a molecular weight of 660g/mol. Since the samples were being
run on the Illumina HiSeq sequencing format, the final construct of the sequencing library is
specifically designed to allow for proper sequencing (Figure 2.7).
Figure 2.7 Library Construct for Next Generation Sequencing. The final library consists of
P5 and P7 oligos flanking an index along with two reads and the insert DNA. The custom read
was provided by our lab.
The P5 and P7 oligos are flow cell binding sites and are required so that they may attach to the
complimentary oligos on the surface of the flow cell. The oligos serve as primers for amplification.
In NGS, 1,000 copies of each fragment are generated and the P5 region is cleaved, leaving only
fragments bound by P7, ensuring that all copies are sequenced in the same direction (source). Read
1 is sequenced first, followed by the index. The index read is the barcode of the sample, namely
the cell barcode that allows for cell-cell differentiation and the single-cell resolution. Finally, read
2 is sequenced following read 1 and the index; this occurs when the fragments are bound by P5
and they are sequenced in the opposite direction of read 1.
24
Chapter 3: Results of DropSeq and Exploration of Other scRNA-seq Platforms
3.1 Results of DropSeq Experiment
The aforementioned DropSeq protocol from blood processing through Nextera library preparation
was done on a sample of PBMCs that were harvested from fresh blood, hereby referred to as
PBMC25. The DropSeq experiment was also done on a 1:1 mix of HEK293T and 3T3 mouse cells,
hereby referred to as HEK/3T3. These two runs were sent for Next Generation Sequencing in an
effort to get gene expression data.
3.1.1 Protocol Optimization
Prior to the final experiments, a multitude of other runs were conducted in order to optimize
the experimental conditions to produce high quality droplets as discussed in the quality control
section; the quality control measures discussed in Chapter 2 all gave indications of how the well
the optimization of the experiment was proceeding. Reasons to reject samples include high
multiplicity, low cell and bead positivity, or poor quality of droplets as seen in imaging.
One such parameter was the cell positivity, which allowed us to see how many of the
generated droplets contained cells. The original DropSeq protocol suggested a cell concentration
of 100 cells/µL, but when this was used, there was a cell positivity of below 10%. When the
concentration was increased to 200 cells/µL, the positivity increased to about 12%. Increasing the
cell concentration to 300 cells/µL yielded a cell positivity ranging from 17-20%, allowing us to
ensure that we would capture more STAMPs without running the risk of having greater than one
cell per droplet, thereby reducing the single cell resolution. Furthermore, previous runs using lower
oil pressures, but the same 300 cells/µL concentration, also resulted in much lower cell positivity,
about 7% when the oil pressure was set as 200mmHg. The average cell positivity across all of our
25
successful runs was about 17%, and so this indicated that the air pump pressures needed further
optimization to generate consistently favorable cell positivity in the droplets.
Initially, the pump pressures were set as 130mmHg, 120mmHg, and. 120mmHg for the oil,
cells, and beads, respectively. This was resulting in a positivity of about 10.4% with about a 10%
multiplicity rate. The critical issue with such a high multiplicity (nearly the same as overall
positivity) lies in the fact that it is almost impossible to obtain the single cell resolution needed.
This could skew sequencing results downstream since two beads with one cell would result in not
only uneven capture, but counting of two different barcodes when, in reality, the transcripts all
belonged to one cell. One approach to curbing this issue and reducing multiplicity was to increase
the oil pressure and adjust the cell and bead pressures accordingly. High speed camera recordings
were taken to monitor in real time the flow of all three components and the assess the necessity for
change. Multiple runs were conducted with varying oil pressures until a suitable number was
reached which would prevent the excess multiplicity seen previously. Ultimately, the final
pressures settled upon were 280mmHg, 150mmHg, and 150mmHg for the oil, cell, and beads,
respectively which, over multiple runs, proved to consistently prevent the high proportion of
multiplicity seen in previous runs.
Another parameter considered in the optimization of this protocol was the bead solution
concentration. Too low of a bead concentration would result in not enough uptake and thus, low
positivity overall. If the concentration is too high, it would result in excess uptake and potentially
too high of multiplicity. The original DropSeq protocol dictated that the bead concentration should
be 120 beads/µL; previous runs in our lab using this concentration in conjunction with lower pump
pressures were resulting in about 14% bead positivity, but following optimization of the pump
pressures, this concentration needed to be increased to account for the more rapid flow of the bead
26
solution. Following finalization of the pump pressures, runs were conducted with a bead
concentration at 400 beads/µL, which was still resulting in low positivity. For example, one run at
the initial 400 beads/µL concentration resulted in a positivity of 11.5%, whereas the average
positivity across 9 runs using 500 beads/µL was 20%. The final value of 500 beads/µL was decided
upon after numerous runs showed that this concentration resulted in favorable positivity and
multiplicity across different samples.
Further testing was done prior to finalization of this protocol to determine whether 20 or
25 cycles was better suited for the PBMC experiment, specifically. It was determined that 25 cycles
were ideal due to the fact that the post-PCR concentration of DNA was higher (as expected), but
also because there was a wider breadth of template lengths at the peak: 300-600 bp for 20 cycles
vs. 1,000-1,500 bp for 25 cycles. Additionally, the 25 cycle PCR program yielded more longer
fragments, up to 10,000bp, and was overall more ideal for the next step which was library
preparation (Figure 3.1).
Figure 3.1 Comparison Between 20 and 25 Cycle PCR on Sample and Controls. The green
line represents the 20 cycle PCR, whereas the red line represents the 25 cycle PCR. The orange
and blue represent the 20 and 25 cycle controls, respectively. The peak for the. 20 cycle PCR is
considerably narrower and at a lower size (bp) than for the 25 cycle PCR. A wider breadth of
template sizes and a peak at a larger size (bp) is preferable.
27
One other approach that was explored was performing the reverse transcription step within
the droplet itself. This version of the protocol utilized a modified lysis buffer to suspend the beads
in, but the other steps through droplet generation remained the same. One major difference apart
from the composition of the lysis buffer laid in the fact that reverse transcription occurred prior to
droplet breakage. The hopes of this particular protocol were to maximize mRNA capture and
prevent its degradation. Multiple experiments were conducted utilizing empty droplets to test
proper rotation speeds without compromising their integrity. Also explored was the idea of placing
mineral oil above the droplets to prevent any evaporation since the rotation occurs under heat. This
proved to be problematic, and the shaking of the tube formed an emulsion with the oil and the
droplets, and no evaporation was actually seen when mineral oil was not used. Another parameter
explored was intermittent shaking of the droplets with moments of stasis. The ideal was determined
to be 1200rpm with 30 seconds of shaking and 2 minutes 30 seconds of stasis for a total of 135
minutes. However, these experiments failed to consider the presence of the capture bead within
the droplet. The beads that were used in our experiment were hard, plastic beads which was the
main limiting factor in our ability to perform reverse transcription within the droplets. Shaking the
droplets with beads within them resulted in rupture due to the hard surface of the bead coming into
contact with the fragile droplet walls, thus reducing capture and resulting in excess mRNA
degradation. Conversely, not subjecting the encapsulated beads would risk incomplete access of
the enzyme to all bound transcripts, thus reducing overall capture as the bound, but not reverse-
transcribed mRNA would be degraded. The ideal would be to use softer beads, and this concept
will be discussed later in this section. Overall, maximal optimization measures were taken to
ensure that the results of the experiments would be ideal for sequencing and gene expression data
generation.
28
3.1.2 PBMC25
The PBMC25 run began with cell processing and staining to check for viability. The viability was
recorded as 86%, but also noted was the fact that the tryptan blue stain used for visualization
contained large amounts of precipitation, thereby skewing the calculations by the Countess
Automated Cell Counter machine; the machine will read the precipitation as a dead cell since it is
stained a dark, uniform blue, whereas live cells have bright centers with dark edges (Figure 3.2).
Therefore, we can infer that the viability was likely higher.
Figure 3.2 Image of Stained Cells in Countess 3 Automated Cell Counter. Here, the live cells
are clearly visible as they have bright centers and dark edges. The dead cell has been circled in red
and can be seen to be uniformly dark throughout.
The initial concentration of the cell suspension was 1.215x10
6
cells/mL, and it was appropriately
diluted down to 3.0x10
5
cells/mL in 1mL by combining 247µL of cell solution and 753µL of 0.08%
BSA/PBS. The bead solution was maintained at 500 beads/µL in 400µL of lysis buffer. In this
particular run, 1.8mL of PicoSurf oil was loaded instead of the usual 1.2mL described in the
protocol. Initially, a tubing issue had arisen due to a blockage in the flow tube for the bead solution,
and so it was replaced to allow for proper flow, resulting in a larger than normal waste volume of
29
400µL. Following droplet breakage, a total of 52,000 beads were recovered at a recovery rate of
47%. Imaging of the droplets revealed 30.5% bead positivity which is higher than average, and
also a 17% multiplicity, which is much higher than the 4.4% average; the average droplet size was
measured to be 118um, which is very consistent with previous runs, the average being 119.5um
(Figure 2.4). The discrepancy in bead multiplicity could have been due to the initial issue regarding
the tubing for the bead solution. However, the mathematically derived bead positivity was 17.4%,
which is more consistent with averages across previous runs. One issue with manual determination
of bead positivity is the potential for sampling bias. Since a very small number of droplets are
being looked at, it may not be representative of the entire sample. This can also lead to inaccurate
measurements of bead multiplicity. Cell positivity was calculated to be 15.4% based on the
volumes of solutions used to generate the droplets. These numbers ultimately led to the total
number of STAMPs that were estimated in this sample: 7998, which is consistent with previous
runs. Since the quality control determined the sample to be viable, the experiment proceeded on to
amplification.
The initial PCR step was run on 4,000 sample beads and 4,000 control beads to ensure that
there was enough RNA captured at both 20 and 25 cycles. Following PCR, the resulting cDNA
was purified by AMPure XP and eluted in 20µL. TapeStation analysis was done to visualize
templates that were captured (Figure 23.1). The concentrations of both the 20 and 25-cycle runs
were also done using Qubit, and the 20-cycle run measured at 2980ng/mL while the 25-cycle run
had a concentration of 17,800ng/mL, a significant increase. After determining that 25 cycle was
optimal, the remaining 44,000 beads were amplified in 11 replicates of 4,000 beads each, purified
by AMPure and eluted into 20µL.
30
After AMPure, the samples were ready for Nextera tagmentation. Total DNA amount in
ng was calculated for each replicate, for a total of 12 replicates. Two replicates were excluded
because one had a very low DNA concentration (731 pg/µL) and one had an extremely high DNA
concentration (17,800 pg/µL) relative to the rest of the replicates, which averaged about
6500pg/µL of DNA. The necessary dilution factors were calculated, and the replicates were
subsequently diluted and pooled; 2µL of each diluted solution was combined into a total volume
of 20µL at a presumed concentration of 300pg/µL. Following pooling, the final concentration was
measured using Qubit and was recorded to be 266pg/µL, which is slightly lower than the 300 that
was the target. Thus, the amount to be aliquoted was adjusted to 2.25µL rather than 2µL to ensure
that the total amount of DNA was 600pg, as is required for Nextera input. Water was added to
bring the total volume up to 5µL, with a final concentration of 120pg/µL, and a total of 600pg in
5µL. The pooled sample was subjected to the tagmentation and post-tagmentation amplification
protocols. Following amplification, the pooled sample was purified, the DNA concentration was
measured by Qubit, and it was recorded to be 1880pg/µL, signifying adequate amplification of the
now fragmented templates. Additionally, TapeStation analysis was done to determine whether or
not fragmentation was successful. The goal of Nextera is to digest the DNA templates into 300-
600bp fragments, which is ideal for NGS. The fragmentation was successful and yielded a peak of
575bp, with the majority of the fragments laying between 400 and 900bp in length (Figure 3.3).
When compared to the pre-Nextera curve of PBMC25 (Figure 3.4), there is a significant narrowing
in the curve which is indicative of successful fragmentation. Additionally, this suggests that the
very long templates seen past 1500bp were also fragmented, which allows for a much more
representative sample of templates to be sequenced, thus enabling us to see a better picture of the
cell’s transcriptome.
31
Figure 3.3 Post-Nextera TapeStation Distribution of PBMC25. The curve presented in this
graph shows a peak intensity at 575bp, suggesting successful fragmentation of the DNA templates.
Compared to the pre-Nextera PBMC25 (Figure 3.2), the width of the peak significantly narrowed,
which is indicative of successful tagmentation.
Figure 3.4 Pre-Nextera TapeStation Distribution of PBMC25. The curve here is wider with a
much longer tail stretching into upwards of 10,000 bp. This is ideal for Nextera input as it will
allow for a wider breadth of transcripts to be fragmented. The comparison also confirms successful
completion of the fragmentation protocol.
Prior to Nextera, the number of replicates per UMI in the diluted samples was calculated; for
PBMC25, this was expected to be 904 replicates per UMI. That is, 904 copies of each unique
transcript that were captured by the beads was expected. Following Nextera, this was recalculated
since there was significant amplification of the pooled samples and was expected to be 5,090
replicates per UMI.
32
The samples were prepared according to the sequencing company’s (Novogene)
requirements and PBMC25 in particular was tagged with the N705 oligo. The company suggested
that the sample be at 5nM concentration in 15µL. After calculating the concentration for PBMC25,
it was determined to be 4.95nM, which is very close. The final concentration that was sent to be
sequenced was 4.133nM, as this was eventually combined in a 2:1 ration with the HEK293T/3T3
sample, which will be discussed in the next section.
3.1.3 HEK293T/3T3
The initial concentrations for the cells were 1.285x10
6
for the 3T3 cells and 1.21x10
6
for the
HEK293T cells. The two cell lines were individual adjusted to the target 3.0x10
5
cells/mL in a
volume of 500µL each, then combined to a total volume of 1mL. The viability for these cells was
recorded to be 80% for both, although there was significant tryptan blue precipitation present yet
again in the Countess slide. In this experiment, the air pressure for the oil channel was dropped
slightly to 270mmHg from the original 280mmHg. This is due to the fact that in a previous run of
this experiment using the same cells, it was noted that the droplets were generating too quickly
and were significantly smaller in size at the normal oil pressure. Even with the slight drop, the
droplet size was decreased to 110um. Bead positivity was measured manually to be 12.92% and
the multiplicity was measured to be 1.14%, which is ideal and better than that of PBMC25.
Mathematically, however, the bead positivity was calculated to be 8.9% which is much lower than
expected and potentially problematic. The cell positivity was calculated to be 15.4% which is ideal.
Following droplet breakage, only 30,000 beads were recovered at a recovery rate of 20%. During
the final wash steps of the breakage protocol, the beads were not pelleting properly, undoubtedly
leading to some loss in overall bead recovery. Despite this, the sample continued on through
reverse transcription and exonuclease treatment, ultimately going on to be amplified.
33
For the amplification of this sample, the first aliquot was run with 4,000 beads at only 20
cycles (compared to the 25 for PBMC25) and yielded a reasonable curve on the TapeStation, with
an average peak at 900bp and a wide breadth (Figure 3.5). Following this, another sample was
amplified using only 2,000 beads, again at 20 cycles. This sample yielded an even better
TapeStation result with a peak at 683 and a wider coverage of template lengths (Figure 3.6). This
was likely due to the fact that too high of an input could hinder the polymerase’s ability to amplify
all of the product available.
Figure 3.5 4,000 Bead TapeSation curve for HEK293T/3T3 cells. This post-amplification curve
of 4,000 bead input shows a peak at 899bp with a decent breadth of coverage in terms of base pair
length, but does not have as many templates larger than 2,500 bp.
34
Figure 3.6 2,000 bead TapeStation Curve for HEK293T/3T3 cells. This post-amplification
curve of an initial 2,000 bead load shows a better tail that encompasses more transcripts above
2500bp, through 10,000bp than did thee 4,000 bead sample.
Based on these data, it was decided that the replicates for Nextera would be done using 2,000 beads
as the input rather than 4,000. This would ensure that there was greater variety of transcript length
to be inputted into the fragmentation. Therefore, the remaining 24,000 beads were aliquoted into
2,000 bead fractions and amplified. They were purified by AMPure and eluted into 20µL.
Following purification, the 14 replicates were all quantified by Qubit and their
concentrations recorded. Three replicates were excluded from the pooling, two of which had very
low DNA concentrations of 837 and 190 pg/µL, and one which was the 4,000-bead sample with a
concentration of 12,200 pg/µL which was significantly higher than the average of 4,050 pg/µL.
Similar to PBMC25, the total amount of DNA in ng was calculated and used to determine proper
dilutions of the replicates down to the desired 300pg/µL. Following dilution, 2µL of each diluted
replicate was pooled to a total volume of 22µL at the target concentration. The final pool was
measured by Qubit for concentration and was recorded to be 374pg/µL, which is slightly higher
than the target. Appropriately, only 1.6µL of the pooled sample was combine with 3.4µL of H2O
for a final Nextera-ready product of 600pg in 5µL. The pooled sample underwent the same
35
tagmentation protocol as PBMC25, yielding a post-Nextera concentration of 617pg/µL as
measured by Qubit, which is significantly lower than the PBMC25 post-Nextera sample.
TapeStation analysis revealed that fragmentation was successful, as a peak was seen at 377bp, with
the curve being narrower than in the pre-Nextera analysis (Figure 3.7).
Figure 3.7 Post-Nextera TapeStation Curve of HEK293T/3T3 cells. While the fragmentation
was clearly successful, as shown by the peak at 377bp and narrow curve, the concentration of the
sample was diminished, suggesting inadequate or unsuccessful amplification.
As with PBMC25, the post-Nextera concentration was measured by Qubit, and it was recorded to
be 617pg/µL. This is a stark difference compared to the increase for PBMC25, over 15-fold,
whereas the increase for this sample is only about 5-fold. This suggests inadequate or unsuccessful
amplification during the Nextera PCR program. While there was clearly some amplification, it
does not nearly compare to the amplification seen in PBMC25. Additionally, the number of
replicates for the HEK293T/3T3 cells after dilution was calculated and estimated to be 1,860
replicates per UMI. Following Nextera, the number of replicates was estimated to be 3,450, not
even doubling compared to PBMC25’s 5-fold increase in replicates.
In the end, it was decided that this sample would still be sent for sequencing, and so it was
tagged with oligo N701. When calculating the molarity of this sample, it was found to be much
36
lower: only 2.5nM, half of what was recommended by the sequencing company. Thus, it was
decided that 16µL of PBMC25 would be combined with 8µL of HEK293T/3T3 at a 2:1 ratio,
resulting in a final molarity of 4.133nM in a total of 24µL.
3.1.4 Sequencing Results
It was found that in PBMC25, there were only an average 309 UMI found per cell at only 54 reads
per UMI (Figure 3.8). Discouragingly, the HEK293T/3T3 sample had only an average 34 UMI per
cell with only 5 reads per UMI (Figure 3.9). These were decidedly not the results that we had
expected, considering the estimations of replicates per UMI and our assumption of 1,000 UMI per
cell. This was highly discouraging, as such low UMI numbers would not be wholly representative
of a cell’s transcriptome, and downstream applications such as cell classification would be
hindered by the low counts. However, one important result that can be deemed a success is the
clear separation between the HEK293T cells and the 3T3 mouse cells. The reason this mixed cell
experiment was done was to determine if the platform was truly capable of capturing cells at the
single-cell resolution. Analysis of the resulting sequencing data revealed that there was no mixing
between the two cell types, confirming the resolution of the pipeline (Figure 3.10). Cells that map
to both human and mouse genomes would suggest that more than one cell was captured within a
droplet with one bead, indicating poor single-cell separation.
37
Figure 3.8 UMI per Cell and Average Counts per UMI in PBMC25. (A) The histogram shows
the number of unique UMI identified in the top 2,000 cells of the PBMC25 sample. The average
was calculated to be about 309 UMI per cell with the lowest being about 50 and the highest at
about 600. (B) Shows the average reads per UMI in the top 2,000 cells, with the average being
about 54.
Figure 3.9 UMI per Cell and Average Counts per UMI in HEK293T/3T3 Mixed Sample. (A)
The histogram shows the number of unique UMI identified in the top 1,000 cells of the
HEK293T/3T3 mixed cell sample. The average was calculated to be about 35 UMI per cell. (B)
This shows the average number of reads per UMI in the top 1,000 cells at an average of about 5
reads.
Figure 3.10 Human vs Mouse Gene Mapping in 1,000 Cells from HEK293T/3T3 sample.
1,000 cells were analyzed for gene expression and mapped to both human and mouse genomes.
The scatter plot shows the number of human and mouse transcripts associated to each
transcriptome. Only 2 cells (red) were determined to be mouse, while 959 (blue) were determined
to be human. No cells were detected to be mixed suggesting accurate single-cell isolation of the
cells.
38
While optimization and modification attempts had been made, including testing a protocol that
moved the reverse transcription step to within the droplet in an effort to maximize mRNA capture,
research on other scRNA-seq platforms concluded that our results were not comparable to the
resolution, UMI count, and gene reads that were possible when utilizing another platform. Single-
cell resolution was achieved, as seen in the distinct separation between mouse and human cells in
the HEK293T/3T3 cells, but this did not make up for the low UMI recovery. However,
development of the subsequently discussed cell-classification algorithm and the use of this
platform in other applications still relies on scRNA-seq data and could still be performed on data
obtained from DropSeq experiments, it was simply the quality of the data that hindered its usage.
Potential reasons for such low UMI recovery include low cell viability at the beginning of
the experiment, bead multiplicity, and low bead recovery. As stated, the viability recorded at the
beginning of the experiment was about 80% for both samples. Normally, a viability above 80 is
desirable for input into a droplet generation experiment, but as discussed, significant precipitation
of the dye may have skewed the numbers. Although this likely meant that the viability was higher
than what was listed on the instrument, it is still not an accurate prediction of the actual viability.
A large number of dead cells can lead to these low UMI counts due to the fact that they are no
longer producing transcripts, thereby reducing the overall average of UMI captured between all
the cells. Although PBMC generally have lower mRNA content than other cell types, the fact that
other single cell capture systems are able to recover high UMI counts suggests that this should not
affect the overall UMI recovered per cell (10X Genomics).
As mentioned, bead multiplicity for PBMC25 was manually counted to be 17%, which was
significantly higher than previous runs. This could also result in low UMI counts due to the fact
that if two beads were encapsulated with one cell, the transcripts of the cell would be split between
39
two beads. Therefore, since each bead has a unique cell barcode, it would appear as if each of those
cells has low UMI per cell, when in actuality, it was an artefact of the bead multiplicity.
Furthermore, some droplets contained more than 2 beads which would reduce even more the UMI
per cell if one cell were encapsulated with those droplets. Even though it was not the majority, this
could be a potential reason why the overall UMI counts were lower. However, in the
HEK293T/3T3 sample, the multiplicity was found to be much lower, meaning this shouldn’t have
been the case. However, the viability for these cells was also lower, suggesting that this may have
played a role.
Finally, low bead recovery following droplet breakage can also lead to loss of information
regarding UMI counts. In the HEK293T/3T3 run, the bead recovery was only about 23%, which
is significantly lower than the average of 55% across our previous runs. Loss of beads during this
step equates to loss of transcriptomes overall. If the lost beads were quality, living cells while the
recovered ones were poor quality cells that were dead, this could also account for the significantly
low UMI counts we observed. In PBMC25, the recovery was about 47%, which is only slightly
lower than our average. However, in conjunction with the high multiplicity observed for this
sample, this could also contribute to the low UMI recovery.
3.2 10X scRNA-seq Platform
3.2.1 Introduction
The 10X Genomics Chromium Connect platform provides consistently reproducible, massively
parallel transcriptional profiling at the single-cell resolution while ensuring reduction in technical
variability. This is critical to ensure consistency across samples, all while reducing the potential of
user error. The system utilizes droplet generation similar to the DropSeq protocol, combining cells,
beads, and oil to produce single-cell droplets, referred to as gel bead-in emulsions (GEMs) (Zheng
40
et al, 2017). One major difference is the fact that reverse transcription of the bound mRNA
transcripts occurs within the droplets, prior to their breakage. This allows for maximal mRNA
capture and reduces the potential for degradation or loss following capture. The emulsion is
broken, the cDNA amplified, a library constructed, and sequenced using short-read next generation
sequencing (Zheng, et al, 2017). The final library construct is nearly identical to that in Figure 2.7,
containing P7 and P5 oligos, two reads, a sample index, and the insert DNA (Zheng, et al, 2017).
3.2.2 Advantages over DropSeq
One of the major advantages the 10X platform has over DropSeq is its ability to be automated,
with the only user dependent step being sample preparation. Additionally, up to 8 separate samples
can be loaded and processed at the same time, significantly reducing time and allowing for
massively parallel preparation of hundreds of thousands of cells at once. Additionally, data from
Zheng, et al shows that bead positivity is consistently high, while multiplicity remains very low,
80% and 5%, respectively (2017). This reproducibility eliminates the need for quality control
measures when using a validated instrument.
Another advantage is seen in the bead construct itself. As previously discussed, the beads
used in our experiments resulted in poor execution of the within droplet reverse transcription
protocol. The hydrogel construction of the 10X beads is significantly less abrasive and allows for
the reverse transcription to take place within the droplets, thereby maximizing capture and
minimizing degradation of bound transcripts (Zheng, et al, 2017).
Perhaps the most attractive benefit of the 10X system is the UMI and gene counts gathered
from each cell. As discussed, sequencing results from the PBMC25 and HEK293T/3T3 samples
yielded very low UMI counts and reads per cell. On average, 10X boasts upwards of 6,000 UMI
counts per cell, a massive difference when compared to the 309 UMI per cell recovered from our
41
PBMC experiment (10X Genomics). Further, comparison to manual workflows on the same cell
samples show the automated 10X platform to be consistently higher in gene reads per cell, as well,
always upwards of 1,800 versus the highly variable 1,300-,1800 seen with the manual workflow
(10X Genomics). In one dataset provided by 10X, a 5,000 PBMC sample was processed on their
Chromium platform (10X Genomics). In this sample, the median UMI detected per cell was 7,803
and the median genes detected per cell were 2,171; such a high UMI count suggests successful
capture of a wide variety of transcripts, allowing for greater gene expression information to be
extracted from this data. It is important to note, however, that since the library constructs at the
end of both the 10X and DropSeq workflows are identical, the data collected following next
generation sequencing is also identical, with the only major difference lying purely in the quality
of the data, where the 10X platform prevails.
42
Chapter 4: Bioinformatic Classification of PBMC Subtypes
4.1 Introduction to scRNA-seq Data Analysis Methods
A variety of methods have been developed to analyze the post-sequencing data from any single-
cell RNA sequencing experiment. As of March 2019, 385 different analysis tools exist for
analyzing scRNA-seq data, most of which are based in the programming languages R and Python
(Luecken & Theis, 2019). Raw data files generated following sequencing must be subject to
quality control and normalization measures in order to ensure valid downstream analysis for any
given application.
Quality control is critical prior to analysis of single-cell gene expression data, and the
measures commonly explore three variables: the count depth, or the number of counts per barcode,
the number of genes per barcode, and the number of counts that are derived from mitochondrial
mRNA (Luecken & Theis, 2019). Two different outliers can be detected: doublets and dying or
otherwise damaged cells. It is important to filter these outliers so as not to skew the data. Doublets
may manifest as cells with many detected genes and unusually high counts; conversely, dying or
damaged cells may present as having low counts, few genes, and a high proportion of genes that
are indicative of mitochondrial RNA (Luecken & Theis, 2019). It is also important that these three
variables be considered in unison, as each one individually may incorrectly filter out valid cells
(Luecken & Theis, 2019).
Following quality control measures, the counts must be normalized across the captured
cells. According to Leucken and Theis, the normalization method that is most commonly utilized
is known as counts per million normalization (2019). This method operates on the assumption that
all of the cells being analyzed began with identical amounts of mRNA, and that any difference
seen in the counts is a matter of sampling error (Bushel, et al, 2020). To do so, counts per gene are
43
divided “by the total number of mapped reads per sample and multipl[ied] by 1 × 10
6
” (Bushel, et
al, 2020). A variety of packages also exist to normalize gene counts, but the ultimate end goal
remains the same: to generate a baseline to compare gene expression changes to. This can be
critical in clinical settings, particularly if investigators wish to utilize single-cell RNA sequencing
to determine whether certain treatments or disease states affect the transcription of specific genes,
also known as differential expression.
The final step lies in processing the data in order to obtain matrices that can then be inputted
into these analytical tools (Luecken & Theis, 2019). The Cell Ranger platform is provided by 10X
and can be used to transform the raw data into matrices that account for the aforementioned quality
control and normalization measures (Zheng, et al, 2017). Following processing, these matrices can
be converted into a single gene expression matrix which lists all gene and UMI information for
each individual cell.
4.2 Parameter Selection: LM22
In order to develop a bioinformatic classification algorithm, parameters must be selected that will
serve as a dictionary or key for classifying each cell subtype. One group has developed a signature
gene matrix, henceforth referred to as LM22 (Newman, et al, 2015). In constructing this matrix,
the group selected for only relevant genes that can identify a cell type, thereby reducing the need
for massively large numbers of reference genes for a given cell. They were able to do so by
processing public datasets for 22 specific leukocyte subtypes and implementing support vector
regression to filter out the irrelevant genes (Newman, et al, 2015). By using these genes, the matrix
is able to identify the following 22 leukocyte (PBMC) subpopulations:
44
Table 4.1 PBMC Subtypes Identifiable by LM22 Matrix
B cell naive B cell memory Plasma cell
CD8 T Cell CD4 T Cell: naïve CD4 T cell: memory resting
CD4 T Cell: memory activated Follicular Helper T Cell Regulatory T Cell (Treg)
Gamma Delta T Cell NK Cell: resting NK cell: activated
Monocyte M0 Macrophage M1 Macrophage
M2 Macrophage Dendritic Cell: resting Dendritic Cell: activated
Mast Cell: resting Mast Cell: activated Eosinophil
Neutrophil
Literature searches were also conducted in order to validate the presence of these genes within
their specific cell subtype. While not all 547 were checked and validated against published works,
a search was done on which genes are most commonly expressed by each cell type. For example,
naïve B cells have been reported to express CD19, CD38, and CD45R, while memory B cells
express CD27, CD20, CD40, CXCR5, and CXCR6; even this small sample of genes shows that
these two activation states of the same cell express different genes, thereby allowing them to be
uniquely identified (Baumgarth, 2004; Agematsu, et al, 2000). The results from the literature
search correlate with the data shown in the LM22 matrix, and thus these 547 genes were used as
the dictionary for the newly constructed algorithm that will be described in the next section.
4.3 Pearson Correlation Based Algorithm
The basis of this new algorithm relies solely on the Pearson correlation between the genes in the
unknown cell’s transcriptome and the LM22 matrix. The Pearson correlation coefficient is a linear
correlation measure between two sets of data, and can be solved for using the following formula:
𝑟 = Σ ( 𝑎 ∗ 𝑏 )
√ Σ 𝑎 2
∗ Σ 𝑏 2
Eq. (18)
𝑎 = ( 𝑥 − 𝑥 ) ; 𝑏 = ( 𝑦 − ȳ )
where x is each individual value for a gene in the unknown cell’s transcriptome, 𝑥 represents the
average for these values, y represents each individual value for a gene in a cell subtype in the
LM22 matrix, and ȳ represents the average for these values. The correlation coefficient is used to
45
identify which cell type most closely matches the unknown cell, for which the highest value for r
is selected. The higher the r value, the more highly correlated the two sets of data, and so the
greater the cell similarity.
4.4.1 Results
A sample single-cell gene expression matrix for 5,000 PBMC cells was obtained from 10X
Genomics’ publicly available datasets; the absolute number of cells in the dataset is 5,022. This
specific sample dataset contained 21,819 genes, and so it was reduced to reflect only those that are
listed in the LM22 matrix. The code then selects for the highest r value for that particular cell, and
classifies that cell based on which LM22 cell subtype it correlates to the most. When the data from
the classification is compared to literature values of PBMC composition in a healthy sample, the
results are comparable (Figure 4.3). Literature values are highly variable and normally presented
in a range of percentages for the major cell types (i.e., T cell, B cell, monocyte, NK cell, etc.). One
study isolated PBMC by density gradient centrifugation across 8 samples and yielded an average
composition of 77 ± 9% lymphocytes and 21 ± 12% monocytes (Ulmer, et al, 1984). The values
generated by the novel algorithm fall within these ranges. Another, more recent study which
compared different PBMC isolation methods from healthy donor blood found, on average, the
composition to be 65%–70% T cells, 5%–22% B cells, 5%–14% NK cells, and 4-12% monocytes
(Grievink, et. al., 2016). Once again, the values fall within or close to these ranges, but it is clear
to see in both these studies and in Figure 4.1 that the composition is never concrete and can vary
substantially.
46
Figure 4.1 Comparison of Literature and Algorithm PBMC Subtype Percentages. (A)
Literature values from Bittersohl & Steimer (2016) show a rough breakdown of subtypes from
healthy humans’ peripheral blood mononuclear cells (PBMC) into T cells, B cells, monocytes, and
NK cells. (B) Values from Pearson correlation-based algorithm showing combined values of same
PBMC subtypes. Relatively similar breakdowns are seen suggesting accurate prediction of cell
types using the newly developed algorithm.
It is important to note that the experimental methods previously described are critical in the ability
to the development of such an algorithm. The ability to isolate and process PBMC, capture mRNA
transcripts, and generate sequencing data is the basis of entire pipeline that ultimately leads to cell
classification. Mastery and optimization of the aforementioned techniques allow for the generation
of the data that is inputted into the algorithm, and although the specific data that was obtained from
the PBMC25 and HEK293T/3T3 experiments was not used, it could easily have been had the
quality been similar to the output of the 10X platform.
47
Chapter 5: Practical Applications
5.1 HIV: Elite Controllers vs Rapid Progressors
The applications of scRNA-seq, and in particular, an algorithm which can identify cell subtypes
can be utilized to investigate a variety of disease states. Since the algorithm that was developed
simply utilizes a dictionary of sorts for classification, this can be easily modified with a new set of
parameters to identify any sort of cell population based on gene markers. Human
Immunodeficiency Virus (HIV) is a small retrovirus which primarily affects CD4 T cells in the
periphery (Saag & Deeks, 2010). Following the discovery of HIV as the cause for acquired
immunodeficiency syndrome (AIDS), a small subpopulation of HIV-positive individuals was
found to not progress to AIDS, deemed “long-term non-progressors” (Saag & Deeks, 2010). They
were observed to maintain high CD4 T cell counts despite infection and in the absence of
antiretroviral therapy for many years. The description of this subpopulation had now has become
more refined, defining those who have a viral load <50 copies of HIV RNA per mL of blood as
“elite controllers” and those with <2,000 copies of RNA/mL as “viremic controllers” (Saag &
Deeks, 2010). Conversely, the subpopulation known as “rapid progressors” progress to AID within
the first 3 years of HIV infection (Khanlou, et al, 1996). The ability to uncover the mechanisms
behind how these patients’ immune systems are naturally able to control infection would be
invaluable to the future of not only HIV treatment, but perhaps prevention, as well.
One application of the aforementioned pipeline could be used to identify and distinguish
the differences between the cells of elite controllers, normal progressors, and rapid progressors.
Ever since this discover, studies have begun attempting to characterize the gene expression
differences between these subpopulations of HIV-positive individuals. In fact, a variety of genes
have been found to be significantly differentially expressed between the elite controller and rapid
48
progressor subpopulations. One such example highlights the significant upregulation of perforin
and granzyme B in CD8+ T cells for elite controllers (Hersperger, 2011). Increased release of these
proteins by CD8+ T-cells allows for more effective killing of HIV-infected cells (Shi, et al, 1997).
This suggests that CD8+ cells in elite controller patients may somehow be at an advantage, as
increase in these proteins would allow for CD8+ cells to target and destroy more HIV-infected
cells. Further, the reported decrease of GrzB and perforin in senescent HIV specific cytotoxic T
lymphocytes (CTL) suggest that HIV infection inhibits the ability of these cells to be cytotoxic
and mediate apoptosis of other cells (Shi, et al, 1997). Another pathway involved in the elite
controller subpopulation is that of cytosolic DNA sensing, where SAMHD1 is upregulated in elite
controller CD4 cells, (Riveira-Munoz, et al, 2014; Wu, et al, 2013). The presence of the SAMHD1
protein blocks the reverse transcription of HIV RNA, allowing for control of HIV infection at the
cellular level (Maelfait, et al, 2016). This may very well prevent infection of the CD4 T cell
altogether. A multitude of other genes have been identified as being up or downregulated in these
special HIV subpopulations, all of which can potentially affect the outcome of the patient’s health.
By sequencing PBMC and classifying them, one can begin identifying these populations early on
during infection. This could potentially open the door to understanding how these populations are
different at the single cell level and could assist in the development of immunotherapeutics that
could mimic the elite controller responses to curb HIV infection.
5.2 COVID-19 Immune Profiling
The recent coronavirus disease (COVID-19) pandemic has allowed for the exploration of single-
cell sequencing as a method to understand how the virus is modulating immune function within its
host. The disease has shown to manifest very differently across individuals, some of whom remain
asymptomatic through infection, and some of whom succumb to severe disease resulting in acute
49
respiratory distress syndrome (ARDS) and death (Zhou, et al, 2020). Previous pandemics that
resulted from respiratory viruses such as SARS-CoV, Spanish influenza, and H1N1 revealed
distinct changes in the immune profiles of those infected, particularly the presence of a
characteristic “cytokine storm” preceding the progression to ARDS (Totura, et al, 2012). Patients
who had severe SARS exhibited aberrant interferon, interferon stimulated genes (ISGs), and
cytokine responses when compared to healthy individuals, and these cytokines were in patients at
the onset of ARDS in those who died compared to those who survived (Totura, et al, 2012; Rockx,
et al, 2009). Similarly, severe cases of COVID-19 had significantly higher levels of pro-
inflammatory cytokines such as IL-2R, IL-6, IL-10, and TNFa, when compared to moderate cases
(Chen, et al, 2020). Utilizing single-cell RNA sequencing would allow for the tracing of immune
cells’ gene expression changes among patients who become severely ill, and those who remain
asymptomatic or only become moderately ill.
Utilization of the aforementioned pipeline would allow for single cell capture and
identification of cell subtypes. Gene expression data can also be used to examine differentially
expressed genes within these cell subtypes in an attempt to understand what may cause certain
patients to transition to severe disease. A study published in 2020 utilized single cell transcriptomic
data from PBMC to characterize interferon responses in COVID-19 patients before, during, and
after being in the intensive care unit (ICU) (Wei, et. al). In order to understand these gene
expression changes, the researchers identified the genes that were differentially expressed in the
ICU patient samples. Genes that were enriched were involved in the defense response to viral
infection and included interferon stimulated genes (ISGs), suggesting that the virus may infect the
PBMCs during this ICU stage of infection, and that type I interferon responses are dominant (Wei,
et al, 2020). Aberrant immune
50
responses may be involved in the progression of COVID-19 to severe stage, and the utilization of
single cell sequencing on PBMC could allow for the discovery of prognostic biomarkers that may
dictate how a patient progresses (Prompetchara, et al, 2020; Chen, et al, 2020). Although
vaccination rates are increasing, the virus continues to ravage not only the United States, but also
low and middle-income countries who may not receive vaccines until late 2021 or well into 2022.
This potential use of the aforementioned pipeline could be critical in not only potential biomarker
discovery, but also in profiling immune cells following infection to understand the long-term
complications of COVID-19 infection.
51
Chapter 6: Summary and Future Directions
6.1 Summary
Sequencing of DNA and RNA had its beginnings in the mid 20
th
-century and has grown to become
one of the most diverse tools in science. The ability to analyze gene expression data at the single-
cell level has allowed for the illumination of cellular heterogeneity and expanded to encompass a
variety of clinical applications. Furthermore, scRNA-seq has been used to accurately predict and
classify unknown cell types through a variety of methods, including statistical clustering methods
and matrix-based methods that utilize reference gene markers to assign cell types.
DropSeq as a single-cell sequencing platform was discussed in depth, and ideally, the
ability to use microfluidics to capture a single bead and a single cell within a micro-sized droplet
should enhance resolution and allow for maximum mRNA capture. Quality control measures are
implemented to ensure that the outcome of the experiment allow for the best possible downstream
analysis. Steps taken to optimize the protocol were outlined. However, as discussed, the resulting
data that was generated following next generation sequencing was of poor quality, and while the
data was unusable due to the low UMI counts and reads per cell, the development of the subsequent
algorithm for cell classification utilizes input data that is identical to what is generated following
a DropSeq run.
The advantages of the 10X single cell sequencing pipeline were discussed, as were the
benefits of a nearly fully automated experimental system. This allows for less user error and
superior reproducibility across experiments. The most important advantage that 10X holds over
DropSeq, at least in the data we generated, remains to be its ability to capture thousands of UMI
per cell compared to our roughly 300. Analysis that utilizes such a limited number of genes, such
as the algorithm that was developed, requires high quality data to get the most reliable and valid
52
results, which is why the dataset that was used for analysis was a sample set completed on the 10X
system for scRNA-seq library development.
A Pearson-correlation based algorithm was briefly discussed. The ability to utilize data
generated from the aforementioned DropSeq protocol was emphasized. The algorithm’s ability to
classify PBMC into 22 cell types was also illustrated. The resulting data showed similar results to
the literature values of PBMC composition in healthy individuals.
Finally, potential applications of the pipeline were discussed, namely its ability to be used
in the identification of HIV-positive subpopulations such as elite controllers and rapid progressors.
These individuals have distinct gene expression profiles, as reported by a number of studies. The
algorithm can be used not only to identify cell types but can be modified to include a dictionary of
gene markers that may identify one of the subpopulations. This could be invaluable to the early
detection of patients who may be rapid progressors and it can be used to identify how elite
controllers differ from the remainder of the HIV-positive population, thus opening the door for the
potential development of immunotherapeutics. The use of this pipeline was also explored in
immune profiling of COVID-19. Potential biomarker discovery was examined along with the use
of the pipeline to understand long-term complications caused by infection.
6.2 Future Directions
Despite the successful development of a PBMC processing and cell classification pipeline, this
project can be further optimized and expanded upon.
6.2.1 Further DropSeq Optimization
An attempt can be made to further optimize the DropSeq protocol. My suggestion would
be to utilize hydrogel beads rather than plastic ones and reattempt the reverse transcription within
the droplet. As mentioned, the hard surface of the plastic beads hindered our ability to successfully
53
complete a within-droplet RT protocol, as the walls of the droplets were too delicate to be shaken
with the bead inside them. Substitution with a hydrogel bead would eliminate that issue and could
result in better mRNA capture and thus, better UMI recovery post-sequencing. Additionally,
droplet rotation testing can be done using faux beads that mimic the capture ones in an attempt to
isolate the ideal rotation speed with beads in the droplets. This could supersede the need for
hydrogel beads if there was a way to allow for the existing beads to shake without rupturing the
droplet.
6.2.2 Future Projects
The next project that I would theoretically like to see utilizing this pipeline would be the
HIV subpopulation analysis. I believe that accurately subtyping these cells and finding DEGs
between these populations could be extremely beneficial to the future of HIV treatment. I would
first start by profiling known elite controllers and rapid progressor in an unblinded fashion to
determine which genes are preferentially upregulated or deliberately downregulated among these
populations. Once these genes are known, testing can be done on blinded samples to see if these
genes can accurately predict which subtype these patients fall into. Should it work, as soon as a
person tests positive for HIV, further testing can be done to identify if they fall into any of the
subpopulations. Other than patient identification, the data generated could be used to learn how to
modulate the immune system to respond similarly to how that of elite controllers’ responds to HIV
infection.
54
References
Agematsu, K., Hokibara, S., Nagumo, H., & Komiyama, A. (2000). CD27: A memory B-cell
marker. Immunology Today, 21(5), 204-206. doi:10.1016/s0167-5699(00)01605-4
Aljanahi, A. A., Danielsen, M., & Dunbar, C. E. (2018). An Introduction to the Analysis of
Single-Cell RNA-Sequencing Data. Molecular Therapy - Methods & Clinical
Development, 10, 189-196. doi:10.1016/j.omtm.2018.07.003
An Overview of HEK-239 Cell Line. (n.d.). Retrieved from
https://www.beckman.com/resources/product-applications/lead-optimization/cell-line-
development/human-embryonic-kidney-293
Aran, D., Looney, A. P., Liu, L., Wu, E., Fong, V., Hsu, A., . . . Bhattacharya, M. (2019).
Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic
macrophage. Nature Immunology, 20(2), 163-172. doi:10.1038/s41590-018-0276-y
Average RNA Yields [PDF]. (2009). Bergisch Gladbach: Miltenyi Biotec GmbH.
Baumgarth, N. (2004). B-Cell Immunophenotyping. Methods in Cell Biology Cytometry, 4th
Edition: New Developments, 643-662. doi:10.1016/s0091-679x(04)75027-x
Bittersohl, H., & Steimer, W. (2016). Intracellular concentrations of immunosuppressants. In
Personalized Immunosuppression in Transplantation (pp. 199-226). Waltham, MA:
Elsevier.
Bushel, P. R., Ferguson, S. S., Ramaiahgari, S. C., Paules, R. S., & Auerbach, S. S. (2020).
Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA
Sequencing Data. Frontiers in Genetics, 11. doi:10.3389/fgene.2020.00594
Cao, Z., Wei, L., Lu, S., Yang, D., & Gao, G. (2020). Searching large-scale scRNA-seq
databases via unbiased cell embedding with Cell BLAST. Nature Communications,
11(1). doi:10.1038/s41467-020-17281-7
Cao, Z., Wei, L., Lu, S., Yang, D., & Gao, G. (2020). Searching large-scale scRNA-seq
databases via unbiased cell embedding with Cell BLAST. Nature Communications,
11(1). doi:10.1038/s41467-020-17281-7
Carter, R. A., Bihannic, L., Rosencrance, C., Hadley, J. L., Tong, Y., Phoenix, T. N., . . . Gawad,
C. (2018). A Single-Cell Transcriptional Atlas of the Developing Murine Cerebellum.
Current Biology, 28(18). doi:10.1016/j.cub.2018.07.062
Cell BLAST. (n.d.). Retrieved from https://cblast.gao-lab.org/
Cell Data Sheet NIH/3T3 [PDF]. (n.d.). Eugene: Invitrogen.
55
Chen, G., Wu, D., Guo, W., Cao, Y., Huang, D., Wang, H., . . . Ning, Q. (2020). Clinical and
immunological features of severe and moderate coronavirus disease 2019. Journal of
Clinical Investigation, 130(5), 2620-2629. doi:10.1172/jci137244
Chromium Single Cell Gene Expression - 10x Genomics. (n.d.). Retrieved from
https://www.10xgenomics.com/products/single-cell-gene-expression
Grievink, H. W., Luisman, T., Kluft, C., Moerland, M., & Malone, K. E. (2016). Comparison of
Three Isolation Techniques for Human Peripheral Blood Mononuclear Cells: Cell
Recovery and Viability, Population Composition, and Cell Functionality.
Biopreservation and Biobanking, 14(5), 410-415. doi:10.1089/bio.2015.0104
Heather, J. M., & Chain, B. (2016). The sequence of sequencers: The history of sequencing
DNA. Genomics, 107(1), 1-8. doi:10.1016/j.ygeno.2015.11.003
Hersperger, A. R., Martin, J. N., Shin, L. Y., Sheth, P. M., Kovacs, C. M., Cosma, G. L., . . .
Betts, M. R. (2011). Increased HIV-specific CD8 T-cell cytotoxic potential in HIV elite
controllers is associated with T-bet expression. Blood, 117(14), 3799-3808.
doi:10.1182/blood-2010-12-322727
Jablonski, K. A., Amici, S. A., Webb, L. M., Ruiz-Rosado, J. D., Popovich, P. G., Partida-
Sanchez, S., & Guerau-De-Arellano, M. (2015). Novel Markers to Delineate Murine M1
and M2 Macrophages. Plos One, 10(12). doi:10.1371/journal.pone.0145342
Khanlou H, Salmon-Ceron D, Sicard D. [Characteristics of rapid progressors in HIV infection]
Annales de Medecine Interne. 1997 ;148(2):163-166.
Kuse, R., Schuster, S., Schübbe, H., Dix, S., & Hausmann, K. (1985). Blood lymphocyte
volumes and diameters in patients with chronic lymphocytic leukemia and normal
controls. Blut, 50(4), 243-248. doi:10.1007/bf00320301
Luecken, M. D., & Theis, F. J. (2019). Current best practices in single‐cell RNA‐seq analysis: A
tutorial. Molecular Systems Biology, 15(6). doi:10.15252/msb.20188746
Macosko, E., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., . . . Mccarroll, S.
(2015). Highly Parallel Genome-wide Expression Profiling of Individual Cells Using
Nanoliter Droplets. Cell, 161(5), 1202-1214. doi:10.1016/j.cell.2015.05.002
Maelfait, J., Bridgeman, A., Benlahrech, A., Cursi, C., & Rehwinkel, J. (2016). Restriction by
SAMHD1 Limits cGAS/STING-Dependent Innate and Adaptive Immune Responses to
HIV-1. Cell Reports, 16(6), 1492-1501. doi:10.1016/j.celrep.2016.07.002
Martinez-Jimenez, C. P., Eling, N., Chen, H., Vallejos, C. A., Kolodziejczyk, A. A., Connor, F., .
. . Odom, D. T. (2017). Aging increases cell-to-cell transcriptional variability upon
immune stimulation. Science, 355(6332), 1433-1436. doi:10.1126/science.aah4115
56
Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., . . . Alizadeh, A. A.
(2015). Robust enumeration of cell subsets from tissue expression profiles. Nature
Methods, 12(5), 453-457. doi:10.1038/nmeth.3337
Ozsolak, F., & Milos, P. M. (2010). RNA sequencing: Advances, challenges and opportunities.
Nature Reviews Genetics, 12(2), 87-98. doi:10.1038/nrg2934
Prompetchara, E., Ketloy, C., & Palaga, T. (2020). Immune responses in COVID-19 and
potential vaccines: Lessons learned from SARS and MERS epidemic. Asian Pacific
Journal of Allergy and Immunology. doi:10.12932/ap-200220-0772
Riveira-Munoz, E., Ruiz, A., Pauls, E., Permanyer, M., Badia, R., Mothe, B., . . . Este, J. A.
(2014). Increased expression of SAMHD1 in a subset of HIV-1 elite controllers. Journal
of Antimicrobial Chemotherapy, 69(11), 3057-3060. doi:10.1093/jac/dku276
Rockx, B., Baas, T., Zornetzer, G. A., Haagmans, B., Sheahan, T., Frieman, M., . . . Katze, M. G.
(2009). Early Upregulation of Acute Respiratory Distress Syndrome-Associated
Cytokines Promotes Lethal Disease in an Aged-Mouse Model of Severe Acute
Respiratory Syndrome Coronavirus Infection. Journal of Virology, 83(14), 7062-7074.
doi:10.1128/jvi.00127-09
Saag, M., & Deeks, S. (2010). How Do HIV Elite Controllers Do What They Do? Clinical
Infectious Diseases, 51(2), 239-241. doi:10.1086/653678
Saliba, A., Westermann, A. J., Gorski, S. A., & Vogel, J. (2014). Single-cell RNA-seq: Advances
and future challenges. Nucleic Acids Research, 42(14), 8845-8860.
doi:10.1093/nar/gku555
Schelker, M., Feau, S., Du, J., Ranu, N., Klipp, E., Macbeath, G., . . . Raue, A. (2017).
Estimation of immune cell content in tumour tissue using single-cell RNA-seq data.
Nature Communications, 8(1). doi:10.1038/s41467-017-02289-3
Shi, L., Mai, S., Israels, S., Browne, K., Trapani, J. A., & Greenberg, A. H. (1997). Granzyme B
(GraB) Autonomously Crosses the Cell Membrane and Perforin Initiates Apoptosis and
GraB Nuclear Localization. Journal of Experimental Medicine, 185(5), 855-866.
doi:10.1084/jem.185.5.855
Staroscik, A. (2004, January 29). Retrieved from http://cels.uri.edu/gsc/cndna.html
Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., . . . Surani, M. A. (2009).
MRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods, 6(5), 377-
382. doi:10.1038/nmeth.1315
Tang, X., Huang, Y., Lei, J., Luo, H., & Zhu, X. (2019). The single-cell sequencing: New
developments and medical applications. Cell & Bioscience, 9(1).
doi:10.1186/s13578-019-0314-y
57
Tasic, B. (2018). Single cell transcriptomics in neuroscience: Cell classification and beyond.
Current Opinion in Neurobiology, 50, 242-249. doi:10.1016/j.conb.2018.04.021
Totura, A. L., & Baric, R. S. (2012). SARS coronavirus pathogenesis: Host innate immune
responses and viral antagonism of interferon. Current Opinion in Virology, 2(3), 264-
275. doi:10.1016/j.coviro.2012.04.004
Ulmer, A., Scholz, W., Ernst, M., Brandt, E., & Flad, H. (1984). Isolation and Subfractionation
of Human Peripheral Blood Mononuclear Cells (PBMC) by Density Gradient
Centrifugation on Percoll. Immunobiology, 166(3), 238-250. doi:10.1016/s0171-
2985(84)80042-x
Wei, L., Ming, S., Zou, B., Wu, Y., Hong, Z., Li, Z., . . . Huang, X. (2020). Viral Invasion and
Type I Interferon Response Characterize the Immunophenotypes during COVID-19
Infection. SSRN Electronic Journal. doi:10.2139/ssrn.3555695
Wu, J., Sassé, T., Saksena, M., & Saksena, N. K. (2013). Transcriptome analysis of primary
monocytes from HIV-positive patients with differential responses to antiretroviral
therapy. Virology Journal, 10(1), 361. doi:10.1186/1743-422x-10-361
Zhang, J., Cao, J., Ma, S., Dong, R., Meng, W., Ying, M., . . . Yang, B. (2014). Tumor hypoxia
enhances non-small cell lung cancer metastasis by selectively promoting macrophage
M2 polarization through the activation of ERK signaling. Oncotarget, 5(20), 9664-9677.
doi:10.18632/oncotarget.1856
Zhang, L., Yu, X., Zheng, L., Zhang, Y., Li, Y., Fang, Q., . . . Zhang, Z. (2018). Lineage tracking
reveals dynamic relationships of T cells in colorectal cancer. Nature, 564(7735), 268-
272. doi:10.1038/s41586-018-0694-x
Zheng, G. X., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., . . . Bielas, J. H.
(2017). Massively parallel digital transcriptional profiling of single cells. Nature
Communications, 8(1). doi:10.1038/ncomms14049
Zhou, F., Yu, T., Du, R., Fan, G., Liu, Y., Liu, Z., . . . Cao, B. (2020). Clinical course and risk
factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A
retrospective cohort study. The Lancet, 395(10229), 1054-1062. doi:10.1016/s0140-
6736(20)30566-3
Abstract (if available)
Abstract
We performed peripheral blood mononuclear cell isolation to obtain gene expression data at the single cell resolution. We generated single cell RNA sequencing data of PBMCs beginning from whole blood, focusing on a droplet based cellular isolation method which allows for transcript capture and library generation. Specifically, we isolated single cells within droplets, captured mRNA transcripts, transcribed them into cDNA, and generated a sequencing library. We optimized the single cell RNA sequencing workflow by adjusting pump pressures, identifying ideal solution concentrations, testing PCR conditions, and attempting a within-droplet variation of our protocol. For our PBMC sample after optimization, we saw an average of 309 UMI per cell with 34 reads per UMI. We also performed droplet capture on a mixture of HEK293T and 3T3 mouse cells to test the resolution of our system. In the mixed cell sample, we saw no doublets of the human and mouse cell types which demonstrates the capacity to obtain a single cell resolution using this platform. The limitations of the generated data including the low UMI counts and the low reads per UMI are discussed, along with potential reasons why this may have occurred. Our workflow is also compared to another scRNA sequencing platform. The utilization of this pipeline in bioinformatic cell classification is briefly explored. Finally, we discussed the potential applications of this approach, including HIV patient immune profiling, potential development of immunotherapeutics, and COVID-19 immune profiling.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Peripheral blood mononuclear cell classification with single-cell RNA sequencing data
PDF
A low-cost and open-source automation platform for HIV surveillance
PDF
Design and characterization of multiplex anti-HIV single domain antibodies for genome editing of the immunoglobulin locus
Asset Metadata
Creator
Ter-Saakyan, Sonia
(author)
Core Title
Peripheral blood mononuclear cell capture and sequencing: optimization of a droplet based capture method and its applications
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Molecular Microbiology and Immunology
Publication Date
03/22/2021
Defense Date
03/04/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cell classification,COVID-19,DropSeq,HIV,OAI-PMH Harvest,PBMC,RNA,scRNA-seq,sequencing,single cell,UMI
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lee, Hayoun (
committee chair
), Comai, Lucio (
committee member
), Eoh, Hyungjin (
committee member
), Yuan, Weiming (
committee member
)
Creator Email
stersaak@usc.edu,stersaakyan@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-430854
Unique identifier
UC11668597
Identifier
etd-TerSaakyan-9347.pdf (filename),usctheses-c89-430854 (legacy record id)
Legacy Identifier
etd-TerSaakyan-9347-1.pdf
Dmrecord
430854
Document Type
Thesis
Rights
Ter-Saakyan, Sonia
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
cell classification
COVID-19
DropSeq
HIV
PBMC
RNA
scRNA-seq
sequencing
single cell
UMI