Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Use of cell-free nucleic acids in associating PD-L1 gene expression with presence of driver mutations in DNA and demographics across different cancers
(USC Thesis Other)
Use of cell-free nucleic acids in associating PD-L1 gene expression with presence of driver mutations in DNA and demographics across different cancers
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
USE OF CELL-FREE NUCLEIC ACIDS IN ASSOCIATING PD-L1
GENE EXPRESSION WITH PRESENCE OF DRIVER MUTATIONS IN
DNA AND DEMOGRAPHICS ACROSS DIFFERENT CANCERS
by
Joshua Usher
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
BIOSTATISTICS
August 2016
1
TABLE OF CONTENTS
INTRODUCTION 2
METHODS 10
RESULTS 19
DISCUSSION 45
CONCLUSION 48
REFERENCES 50
APPENDIX 54
2
INTRODUCTION
Cell-Free DNA and RNA: The Emerging Plasma Biopsy
It is now widely accepted to analyze tumor biomarkers in cancer patients prior to
treatment for evaluating their prognosis and for guiding the selection of their initial
therapy. This practice of “individualized therapy” has resulted in significant
improvement in cancer treatment. The standard procedure (until recently) has been to
take a tissue biopsy from the tumor site and analyze it for various relevant biomarkers
such as protein levels, gene mutations, and gene expressions. The risks and
disadvantages attached to this pioneer method of the tissue biopsy are numerous [1, 2].
At the forefront of these drawbacks is the reality that tumors evolve genetically over time
and with treatment [3, 4]. Consequently, in order to identify newly emerging biomarkers,
biopsies would need to be taken at regular intervals (every 4-6 weeks) to pinpoint when
such adverse events occur. However, acquiring fresh tumor biopsies in most cases
cannot be achieved at regular frequent intervals. In certain cancers, for instance,
accessing a tumor site (such as the pancreas) requires an invasive surgical procedure – a
process that further burdens the already compromised immune system of the patient.
Moreover, only weeks following such a surgery, enough time may have elapsed to allow
a tumor to mutate significantly, thus rendering the biopsy obsolete in terms of its
representation of the tumor’s biology. But even if multi-sampling of tumors were
possible, a completely different caveat to the tissue approach would quickly be realized –
a caveat that permeates the realms of economics. The average cost of a biopsy in lung
cancer, for instance, is $14,672 [5]. Moreover, approximately 20% of the time, serious
3
complications are encountered during surgical biopsies – the cost then exceeds $37,000
[5].
Aside from injury caused to the patient’s tissues and the appreciable costs
associated with the procedure, however, is that in tumors with considerable heterogeneity
such as in that of prostate [6], breast and colon [7] tissues, taking a needle biopsy of a
solid tumor does not guarantee that the area being sampled is representative of the
majority of the tissue. Therefore, surgical biopsies may miss a part of the tumor
containing genetic mutations of prognostic significance.
Isolating circulating cell-free DNA (cfDNA) in plasma for the presence of
mutations and cell-free RNA (cfRNA) for the analysis of gene expressions offers a better
alternative than taking solid biopsies directly from tumor sites. A major part of the
burgeoning interest in circulating biomarkers is that they provide representative readouts
of the entire primary tumor while also capturing metastatic sites. This is because cfDNA
and cfRNA are exuded into the bloodstream from all tumor areas in the patient. Cell-free
material is, therefore, a composite of genetic information from the growing and dying
cells of a tumor’s complete makeup.
Two Approaches to Acquiring Circulating Nucleic Acids
The established methods for obtaining free-floating nucleic acids include (1) the
capture and lysing of circulating tumor cells (CTCs) to release the nucleic acids within,
and (2) fractionating plasma from blood and isolating cfDNA or cfRNA. CTCs have
important roles in tumor dissemination and progression, but tumors also release free
DNA and RNA into circulation [8]. The noninvasive nature of each method presents an
advantage over the aforementioned surgical sampling of tumor tissues. However, there is
4
variability in techniques for isolating CTCs, and thus a lack of standardization for the
assessment of CTCs constitute notable challenges for the implementation of these
technologies in clinical practice [8]. For instance, the vast majority of cells captured in a
blood draw are healthy red and white blood cells and just a few actual CTCs [9]. In order
to enrich CTCs while simultaneously diminishing the overabundance of normal cells,
intensive cell culturing with antibodies and incubators is generally required [10]. This
process is time-consuming, and therefore not a viable option for doctors, clinical trial
management, or patients seeking quick turnaround time. Additionally, the diagnostic
relevance of CTCs is questionable, based on the somewhat serendipitous characteristic of
CTCs to break off from solid tumors without consistently representing all tumor growth
sites or treated areas.
The alternate method of isolating cell-free nucleic acids from plasma
demonstrates features more promising than that of CTCs. A 2013 study comparing
cfDNA with CTCs observed that cfDNA demonstrated greater sensitivity and could
identify disease progression up to 5 months in advance of imaging [11]. With regard to
sample handling, no enrichment is required after procuring a blood sample to obtain cell-
free nucleic acids. The 10mL tubes used for drawing blood samples contain chemical
“fixatives”, which act as stabilizers, both preventing the lysing of nucleated (white) blood
cells and preserving cell-free nucleic acids from degradation, thus allowing the sample to
remain uncompromised at room temperature for a number of days [12]. This window of
stability is limited however; blood tubes need to be centrifuged as soon as possible in
order to fractionate the plasma (containing the cell-free nucleic acids) away from the
white blood cells before they have a chance to lyse, and release their normal-cell genomic
5
DNA (gDNA) into the plasma [13]. This is critical because less than 10% of the total
cfDNA in a cancer patient is of tumor cell origin to begin with [14], thus any release of
gDNA by a white blood cell could easily increase the gDNA background enough to mask
the true cfDNA targets [12, 13, 15]. Another substantial risk is the degradation of cfRNA
due to the high concentrations of RNases present in plasma, as documented in a study
measuring the steady decrease in plasma β-actin mRNA over time [16].
The process of fractionating plasma from the blood takes only about 20 minutes.
Upon centrifugation of blood tubes, the high radial acceleration causes denser materials
(such as intact blood cells) to descend to the bottom of the tube, while the more buoyant
substances (plasma containing cell-free nucleic acids) will gravitate to the top. Once
separated, plasma can then either undergo isolation and purification of cfDNA/cfRNA, or
be safely stored for long periods by freezing at -80°C (even 10 years at a time).
PD-L1: Emerging Target as a Multi-Tumor Biomarker
While analysis of DNA sequence detects mutations of nucleotide base pairs
within given genes, quantitative measurement of mRNA determines the expression levels
of specific genes. Because mRNA carries the code for protein synthesis obtained by
transcription of the genetic information in DNA, mutations detected in DNA are perforce
reflected in the mRNA, which is eventually to be translated into aberrant proteins. Thus,
cfRNA contains two levels of information: the genetic information contained in cfDNA
as well as the additional information on the expression of genes. Quantitating a gene’s
expression level in cfRNA may be just as important as detecting a point mutation in
6
cfDNA, especially with respect to genes whose expression in certain cancers are
prognostic and therapeutic biomarkers.
Programmed Cell Death Ligand 1 (PD-L1) is a protein that affects cellular
activation or inhibition by binding to its receptor, PD-1, found on the surface of T cells, B
cells, and myeloid cells [19–21]. This immune checkpoint regulator has received much
attention in the field of immuno-oncology, where it has been described as a promising
target across a broad array of tumor types [22]. Trials of therapies targeting PD-L1 are
continually underway [23]. High expression of the protein PD-L1 in non-small cell lung
cancer (NSCLC) is predictive of response to a monoclonal antibody (Nivolumab), shown
to cause a 3.2-month increase in median overall survival (OS) in patients with advanced
squamous cell NSCLC (CheckMate 017 trial) [17], and a 2.8-month increase in median
OS in advanced non-squamous patients (CheckMate 057 trial) [18].
With plasma-based screening of cfRNA, PD-L1 expression can be measured at
regular intervals. Its assessment in excisional tissue biopsies is limited due to the
infeasibility of performing exhaustive analyses of isolated regions of tissue and of
continued monitoring through repeat biopsies. The possibility of tracking changes in PD-
L1 expression levels alongside the potential emergence of co-occurring genetic anomalies
may provide valuable insight concerning treatment response to certain agents. Thus,
associations, or lack thereof, between PD-L1 expression in RNA and presence/absence of
mutations in DNA are topics of considerable interest.
PD-L1 and BRAF Association
The v-Raf murine sarcoma viral oncogene homolog B (BRAF) gene is a serine-
threonine protein kinase, which, through dimerization of RAF molecules, phosphorylates
7
and activates MEK. A central moderator in the MAP kinase pathway, RAF plays a role
in various cell growth processes, including cellular division, differentiation, and
secretion. Active BRAF mutants commonly cause cancer by signaling cell growth
without restraint. Therapies targeting the BRAF gene (most effective in the presence of
V600 mutations) by blocking the gene’s kinase domain have been developed for the
treatment of melanoma [24], in which BRAF mutations are prevalent in 40-50% of cases.
BRAF mutations (primarily V600E) also appear in 5-15% of colorectal cancers [25]. Yet
despite the success of BRAF-targeted therapies in melanoma, the impact of such
interventions was not as promising among colorectal cancer patients according to recent
studies focused on BRAF inhibition in V600-mutated colorectal cancers [26]. However,
because mutations in BRAF and EGFR have been found to be mutually exclusive of one
other, presence of V600E in colorectal cancer can be used as a marker for resistance to
anti-EGFR monoclonal antibodies [28].
A lack of correspondence between the BRAF V600E mutation and PD-L1
expression was observed in a study that searched for associations between these
biomarkers in melanocytic lesions. Immunohistochemistry (IHC) was the employed
methodology for establishing expression levels of PD-L1 and V600E [27]. An evaluation
of potential associations gathered from cfDNA and cfRNA could lend further support to
these findings.
PD-L1 and EGFR Association
The epidermal growth factor receptor (EGFR) is a surface protein receptor for
epidermal growth factors of extracellular occurring protein ligands [29]. Over-expression
of EGFR is present in 40-80% of non-small-cell lung cancers (NSCLC) [31]. A recent
8
study observed that the presence of EGFR in post-surgical resection NSCLC correlated
with high expression of PD-L1 as measured by IHC [30]. This finding merits further
exploration in plasma-based screening of cfDNA and cfRNA, which could provide
greater accuracy and detail in examining the correspondence.
PD-L1 and KRAS Association
The V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) gene
functions as a molecular on/off switch. When the gene is on, KRAS engages and
activates various proteins necessary for the propagation of growth factor and signaling of
other receptors. When mutation of the KRAS gene occurs, normal tissue signaling
faculties become disrupted [32]. KRAS mutations have been observed to be mutually
exclusive to that of DNA mutations within EGFR, and are associated with poorer
treatment response to EGFR-directed therapies [33]. Research efforts continue to address
alternative approaches to treating patients in whom KRAS mutations are present and
recent investigations have sought to determine the potential utility of immune checkpoint
blockade based interventions. In a study published in August 2015, no correspondence
was found to exist between PD-L1 positivity and mutational presence of EGFR or KRAS,
though limitations in the methodologies employed were noted and could well have
contributed to the lack of associations recorded [34]. Another study published in January
2015, described significant correlations between KRAS and EGFR mutational status and
the presence of PD-L1, where a far greater prevalence of PD-L1 positivity was noted in
EGFR versus KRAS mutant expressers [35]. Clearly, PD-L1 positivity in connection
with oncogenes such as KRAS and EGFR deserves further study and analysis of such
9
potentially beneficial or negative relationships alongside the continued development of
immune checkpoint blockade therapies.
Purpose
The aim of this study was to determine both the frequency and relative levels of
PD-L1 gene expression in the cfRNA of patients with different cancer types. We expect
that both the frequency and levels of PD-L1 expression should differ across tumor
histology, as its genetic signature is a unique pathway with certain genes within which
individual cancer types have their own sets of driver and passenger mutations. Similarly,
of interest in this study was to determine the frequency of PD-L1 expression in the
presence (or absence) of specific point mutations common to the genes BRAF, EGFR,
and KRAS. In accordance with findings already reported across studies [27, 30, 35], we
expected to find an increased frequency of PD-L1 expression among EGFR-mutated
cancer patients, and not so much among those harboring either BRAF or KRAS
mutations. It was also expected that one or more secondary variables (such as age,
gender, total amount of cfRNA in samples, state/country samples were collected from,
and institution collected from) would have a significant influence on any associations
found between PD-L1 frequency and mutational status of a gene.
A practical area of focus to be addressed in this study pertains to the proprietary
stabilizing technology built into the tubes used for blood draws. Specifically, we wanted
to know within what time frame (days) after sample collection and before fractionation
cfRNA and cfDNA begin to degrade, ultimately compromising the amount of copies
expected at the end of the isolation pipeline. Addressing this question would be important
to define the true parameters of this technology’s clinical utility. Based on numerous
10
reports of the labile nature of RNA especially in the bloodstream [36], we expected to see
at least some cfRNA degradation (whether or not statistically significant) after the first 24
hours of a blood draw. In contrast, we anticipated cfDNA to remain stable for longer
periods, and even that cfDNA (versus cfRNA) might experience significant increases in
copy number over time, since the nucleus of white blood cells is heavily concentrated
with gDNA, which would be released into plasma as these cells continue to lyse.
METHODS
Description of Patient Sample Set
For this study, all blood samples collected were consented by the patient for
testing. The patient study sample included patients with one of the following cancer
types: (1) colorectal, (2) non-small cell lung cancer (NSCLC), (3) breast, (4) gastric, and
(5) prostate. Within the NSCLC group, there was a random and unrecorded mixture of
the squamous cell carcinoma and adenocarcinoma histologic subtypes (reports of
significant variability in frequency of PD-L1 expression across these two subgroups will
be addressed in the Discussion). Blood samples were collected from a total of 191
patients (see Results for distribution by tumor type). For patients with multiple blood
draws, only the first sample was used in this analysis. All patients were in stage III or IV
of their disease progression. Therapeutic regimens (past and current) as well as survival
outcome were not obtained for most of the patients in the study.
During the process of patient recruitment and sample acquisition, other variables
of interest were recorded, including age, gender, date of blood draw, season of blood
11
draw, physician ID, and number of days between date of a sample’s collection and the
date of its processing (hereupon called days_intube). Several variables across this study
were not captured for every patient; the number of missing values under each of these
variables is listed with the summary statistics for all of the demographic variables
(TABLE 12).
Specimen Collection
At the time of sample collection from each patient, blood was drawn (by
venipuncture) into two different 10mL tubes – one containing a proprietary stabilizer
designed for cfDNA preservation at room temperature; the other designed for cfRNA.
Each stabilizer consisted of approximately 1/8
th
of a teaspoon of liquid, located at the
base of the tube. Immediately upon sample collection, each tube was manually inverted
10 times to thoroughly mix the stabilizing component throughout. Collection instructions
were emphasized to participating clinicians, most of who confirmed that tubes had been
inverted the 10 times. Samples were kept at ambient temperatures (18-20°C) and shipped
priority from institutions.
Fractionation, Isolation, Reverse Transcription
Upon receipt, blood tubes were inspected for any signs of compromise, i.e.
cracked tubes or insufficient sample quantity. Only tubes that passed quality control
were processed. Plasma fractionation was achieved by centrifuging tubes at 16,000x g
for 20 minutes. Separated plasma layers were removed by pipetting them off the top,
being careful not to disturb the buffy coat. Nucleic acids were isolated and purified using
2mL of plasma input for each sample isolation (the isolation pipeline is proprietary to
12
Liquid Genomics, Inc. and so will not be described here). Isolated cfDNA and cfRNA
samples were eluted in 60µL of salt buffer as a final volume. cfDNA samples were
refrigerated at 4°C; cfRNA samples were aliquotted in halves – one stored at -80°C, and
the other reverse transcribed into cDNA using random primers purchased from Invitrogen
(Foster City, CA). In each reverse transcription reaction, 28µL of cfRNA was inputted
along with 12µL of primers and reagents (cfRNA concentration was diluted by 30%); this
was taken into account when back-calculating total cfRNA.
Real-Time Quantitative PCR Assays
The DNA target assays tested in each patient sample varied depending on the
cancer type, such that only clinically relevant genes were tested. Specifically, BRAF
V600E was tested only in colon samples, the EGFR mutations (exon19del, L858R) were
tested only in NSCLC samples, and the selected KRAS panel (G12A, G12C, G12D,
G12S, G12V, G13D) was tested in both colon and NSCLC samples. No prostate, breast,
or gastric samples were tested for any targets in DNA. Reference (wild-type) assays for
BRAF, EGFR, and KRAS were run in parallel with the mutation assays across samples,
so that all mutant signals could be quantified relative to the wild-type signal (used to
distinguish false-positives). The primers and probes for all of these DNA-based
TaqMan
®
assays were purchased from Applied Biosystems (Foster City, CA).
The RNA assays used to measure gene expression were tested across all of the
cancer types. These assays included PD-L1, β-Actin, and ERCC1. β-Actin was used as
the denominator housekeeping gene to quantify total cfRNA across all samples. ERCC1
expression was quantified for its non-tumor specific status; significant variance in its
relative expression across samples could offer quality control related insight (i.e., error in
13
sample processing). All primers and probes for these TaqMan
®
expression assays were
purchased from Applied Biosystems (Foster City, CA), except for the β-Actin primers,
which were purchased from Integrated DNA Technologies (IDT) (Coralville, IA).
Limits of detection (LODs) for calling true-/false-positives and negatives in the
cycle thresholds (CTs) reported under each assay were established by performing a
method comparison validation across real-time quantitative polymerase chain reaction
(RT-qPCR) and droplet digital PCR (ddPCR) platforms (LODs are intellectual property
of Liquid Genomics, Inc. and so will not be disclosed here). In this study, the validated
LODs for each DNA variant tested and each gene expression measured were used to
assess detection status.
Copy Number and Relative Gene Expression Calculations
Universal Human Reference RNA (UHRR) (100µg, dry pellet), a compilation of
total RNA from 10 different human cell lines, was purchased from Agilent Technologies
(Santa Clara, CA). UHRR was diluted 1:1000 and reverse transcribed. RT-qPCR of β-
Actin was run on replicates of UHRR cDNA across 4 dilutions (1:5000, 1:10000,
1:20000, 1:40000), to create a standard curve that correlated β-Actin CTs with known
concentrations of RNA in copies/reaction (TABLE 1, Appendix). The standard curve
constructed was used to calculate total cfRNA in copies per 1mL of plasma (FIGURE 2,
Appendix). First, β-Actin CTs were plugged into the linear equation to acquire
copies/reaction; copies obtained were then multiplied by 40 to get the number of copies
in the 40µL volume that had resulted from reverse transcription (only 1µL of RNA
material was inputted into any given TaqMan
®
expression assay). However, because only
28µL of the 40µL had initially contained all of the cfRNA observed here, this copy
14
number actually therefore represented cfRNA copies present in 28µL. Lastly, to arrive at
number of cfRNA copies per 1mL of plasma (i.e. copies in 30µL worth of eluted
volume), the number of copies in 28µL was added to 2 times the quotient of that same
number and 28.
The most frequently tested assay in DNA that could be used for back-calculating
copies of cfDNA was the KRAS reference assay. A reference control of pure KRAS
wild-type (WT) gDNA (50ng/µL) was purchased from Horizon Discovery (Cambridge,
UK). RT-qPCR of KRAS reference was run on replicates of KRAS WT gDNA across 4
concentrations (0.625ng/µL, 1.25ng/µL, 2.5ng/µL, 5.0ng/µL), to create a standard curve
that correlated KRAS reference CTs with known concentrations of gDNA in
copies/reaction (TABLE 2, Appendix). The standard curve constructed was used to
calculate copies of KRAS WT cfDNA per 1mL of plasma (FIGURE 2, Appendix). First,
KRAS reference CTs were plugged into the linear equation to obtain copies/reaction, and
then multiplied by 15 to get the number of copies contained in 30µL (half) of the final
elution volume that resulted from isolation (2µL of material was inputted into all DNA-
based TaqMan
®
assays, which is where multiplying by 15 comes from).
Relative expression values for PD-L1 and ERCC1 were calculated according to
standard curves and multipliers that were collected comprehensively over time from PCR
plate runs until their averages and variance did not significantly differ with additional
data introduced. Levels of gene expression were “relative” to that measured in β-Actin,
the denominator gene representative of total nucleic acid expression. To make these
relative expression measurements (or ΔCTs), the β-Actin CTs for a given sample were
15
subtracted from the expressed gene’s CTs for that sample (in this case PD-L1 or
ERCC1).
In this study, relative PD-L1 expression was calculated using the standard curve:
PDL1 ΔCTs = 16.21 – 3.34*Log10(Relative Expression) .
Relative ERCC1 expression was achieved by calculating 2
-ΔCTs
and then multiplying each
by 320 (this multiplier was established over time as a way to normalize ERCC1 2
-ΔCTs
from UHRR cDNA samples across various dilutions).
Statistical Analysis
PD-L1 Detection Status: Logistic Regression
The majority of the intended tests for association with PD-L1 expression were in
the context of PD-L1’s detection status (detected, not detected), i.e. from a qualitative
viewpoint. For this type of analysis, the continuous variable of PD-L1 PCR CTs was
dichotomized (0 = not detected, 1 = detected). After univariate analysis of each variable,
bivariate associations were made between PD-L1 status and each independent variable
using JMP 8.0.2 statistical software (normality of continuous variables was also
assessed/achieved with this software). Any contingency tables constructed with cell
values of 10 or less were assessed using Fisher’s Exact tests rather than Pearson chi-
square tests. Variables with p-values of 0.25 or lower were considered for inclusion in a
subsequent analysis with PD-L1 detection status. Among all of the independent variables
found to satisfy this liberal association with PD-L1 status, any missing more than 50% of
their values across the observations in the dataset were excluded from the subsequent
analysis, unless alternative methods (detailed later in this section) could be made for
16
independent variables in which to fill missing values without changing the distribution of
the independent variable, nor its association with the PD-L1 detection status variable.
Independent variables entertaining the criteria above were included in a
preliminary multivariate logistic regression model (constructed using STATA 14.1).
Under this type of regression, a dichotomous categorical variable (often in the form of
outcome to an exposure, i.e., response versus no response, survival versus mortality, etc.)
is used as the dependent variable, and multiple independent variables (including those in
the form of nominal categorical, ordinal categorical, and/or continuous numeric) are fit
against the dichotomous outcome. PD-L1 detection status (made into a dichotomous
categorical variable earlier) was used as the dependent variable of the logistic regression
model. Because most of the independent variables in this dataset were missing values
across the observations in the dataset, copies of β-Actin cfRNA/1mL plasma was
intended to be the main effect of the model, since it had no missing values across
observations and was anticipated to be associated with PD-L1 detection outcome. Within
each categorical variable of the preliminary model, the level with the highest frequency
was made to be the reference group of the variable. Normality was assessed for all
continuous variables, and transformations were made if necessary.
Within the preliminary model, the independent variable with the most
insignificant (highest) p-value corresponding to its parameter estimate was removed and
then the model was refit. The single variables with the largest insignificant p-values in
subsequent models were removed, one at a time, until a fitted model contained only
variables with significant p-values corresponding to their parameter estimates. If only
one level of a categorical variable was statistically significantly associated with PD-L1
17
detection status, the variable was not removed from the model until its influence was
fully assessed from a biological standpoint. Confounders and effect modifiers were also
tested for. A covariate was considered to be a confounder if its presence in a model fit
caused the parameter estimate of the main effect to change by 10% or more compared
with that from the baseline model containing only the main effect. Only biologically
relevant effect modifiers were tested for (see Results).
As an automated feature of STATA’s logistic regression modeling platform,
collinearity was assessed between all pairs of independent variables in a given model, and
variables were removed accordingly. Once a final model was assembled, it was set aside
and another model was produced using forward selection in SAS. Results from the two
modeling approaches were compared.
Because of the abundance of missing values across different variables in this
analysis, an exploratory approach (imputation) was considered to test whether such
variables were influenced by their missing values with respect to the dependent variable
(PD-L1 detection status). To get a sense of the impact of these missing data, a rough
sensitivity analysis was performed by substituting the mean or median of a given variable
in for each missing value under that variable (means were used to impute values of
continuous variables, and medians were used to impute values of discrete variables). For
categorical variables, missing values were substituted with the word “missing”, thereby
creating a new level under the variable’s categories. The imputed versions of each
variable were fit by PD-L1 detection status (bivariate logistic regression) and checked for
a difference in outcome compared with the results initially obtained by ignoring missing
values.
18
Since only strong trends with PD-L1 detection status were of interest in this study,
further measures were taken during the sensitivity analysis for any variables that were
initially found to be statistically significantly associated with PD-L1 detection status
(bivariate associations with p < 0.05 by Fisher’s Exact or Pearson chi-square tests). For
such variables, means or medians were calculated from the observations detected positive
for PD-L1 and then imputed for all missing values of the variable that were detected for
PD-L1. Similarly, the mean/median of the values not detected for PD-L1 was used to
impute all missing values of the variable among those that were not detected for PD-L1.
After checking for a difference in statistical significance between the imputed version of
the variable and PD-L1 detection status (compared with results obtained from ignoring all
missing data), the mean/median imputed for observations that were PD-L1 detected were
swapped with the mean/median imputed for those not detected for PD-L1. Thus, the
mean/median calculated among observations positive for PD-L1 were instead imputed for
the missing values of observations negative for PD-L1, and vice versa. If swapping these
imputed values between the detected and non-detected groups of a variable bore no
influence upon the variable’s association with PD-L1 detection status, any potential bias
caused by the absence of the missing values was disregarded.
Relative PD-L1 Expression by Tumor Type: ANOVA
Tumor type was the only variable across which the relative levels of PD-L1
expression (quantitative) were of interest. An ANOVA model was employed to assess if
mean levels of PD-L1 gene expression changed significantly across any of the five tumor
types tested. If a statistically significant difference was detected between the mean
19
relative expressions of at least two cancer groups, multiple comparison testing was
employed using Tukey-Kramer (for unequal group sizes) to establish where the
difference occurred.
Quantity of cfDNA by Days in Blood Tube
PD-L1 expression is measured in cfRNA. Therefore, a trend between PD-L1
detection status and the number of days spent in a blood tube ultimately represents the
relationship between cfRNA yield (as measured in PD-L1) and days in blood tube. In
testing whether the number of days in blood tubes had an effect upon mean amounts of
cfDNA (measured in KRAS wild-type), the possibility of amplified KRAS copy numbers
due to mutational presence was addressed by adjusting for KRAS mutation status,
thereby employing a generalized linear model (GLM) analysis.
RESULTS
SECTION 1: Description of the Patient Sample Set
Among the 191 patient samples collected, 87 (46%) were colorectal, 54 (28%)
were NSCLC, 11 (6%) gastric, 27 (14%) prostate, and 12 (6%) breast (FIGURE 3).
Geographic locations of the patients were distributed across 4 North American states
(CA, FL, MI, NJ), Singapore (SG), Germany (DE), and British Columbia (BC) (TABLE
3). Among these 7 locations, patient samples were collected from a total of 23
institutions; colon patients were collected from 5 different institutions, NSCLC from 18,
breast from 2, gastric from 1, and prostate from 8. Some samples of different tumor
20
types came from same institutions, and more than half (108, 56.5%) of all the samples in
the study came from the same single institution (TABLE 4).
FIGURE 3. Patient Distribution by Tumor Type
TABLE 3. Patient Distribution by Tumor Type & Geographic Area
Breast CRC Gastric NSCLC Prostate
CA 0 83 11 5 16 115
DE 0 0 0 6 8 14
NJ 1 2 0 27 3 33
FL 11 2 0 11 0 24
SG 0 0 0 1 0 1
MI 0 0 0 3 0 3
BC 0 0 0 1 0 1
12 87 11 54 27
Tumor Type
State / Country
191
6%
46%
6%
28%
14%
Breast Colon Gastric Lung Prostate
Tumor Type Patients
Breast 12
Colon 87
Gastric 11
Lung 54
Prostate 27
Total 191
21
TABLE 4. Patient Distribution by Tumor Type and Institution
SECTION 2: Univariate Analysis & Bivariate Associations with PD-L1
Of the 191 observations in the dataset, 79 (41.36%) were positive (detected) for
PD-L1 expression. Although tables are provided for the individual bivariate model fits
between PD-L1 detection status and each independent variable (see Appendix), a
comprehensive table of these bivariate associations with summary statistics can be found
in TABLE 12 at the end of this section.
Patient age (missing 46 observations, 24.1%) was normally distributed across the
sample population by a Shapiro-Wilk test of normality (W = 0.9949, p = 0.8959), with a
mean and standard deviation of 62.5 and 12.72 years (95% CI: 37.9, 88.0) (FIGURE
Breast Colon Gastric Lung Prostate
A 0 81 11 2 14 108 (56.5)
B 0 0 0 6 2 8 (4.2)
C 11 2 0 11 0 24 (12.6)
D 0 0 0 0 1 1 (0.5)
E 0 2 0 0 0 2 (1)
F 0 0 0 1 1 2 (1)
G 0 0 0 0 5 5 (2.6)
H 0 0 0 1 0 1 (0.5)
I 0 0 0 5 0 5 (2.6)
J 0 0 0 1 0 1 (0.5)
K 1 0 0 2 0 3 (1.6)
L 0 1 0 4 0 5 (2.6)
M 0 0 0 3 0 3 (1.6)
N 0 0 0 2 0 2 (1)
O 0 0 0 2 1 3 (1.6)
P 0 0 0 2 2 4 (2.1)
Q 0 0 0 8 0 8 (4.2)
R 0 0 0 1 0 1 (0.5)
S 0 0 0 1 0 1 (0.5)
T 0 1 0 0 0 1 (0.5)
U 0 0 0 0 1 1 (0.5)
V 0 0 0 1 0 1 (0.5)
W 0 0 0 1 0 1 (0.5)
12 87 11 54 27 191
Tumor Type
Institution
Total (%)
22
4a,b). Age was not statistically significantly associated with PD-L1 detection status by
the logistic regression model created from their fitting (p = 0.8237, Wald test) (TABLE 5,
Appendix). Categorizing age by its quartiles did not change its relationship with PD-L1
detection status (p = 0.8615, Pearson chi-square). Substituting the mean patient age in
for all the missing values under the variable also did not change the lack of association
with PD-L1 detection status (p = 0.8237, Wald test). The lack of variability in PD-L1
outcome across age made sense; the age variable was normally distributed with no
influential outliers, displayed the same mean and standard deviation across the two levels
of PD-L1 status, and because the number of missing values for age was evenly dispersed
among the two levels of PD-L1 status. These above observations were all compatible
with the null hypothesis that age cannot predict PD-L1 detection status in cancer patients.
FIGURE 4a. Distribution: Patient Age
30 40 50 60 70 80 90 100
Normal(62.5034,12.7165)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
95
95
88
79
72
63
54
45.2
37.9
26
26
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
62.503448
12.716527
1.0560501
64.590811
60.416086
145
145
9063
161.71006
-0.128935
0.0413296
20.345321
Moments
23
FIGURE 4b. PD-L1 Detection Status by Patient Age
There was no association between gender (only missing 2 values) and PD-L1
status (p = 0.9812, Pearson chi-square); 85 (45%) of patients were female (TABLE 6,
Appendix). Again, a lack of association could be reflected by the symmetry of the
variable across the two levels of PD-L1 status; exactly 41% of males and 41% of females
in the study had detectable levels of PD-L1 expression, and the missing observations
(although miniscule) were evenly distributed across the two levels of PD-L1 status.
Using Fisher’s Exact tests, none of the geo-centric variables (institution, physician
ID, state/country collected from) were associated with PD-L1 detection status (p =
0.6729, p = 0.8636, and p = 0.7350 respectively) (TABLE 7, Appendix). The similarity
in results from the bivariate model fits by PD-L1 detection status across each of these
PD-L1 Expression Detected vs. age
PD-L1 Detection Status
0
1
20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Patient Age
PD-L1 Detection Status
Patient Age
24
three variables (not the result itself of the variables not being predictors PD-L1 status)
was expected, however, due to collinearity caused by their overlap with one another.
Due to this collinearity, only one of these geo variables (state/country collected from)
was displayed in the summary statistics (TABLE 12), as its association with PD-L1 status
also served to represent bivariate associations of both institution and physician ID with
that of PD-L1.
Within tumor type, 40.23% (35/87) of colon patients had detectable levels of PD-
L1 expression, as did 42.59% (23/54) of NSCLC patients, 45.45% (5/11) of gastric,
37.04% (10/27) of prostate, and 50% (6/12) of breast (FIGURE 5). Subsequently, PD-L1
detection outcome did not vary statistically across the 5 levels of tumor type (p = 0.9431,
Fisher’s Exact) (TABLE 8, Appendix).
FIGURE 5. Patient Samples by Tumor Type and PD-L1 Detection Status
0
10
20
30
40
50
60
70
6 6
35
52
5
6
23
31
10
17
Breast Colon Gastric Lung Prostate
Tumor Type
PD-L1 Detected
PD-L1 Not Detected
# of Patient Samples
Breast Colon Gastric Lung Prostate
Tumor Type
PD-L1 Detected
PD-L1 Not Detected
25
To assess PD-L1 detection status by collection date, the oldest collection date (10-
Jul-2014) was set to 1, and each subsequent date of collection was made equal to the
number of days between it and 10-Jul-2014. The distribution of the continuous variable
formed from this procedure proved to be bimodal (FIGURE 6a), which was clearly
visible in generating a scatterplot of the variable fit by PD-L1 detection status (FIGURE
6b). Because of this tendency, the variable was dichotomized using its natural point of
separation (the point separating the two peaks, approx. 1 year after the initial collection
date) as the cut-point. Neither the continuous nor categorical forms of the variable were
associated with PD-L1 status (p = 0.2924, Wald and p = 0.4867, Pearson chi-square)
(TABLE 9a,b, Appendix). However, using fractional polynomials in STATA produced a
transformation (Imean_2, but renamed days_initial_fp) (FIGURE 6c, Appendix), which
achieved linearity across PD-L1 status, thereby making the association with PD-L1
significant enough for inclusion within a preliminary logistic regression model (p =
0.2440, Wald) (TABLE 9c, Appendix).
FIGURE 6a. Distribution: Days Since Initial Sample Collection
0 100 200 300 400 500 600
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
568
568
561.8
551.2
517
467
148
37.4
1
1
1
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
382.20419
194.00945
14.038031
409.8946
354.51378
191
191
73001
37639.669
-0.990914
-0.743341
50.760683
Moments
26
FIGURE 6b. PD-L1 Detection Status by Days Since Sample Collection
There were three seasonal levels during which samples were collected - Fall made
up 111 observations, 42.34% of which were detected for PD-L1 expression; Summer
made up 40 observations, 32.5% of which were detected for PD-L1 expression; and
Winter made up 40 observations, 47.5% of which were detected for PD-L1 expression
(the absence of samples collected during the months of Spring from either 2014 or 2015
was an arbitrary occurrence). Season had no significant effect on the PD-L1 detection
outcome (p = 0.3752, Pearson chi-square) (TABLE 10, Appendix).
Days_intube (i.e., number of days spent in a blood tube) was missing 51
observations (26.7%) and consisted of only 6 different outcomes (0, 1, 2, 3, 4, or 6 days).
Because of the variable’s limited number of possible outcomes, days_intube was initially
0
1
0 50 100 150 200 250 300 350 400 450 500 550
PD-L1 Detection Status
Days Since Initial Sample Collection
27
treated as an ordinal variable (instead of continuous), and was statistically significantly
associated with PD-L1 detection status (p = 0.0169, Fisher’s Exact) (TABLE 11,
Appendix). A mosaic plot (FIGURE 7) illustrates the rapid change in frequency of PD-
L1 detection (up to 66.7% detection among samples that spent 3 days in blood tubes).
Among samples that spent 0 days in blood tubes before processing, those undetected for
PD-L1 expression were 5-fold lower than the number of those that were detected.
Among samples that spent 1 day in blood tubes before processing, those undetected for
PD-L1 expression were more than double the number of those that were detected (49 v.
22). On the contrary, at day 3, the number of samples detected for PD-L1 expression was
double that of those not detected (10 v. 5). Also to note, was the disparity in missing
observations across the status of PD-L – undetected samples were missing 32
observations while those with detectable levels of PD-L1 expression were missing 19.
FIGURE 7. Mosaic Plot: PD-L1 Detection Frequency Across Days in Blood Tube
0.00
0.25
0.50
0.75
1.00
31.0
69.0
52.2
47.8
66.7
33.3
63.2
36.8
0 1 2 3 4 6
Detected
Not Detected
# of Days in Blood Tube
Key
PD-L1 Not
Detected
PD-L1
Detected
Frequency (per day #)
28
Because of the strength of the trend in PD-L1 detection status by the number of
days spent in blood tubes by samples, a couple of sensitivity analyses were performed to
test it. First, the median days spent in a blood tube was calculated among observations
that were detected for PD-L1. This median number of days (1 day) was imputed for all
the missing values of days_intube that were detected for PD-L1. Similarly, the median
number of days calculated among the observations that were undetected for PD-L1 (2
days) was imputed for all the missing values of days_intube that were in the undetected
group. As anticipated, this strengthened the already statistically significant association
between days_intube and PD-L1 detection status (p < 0.0001, Fisher’s Exact).
To test for an opposite effect under this sensitivity analysis, the medians obtained
and imputed for both groups of missing values were swapped with one another, so that 1
day was substituted in for the missing values detected for PD-L1 and 2 days was
substituted in for the missing values that were not detected for PD-L1. Interestingly, this
more severe approach for determining the impact of the missing data only continued to
maintain the statistically significant relationship between days_intube and the outcome
variable (p = 0.0010, Fisher’s Exact).
Although a rough means for testing sensitivity, this analysis showed that input
values for median days (either that from the detected or undetected distributions of
days_intube) did not effect the association with PD-L1 detection status. For preliminary
modeling, the overall median number of days in a blood tube (1 day) was substituted for
the missing values of the variable.
29
TABLE 12. Summary Statistics: PD-L1 Detection Outcome & Demographics
Not$Detected Detected p)value
112#(58.6) 79#(41.3)
N#(%) 86#(59.3) 59#(40.7)
Mean 62.7 62.22
SD 12.9 12.55
Lower#95%#CL 30.7 40
Upper#95%#CL 88 93.5
Missing#Values 26 20
chi2%(Wald) 0.8237
M#(%) 61#(58.65) 43#(41.35)
F#(%) 50#(58.82) 35#(41.18)
Missing#Values 1 1
chi2%(Pearson) 0.9812
Breast#(%) 6#(50) 6#(50)
Colon#(%) 52#(59.77) 35#(40.23)
Gastric#(%) 6#(54.55) 5#(45.45)
NSCLC#(%) 31#(57.41) 23#(42.59)
Prostate#(%) 17#(62.96) 10#(37.04)
chi2%(Fisher's%Exact) 0.9431
<=#300#days 32#(62.75) 19#(37.25)
>#300#days 80#(57.14) 60#(42.86)
chi2%(Pearson) 0.4867
Geography$of$Collection
CA#(%) 66#(58.41) 47#(41.59)
DE#(%) 8#(57.14) 6#(42.86)
NJ#(%) 20#(60.61) 13#(39.39)
FL#(%) 13#(54.17) 11#(45.83)
SG#(%) 1#(100) 0#(0)
MI#(%) 3#(100) 0#(0)
BC#(%) 0#(0) 1#(100)
Missing#Values 1 1
chi2%(Fisher's%Exact) 0.7350
Season$During$Collection
Fall#(%) 64#(57.66) 47#(42.34)
Summer#(%) 27#(67.50) 13#(32.50)
Winter#(%) 21#(52.50) 19#(47.50)
chi2%(Pearson) 0.3752
0 5#(83.33) 1#(16.67)
1 49#(69.01) 22#(30.99)
2 11#(47.83) 12#(52.17)
3 5#(33.33) 10#(66.67)
4 7#(36.84) 12#(63.16)
6 3#(50.00) 3#(50.00)
Missing#Values# 32 19
chi2%(Fisher's%Exact) 0.0169
Days$in$Blood$Tube
Days$Since$Initial$Collection
PD)L1$Detection$Outcome
N$(%)
Patient$Age$$$$$
Gender$$$$$$$$
Tumor$Type$$$$$
30
SECTION 3: Bivariate Associations Between PD-L1 Status and DNA Mutations
The BRAF, EGFR, and KRAS mutational status variables were dichotomous (0=
not detected, 1 = detected). Within the BRAF mutation status variable, there were a total
of 53 observations (3 detected for the V600E mutation, 50 undetected), 138 missing
values (72.3%), and no statistically significant association with PD-L1 detection status (p
= 0.6286, Fisher’s Exact) (TABLE 13, Appendix). Imputing the median (0, undetected)
in for the missing values of V600E mutation status only further weakened this lack of
association (p = 1.000, Fisher’s Exact). Due to this lack of association with the outcome
variable, BRAF mutation status was not considered for preliminary modeling.
Within the EGFR mutation status variable, there were a total of 50 observations (6
detected for a mutation, 44 undetected), 141 missing values (73.8%), and what was
construed as a marginal association with PD-L1 detection outcome given the preliminary
phase of modeling (p = 0.1940, Fisher’s Exact) (TABLE 14, Appendix). It was noted
that the reliability of EGFR mutational status as an estimator of PD-L1 outcome might be
compromised due to the variable missing 74% of the dataset’s observations. However, of
the samples positive for either an L858R mutation or exon 19 deletion in exon 19 of
EGFR, 66.67% (4/6) were positive for PD-L1 gene expression, a proportion consistent
with that observed across numerous publications. Imputing the median (0, undetected) in
for the missing values of EGFR mutation status weakened the marginal association with
PD-L1 detection status (p = 0.2333, Fisher’s Exact), but not enough to push it outside of
the preliminary cutoff set forth, thereby affording EGFR mutational status reliability
enough to consider for inclusion in preliminary modeling. Both versions of the EGFR
31
variable (with values missing and with values imputed) were tried in multivariate logistic
models.
Within the KRAS mutation status variable, there were a total of 112 observations
(32 detected for mutations, and 80 undetected), 79 missing values (41.4%), and a
statistically significant association with PD-L1 detection outcome (p = 0.0466, Fisher’s
Exact) (TABLE 15). The reliability of KRAS mutation status as an estimator of PD-L1
outcome was compromised due to its missing 41% of the observations in the dataset.
Imputing the median (0, undetected) in for the missing values of KRAS mutation status
caused its association with PD-L1 detection status to become marginal (p = 0.1166,
Fisher’s Exact), but not enough to push it outside of the preliminary cutoff set forth.
Moreover, of the observations positive for a KRAS mutation, 71.88% (23/32) were
undetectable for PD-L1 expression, a proportion similar to that observed in other studies,
thereby affording reliability enough to consider KRAS mutation status for preliminary
modeling. Both versions of the KRAS variable (missing values and with imputed values)
were tried in preliminary modeling.
TABLE 16. Summary Statistics: PD-L1 Outcome & Mutational Gene Status
Not Detected (0) Detected (1)
Absent (%) 29 (54.72) 21 (39.62)
Present'(%) 2 (3.77) 1 (1.89)
Missing Values 81 57
Absent'(%) 27 (54) 17 (34)
Present (%) 2 (4) 4 (8)
Missing Values 83 58
Absent (%) 42 (37.5) 38 (33.93)
Present (%) 23 (20.54) 9 (8.04)
Missing Values 47 32
V600E
KRAS
PD-L1 Expression Outcome Variant(s)
Tested
Gene
G12A, G12C,
G12D, G12S,
G12V, G13D
Ex 19 del,
L858R
EGFR
BRAF
Prob>ChiSq
0.6286
0.1935
0.0466
Values
32
SECTION 4: PD-L1 Status and Copies of cfDNA and cfRNA in Wild-Type Genes
The number of copies of β-Actin cfRNA per 1mL of plasma (refer to Methods for
way of calculation) was not normally distributed (FIGURE 9a), so it was log-transformed
to the base 10. Log10(copies) was statistically significantly associated with PD-L1
detection status (p = 0.0192, Wald test) (TABLE 17a, Appendix). Upon plotting PD-L1
status by log10(copies), the two left-most points appeared as potential outliers (FIGURE
9c) ; these two points were removed and a fitted model was regenerated. Log10(copies)
now passed a Shapiro-Wilk test of normality (W = 0.9859, p = 0.0557), and was still
statistically significantly associated with the outcome variable (p = 0.0394, Wald test)
(TABLE 17b, Appendix). Mean (SD) log10(copies) was 2.65 (0.40) (95% CI: 1.73,
3.38). To achieve linearity in log10(copies) across PD-L1 detection status (for inclusion
of the continuous variables in a preliminary logistic regression model), fractional
polynomials were tested in STATA and a transformation was produced (Imean_2, but
renamed copies_cfRNA_fp) (FIGURE 9e, Appendix).
FIGURE 9a. Distribution: Copies β-Actin cfRNA / 1mL Plasma
0 1000 2000 3000 4000
Normal(642.8,589.577)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
4194.12
4194.12
2387.03
1350.77
854.146
450.203
242.763
122.328
48.5702
5.013
5.013
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
642.80021
589.57732
42.660317
726.94889
558.65154
191
191
122774.84
347601.41
2.191507
7.5373156
91.720149
Moments
33
FIGURE 9b. Distribution: Log10(copies β-Actin cfRNA / 1mL plasma)
FIGURE 9c. PD-L1 Detection Status by Log10(copies β-Actin cfRNA/1mL plasma)
Relative ERCC1 gene expression (missing 143 observations, 74.9%) was not
normally distributed (FIGURE 10a), so it was log-transformed to the base 10.
Log10(rel_ERCC1_expssn), was marginally statistically significantly associated with
0.5 1 1.5 2 2.5 3 3.5
Normal(2.63102,0.43415)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
3.623
3.623
3.3778
3.1306
2.932
2.653
2.385
2.0876
1.6864
0.7
0.7
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
2.6310209
0.4341542
0.0314143
2.6929865
2.5690554
191
191
502.525
0.1884899
-0.798952
1.6679394
16.501358
Moments
0
1
0.5 1 1.5 2 2.5 3 3.5 4
PD-L1 Detection Status
Log10(copies β-Actin cfRNA / mL plasma)
34
PD-L1 detection status (p = 0.1702, Wald test) (TABLE 18a, Appendix). Upon plotting
PD-L1 status by log10(rel_ERCC1_expssn), the two left-most points appeared as
potential outliers (FIGURE 10c); these two points were removed and a fitted model was
regenerated. Log10(rel_ERCC1_expssn) now passed a Shapiro-Wilk test of normality
(W = 0.9739, p = 0.3563), and was still marginally associated with the PD-L1 status (p =
0.2427, Wald test) given the preliminary cutoff (TABLE 18b, Appendix). The
marginality of this association was expected, however (if not a lack of association
altogether), due to the non-tumor specific nature of ERCC1. The Mean and standard
deviation of log10(rel_ERCC1_expssn) was 0.26 and 0.29 (95% CI: -0.32, 0.84).
Categorizing log10(rel_ERCC1_expssn) by its quartiles did not significantly change its
influence upon PD-L1 detection status (p = 0.2573, Pearson chi-square), although this
technically pushed it outside of the preliminary cutoff set forth. In calculating the mean
value of log10(rel_ERCC1_expssn) from the data with the outliers excluded (0.265) and
imputing this for all missing values within the variable also caused the association with
PD-L1 detection status to fall outside of the preliminary cutoff (p = 0.2507, Wald test).
Due to this variable’s missing 75% of the dataset’s observations, along with a departure
from entertainment of the p = 0.25 preliminary cutoff by multiple tests, the weak
association between ERCC1 and PD-L1 outcome was deemed as being not likely to make
a reliable estimator, and was therefore excluded from preliminary logistic regression
modeling.
35
FIGURE 10a. Distribution: Relative ERCC1 Expression
FIGURE 10b. Distribution: Log10(relative_ERCC1_expression)
0 1 2 3 4 5 6 7
Normal(2.12884,1.50322)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
7.081
7.081
6.9304
3.7568
2.869
1.887
0.949
0.5274
0.0081
0
0
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
2.1288431
1.503224
0.2104935
2.5516317
1.7060546
51
51
108.571
2.2596824
1.2735229
2.2655676
Moments
-1.5 -1 -0.5 0 0.5 1
Normal(0.21256,0.39874)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
0.85009
0.85009
0.84131
0.57642
0.46705
0.27818
0.02654
-0.2545
-1.2799
-1.5686
-1.5686
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
0.2125599
0.398738
0.0563901
0.32588
0.0992398
50
50
10.627994
0.158992
-1.913868
6.9882789
187.58856
Moments
36
FIGURE 10c. PD-L1 Detection Status by Log10(relative_ERCC1_expression)
Copies of KRAS wild-type cfDNA per 1mL of plasma (missing 81 observations,
42.4%) was not normally distributed (FIGURE 11a), so it was log-transformed (FIGURE
11b). Log10(KRAS copies) passed a Shapiro-Wilk test of normality (W = 0.9912, p =
0.7074), but was not statistically significantly associated with PD-L1 detection status (p =
0.6485, Wald test) (TABLE 19, Appendix). The mean and standard deviation of
log10(KRAS copies) was 0.76 and 0.77 (95% CI: -0.96, 2.32). Categorizing
log10(KRAS copies) by its quartiles did not significantly change its influence upon PD-
L1 detection status (p = 0.4405, Pearson chi-square). In calculating the mean value of
log10(KRAS copies) from the variable’s observations (0.756) and imputing this for all
missing values within the variable, no change was made upon the lack of association with
0
1
-1.5 -1 -0.5 0 0.5 1
PD-L1 Detection Status
Log10(Relative ERCC1 Gene Expression)
37
PD-L1 detection status (p = 0.6477, Wald test). Since this rough sensitivity analysis
showed no difference in the lack of association between log10(KRAS copies) and the
outcome variable, absence of the missing values of log10(KRAS copies) were dismissed
as a source of potential bias. Due to the fixed lack of association between log10(KRAS
copies) and PD-L1 status across all methods employed, the variable was no considered
for inclusion in a preliminary logistic regression model. Summary statistics on the
measurements from the β-Actin, ERCC1, and KRAS wild-type genes and their
associations with PD-L1 detection status are all displayed in TABLE 20.
FIGURE 11a. Distribution: Copies KRAS WT cfDNA / 1mL Plasma
FIGURE 11b. Distribution: Log10(copies KRAS WT cfDNA / 1mL plasma)
0 100 200 300
Normal(22.1126,49.3716)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
364.428
364.428
210.834
60.5348
19.2988
5.4685
1.77525
0.6225
0.11363
0.042
0.042
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
22.112636
49.371573
4.7073948
31.44254
12.782733
110
110
2432.39
2437.5522
4.6164733
25.370237
223.27312
Moments
-1 0 1 2
Normal(0.75632,0.77099)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
2.562
2.562
2.32267
1.7804
1.28525
0.738
0.24975
-0.2059
-0.9561
-1.376
-1.376
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
0.7563182
0.7709912
0.0735111
0.9020148
0.6106215
110
110
83.195
0.5944274
-0.249131
0.1028189
101.94006
Moments
38
FIGURE 11c. PD-L1 Status by Log10(copies KRAS WT cfDNA / 1mL plasma)
TABLE 20. PD-L1 Expression Outcome & Mutational Gene Status
0
1
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
PD-L1 Detection Status
Log10(Copies KRAS wild-type cfDNA)
Not Detected (0) Detected (1)
N (%) 30 (62.5) 18 (37.5)
Mean 0.3 0.2
SD 0.3 0.27
Lower 95 CL -0.28 -0.33
Upper 95 CL 0.85 0.58
Missing 82 61
chi2 0.2427
N (%) 110 (58.2) 79 (41.8)
Mean 2.6 2.72
SD 0.37 0.42
Lower 95 CL 1.73 1.72
Upper 95 CL 3.23 3.41
Missing 2 0
chi2 0.0394
N (%) 64 (58.18) 46 (41.82)
Mean 0.78 0.72
SD 0.78 0.76
Lower 95 CL -0.95 -1.29
Upper 95 CL 2.45 2.23
Missing 48 33
chi2 0.6485
ERCC1
β-Actin
KRAS
Reference
Log10(copies
KRAS wild-type
cfDNA / 1mL
plasma)
Log10(total
copies cfRNA /
1mL plasma)
Log10(relative
expression)
Gene
Measurement
Form
Summary
Statistics
PD-L1 Expression Outcome
Prob >
ChiSq
39
SECTION 5: Multivariate Logistic Regression
The first logistic model assembled was done so using variables without
imputation of missing values. The independent variables considered for inclusion in a
preliminary logistic regression model with PD-L1 status as the outcome included (1)
Log10(copies β-Actin cfRNA/1mL plasma) (copies_cfRNA_fp), (2) collection date
(days_initial_fp), (3) KRAS mutational status (KRAS_af_cat), (4) EGFR mutational
status (EGFR_af_cat), and (5) number of days in blood tube (days_intube). With all 5
independent variables included in the model, the global likelihood ratio test was
significant (LR Chi2 = 11.8, p = 0.0377), but only the parameter estimate of days_intube
had a statistically significant association with the outcome (β = 0.705, p = 0.027, 95%CI:
0.079, 1.332) (FIGURE 12); and this was when days_intube was treated as a continuous
variable (instead of ordinal). Days_initial_fp’s parameter estimate had the highest p-
value from its Wald test (p = 0.881), so it was removed from the model; EGFR
mutational status was removed from the subsequent model (p = 0.674), followed by
KRAS mutational status (p = 0.953). The only effects ultimately kept in the model were
copies_cfRNA_fp (β = 0.270, 95% CI: 0.005, 0.536; p = 0.0460, Wald) and days_intube
(β = 0.344, 95% CI: 0.089, 0.599; p = 0.008, Wald) (FIGURE 13).
When the main effect (copies_cfRNA_fp) was screened with other covariates for
their potential as confounders to its association with PD-L1 status, the same was not done
for days_intube because its large amount of missing observations (48.2%) would
significantly bias its being confounded or not confounded by other covariates, especially
ones missing many of the same observations, as well as ones comprised primarily of
observations not present within days_intube. It was verified, however, that days_intube
40
wasn’t a confounder of copies_cfRNA_fp (only 1.7% change was observed in the
parameter estimate of copies_cfRNA_fp across models). In total, 5 covariates proved to
change the parameter estimate of copies_cfRNA_fp by 10% or more (TABLE 21).
TABLE 21. Confounders of Copies_cfRNA_fp by PDL1 (logistic regression)
Of these confounders, BRAF_af_cat was excluded due to its lack of reliability as
an estimator of outcome (72.3% of observations missing), and neither KRAS_af_cat or
Log10(KRAS_WT_copies_1mL) could be included in the model due to a large amount
of non-overlap in missing observations in relation to days_intube. Specifically, the 92
observations missing from days_intube caused the model to be truncated by deleting
those observations from its fit; 62 completely separate observations were missing from
both KRAS_af_cat and Log10(KRAS_WT_copies_1mL), thus not overlapping with any
of the 92 already removed. Therefore, inclusion of the two confounders would cause the
model to become truncated by 62 more observations. The parameter estimate of
copies_cfRNA_fp would become more obscured by the loss of data, than it would be
Copies_cfRNA_fp-------
Parameter-Estimate-(95%CI)
%-Difference-
(from-baseline)
Confounder
@0.299!(#0.534,!#0.079)
Age @0.199-(#0.484,!0.080) 33.44 ✔
Gender @0.316!(#0.557,!#0.090) 5.69
Tumor!Type @0.323!(#0.573,!#0.087) 8.03
Collection!Season @0.285!(#0.522,!#0.062) 4.68
Physician_ID @0.290!(not!estimable) 3.01
State/Country @0.353!(#0.611,!#0.112) 18.06 ✔
Log10(KRAS!copies!/!1mL) @0.254!(#0.542,!0.011) 15.05 ✔
BRAF!Mutational!Status @0.218!(#0.763,!0.313) 27.09 ✔
EGFR!Mutational!Status @0.312!(#0.718,!0.043) 4.35
KRAS!Mutational!Status @0.191!(#0.476,!0.076) 36.12 ✔
Days_initial_fp @0.282!(#0.519,!#0.059) 5.69
Log10(rel!ERCC1!expression) @0.305!(#1.207,!0.440) 2.01
copies_cfRNA_fp,(baseline)
Covariate
41
more explained by the presence of the two confounders. For this reason, the only
confounders that were added to the model were age and state/country of collection
(FIGURE 14, Appendix).
When a fresh logistic regression model was automated in SAS using forward
selection, there was a difference in covariates included, although both models included
copies_cfRNA_fp, days_intube, and two additional variables (TABLE 22a,b, Appendix).
These additional model effects were expected to differ across the model types, since in
the manual model, inclusion of covariates was contingent upon their confounding of the
association between copies_cfRNA_fp and the outcome, whereas under the forward
selection method, covariate inclusion was based upon most significant parameter
estimates, and did not take confounding into account.
The last logistic model approach performed involved the imputed versions
of all variables (after verifying that each variable’s independent association with PD-L1
detection status did not significantly change after missing values were substituted with
mean/median values). Using the same independent variables from before, the global
likelihood ratio test was significant (LR Chi2 = 17.22, p = 0.0041). In this model, only
the parameter estimates for days_intube and copies_cfRNA_fp were statistically
significantly associated with the outcome (FIGURE 15, Appendix). Days_initial_fp’s
parameter estimate again had the highest p-value from its Wald test (p = 0.673), so it was
removed from the model. EGFR mutational status was removed from the subsequent
model (p = 0.423), followed by KRAS mutational status (p = 0.252). Once again, the
only effects ultimately kept in the model were copies_cfRNA_fp (β = 0.282, 95% CI:
42
0.053, 0.510; p = 0.016, Wald) and days_intube (β = 0.327, 95% CI: 0.091, 0.563; p =
0.007, Wald) (FIGURE 16, Appendix).
SECTION 6: Relative PD-L1 Expression by Tumor Type
As mentioned briefly in Methods, tumor type was the only variable across which
the relative levels of PD-L1 expression (quantitative) were of interest. Relative PD-L1
expression values were calculated as described in Methods. Whenever PD-L1 was not
detected by PCR for a given patient sample, it was excluded from the analysis. Relative
PD-L1 expression proved to not be normally distributed (FIGURE 17a), but a log
transformation to the base 10 along with the removal of a single outlier achieved
normality by a Shapiro-Wilk test (W = 0.9745, p = 0.1191) (FIGURE 17b).
FIGURE 17a. Distribution: Relative PD-L1 expression
0 50 100 150 200
Normal(7.05949,21.3972)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
207.51
207.51
57.2294
17.9202
6.364
0
0
0
0
0
0
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
7.0594921
21.397178
1.5482455
10.11345
4.0055343
191
191
1348.363
457.83925
6.4083116
49.395919
303.09799
Moments
43
FIGURE 17b. Distribution: Log10(Relative PD-L1 expression), outlier removed
Log10(rel_PD-L1_express) was unevenly distributed among the levels of tumor
type, with 6 observations under breast, 34 under colon, 5 under gastric, 23 under lung,
and 10 under prostate (TABLE 23a, Appendix). Under an ANOVA model created using
log10(rel_PD-L1_express) as the response variable and tumor type as the independent
variable, the mean level of log10(rel_PD-L1_express) did not vary significantly across
the different levels of tumor type (F = 1.9041, p = 0.1189) (TABLE 23b, Appendix). The
model is plotted in FIGURE 17c.
FIGURE 17b. ANOVA: Log10(rel_PD-L1_express) by Tumor Type
0 0.5 1 1.5 2
Normal(0.9218,0.42893)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
2.13223
2.13223
1.99063
1.52918
1.14674
0.85686
0.63959
0.38622
0.14448
0.03181
0.03181
Quantiles
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Wgt
Sum
Variance
Skewness
Kurtosis
CV
0.9217977
0.4289305
0.0485668
1.0185066
0.8250888
78
78
71.900223
0.1839814
0.5280412
0.3956788
46.531959
Moments
0
0.5
1
1.5
2
Breast Colon Gastric Lung Prostate
Breast Colon Gastric Lung Prostate
Tumor Type
Log10(Relative PD-L1 Expression)
44
SECTION 7: Quantities of cfDNA by Days in Blood Tube
In this study, an assay targeting KRAS wild-type was used as a measure of
cfDNA in patient samples. Since we already determined that a log transformation to the
base 10 achieved a normal distribution among copies of KRAS wild-type cfDNA (per
mL of plasma), the same transformation was applied and used as the dependent variable
in a model fitted with days_intube as the independent (ordinal) variable. As discussed,
the potential amplification of KRAS wild-type when in the presence of a KRAS mutation
needed to be taken into account, as this could likely be a confounder to the association
between log10(copies KRAS wild-type cfDNA) and days_intube. In order to adjust for
KRAS mutation status, a GLM was used to assess the association. Under this model,
there was a statistically significant association between amounts of cfDNA and number
of days in blood tubes, adjusting for KRAS mutation status (LR = 27.6981, p < 0.0001)
(TABLE 24a). The greatest difference in amounts of cfDNA yield between days spent in
tubes was observed between 2 and 3 days in tubes (p = 0.0129, LR chi-square) and
between 4 and 6 days in tubes (p = 0.0348, LR chi-square) (TABLE 24c).
TABLE 24a. GLM: Log10(cfDNA) by Days_intube and KRAS mutation status
Difference
Full
Reduced
Model
13.8490616
73.2667427
87.1158044
-LogLikelihood
27.6981
L-R
ChiSquare
5
DF
<.0001*
Prob>ChiSq
Whole Model Test
45
TABLE 24b. GLM: Log10(cfDNA) by Days_intube and KRAS mutation status
TABLE 24c. GLM: Log10(cfDNA) by Days_intube and KRAS mutation status
DISCUSSION
Because of the lack of difference in frequency of PD-L1 expression across the 5
cancer types sampled from, presence of participation bias was considered, since many
patients who consented to the testing were likely doing so with the knowledge that
detection of the immune checkpoint regulator would result in favorable prognostic
outcome (hence new therapy-response options on their behalf). The possibility of this is
suggested by the higher-than-expected frequency of PD-L1 expression across the 5
groups collectively (41.36% detection). Here we can address an issue mentioned in
Methods – not knowing the true squamous-adenocarcinoma ratio comprising the NSCLC
samples in this study. It has been observed that frequencies of PD-L1 expression tend to
be 2-fold higher in squamous versus adenocarcinoma. However, other studies have
Observations (or Sum Wgts) = 78
days_intube
KRAS_af_cat
Source
4
1
DF
11.864054
12.678715
L-R
ChiSquare
0.0184*
0.0004*
Prob>ChiSq
Effect Tests
Observations (or Sum Wgts) = 78
Intercept
days_intube[2-1]
days_intube[3-2]
days_intube[4-3]
days_intube[6-4]
KRAS_af_cat[0]
Term
1.0356309
0.044827
-0.670878
0.2646711
0.7648362
-0.363174
Estimate
0.1213675
0.1924178
0.2646013
0.2693039
0.3573044
0.097879
Std Error
51.427581
0.0542548
6.1772226
0.9599585
4.4525177
12.678715
L-R
ChiSquare
<.0001*
0.8158
0.0129*
0.3272
0.0348*
0.0004*
Prob>ChiSq
0.794796
-0.336996
-1.195939
-0.269721
0.0558209
-0.557399
Lower CL
1.2764659
0.4266504
-0.145818
0.799063
1.4738515
-0.168948
Upper CL
Parameter Estimates
46
maintained that PD-L1 frequency in NSCLC is irrespective of histology. One such study
reported a frequency range between 45-50% within both squamous and adenocarcinoma
histologic subtypes [38], in concordance with the 42.59% obtained in this study. Further,
the detection frequency picked up in our gastric samples (45.45%) was concordant with
that observed across a 2,993-patient meta-analysis, which reported a positive frequency
of 49.5% [39]. Concerning prostate cancer, the 37.04% frequency observed in our study
was less than that which was reported in numerous studies, the average being about a 50-
55% frequency in advanced prostate cancers, independent of therapeutic regimen [40].
Our breast samples did exhibit an overabundance of PD-L1 expression frequency - 50%
versus 23% observed across 650 breast cancer samples [41]. However, we attributed this
inconsistency to our relatively small sample size for breast cancer (n = 12). Finally, we
showed a 40.23% frequency of PD-L1 expression among colorectal patient samples – a
frequency at least 2-fold higher than that observed across numerous studies.
Therefore (with the exception of colorectal cancer) the frequencies of PD-L1
expression in each of the cancer types sampled from were synonymous with the expected
frequencies provided from reported results by outside sources.
When testing for a correlation (or lack thereof) between PD-L1 detection status
and presence/absence of specific gene mutations in DNA, our findings (although
deficient in data size) were concordant with the findings of others. There was not enough
data on BRAF mutational status (3 V600E mutants, 47 BRAF wild-types) to make any
solid claims regarding a relationship between BRAF and PD-L1. Within the EGFR, of
the samples positive for either an L858R mutation or an exon 19 deletion, 66.67% (4/6)
were positive for PD-L1 gene expression – a proportion within range of that previously
47
observed in patients harboring either of these same two mutants by numerous studies (40
– 80%) [31]. Of the observations positive for a KRAS mutation, 71.88% (23/32) were
undetectable for PD-L1 expression, this negative majority consistent with previous
studies.
More data is needed in order to model a stronger trend between PD-L1 detection
status and tumor type, as multiple studies and publications have reported histology-
specific frequencies of PD-L1 gene expression that are not fixed across cancer type. The
ratios of tumor type in subsequent data sampled from the cancer patient population
should be equal across all groups, in order to alleviate over- and under-representation of
PD-L1 frequency in any cancer type. Further, collection bias must be factored out by
randomizing geographic location, institution, and physician or collection source from
which patient samples are procured.
The amount of time (days) blood samples remained in collection tubes prior to
plasma fractionation and cfRNA extraction was associated with PD-L1 detection status.
Contrary to the initial hypothesis that RNA yield would decrease (via degradation) the
longer blood samples remained in tubes, a significant increase in RNA yield was
observed in patient samples. In revising the initial hypothesis, one possible explanation
is that the lysing of white blood cells (most frequent on the third day in a blood tube)
releases an influx of cytoplasmic RNA fragments into the blood sample. While more
testing is required in order to form a theory, this nascent hypothesis supports the
phenomenon of DNA yield not changing on the third day in a blood tube, since DNA is
only present in the nucleus of cells. It is suggested that at the time of cellular lysing,
48
while all free-floating cytoplasmic materials are released, various organelles may well
remain intact, thereby keeping their contents encapsulated (such as the nucleus).
Another revised hypothesis resides under a more simple explanation. Within the
nuclear envelope of any cell, there is substantially more DNA than there is RNA,
meaning that large amounts of cell-free DNA are already in the bloodstream of cancer
patients. Under this reasoning, the lysing of any white blood cells in a 10-mL vial of
blood cannot release nearly enough genomic DNA to compare with the amounts of
cfDNA already present in the blood. This is because the cfDNA already released come
from the cells of advanced, metastatic tumors from all tumor sites in a given patient, thus
accruing an abundance of cfDNA that will be orders of magnitude greater than that
released from the potential lysing of any of the limited number of white blood cells
captured in a 10-mL blood draw.
CONCLUSION
While expected, detectable levels of PD-L1 expression in cell-free RNA is
associated with the amount of RNA yielded from human plasma (measured in nanograms
per milliliter of plasma). The frequency of PD-L1 expression among cancer patients
harboring mutations within specific genes, including BRAF, KRAS, and EGFR were
concordant with that found across numerous studies. PD-L1 frequency was minimal in
BRAF- and KRAS-mutated cancer samples, and notably frequent in EGFR-mutated
(L858R and exon 19 deletion) cancer samples. More data is needed to concretely
establish a difference in the frequency of PD-L1 expression across different tumor types.
49
The topic of cfDNA and cfRNA in blood draws, and an explanation for the
observed difference in their amounts of yield over the course of time spent in blood (pre-
fractionation), is a hot topic for debate. Ultimately, more research is needed to grasp a
valid understanding of the technology behind the liquid biopsy, as it’s still at its infancy.
50
REFERENCES
1. Brandén E, Wallgren S, Hogberg H, et al. Computer tomography-guided core
biopsies in a county hospital in Sweden: complication rate and diagnostic yield.
Ann Thorac med, vol. 9, no. 3, pp. 149-53, Jul 2014
2. Poulou LS, Tsagouli P, Ziakas PD, et al. Computer tomography-guided needle
aspiration and biopsy of pulmonary lesions: a single-center experience in 1000
patients. Acta Radiol, vol. 54, no. 6, pp. 640-5, Jul 2013
3. Fleck JL, Pavel AB, Cassandras CG. Integrating mutation and gene expression
cross-sectional data to infer cancer progression. BMC systems biology. 2016 Jan
25;10(1):1.
4. Khan MS, Kirkwood AA, Tsigani T, et al. Early changes in circulating tumor
cells are associated with response and survival following treatment of metastatic
neuroendocrine neoplasms. Clinical Cancer Research. 2016 Jan 1;22(1):79-85.
5. Lokhandwala T, Dann R, Johnson M, D'Souza AO. Costs of the Diagnostic
Workup for Lung Cancer: A Medicare Claims Analysis. International Journal of
Radiation Oncology• Biology• Physics. 2014 Nov 15;90(5):S9-10.
6. Krause FS, Feil G, Bichler KH, et al. Heterogeneity in prostate cancer: prostate
specific antigen (PSA) and DNA cytophotometry. Anticancer Res. 2005; 25:
1783-5.
7. Morris L, Riaz N, Desrichard A, et al. Pan-cancer analysis of intratumor
heterogeneity as a prognostic determinant of survival. Oncotarget [Online], 5
(2014): n. pag. Web. 18 Feb. 2016
8. Mattos-Arruda L, J. Cortes, L. Santarpia, et al. Circulating tumour cells and cell-
free DNA as tools for managing breast cancer. Nature Reviews Clinical Oncology
10, 377-389 (July 2013)
9. Williams SC. Circulating tumor cells. Proceedings of the National Academy of
Sciences. 2013 Mar 26;110(13):4861-.
10. Gabriel MT, Calleja LR, Chalopin A, et al. Circulating tumor cells: a review of
non–EpCAM-based approaches for cell enrichment and isolation. Clinical
chemistry. 2016 Feb 19:clinchem-2015.
11. Dawson SJ, Tsui DW, Murtazza M, et al. Analysis of circulating tumor DNA to
monitor metastatic breast cancer. N Engl J Med 2013, 368:1199-1209
12. Fernando MR, Chen K, Norton S, et al. A new methodology to preserve the
original proportion and integrity of cell-free fetal DNA in maternal plasma during
51
sample processing and storage. Prenatal diagnosis. 2010 May 1;30(5):418-24.
13. Norton SE, Lechner JM, Williams T, Fernando MR. A stabilizing reagent
prevents cell-free DNA contamination by cellular DNA in plasma during blood
sample storage and shipping as determined by digital PCR. Clinical biochemistry.
2013 Oct 31;46(15):1561-5.
14. Lo YD, Tein MS, Lau TK, et al. Quantitative analysis of fetal DNA in maternal
plasma and serum: implications for noninvasive prenatal diagnosis. The American
Journal of Human Genetics. 1998 Apr 30;62(4):768-75.
15. Fernando MR, Norton SE, Luna KK, Lechner JM, Qin J. Stabilization of cell-free
RNA in blood samples using a new collection device. Clinical biochemistry. 2012
Nov 30;45(16):1497-502.
16. Holford NC, Sandhu HS, Thakkar H, Butt AN, Swaminathan R. Stability of β-
Actin mRNA in Plasma. Annals of the New York Academy of Sciences. 2008
Aug 1;1137(1):108-11.
17. Brahmer J, Reckamp KL, Baas P, et al. Nivolumab versus docetaxel in advanced
squamous-cell non-small-cell lung cancer. N Engl J Med. 2015;373:123-135.
18. Paz-Ares L, Horn L, Borghaei H, et al. Phase III, randomized trial (CheckMate
057) of nivolumab (NIVO) versus docetaxel (DOC) in advanced non-squamous
cell (non-SQ) non-small cell lung cancer (NSCLC). J Clin Oncol. 2015;33.
Abstract LBA109.
19. Kiyasu J, Miyoshi H, Hirata A, et al. Expression of programmed cell death ligand
1 is associated with poor overall survival in patients with diffuse large B-cell
lymphoma. Blood. 2015;126(19):2193-2201. doi:10.1182/blood-2015-02-629600.
20. Sabatier R, Finetti P, Mamessier E, et al. Prognostic and predictive value of PDL1
expression in breast cancer. Oncotarget. 2015;6(7):5449-5464.
21. Butte MJ, Pena-Cruz V, Kim MJ, Freeman GJ, Sharpe AH. Interaction of human
PD-L1 and B7-1. Molecular immunology. 2008 Aug 31;45(13):3567-72.
22. Kim JW, Eder JP. Prospects for targeting PD-1 and PD-L1 in various tumor types.
Oncology (Williston Park, NY). 2014 Nov;28:15-28.
23. Mullard A. Pfizer expands hunt for immuno-oncology biomarkers. Nature
Reviews Drug Discovery. 2016 Feb 1;15(2):77-.
24. Nagaraja AK, Bass AJ. Hitting the Target in BRAF-Mutant Colorectal Cancer.
Journal of Clinical Oncology. 2015 Oct 12:JCO-2015.
52
25. Bollag G, Tsai J, Zhang J, Zhang C, Ibrahim P, Nolop K, Hirth P. Vemurafenib:
the first drug approved for BRAF-mutant cancer. Nature reviews Drug discovery.
2012 Nov 1;11(11):873-86.
26. Corcoran RB, Atreya CE, Falchook GS, et al. Combined BRAF and MEK
inhibition with dabrafenib and trametinib in BRAF V600–mutant colorectal
cancer. Journal of Clinical Oncology. 2015 Sep 21:JCO-2015.
27. Rodić N, Anders RA, Eshleman JR, et al. PD-L1 expression in melanocytic
lesions does not correlate with the BRAF V600E mutation. Cancer immunology
research. 2015 Feb 1;3(2):110-5.
28. Mao C, Liao RY, Qiu LX, et al. BRAF V600E mutation and resistance to anti-
EGFR monoclonal antibodies in patients with metastatic colorectal cancer: a
meta-analysis. Molecular biology reports. 2011 Apr 1;38(4):2219-23.
29. Herbst RS. Review of epidermal growth factor receptor biology. International
Journal of Radiation Oncology* Biology* Physics. 2004 Jun 30;59(2):S21-6.
30. Azuma K, Ota K, Kawahara A, Hattori S, et al. Association of PD-L1
overexpression with activating EGFR mutations in surgically resected non–small
cell lung cancer. Annals of Oncology. 2014 Jul 9:mdu242.
31. Suzuki M, Shigematsu H, Hiroshima K, et al. Epidermal growth factor receptor
expression status in lung cancer correlates with its mutation. Human pathology.
2005 Oct 31;36(10):1127-34.
32. Kranenburg O. The KRAS oncogene: past, present, and future. Biochimica et
Biophysica Acta (BBA)-Reviews on Cancer. 2005 Nov 25;1756(2):81-2.
33. Valtorta E, Misale S, Sartore-Bianchi A, et al. KRAS gene amplification in
colorectal cancer and impact on response to EGFR-targeted therapy. International
Journal of Cancer. 2013 Sep 1;133(5):1259-65.
34. Cooper WA, Tran T, Vilain RE, et al. PD-L1 expression is a favorable prognostic
factor in early stage non-small cell carcinoma. Lung Cancer. 2015 Aug
31;89(2):181-8.
35. D'incecco A, Andreozzi M, Ludovini V, et al. PD-1 and PD-L1 expression in
molecularly selected non-small-cell lung cancer patients. British journal of
cancer. 2015 Jan 6;112(1):95-102.
36. Tsui NB, Ng EK, Lo YD. Stability of endogenous and added RNA in blood
specimens, serum, and plasma. Clinical chemistry. 2002 Oct 1;48(10):1647-53.
53
37. Wu TL, Zhang D, Chia JH, et al. Cell-free DNA: measurement in various
carcinomas and establishment of normal reference range. Clinica Chimica Acta.
2002 Jul 31;321(1):77-87.
38. Grosso J, Horak CE, Inzunza D, et al. (2013) Association of tumor PD-L1
expression and immune biomarkers with clinical activity in patients (pts) with
advanced solid tumors treated with nivolumab (anit-PD-1; BMS-936558; ONO-
4538). J Clin Oncol 31(suppl): abstract 3016.
39. Huang B, Chen L, Bao C, Sun C, Li J, Wang L, Zhang X. The expression status
and prognostic significance of programmed cell death 1 ligand 1 in
gastrointestinal tract cancer: a systematic review and meta-analysis. OncoTargets
and therapy. 2015;8:2617.
40. Gevensleben H, Dietrich D, Golletz C, et al. The immune checkpoint regulator
PD-L1 is highly expressed in aggressive primary prostate cancer. Clinical Cancer
Research. 2015 Nov 16:clincanres-2042.
41. Muenst S, Schaerli AR, Gao F, et al. Expression of programmed death ligand 1
(PD-L1) is associated with poor prognosis in human breast cancer. Breast cancer
research and treatment. 2014 Jul 1;146(1):15-24.
54
APPENDIX
TABLES
TABLE 1. β-Actin CTs of UHRR cDNA Replicates Across 4 Dilutions
TABLE 2. KRAS Reference CTs of KRAS WT gDNA Across 4 Dilutions
TABLE 5. Univariate Logistic Regression: PD-L1 Status by Age
Rep
UHR cDNA
Dilution
ng/rxn copies/rxn Log10(copies/rxn) β-Actin CTs
1 1:5000 0.14 42.42 1.63 21.04
2 1:5000 0.14 42.42 1.63 20.67
1 1:10000 0.07 21.21 1.33 22.16
2 1:10000 0.07 21.21 1.33 21.95
1 1:20000 0.035 10.61 1.03 23.52
2 1:20000 0.035 10.61 1.03 23.55
1 1:40000 0.018 5.30 0.72 24.66
2 1:40000 0.018 5.30 0.72 24.70
Rep
KRAS WT gDNA
Concentration
ng/rxn copies/rxn Log10(copies/rxn)
KRAS
Reference CTs
1 0.625 ng/μL 1.25 2.5 0.40 22.59
2 0.625 ng/μL 1.25 2.5 0.40 22.58
1 1.25 ng/μL 2.5 5 0.70 21.61
2 1.25 ng/μL 2.5 5 0.70 21.39
1 2.5 ng/μL 5 10 1.00 20.58
2 2.5 ng/μL 5 10 1.00 20.58
1 5 ng/μL 10 20 1.30 19.53
2 5 ng/μL 10 20 1.30 19.32
Intercept
age
Term
0.1911862
0.00297188
Estimate
0.8496416
0.0133405
Std Error
0.05
0.05
ChiSquare
0.8220
0.8237
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
55
TABLE 6. Contingency Table: PD-L1 Detection Status by Gender
TABLE 7. Contingency: PD-L1 Status by Geographic Area of Sample Collection
Count
Row %
50 35 85
58.82 41.18
61 43 104
58.65 41.35
111 78
Test Chi(Square Prob>ChiSq
Pearson3
chi(square
0.001 0.9812
Gender
Female
Male
189
PD-L1 Expression Detection
0 1
Count
Row %
0 1 1
0 100
66 47 113
58.41 41.59
8 6 14
57.14 42.86
13 11 24
54.17 45.83
3 0 3
100 0
20 13 33
60.61 39.39
1 0 6
100 0
111 78
Test Table((P) Prob>ChiSq
Fisher's(
Exact(Test
0.0003 0.7350
SG
189
FL
PD-L1 Expression Detection
0 1
Geographic Location of Collection
BC
CA
DE
MI
NJ
56
TABLE 8. Contingency Table: PD-L1 Detection Status by Tumor Type
TABLE 9a. Logistic Regression: PD-L1 by Days Since Collection (continuous)
TABLE 9b. PD-L1 Detection Status by Days Since Collection (dichotomous)
Count
Row %
6 6 12
50 50
52 35 87
59.77 40.23
6 5 11
54.55 45.45
31 23 54
57.41 42.59
17 10 27
62.96 37.04
112 79
Test Table((P) Prob>ChiSq
Fisher's(
Exact(Test
0.0009 0.9431
PD-L1 Expression Detection
NSCLC
Prostate
191
Tumor Type
0 1
Breast
Colon
Gastric
Intercept
Days Since Initial Collection Date
Term
0.66077929
-0.0008093
Estimate
0.3351497
0.0007749
Std Error
3.89
1.09
ChiSquare
0.0487*
0.2963
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
Count
Row %
32 19 51
62.75 37.25
80 60 140
57.14 42.86
112 79
Test Chi(Square Prob>ChiSq
Pearson3
chi(square
0.484 0.4867
191
Days Since
Initial
Collection
PD-L1 Expression Detection
0 1
<= 300
days
> 300
days
57
TABLE 9c. Logistic Regression: PD-L1 by Days Since Collection (frac. poly)
TABLE 10. PD-L1 Detection Status by Season During Sample Collection
TABLE 11. PD-L1 Detection Status by Number of Days in Blood Tube
Intercept
days_initial_fp
Term
0.31603425
-2.7887e-5
Estimate
0.1489149
2.3938e-5
Std Error
4.50
1.36
ChiSquare
0.0338*
0.2440
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
Count
Row %
64 47 111
57.66 42.34
27 13 40
67.50 32.50
21 19 40
52.50 47.50
112 79
Test Chi(Square Prob>ChiSq
2Pearson2
chi(square2
1.961 0.3752
Season
Fall
Winter
191
Summer
PD-L1 Expression Detection
0 1
Count
Row %
5 1 6
83.33 16.67
49 22 71
69.01 30.99
11 12 23
47.83 52.17
5 10 15
33.33 66.67
7 12 19
36.84 63.16
3 3 6
50 50
80 60
Test Table((P) Prob>ChiSq
Fisher's(
Exact(Test
< 0.0001 0.0169
140
4
PD-L1 Expression Detection
0 1
Days in Blood Tube
0
1
2
3
6
58
TABLE 13. PD-L1 Detection Status by BRAF V600E Mutation Status
TABLE 14. PD-L1 Detection Status by EGFR Mutation Status
TABLE 15. PD-L1 Detection Status by KRAS Mutation Status
Count
Row %
29 21 50
58 42
2 1 3
66.67 33.33
31 22
Test Table((P) Prob>ChiSq
Fisher's(
Exact(Test
0.4367 0.6286
53
BRAF
PD-L1 Expression Detection
0 1
0
1
Count
Row %
27 17 44
61.36 38.64
2 4 6
33.33 66.67
29 21
Test Table((P) Prob>ChiSq
Fisher's(
Exact(Test
0.1529 0.1935
PD-L1 Expression Detection
0 1
EGFR
0
1
50
Count
Row %
42 38 80
52.5 47.5
23 9 32
71.88 28.13
65 47
Test Table((P) Prob>ChiSq
Fisher's(
Exact(Test
0.0295 0.0466
PD-L1 Expression Detection
0 1
KRAS
0
1
112
59
TABLE 17a. Logistic Regression: PD-L by Log10(copies β-Actin RNA/mL plasma)
TABLE 17b. Logistic Regression: PD-L1 by Log10(copies β-A RNA/mL plasma)
(outliers removed)
TABLE 18a. Logistic Regression: PD-L Status by Log10(rel_ERCC1_expression)
TABLE 18b. Logistic Regression: PD-L Status by Log10(rel_ERCC1_expression)
(outliers removed)
TABLE 19. Logistic Regression: PD-L Status by Log10(copies KRAS WT cfDNA)
Intercept
Log10(copies cfRNA / 1mL plasma)
Term
2.63864378
-0.8648462
Estimate
0.9954509
0.3693353
Std Error
7.03
5.48
ChiSquare
0.0080*
0.0192*
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
Intercept
Log10(copies cfRNA / 1mL plasma)
Term
2.44211073
-0.7937072
Estimate
1.0400438
0.3853028
Std Error
5.51
4.24
ChiSquare
0.0189*
0.0394*
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
Intercept
Log10(rel_ERCC1_expssn)
Term
0.26224163
1.12109518
Estimate
0.3405094
0.8173017
Std Error
0.59
1.88
ChiSquare
0.4412
0.1702
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
Intercept
Log10(rel_ERCC1_expssn)
Term
0.19591465
1.24491041
Estimate
0.3976423
1.0656463
Std Error
0.24
1.36
ChiSquare
0.6222
0.2427
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
Intercept
Log10(KRAS WT copies/1mL)
Term
0.24393533
0.11493452
Estimate
0.2697771
0.2521422
Std Error
0.82
0.21
ChiSquare
0.3659
0.6485
Prob>ChiSq
For log odds of 0/1
Parameter Estimates
60
TABLE 22a. Multivariate Logistic Regression Model by Forward Selection
TABLE 22b. Parameter Estimates of Logistic Regression by Forward Selection
TABLE 23a. Summary Statistics: Log10(relative PD-L1 expression) by Tumor Type
TABLE 23b. ANOVA: Log10(rel_PD-L1_express) by Tumor Type
Sum of Mean
Squares Square
Model 5 2.67315 0.53463 2.32 0.0501*
Error 88 20.30557 0.23075
Corrected Ttl 93 22.97872
Analysis of Variance
Source DF F Value Pr > F
Standard
Error
Intercept 1 0.246327 0.329563 0.75 0.4568
age 1 -0.002411 0.004191 -0.58 0.5666
days_intube 1 0.088601 0.033084 2.68 0.0088*
copies_cfRNA_fp 1 0.038899 0.040449 0.96 0.3388
days_initial_fp 1 -4.421336 5.735078 -0.77 0.4428
gender: female 1 0.042279 0.101052 0.42 0.6767
Forward Selection: Parameter Estimates
Parameter DF Estimate t Value Pr > |t|
Level Number Mean Std Dev
Std Err
Mean
Lower 95% Upper 95%
Breast 6 1.091 0.645 0.263 0.414 1.768
Colon 34 0.998 0.394 0.068 0.860 1.135
Gastric 5 1.114 0.729 0.326 0.209 2.020
Lung 23 0.738 0.363 0.076 0.581 0.895
Prostate 10 0.888 0.255 0.081 0.706 1.071
Source DF
Sum of
Squares
Mean
Square
F Ratio Prob > F
Tumor Type 4 1.338401 0.3346 1.9041 0.1189
Error 73 12.82817 0.175728
C. Total 77 14.16657
61
FIGURES
FIGURE 1. Standard Curve for Calculating cfRNA Copies per 1mL Plasma
FIGURE 2. Standard Curve for KRAS WT cfDNA Copies per 1mL Plasma
Beta-Actin CTs = 27.814 - 4.274*Log10(copies/rxn)
R-Square: 0.992
Linear Fit
62
FIGURE 6c. Fractional Polynomial Transformation of Days Since Initial Collection
FIGURE 9e. Fractional Polynomial of Log10(Copies cfRNA / 1mL plasma)
63
FIGURE 12. Preliminary Logistic Regression Model: PD-L1 Detection Status
FIGURE 13. Logistic Regression: PD-L1 Status by Copies cfRNA & Days_intube
64
FIGURE 14. Logistic Regression: PD-L1 Detection Adjusted for Confounders
65
FIGURE 15. Preliminary Logistic Regression Model: Imputed Independent Variables
FIGURE 16. Logistic Regression: PD-L1 by cfRNA & Days_intube_impute
Abstract (if available)
Abstract
Introduction: Measuring allele frequencies of somatic gene mutations in DNA and levels of relative gene expressions in RNA can now be measured using cell-free nucleic acids isolated from liquid biopsies (plasma fractionated from blood samples). Liquid biopsies may replace the traditional pioneer method for monitoring genetic mutations and levels of gene expressions from the tissue biopsies of cancer patients. ❧ Methods: Blood samples were collected from cancer patients from 5 different tumor types (colorectal, lung, prostate, gastric, and breast). Plasma was fractionated from blood samples and nucleic acids were extracted. RNA was reverse-transcribed into cDNA using random primers, and then analyzed by quantitative RT-PCR using appropriate gene-specific primers. The cDNA of PD-L1 was quantitated. Mutations in subsequent genes (cfDNA) were measured for same patients and then compared with that of their individual PD-L1 gene expression detection frequencies. β-actin expression was used as the denominator gene representing total RNA. ❧ Results: The presence of relative PD-L1 gene expression in cfRNA was not statistically associated with the somatic cfDNA mutation measured in the BRAF gene (V600E) (p = 0.6286, Fisher’s Exact). The presence of relative PD-L1 gene expression in cfRNA was (preliminarily) statistically associated with the somatic cfDNA mutations measured in the EGFR gene (L858R and exon 19 deletions) (p = 0.1940, Fisher’s Exact). The presence of relative PD-L1 gene expression in cfRNA was (preliminarily) statistically associated with the somatic cfDNA mutations measured in the KRAS gene (G12A, G12C, G12D, G12S, G12V, G13D) (p = 0.0466, Fisher’s Exact). ❧ Conclusion: The frequency of PD-L1 expression among cancer patients harboring mutations within specific genes, including BRAF, KRAS, and EGFR were concordant with that found across numerous studies using nucleic acids isolated from tumor tissue biopsies. Ultimately, more research is needed to grasp a valid understanding of the technology behind the liquid biopsy, as it’s still at its infancy.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Evaluation of preservatives in blood collection tubes for cell-free RNA transcriptional profiles in human plasma
PDF
DNA methylation and gene expression profiles in Vidaza treated cultured cancer cells
PDF
Gene expression and angiogenesis pathway across DNA methylation subtypes in colon adenocarcinoma
PDF
Potential of aqueous humor as a liquid biopsy for uveal melanoma
PDF
Identifying prognostic gene mutations in colorectal cancer with random forest survival analysis
PDF
PD-L1-GM-CSF fusion protein-loaded DC vaccination activates PDL1-specific humoral and cellular immune responses
PDF
Analysis of SNP differential expression and allele-specific expression in gestational trophoblastic disease using RNA-seq data
PDF
A functional genomic approach based on shRNA-mediated gene silencing to delineate the role of NF-κB and cell death proteins in the survival and proliferation of KSHV associated primary effusion l...
PDF
LINC00261 induces a G2/M cell cycle arrest and activation of the DNA damage response in lung adenocarcinoma
PDF
Placenta growth factor-miRNAs-lncRNAs axis in the regulation of ET-1 gene involved in pulmonary hypertension in sickle cell disease
PDF
Using engineered exosomes and gene-editing to target latent HIV
PDF
Development of immunotherapy for small cell lung cancer using iso-aspartylated antigen
PDF
Capture and analysis of circulating tumor cells in patients with hepatocellular carcinoma: analysis of a pilot study
PDF
Nonlinear modeling of the relationship between smoking and DNA methylation in the multi-ethnic cohort
PDF
Identification of molecular mechanism for cell-fate decision in liver; &, SARS-CoV replicon inhibitor high throughput drug screening
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
PDF
Screening and association testing of coding variation in steroid hormone coactivator and corepressor genes in relationship with breast cancer risk in multiple populations
PDF
Association of single nucleotide polymorphisms in GCK, GCKR and PNPLA3 with type 2 diabetes related quantitative traits in Mexican-American population
PDF
Genetic manipulation of receptor interacting protein (RIP140) uncovers its critical role in the regulation of metabolism, gene expression and insulin signaling in skeletal muscle cells
PDF
A novel therapeutic approach in asthma: depleting CD52-expressing leukocytes suppresses airway hyperreactivity and ameliorates lung inflammation
Asset Metadata
Creator
Usher, Joshua
(author)
Core Title
Use of cell-free nucleic acids in associating PD-L1 gene expression with presence of driver mutations in DNA and demographics across different cancers
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Publication Date
08/09/2016
Defense Date
08/09/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
BRAF,cell-free,DNA,EGFR,KRAS,OAI-PMH Harvest,PD-L1,RNA
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goshen, Susan (
committee chair
)
Creator Email
josh@liquidgenomics.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-640124
Unique identifier
UC11305876
Identifier
etd-UsherJoshu-4769.pdf (filename),usctheses-c3-640124 (legacy record id)
Legacy Identifier
etd-UsherJoshu-4769-0.pdf
Dmrecord
640124
Document Type
Thesis
Format
application/pdf (imt)
Rights
Usher, Joshua
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
BRAF
cell-free
DNA
EGFR
KRAS
PD-L1
RNA