Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Model development of breast cancer detection and staging via rare event enumeration from a liquid biopsy: a retrospective descriptive clinical research study
(USC Thesis Other)
Model development of breast cancer detection and staging via rare event enumeration from a liquid biopsy: a retrospective descriptive clinical research study
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODEL DEVELOPMENT OF BREAST CANCER DETECTION AND STAGING VIA RARE
EVENT ENUMERATION FROM A LIQUID BIOPSY
A RETROSPECTIVE DESCRIPTIVE CLINICAL RESEARCH STUDY
by
Jeremy Mason
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(CLINICAL, BIOMEDICAL AND TRANSLATIONAL INVESTIGATIONS)
August 2023
Copyright 2023 Jeremy Mason
ii
Table of Contents
List of Figures ................................................................................................................................ iii
Abstract .......................................................................................................................................... iv
Chapter One: Introduction .............................................................................................................. 1
Chapter Two: Methods ................................................................................................................... 3
Prediction Problem ...................................................................................................................... 3
Study Design, Patient Information .............................................................................................. 3
LBx Acquisition, Processing, and Cryobanking ......................................................................... 4
Slide Staining, Scanning, and Pre-Processing ............................................................................. 5
Rare Event Detection, Identification, and Enumeration ............................................................. 6
Model Training and Testing ........................................................................................................ 6
Chapter Three: Results .................................................................................................................... 8
Data ............................................................................................................................................. 8
Cancer vs. Not Cancer Model ..................................................................................................... 8
Early-stage vs. Late-stage Model .............................................................................................. 12
Chapter Four: Discussion .............................................................................................................. 15
Bibliography ................................................................................................................................. 19
Appendix ....................................................................................................................................... 21
iii
List of Figures
Figure 1: Graphic of the High Definition Single Cell Assay (HDSCA3.0) workflow. .................. 5
Figure 2: “Out of bag” error rates of Cancer vs Not Cancer models. ............................................. 9
Figure 3: Feature importance and event distributions for Cancer vs. Not Cancer model. ............ 11
Figure 4: “Out of bag” error rates of Early-stage vs. Late-stage models. ..................................... 12
Figure 5: Feature importance and event distributions for Early-stage vs. Late-stage model. ...... 14
iv
Abstract
Breast cancer (BC) is the leading cause of death in women worldwide, with the late-stage
disease having a 5-year survival rate of only 30%. Early detection is essential for optimal
treatment, response, and reducing mortality. However, mammography results, the current gold
standard, can be misinterpreted due to various technical limitations and biological complexities.
Recent studies have shown that a liquid biopsy (LBx) analysis differentiates cancer patients from
normal donors and stratifies early-stage from late-stage. In this study, we set out to investigate the
feasibility of detecting and staging BC via a fully automated approach of detecting, identifying,
and enumerating rare events from a LBx sample. Namely, we employed an outlier clustering
method to stratify rare from common events, followed by an unsupervised learning algorithm to
group them by hierarchy. We then used this ranking to cluster them as for inputs to a classification
model. We found the Cancer vs. Not Cancer model to differentiate with an accuracy of 97.8%, and
the Early-stage vs. Late-stage to stratify with an accuracy of 87.5%. Additionally, we observed
that 17 of the top 20 event clusters in the Cancer vs. Not Cancer model were cellular events, while
12 of the top 20 were cellular events in the Early-stage vs. Late-stage model. Overall, we were
able to reproducibly detect, and cluster rare events based on morphometric features to develop
models to stratify with high accuracy. Our findings illustrate the feasibility of an automated
approach to BC detection and stage stratification via a LBx analysis.
1
Chapter One: Introduction
It is estimated that there will be more than 300,000 newly diagnosed breast cancer (BC)
cases in the US in 2022, with males accounting for approximately 2,800 of these [1]. It is also
estimated that 66% of cases will be diagnosed as localized disease, 26% as regional spread, and
6% as distant spread, with respective 5-year survival rates of 99%, 86%, and 30% [1]. With this
disease being the worldwide leading cause of cancer death in women in 2020 [2], early detection
is essential for optimal management and reduced mortality [3-5]. Compared to females, male BC,
although rare, is typically diagnosed at later stages and results in worse survival rates [6]. The
current gold standard for BC screening and identification is the mammogram [7, 8], which is
recommended yearly for average-risk women aged 45-54 and biennially for those aged 55 and
above [7]. In contrast, those at higher-than-average risk are recommended to start screening as
early as 30 [9]. Despite the utility of the mammogram, various technological limitations (e.g.,
imaging resolution) and biological complexities (e.g., breast density) can lead to false positives
(i.e., mammogram positive, cancer negative) and false negatives (i.e., mammogram negative,
cancer positive). Dabbous et al. examined 741,150 screening mammograms from 261,767 women
that did not develop BC during the study period and determined that 12.3% (n=90,918) of
screening results were false positives [10]. Another study by Lehman et al. examined 1,682,504
screening mammograms and calculated a sensitivity of 86.9% and specificity of 88.9% [11].
Additionally, this screening approach can be invasive and only yield localized information. A
minimally invasive, systemic procedure has the potential to identify subclinical disease that may
have begun to spread throughout the body.
This study had two main goals: 1) to define a liquid biopsy (LBx) BC signature that
accurately screens for or diagnoses cancer and 2) to stratify diagnosed BC patients into early- and
2
late-stage. In the former, accurate identification of BC via routine blood draws can lead to early
cancer detection (screening) and potentially reduce the need for invasive solid biopsies (diagnostic
workup). In the latter case, accurate stratification of staging can identify the presence of micro
metastatic disease that can contribute to optimized treatment decisions. Both scenarios can
transform BC identification, care, and management, an emerging trend toward the future of cancer
within the clinic [12-19].
A recent study from our lab was conducted that analyzed LBx analytes from BC patients
and compared manual enumerations of these events based on biomarker positivity to those of
normal donors (NDs) [20]. We found that the levels of these rare events are distinct across groups
(ND, early-, and late-stage BC), and the resulting random forest models built to predict Cancer vs.
Normal and Early-stage vs. Late-stage yielded AUC values of 0.99 and 0.91, respectively. That is
to say, these models were able to differentiate a cancer sample from a non-cancerous one with
99% accuracy and stratify disease stage with 91% accuracy. While these results represent a manual
approach that is supported by computation, we hypothesized that on an individual patient basis,
diagnosis (Cancer vs. Not Cancer) and staging (Early-stage vs. Late-stage) of BC could be
predicted using LBx analytes found in peripheral blood (PB) through a fully automated data
science approach.
3
Chapter Two: Methods
The set of mathematical models described in this study is intended for use in individuals at
risk of BC and/or those with a confirmed diagnosis of BC. We used data generated from LBx
samples supplied to our lab to construct these models for investigational purposes.
Prediction Problem
We hypothesized that on an individual patient basis, diagnosis and staging of BC can be
predicted using LBx analytes found in PB. We retrospectively analyzed data to construct predictive
models for both BC detection and stage stratification. Given the stark differences between cancer
patients and healthy individuals, we expected the diagnostic model to perform fairly well with
minimal false positives and false negatives. However, we expected less concordance for the
staging model due to the variations associated with either an early- or late-stage classification.
That is, the stage does not consider tumor sizes, specific locations, or the timings associated with
progression. Both models' accuracy was measured via the overall misclassification rate and
visualized with confusion matrices.
Study Design, Patient Information
Cancer patients were recruited prospectively as part of the Physical Sciences in Oncology
Center Study (PSOC-0068) entitled OPTImization of blood COLLection (OPTICOLL) [21] and
retrospectively analyzed. Multiple clinical sites provided patient samples, including Billings Clinic
(Billings, MT), Duke University Cancer Institute (Durham, NC), City of Hope Comprehensive
Cancer Center (Duarte, CA), and the University of Southern California Norris Comprehensive
Cancer Center (Los Angeles, CA). Recruitment occurred according to Institutional Review Board-
4
approved protocols, with all participants providing written informed consent. All patients were
enrolled between April 2013 and January 17, 2017.
A total of 74 patients diagnosed with non-metastatic, treatment-naive BC (Early-stage BC)
were included in this study, with PB samples collected prior to any treatment. A total of 26 patients
with metastatic disease (Late-stage BC) had 51 PB samples collected prior to the start of a new
therapy, either as a first-line or as post-progression treatment. Given that the start of a new therapy
indicates progressive disease, each time point for an individual can be viewed as a distinctly
different disease that must be treated uniquely and thus is categorized as a unique patient. A total
of 53 ND samples were collected and provided by Epic Sciences (San Diego, CA, USA). These
individuals had no known pathology and were matched for age (45-82 years old, median of 57)
and gender (female).
LBx Acquisition, Processing, and Cryobanking
The third-generation High Definition Single Cell Assay (HDSCA3.0) workflow (Figure
1), designed to identify, characterize, and isolate analytes from LBx samples, is described in Chai
et al. [22]. In short, approximately 7.5 mL of PB was collected in 10-mL collection tubes (Cell-
free DNA BCT, Streck) at the clinical site and subsequently shipped to our laboratory for
processing within 48 hrs. The LBx sample underwent red blood cell lysis, and the remaining
nucleated cells were attached as a monolayer on custom-made glass slides (Marienfeld, Lauda,
Baden-Württemberg, Germany). Next, the slide was cover-slipped to protect the cells, and
subsequently cryobanked in a -80°C freezer for preservation and future analyses. This process
typically yields 14 glass slides with a seeding target of 3 million cells on each.
5
Figure 1: Graphic of the High Definition Single Cell Assay (HDSCA3.0) workflow. Liquid biopsy (LBx) samples
are taken from whole peripheral blood (PB) or bone marrow aspirate and subsequently plated onto custom glass
slides before storage at -80°C. When necessary, slides are thawed, stained with immunofluorescent (IF) antibodies,
and scanned at 100x magnification. If desired, single cells can be subjected to higher-quality imaging (i.e., 400x
magnification) and either genomics or targeted proteomics. Additionally, solid tumor touch preps from either
primary or metastatic lesions can be subjected to the same workflow.
Slide Staining, Scanning, and Pre-Processing
Slides were thawed at room temperature for 1 hour, fixed with paraformaldehyde, and
subsequently stained with immunofluorescent (IF) antibodies for downstream identification and
characterization. These specific antibodies are markers for nuclear identification (DAPI), epithelial
cells (cytokeratin [CK]), white blood cells (CD45), endothelial cells (CD31), and mesenchymal
cells (Vimentin [VIM]). The CD45 and CD31 antibodies are included in the same IF channel.
After staining is complete, the slides are scanned at 100x magnification in each of the four IF
channels, yielding a total of 9216 frames of view (4 IF channels x 2304 frames). During this
process, the R package EBImage (version 4.22.1) is used to generate masks around each event
(cellular [DAPI+] and acellular [DAPI-]) to extract morphological features. Using each of the four
IF channels and paired composites of each (e.g., DAPI + CK), 761 features are extracted for each
6
event. These features include area, perimeter, mean radius, eccentricity, major axis, and channel
intensity.
Rare Event Detection, Identification, and Enumeration
Given the 761 morphometric features extracted, a principal component analysis (PCA) was
performed to reduce the feature space to 350 variables while still maintaining 99.95% of the
original variance. Using these principal components, event-to-event distances are calculated for a
given frame and used to define hierarchical clusters. From these metrics, we can define ~30
clusters of cells on a frame and then use these to separate the identified events into common and
rare categories. A rare event is defined as either belonging to a small-sized cluster (< 1.5% of all
events on the frame) or being far away from the frame’s median event (PCA distance > 12). After
this frame-based identification, the rare events are compared and analyzed to those found across
the slide in the other frames for further discrimination. Due to edge effects, a 2-frame border
around the outside of the slide is omitted for analysis purposes resulting in 1840 frames total. The
final set of rare events is then mapped onto a pre-constructed t-SNE of previously identified events
and assigned a unique identifier based on the nearest representative event. The integer counts of
each identifier are converted to count/mL to ensure comparable data across slides.
Model Training and Testing
Rare event counts/mL from paired sister slides were aggregated for the Early- and Late-
stage BC patients and the set of age-matched NDs. Based on the hierarchical clustering used to
determine the t-SNE, events are iteratively grouped into increasing numbers of clusters up until a
predefined maximum amount. At each increment, a random forest (RF) classification model
7
consisting of 1000 decision trees is constructed for the given prediction problem (i.e., Cancer vs.
Not Cancer or Early-stage BC vs. Late-stage BC). For each model, “out of bag” sampling
determines the error rate on the training set (75%), with the minimum value corresponding to the
optimal number of event clusters. At each iteration, the corresponding model is applied to the test
set (25%) to calculate the model's accuracy. To more accurately track the contributions of cellular
(DAPI+) and acellular (DAPI-) events to correctly predict the classification, the data and data
science steps outlined above are completed in parallel with each group and subsequently merged
to construct a combined model that takes all rare events into consideration. The feature importance
of the combined model is then used to identify the event clusters that contribute the most to
correctly predicting the classifications. Beginning with the two most important features, RF
models are iteratively constructed using the top N clusters to determine the best pared-down model
with the lowest error rate. To ensure a level of interpretability in the final result, the minimum
number of clusters is set to 25. Again, the final model is applied to the test set to calculate the
accuracy.
8
Chapter Three: Results
Data
A total of 354 slides from 178 PB samples were acquired from the 53 NDs, 74 early-stage
BC patients, and 26 late-stage BC patients. This corresponded to an average of 2 slides/sample.
All but one of the individuals were female. Patient demographic and clinical features were not
used for constructing the models other than the knowledge of disease status and staging.
Cancer vs. Not Cancer Model
This model was built using 125 BC patient samples and 53 ND samples. The training set
resulted in 93 and 39 of each, respectively, while the test set resulted in 32 and 14 of each. The
iterative clustering for the Cancer vs. Not Cancer cellular and acellular models was stopped at 100
clusters, presenting optimal cut points at 48 and 63 with “out of bag” error rates of 2.3% and 9.1%,
respectively (Figure 2a and 2b). The resulting accuracy of the test set was 98.4% for cellular and
92.9% for acellular. With both the cellular and acellular events merged, the resulting combined
model yielded an “out of bag” error rate of 3.8% and a test set accuracy of 97.8%. The paring
down process resulted in 37 final event clusters, producing a final model with an “out of bag” error
rate of 3.0% and a test set accuracy of 97.8%, corresponding to 1 cancer patient being predicted as
not having cancer.
9
Figure 2: “Out of bag” error rates of Cancer vs Not Cancer models. (a) cellular events and (b) acellular events. The
optimal cluster value is depicted by the vertical blue line and the corresponding error rate is depicted by the horizontal
red line.
Of the final clusters, 28 were cellular events, accounting for 17 of the top 20 (Figure 3a).
When looking at the distributions of these events in the two populations, 13 of the top 20 were
found in higher proportions within the NDs as compared to the BC patients (Figure 3b). The most
important cluster, Cellular #1, had median distributions of 0 cells/mL and 40.8 cells/mL for the
BC patients and NDs, respectively. Most of the events in this cluster displayed little to no
expression in CK and VIM. Additionally, the DAPI and CD45/CD31 signals are fairly weak,
potentially indicating that these are artifacts of the staining and imaging processes. The second
most important cluster, Cellular #2, had median distributions of 1067.1 cells/mL and 293.9
cells/mL for the BC patients and NDs, respectively. Most of the cells in this cluster displayed more
CK and CD45/CD31 expression than the previous cluster but still little to no expression in VIM.
Upon further inspection, however, the CK signal appeared to mostly be smaller, isolated instances
that did not encompass the entire cell. The third most important cluster, Cellular #3, had median
distributions of 0 cells/mL and 5.0 cells/mL for the BC patients and NDs, respectively. Most of
10
the cells in this cluster displayed CD45/CD31 expression similar to the previous cluster but still
little to no expression in CK and VIM. Gaining a deeper understanding of what these events are,
what makes them rare, and why there are distribution differences between these two groups is
necessary for downstream analyses (e.g., single-cell genomics, targeted proteomics) and future
investigations (e.g., clinical trial design). Images of these clusters for further analysis can be found
in the Appendix.
11
Figure 3: Feature importance and event distributions for Cancer vs. Not Cancer model. (a) feature importance scores
of the top 20 event clusters shown as a bar graph and (b) their corresponding event distributions shown as box and
whisker plots. Red values indicate BC patient samples and blue values indicate ND samples.
12
Early-stage vs. Late-stage Model
This model was built using 74 early-stage BC patient samples and 51 late-stage BC patient
samples. The training set resulted in 55 and 38 of each, respectively, while the test set resulted in
19 and 13 of each. The iterative clustering for the Early-stage vs. Late-stage cellular and acellular
models was stopped at 100 clusters, presenting optimal cut points at 77 and 79 with “out of bag”
error rates of 19.4% and 17.2%, respectively (Figure 4a and 4b). The resulting accuracy of the test
set was 79.4% for cellular and 76.7% for acellular. With both cellular and acellular events merged,
the resulting combined model yielded an “out of bag” error rate of 21.5% and a test set accuracy
of 87.5%. The paring down process resulted in 35 final event clusters, producing a final model
with an “out of bag” error rate of 15.1% and a test set accuracy of 87.5%, corresponding to 1 early-
stage patient and three late-stage patients being predicted incorrectly.
Figure 4: “Out of bag” error rates of Early-stage vs. Late-stage models. (a) cellular events and (b) acellular events.
The optimal cluster value is depicted by the vertical blue line, and the corresponding error rate is depicted by the
horizontal red line.
13
Of the final clusters, 20 are cellular, accounting for 12 of the top 20 (Figure 5a). When
looking at the distributions of these events in the two populations, only 6 of the top 20 are found
in higher proportions within the late-stage patients as compared to the early-stage patients (Figure
5b). The most important cluster, Acellular #1, had median distributions of 127.3 events/mL and
595.2 events/mL for the early-stage and late-stage patients, respectively. Most of the events in this
cluster displayed a significant amount of CK expression with little to no expression in VIM. A
number of these events exhibit the typical features of what can be described as an oncosome, the
most important feature for separation identified in our previous, fully manual study [20]. The
second most important cluster, Acellular #2, had median distributions of 47.4 events/mL and 11.3
events/mL for the early-stage and late-stage patients, respectively. Nearly all the events in this
cluster displayed little to no CK and VIM expression and were almost exclusively presented with
CD45/CD31 signal. Additionally, several of these events appear to lie on the border of a frame
(i.e., there is a straight edge dissecting the event). The third most important cluster, Cellular #1,
had median distributions of 25.9 cells/mL and 6.9 cells/mL for early- and late-stage patients,
respectively. Most of the events in this cluster appeared to be artifacts of the staining and imaging
processes and not true cellular signals. Images of these clusters for further analysis can be found
in the Appendix.
14
Figure 5: Feature importance and event distributions for Early-stage vs. Late-stage model. (a) feature importance
scores of the top 20 event clusters are shown as a bar graph, and (b) their corresponding event distributions are shown
as box and whisker plots. Red values indicate early-stage patient samples, and blue values indicate late-stage patient
samples.
15
Chapter Four: Discussion
We developed prediction models that identified a potential BC early detection signature
via LBx analytes and stratified patients into early- and late-stage disease. These models used an
automated enumeration-based approach of rare events identified via the HDSCA3.0 workflow. We
found that cellular events were the most important differentiator between cancer and normal, while
acellular events were the most important for stage stratification. With further investigation and
validation, this approach could prove to be a powerful companion for mammography in the setting
of BC screening, as well as a way of longitudinal monitoring.
The data from a single blood draw can identify a malignant tumor without needing a solid
tumor biopsy. Comparatively, the approach outlined here is both minimally invasive and cost-
efficient, potentially leading to more frequent screening, leading to earlier disease detection across
the population. Additionally, this approach provides a potential avenue for detecting widespread
systemic BC before metastatic lesions become clinically detectable. This type of information can
help guide treatment decisions (i.e., surgical resection vs. high-dose neoadjuvant chemotherapy)
that can be tailored best to the individual patient. Similarly, downstream genomic and/or proteomic
analysis of these rare events can identify disease subsets (e.g., HR+, HER2- BC) that can motivate
targeted therapies. Ultimately, these diagnostic and prognostic predictions, paired with additional
LBx, clinical, and demographic data, can inform the clinical conversation, ideally leading to a
treatment plan aligned with the patient's best interests. However, it is important to remember that
the results presented here are informed by pre-clinical research data that was not collected for this
purpose. As such, additional properly designed studies are needed prior to wide-scale
dissemination and clinical implementation.
16
This automated approach on a robust, reliable, and reproducible platform allows for a “no
information left behind” analysis that can produce comparative results quickly. The models are
given all rare cellular and acellular events to make a prediction, allowing the architectures to rank
the importance of each feature. Since the pipeline preserves all of the data in its original state, these
models can be updated if upstream computational algorithms are changed (e.g., new event
masking; reconstructed t-SNE map) and/or revised when data from new patients is generated.
Additionally, the non-specific methodology used here allows for easy adaptation to targeted cancer
types (e.g., HER2+ BC vs. HER2- BC; SCLC vs. NSCLC) and other disease settings (e.g., normal
vs. post-acute COVID-19 syndrome; normal vs. diabetic).
As with all model-building efforts, there are inherent limitations to the approach. The
models built here only apply to BC when comparing Cancer vs. Not Cancer and are only useful
for stratifying BC patients into early- or late-stage. Additional data is needed to ensure properly
powered models to expand to specific disease subsets. Furthermore, while the utility of such
techniques sounds promising, real-time results are currently infeasible in a clinical setting due to
the non-widespread use of the HDSCA3.0 platform (even when considering Epic Science’s
commercial adaptation). Lastly, only one model architecture was explored, primarily due to the
interpretability of results (i.e., the feature importance). Other architectures like k-nearest
neighbors, naive Bayes, or support vector machines may yield better results.
All things considered, an unbiased, data-driven approach is powerful and informative but
has its drawbacks. While models like the ones shown here can modestly separate disease from
non-disease and stratify by severity, they are limited by the data provided, the time needed to
generate the predictions, and the interpretation of the results within the context of the individual.
However, clinical adaptation and usage of such methods can and will save lives and impact the
17
quality of life of many patients and their families. Accurate and precise prediction of the onset and
spread of cancer via a simple blood draw will allow more targeted clinical decision-making that
can potentially transform this disease into a manageable, chronic condition. Regardless of whether
a LBx would replace or supplement the existing diagnostic methods, widespread expansion, and
validation is needed before these tools can be considered for use within routine care.
As previously mentioned, it is important to highlight that these models are only intended
for use within individuals with suspicion of BC and/or those with a confirmed diagnosis of BC. A
clear next step would be to investigate these approaches within women that are mammogram
positive, cancer negative (i.e., those that have a benign lesion). We would expect these individuals
to have a LBx signature somewhere in between that of the BC patients and the NDs, however it is
unclear whether that signature will be distinguishable from either with respect to predictive
modeling. Another next step would be an expansion to hormonal subtypes of BC as well as the
inclusion of more specific staging information (e.g., numeric staging; TNM staging). In the former
case, these models could help identify patients that have acquired resistance to a targeted therapy
while the latter could provide greater resolution around the extent of metastatic spread.
Incorporation of patient demographic and clinical features would increase the
interpretability of the results. While the data derived from the LBx can provide snapshots of time
for an individual, they do not fully tell the history and identity of that person. For example,
knowing a patient is a 65-year-old Asian female with no family history of BC vs a 35-year-old,
Black female with a BMI of 19.9 provides a wealth of information that can impact predictions and
drive clinical decision-making. Likewise, utilizing a dataset that has contrasting therapy options
with long-term survival outcomes provides an opportunity to make predictions on therapy efficacy
based on LBx signatures. Recent work highlights the potential of predicting therapy efficacy yet
18
fails to succeed due to the need for high-resolution data available [23]. Similar to the early
identification of disease spread, models such as this can identify therapeutic resistance at its earliest
possible point, allowing for treatment modification to maintain overall efficacy. Simply put, a
minimally invasive, cost-efficient, and systemic procedure, backed by automated and streamlined
methods can inform pivotal, clinical conversations and revolutionize the way patients with BC are
cared for.
19
Bibliography
1. Siegel, R.L., Miller, K.D., Wagle, N.S., et al., Cancer statistics, 2023. CA Cancer J Clin,
2023. 73(1): p. 17.
2. Sung, H., Ferlay, J., Siegel, R.L., et al., Global Cancer Statistics 2020: GLOBOCAN
Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA
Cancer J Clin, 2021. 71(3): p. 209.
3. Tabár, L., Vitak, B., Chen, T.H., et al., Swedish two-county trial: impact of
mammographic screening on breast cancer mortality during 3 decades. Radiology, 2011.
260(3): p. 658.
4. Tonelli, M., Connor Gorber, S., Joffres, M., et al., Recommendations on screening for
breast cancer in average-risk women aged 40-74 years. Cmaj, 2011. 183(17): p. 1991.
5. Marmot, M.G., Altman, D.G., Cameron, D.A., et al., The benefits and harms of breast
cancer screening: an independent review. Br J Cancer, 2013. 108(11): p. 2205.
6. Wang, F., Shu, X., Meszoely, I., et al., Overall Mortality After Diagnosis of Breast
Cancer in Men vs Women. JAMA Oncol, 2019. 5(11): p. 1589.
7. Oeffinger, K.C., Fontham, E.T., Etzioni, R., et al., Breast Cancer Screening for Women at
Average Risk: 2015 Guideline Update From the American Cancer Society. Jama, 2015.
314(15): p. 1599.
8. DeAngelis, C.D. and Fontanarosa, P.B., US Preventive Services Task Force and breast
cancer screening. Jama, 2010. 303(2): p. 172.
9. Lee, C.S., Monticciolo, D.L., and Moy, L., Screening Guidelines Update for Average-
Risk and High-Risk Women. AJR Am J Roentgenol, 2020. 214(2): p. 316.
10. Dabbous, F.M., Dolecek, T.A., Berbaum, M.L., et al., Impact of a False-Positive
Screening Mammogram on Subsequent Screening Behavior and Stage at Breast Cancer
Diagnosis. Cancer Epidemiol Biomarkers Prev, 2017. 26(3): p. 397.
11. Lehman, C.D., Arao, R.F., Sprague, B.L., et al., National Performance Benchmarks for
Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance
Consortium. Radiology, 2017. 283(1): p. 49.
12. Michela, B., Liquid Biopsy: A Family of Possible Diagnostic Tools. Diagnostics (Basel),
2021. 11(8).
13. Mader, S. and Pantel, K., Liquid Biopsy: Current Status and Future Perspectives. Oncol
Res Treat, 2017. 40(7-8): p. 404.
20
14. Kilgour, E., Rothwell, D.G., Brady, G., et al., Liquid Biopsy-Based Biomarkers of
Treatment Response and Resistance. Cancer Cell, 2020. 37(4): p. 485.
15. Chen, M. and Zhao, H., Next-generation sequencing in liquid biopsy: cancer screening
and early detection. Hum Genomics, 2019. 13(1): p. 34.
16. Xue, V.W., Wong, C.S.C., and Cho, W.C.S., Early detection and monitoring of cancer in
liquid biopsy: advances and challenges. Expert Rev Mol Diagn, 2019. 19(4): p. 273.
17. Nazarenko, I., Extracellular Vesicles: Recent Developments in Technology and
Perspectives for Cancer Liquid Biopsy. Recent Results Cancer Res, 2020. 215: p. 319.
18. Ozawa, P.M.M., Jucoski, T.S., Vieira, E., et al., Liquid biopsy for breast cancer using
extracellular vesicles and cell-free microRNAs as biomarkers. Transl Res, 2020. 223: p.
40.
19. Roy, D., Pascher, A., Juratli, M.A., et al., The Potential of Aptamer-Mediated Liquid
Biopsy for Early Detection of Cancer. Int J Mol Sci, 2021. 22(11).
20. Setayesh, S.M., Hart, O., Naghdloo, A., et al., Multianalyte liquid biopsy to aid the
diagnostic workup of breast cancer. NPJ Breast Cancer, 2022. 8(1): p. 112.
21. Shishido, S.N., Welter, L., Rodriguez-Lee, M., et al., Preanalytical Variables for the
Genomic Assessment of the Cellular and Acellular Fractions of the Liquid Biopsy in a
Cohort of Breast Cancer Patients. J Mol Diagn, 2020. 22(3): p. 319.
22. Chai, S., Matsumoto, N., Storgard, R., et al., Platelet-Coated Circulating Tumor Cells Are
a Predictive Biomarker in Patients with Metastatic Castrate-Resistant Prostate Cancer.
Mol Cancer Res, 2021. 19(12): p. 2036.
23. Mason, J., Gong, Y., Amiri-Kordestani, L., et al., Model Development of CDK4/6
Predicted Efficacy in Patients With Hormone Receptor-Positive, Human Epidermal
Growth Factor Receptor 2-Negative Advanced or Metastatic Breast Cancer. JCO Clin
Cancer Inform, 2021. 5: p. 758.
21
Appendix
Appendix Figure 1: Images of the events found in Cellular #1 for the Cancer vs Not Cancer model separated by (a)
training set and (b) test set. The left half represents those found in the cancer population and the right half represents
those found in the not cancer population. For visualization purposes, the number of images displayed is limited to 10
per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to the distributions
shown in Figure 3b.
22
Appendix Figure 2: Images of the events found in Cellular #2 for the Cancer vs Not Cancer model separated by (a)
training set and (b) test set. The left half represents those found in the cancer population and the right half represents
those found in the not cancer population. For visualization purposes, the number of images displayed is limited to 10
per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to the distributions
shown in Figure 3b.
23
Appendix Figure 3: Images of the events found in Cellular #3 for the Cancer vs Not Cancer model separated by (a)
training set and (b) test set. The left half represents those found in the cancer population and the right half represents
those found in the not cancer population. For visualization purposes, the number of images displayed is limited to 10
per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to the distributions
shown in Figure 3b.
24
Appendix Figure 4: Images of the events found in Cellular #4 for the Cancer vs Not Cancer model separated by (a)
training set and (b) test set. The left half represents those found in the cancer population and the right half represents
those found in the not cancer population. For visualization purposes, the number of images displayed is limited to 10
per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to the distributions
shown in Figure 3b.
25
Appendix Figure 5: Images of the events found in Cellular #5 for the Cancer vs Not Cancer model separated by (a)
training set and (b) test set. The left half represents those found in the cancer population and the right half represents
those found in the not cancer population. For visualization purposes, the number of images displayed is limited to 10
per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to the distributions
shown in Figure 3b.
26
Appendix Figure 6: Images of the events found in Acellular #1 for the Early-stage vs Late-stage model separated by
(a) training set and (b) test set. The left half represents those found in the early-stage population and the right half
represents those found in the late-stage population. For visualization purposes, the number of images displayed is
limited to 10 per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to
the distributions shown in Figure 5b.
27
Appendix Figure 7: Images of the events found in Acellular #2 for the Early-stage vs Late-stage model separated by
(a) training set and (b) test set. The left half represents those found in the early-stage population and the right half
represents those found in the late-stage population. For visualization purposes, the number of images displayed is
limited to 10 per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to
the distributions shown in Figure 5b.
Appendix Figure 8: Images of the events found in Cellular #1 for the Early-stage vs Late-stage model separated by
(a) training set and (b) test set. The left half represents those found in the early-stage population and the right half
represents those found in the late-stage population. For visualization purposes, the number of images displayed is
limited to 10 per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to
the distributions shown in Figure 5b.
Appendix Figure 9: Images of the events found in Acellular #3 for the Early-stage vs Late-stage model separated by
(a) training set and (b) test set. The left half represents those found in the early-stage population and the right half
represents those found in the late-stage population. For visualization purposes, the number of images displayed is
limited to 10 per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to
the distributions shown in Figure 5b.
28
Appendix Figure 10: Images of the events found in Cellular #2 for the Early-stage vs Late-stage model separated by
(a) training set and (b) test set. The left half represents those found in the early-stage population and the right half
represents those found in the late-stage population. For visualization purposes, the number of images displayed is
limited to 10 per sample and ~5,000 overall. As such, the number of images shown may not be directly correlated to
the distributions shown in Figure 5b.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Potential of aqueous humor as a liquid biopsy for uveal melanoma
PDF
Early detection of lung cancer by characterizing circulating rare cells using peripheral blood liquid biopsy
PDF
Applying multi-omics in cancer liquid biopsy for improved patient monitoring and biomarker discovery
PDF
A cross-sectional study of the association of PTH on bone quality across levels of propionic acid among adult patients with uremia
PDF
Medical comorbidities and multiple sclerosis in the Hispanic population: linking healthcare disparities
PDF
Cardiac function in children and young adults treated with MEK inhibitors: a retrospective cohort study of routinely collected health data
PDF
Integrated management of atrial fibrillation in women in an underserved, safety-net health care system: a multicenter, single health system randomized control efficacy trial protocol
PDF
Creating a shared definition of mental health in the Pilipinx community: a comparative focus group analysis
PDF
Risk factors for unanticipated hospitalizations in children and youth with spina bifida at an urban children’s hospital: a cross-sectional study
PDF
Using mobile health to improve social support for low-income Latino patients with diabetes: a randomized mixed methods feasibility trial of TExT-MED FANS
PDF
Magnetic resonance imaging (MRI) staging for breast cancer in a diverse population
PDF
RNA methylation in cancer plasticity and drug resistance
PDF
Understanding anti-depressant treatment failure in an underserved vulnerable population
PDF
Clinical outcomes of allogeneic hematopoietic stem cell transplant in acute lymphoblastic leukemia patients: a quality improvement project and systematic review meta-analysis
PDF
Predictors of thrombosis in hospitalized children with central venous catheters: a multi-center predictive study from the CHAT Consortium
PDF
The cost of opioid use in high-risk hospitalized infants
PDF
Cryopreserved umbilical cord mesenchymal stem cells therapy for the treatment of knee osteoarthritis: in-vitro evaluation and phase I clinical trial protocol
PDF
Association of chronic obstructive pulmonary disease and mortality following thoracic and complex endovascular aortic repair: a retrospective cohort study
PDF
Predictive factors of breast cancer survival: a population-based study
PDF
Factors influencing the decision and timing to undergo breast reconstruction after mastectomy for breast cancer in public hospital vs. private medical center from 2007 to 2013: a retrospective co...
Asset Metadata
Creator
Mason, Jeremy
(author)
Core Title
Model development of breast cancer detection and staging via rare event enumeration from a liquid biopsy: a retrospective descriptive clinical research study
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Clinical, Biomedical and Translational Investigations
Degree Conferral Date
2023-08
Publication Date
05/18/2023
Defense Date
05/17/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
breast cancer,liquid biopsy,model,OAI-PMH Harvest,rare event
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Patino-Sutton, Cecilia (
committee chair
), Hicks, James (
committee member
), Kuhn, Peter (
committee member
)
Creator Email
masonj@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113131579
Unique identifier
UC113131579
Identifier
etd-MasonJerem-11868.pdf (filename)
Legacy Identifier
etd-MasonJerem-11868
Document Type
Thesis
Format
theses (aat)
Rights
Mason, Jeremy
Internet Media Type
application/pdf
Type
texts
Source
20230518-usctheses-batch-1046
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
breast cancer
liquid biopsy
rare event