Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Multimodal single-cell biology and machine learning to characterize plasma cell neoplasms
(USC Thesis Other)
Multimodal single-cell biology and machine learning to characterize plasma cell neoplasms
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Multimodal Single-cell Biology and Machine Learning to Characterize Plasma Cell Neoplasms By Libère Jensen Ndacayisaba A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MEDICAL BIOPHYSICS) December 2022 Copyright 2022 Libère Jensen Ndacayisaba ii Dedication To the memory of my dad, for being the sacred anchor through life. To my mom, for the chromosomes, for the sacrifices, and for teaching me that grit and valor render human endeavor limitless. iii Acknowledgments The doctoral work in this thesis is a culmination of many years of preparation and the outcome of countless individuals who have supported my pursuits in work and life. Words are not sufficient to express my gratitude towards every person who played part in helping me reach this milestone. I do want to highlight important contributors. I thank Professors Peter Kuhn and James Hicks for taking me into the lab and providing a stimulating, supportive, and challenging environment for me to learn and become an independent scientist. I’m grateful for your guidance throughout my PhD. I’m grateful to my committee members Professors Jerry Lee, Assad Oberai, Kevin Kelly, for their intellectual guidance and patience during a critical period of my professional development. I’m immensely grateful to the entire PIBBS staff and administration for the massive support at every step since the first day at Keck. To all my mentors over the years, primarily Drs Misha Leong, Brad Balukjian, Nilesh Banavali, Hongmin Li, Michael Grabe, Yuan Wang, thank you for keeping the fire in ablaze, for your time, and for the lessons that remain the guiding light in my career. I immensely thank the core CSI-Multiple Myeloma team members, Kate Rappard, Carlisle Maney, Eric Yang, Dean Tessone, Sean Solomon, Carli Kaleta, Jena Tran, Sonia Setayesh, and Carmen Ruiz for their enormous contribution to the myeloma research work. Without them, this dissertation would not have been possible and for that, I’m forever indebted. Thanks to Amin Naghdloo and Riya Verma along with the technical and administrative members of the lab for additional contributions to the research work in this thesis. To my undergraduate student mentees, thank you for trusting me with an important part of your career path, for being teachable, and more teaching me how to be a better leader and scientist. To me you were the best part of my graduate school experience. To all colleagues, friends, and family near and far who have supported me through the doctoral training, your love and support means the world. I owe a great deal of gratitude to my uncle Jean-Berchmans Nsabiyumva for being a foundational role model since my formative years and to my brother Isidore Niyongabo for sharing the hardest days of my immigrant life in America. To the collaborators at every point of my PhD, you taught me so much and for that, I thank you! I am grateful to the Dr. Miriam and Sheldon G. Adelson Medical Research Foundation, Novartis Pharmaceuticals, and the Peter Schlegel Family for providing financial support to enable the research work in this dissertation. Finally, I’d like to thank all the patients whose samples and data contributed to the research in this PhD thesis. With the hope that my contribution to biomedicine improves cancer care in the future, the benevolence of all the patients is what will save many lives. iv Table of Contents Dedication ..................................................................................................................................... ii Acknowledgments ...................................................................................................................... iii List of Figures .............................................................................................................................. vi List of Tables .............................................................................................................................. viii Abstract......................................................................................................................................... ix Chapter 1 General Introduction ..................................................................................................1 Chapter 2 Single-cell Morphogenomics for the Characterization of Plasma Cell Neoplasms ......................................................................................................................................3 2.1. Introduction........................................................................................................................ 3 2.2. Enrichment-Free Single-Cell Detection and Morphogenomic Profiling of Myeloma Patient Samples to Delineate Circulating Rare Plasma Cell Clones ................................. 3 2.2.2. Materials and Methods .............................................................................................. 6 2.2.3. Results ........................................................................................................................ 11 2.2.4. Discussion .................................................................................................................. 23 2.3. Characterization of BCMA Expression in Circulating Rare Single Cells of Patients with Plasma Cell Neoplasms ................................................................................................ 25 2.3.1. Introduction ............................................................................................................... 25 2.3.2. Materials and Methods ............................................................................................ 27 2.3.3. Results ........................................................................................................................ 32 2.3.4. Discussion .................................................................................................................. 42 2.4. Conclusion ........................................................................................................................ 43 Chapter 3 Molecular Subtyping of NDMM using Non-negative Matrix Factorization ...44 3.1. Introduction...................................................................................................................... 44 3.2. Materials and methods ................................................................................................... 45 3.2.1. The Multiple Myeloma Research Foundation (MMRF) CoMMpass study ..... 45 3.2.2. Selection and pre-processing of RNA sequencing data ...................................... 46 3.2.3. Non-negative Matrix factorization for transcriptomic subtyping ..................... 46 3.2.4. Survival analysis in NMF subtypes ....................................................................... 46 3.2.5. Gene Set Enrichment Analysis (GSEA) for biological pathway determination ............................................................................................................................................... 46 v 3.2.6 Analysis of the Tumor Necrosis Factor (TNF) gene family in NMF candidate groups ................................................................................................................................... 47 3.2.7. Copy number variation (CNV) profiling of NMF groups .................................. 47 3.3. Results ............................................................................................................................... 47 3.3.1. Identification of six transcriptomic subtypes in NDMM .................................... 47 3.3.2. Overall survival (OS) and progression-free survival (PFS) in NMF groups ... 48 3.4. Discussion ......................................................................................................................... 53 Chapter 4 Mathematical Oncology to Integrate Multimodal Clinical and Liquid Biopsy Data for the Prediction of Survival ...........................................................................................53 4.1. Introduction...................................................................................................................... 54 4.2. Materials and Methods ................................................................................................... 56 4.2.1. Integration of Multi-Center Demographic and Clinical Data ............................ 56 4.2.2. Sparsity, Scale, and Dimensionality of Liquid Biopsy Data in Comprehensive Profiling ................................................................................................................................ 59 4.2.3. Integration and Augmentation of Multimodal and Multiscale Liquid Biopsy 61 4.2.4. Advances in Machine and Deep Learning Methodologies for the Prediction of Survival ................................................................................................................................ 63 4.3. Results ............................................................................................................................... 65 4.3.1. Survival Prediction on Multi-Center Demographic and Clinical Data in Two Breast Cancer Cohorts ........................................................................................................ 65 4.3.2. Survival Prediction on Integrated Demographic, Clinical, and Liquid Biopsy Data in a Prostate Cancer Cohort ..................................................................................... 70 4.4. Conclusion ........................................................................................................................ 76 Chapter 5 Ongoing and future work in PC neoplasia characterization: from discovery to clinical implementation ..............................................................................................................77 5.1 Leveraging the versatility of HDSCA for custom 4-plex assays in PC neoplasia ... 78 5.2 Detection and Profiling of VIM, CK, and CD31 Expression in PC Neoplasms ...... 79 5.3 Single-cell multiplex proteomics to deconvolute myeloma progression ................. 81 5.4 Predicting disease progression from precursor conditions to malignant states ..... 82 5.5 Therapy response prediction in plasma cell neoplasia ............................................... 83 Chapter 6 Broad Conclusions and Perspectives .....................................................................83 Bibliography ................................................................................................................................86 vi List of Figures Figure 2.1 HDSCA-based 4-plex Immunofluorescence for Rare Plasma Cell Detection ... 7 Figure 2.2 Assay Validation of Markers in Spiked NBD Samples. ..................................... 12 Figure 2.3 Rare Single-cell Classification and Enumeration. ............................................... 15 Figure 2.4 Morphological Characterization of MM CTCs and BMPCs.............................. 17 Figure 2.5 Morphogenomic Validation of Detected Candidate MM CTCs and BMPCs. 19 Figure 2.6 sCNV in PB and BMA samples in the cohort ...................................................... 20 Figure 2.7 Correlation of scCNV events with diagnostic cytogenetic aberrations ........... 22 Figure 2.8 Assay rationale and 4-plex immunofluorescence targeting .............................. 29 Figure 2.9 Assay development and validation in cell lines and spiked normal blood .... 33 Figure 2.10 Morphological characterization of BCMA+ cells in PB samples .................... 37 Figure 2.11 Representative images of observed BCMAp staining patterns ...................... 38 Figure 2.12 Quantitative enumeration of circulating rare and BCMA+ cells .................... 39 Figure 2.13 scCNV analysis for genomic validation of candidate aberrant cells.............. 41 Figure 3.1 NFM candidate clusters and corresponding survival analysis ........................ 48 Figure 3.2 GSEA pathways and survival outcomes across NMF groups .......................... 49 Figure 3.3 Distribution of TNF receptors and ligands in NMF groups ............................. 51 Figure 3.4 DNA CNV profiles of patients in NMF groups .................................................. 53 Figure 4.1 Multimodal liquid biopsy data from the HDSCA workflow ............................ 59 Figure 4.2 Missingness in MSKCC and MDACC integrated breast cancer dataset ......... 67 Figure 4.3 Survival predictions and variable importance .................................................... 69 Figure 4.4 Missingness in demographic and clinical prostate cancer data........................ 74 Figure 4.5 Predicted survival and variable importance in multimodal BC and PC data 75 Figure 5.1 The versatility of HDSCA for custom 4-plex assays .......................................... 78 Figure 5.2 GPRC5D IF assay staining in NBD samples spiked with MCF cell lines ........ 79 vii Figure 5.3 Morphology and enumeration of detected MM TiME cells expressing CK, Vim, and CD31 ........................................................................................................................... 80 viii List of Tables Table 2.1 Patient Demographics and Clinical Characteristics. ............................................ 12 Table 2.2 BCMA Assay Staining Protocol. ............................................................................. 30 Table 2.3 Patient demographics and clinical characteristics ................................................ 34 Table 4.1 Demographic and clinical variables in the integrated breast cancer dataset ... 65 Table 4.2 Demographic and clinical variables in the integrated prostate cancer dataset 71 Table 5.1 40-plex IMC panel for targeted proteomics of the PC neoplasm TiME ............ 81 ix Abstract Plasma cell neoplasms are a network of benign and malignant disorders that are clinically and biologically distinct. Molecular profiling in overt and precursor conditions shows additional subtypes based on associated clonal immunoglobulins and thus manifests significant heterogeneity in disease risk, progression rates, treatment response, prognosis, and survival outlook. While the progression of plasma cell cancers is clinically defined, the biological mechanisms and sequential events leading to progression between states remain poorly understood, hindering accurate prediction of state change. Further, the intrinsic heterogeneity in the multi-state spatiotemporal progression makes it difficult to predict which patients will or will not progress. Additionally, standard monitoring for disease management in both pre-malignant and overt myeloma is through repetitive, highly invasive, and costly bone marrow biopsies. The above unmet needs have motivated the research work in this dissertation thesis applying a convergent oncology approach integrating liquid biopsy via single cell multiparametric profiling and machine and deep learning for disease subtyping and outcomes prediction in cancer patients. Single cell morphoproteogenomics assays were developed for rare cell detection and characterization in the bone marrow and peripheral blood of patients with lymphoproliferative cancers and validated as potential blood-based liquid biopsy technologies for clinical applications. Machine learning methods to delineate subtypes and predict therapy response in newly diagnosed multiple myeloma are presented in this work as quantitative framework to delineate the heterogeneity in myeloma. Finally, work to integrate multimodal liquid biopsy data with predictive models are discussed as a promising quantitative approach towards personalized medicine and improved patient outcomes. 1 Chapter 1 General Introduction Nearly 10 billion B cells are made daily in the capillaries of the adult human bone marrow. Upon maturation, they patrol the body’s lymphatic system though the bloodstream as part of the antibody-recognition arsenal of the immune system. Affinity maturation, a specialized differentiation process that commits 2-3% of the B cells into plasma cells, our antibody-productive machinery in the perpetual fight against pathogens for the exquisite immune defense that keeps us healthy. Mistakes in the genomic and antigen-specificity assignment during affinity maturation lead to plasma cell disorders. Multiple myeloma, first documented by Samuel Solly in 1844 1 , is the most common plasma cell dyscrasia and represents the second most common hematologic malignancy. Since the late 1800s, methods for diagnostic detection and characterization of myeloma have been developed. Notable advances in pathological definition and clinical staging by Otto Kahler, Bence Jones, Edelman, Bayrd, Tiselius, and Waldenstrom advanced our understanding of plasma cell neoplasms and significantly improved detection and patient outcomes throughout the past 200 years 2 . Enabled by the genomic revolution and digital computing transformations of the last decades, contemporary advances in cancer profiling have focused in developing high-throughput and highly targeted multiparametric approaches for molecular profiling of cancer. Among these cutting-edge methods, liquid biopsies have been at the forefront of precision oncology medicine for plasma cell disorders. In parallel, machine learning advances have enabled computational quantification and predictive analysis in oncology. Methodological advances also enabled the discovery of additional complexities in the cellular and molecular architecture driving the pathogenesis and progression of plasma cell malignancies. Monoclonal protein (M protein) detection in conjunction with serum immunoglobulin subtyping and accurate quantification of bone marrow plasma cells quantification provided evidence for the existence of clinically well-defined subtypes of plasma cell neoplasia. Modern Karyotyping and cytogenetics in conjunction with next-generation sequencing uncovered extended intra- and inter-tumor heterogeneity in myeloma and across plasma cell cancers. Recent data support that plasma cell dyscrasia almost always start as monoclonal gammopathy of undetermined significance (MGUS), which is usually incidentally detected during routine check-ups. The focus of this proposal is myeloma-associated disease states, particularly the multiple myeloma (MM) progressive evolution path. MGUS and smoldering MM (SMM) are the asymptomatic monoclonal gammopathies that precede MM 3 . Newly diagnosed MM (NDMM) is the disease state at diagnosis and before treatment. It is clinically and biologically different from relapsed/refractory MM (RRMM), the condition occurring in response to treatment resistance. Plasma cell leukemia (PCL) is a rare and aggressive variant from non-IgM myeloma characterized by the excessive presence of circulating plasma cells in the blood and an extremely grim 2 prognosis with overall survival of only 4-5 months 4,5 . Disease risk factors, clinical stratification, and staging are defined by the International Myeloma Working Group (IMWG) 6 and International Staging System (ISS) based on standard FISH and cytogenetics assays by bone marrow biopsy, supported by clinical blood and urine tests 7 . Treatment is initiated only when myeloma defining events (MDEs), hyperCalcemia, Renal failure, Anemia, Bone lesions (referred to as CRAB) are found 8 . The risk of having MGUS is 3% in individuals over 50 years of age (6% risk at 70 years old). Of all the MGUS patients, there is an 11% lifetime risk of progression and 20% will progress to MM while the remaining cases progress to other disorders, among which is cardiovascular disease. Depending on the immunoglobulin type in monoclonal gammopathy, these other disorders include Waldenström macroglobulinemia, primary amyloidosis, B-cell lymphoma, and chronic lymphocytic leukemia 9 . SMM is comprised of high-risk individuals with a 70-80% risk of advancing to MM within 5 years and the prevalence is higher in men than women while the incidence is doubled in blacks than in whites 10–12 , adding heterogeneity to the disease pathogenesis. While the clinical progression from MGUS to MM is well-defined, biological characterization of this spatiotemporal multistate process remains poorly defined. Since events and mechanisms driving disease state transitions are unknown, the primary unmet need in myeloma is the ability to accurately predict which individual MGUS patients will progress to MM and which will evolve into other disorders 13 . With a progression rate of 1-2% per year for MGUS to MM, individuals with MGUS diagnosis are put on lifetime surveillance through bone marrow biopsies every 3-6 months to monitor disease progression. Further, this method remains the standard of care for diagnosis, therapy response monitoring, and disease surveillance for SMM, NDMM, RRMM, and PCL. Further, the recently expanded understanding of local and systemic tumor immune microenvironment (TiME) 14 and its support for growth, survival, and proliferation of myeloma cells 15 calls for methodological development to characterize not only the cells of the bone marrow but also those in the circulatory system. The above challenges motivated the rise in multiparametric single cell biology for deep profiling of morphological phenotypes and functional characteristics of the primary many cell types across the primary disease cell lineages and cells of the TiME. Witzig et al. described the first slide-based single-cell method for quantitative profiling circulating plasma cells in 1993 16 and further methodological developments included CellSearch- DEPArray 17 (Silicon Biosystems), microfluidics-based methods 18 and others, with multiparametric flow cytometry 19,20 reaching clinical utility and becoming integral to standard-of-care diagnosis and disease monitoring in the bone marrow biopsy tumor analysis. The primary challenge of the current methods is the reliance on an enrichment step for specific target cells and thus unable to analyze the full spectrum of cell types contributing to disease. Further needle-based BM biopsies are invasive, uncomfortable, have high risk of complications, and oftentimes require sedation which significantly 3 increases healthcare costs. Minimally invasive blood-based technologies, with clinical- grade sensitivity and specificity, to detect, quantify, and characterize disease progression through rare cell biology and mathematical modeling could improve care in myeloma. The work in this Ph.D. dissertation innovates on current methodological advances by applying a convergent oncology approach via integration of enrichment-free multiparametric single cell biology to probe and characterize the molecular and cellular phenotypes driving disease and machine learning methods to map and delineate the heterogeneity inherent to the hierarchical multimodal data. Chapter 1 introduces the broad background and motivation for the dissertation work. Chapter 2 takes a deep dive into the development and initial validation of single-cell liquid biopsy technologies for the detection and comprehensive profiling of circulating rare cells in plasma cell neoplasia. Chapter 3 presents computational work applying unsupervised machine learning in NDMM multimodal clinical data to delineate molecular subtypes in myeloma. Chapter 4 focuses on the integration of multimodal liquid biopsy and clinical data for machine learning prediction of survival, using breast and prostate cancers to demonstrate proof-of-concept. Chapter 5 concludes with a forward-looking discussion on how future advances in integrating liquid biopsy data with predictive mathematical modeling will improve patients’ outcomes. Chapter 2 Single-cell Morphogenomics for the Characterization of Plasma Cell Neoplasms 2.1. Introduction This chapter focuses on the development and validation of enrichment-free multiparametric single-cell assays to detect and characterize cells driving the pathogenesis and progression of hematologic malignancies. The work was enabled by the flexible architecture and capabilities of the third generation High-Definition Single Cell Assay (HDSCA) workflow for morphoproteogenomic profiling of rare single cells as described in recent publications 21–24 . Section 2.2 presents the development and preliminary validation of a methodological approach to detect and characterize malignant plasma cells and delineate rare morphogenomic clones in MGUS and NDMM PB and BMA samples leveraging CD138 as the marker for plasma cells and using CD56 as the differentiation marker for malignant plasma cells 24 (work published in Current Oncology). Section 2.3 focuses on BCMA as marker for detecting precursor cells in MGUS, SMM, NDMM, RRMM, and PCL (work submitted under review in Blood Advances). Section 2.4 is a concluding summary. 2.2. Enrichment-Free Single-Cell Detection and Morphogenomic Profiling of Myeloma Patient Samples to Delineate Circulating Rare Plasma Cell Clones 4 This section of has been adapted as published in Ndacayisaba, Libere J., Kate E. Rappard, Stephanie N. Shishido, Carmen Ruiz Velasco, Nicholas Matsumoto, Rafael Navarez, Guilin Tang, Pei Lin, Sonia M. Setayesh, Amin Naghdloo, Ching-Ju Hsu, Carlisle Maney, David Symer, Kelly Bethel, Kevin Kelly, Akil Merchant, Robert Orlowski, James Hicks, Jeremy Mason, Elisabeth E. Manasanch, and Peter Kuhn. 2022. "Enrichment-Free Single- Cell Detection and Morphogenomic Profiling of Myeloma Patient Samples to Delineate Circulating Rare Plasma Cell Clones" Current Oncology 29, no. 5: 2954-2972. https://doi.org/10.3390/curroncol29050242 2.2.1. Introduction Multiple myeloma (MM) is a plasma cell (PC) neoplasm that is the second leading hematologic malignancy worldwide, accounting for 34,920 new cases and 12,410 deaths annually in the US alone 25 . Nearly all MM cases are preceded by monoclonal gammopathy of undetermined significance (MGUS) 3 . MM initiates when germinal center B-cells undergoing normal affinity maturation encounter an error either in somatic hypermutation of the immunoglobulin heavy chain (IGH) locus, or class-switch recombination 26 . These distinct initial genetic errors give rise, respectively, to a set of characteristic and mutually exclusive chromosome 14-based translocations involving the IGH locus (14q32), primarily t(11;14), t(4;14), t(14;16) 27 , and characteristic hyperdiploidy of odd-numbered chromosomes. Together, these large chromosomal structural events define two genomic categories that comprise the hallmarks of myeloma genetics 26,28 . Over time, monoclonal PCs acquire secondary mutational changes and evade the immune system to form multiple focal bony lesions which can eventually spread beyond the bony medullary space into soft tissues and organs 9,28–30 . Despite advances in therapeutic modalities, nearly all MM cases relapse, and the disease remains mostly incurable in large part due to the emergence of resistant genomic clones both as part of the genomic heterogeneity within lesions 31,32 , and anatomically spatial variability between lesions 33 . Sampling and analysis of common/hallmark myeloma cytogenetic events from bone marrow PCs is key to MM clinical diagnosis, staging, and disease monitoring 6,34,35 . MM diagnosis and stratification are accomplished by genomic profiling of CD138+ cells isolated from bone marrow aspirates (BMAs) by clinical karyotyping and Fluorescent In Situ Hybridization (FISH) assays to identify the presence of canonical trisomies in odd- numbered chromosomes (hyperdiploid disease) and for translocations (non- hyperdiploid disease) which together allows for patient stratification and therapy selection, particularly for high-risk disease 7,34 . As aberrant myeloma cells evolve, they are able to relocate from their bone marrow niche into the peripheral blood circulation which offers a unique opportunity to evaluate the myeloma tumor in a non-invasive manner. In this respect, a robust unbiased method for detection and morphogenomic description and quantification of MM circulating tumor cells (MM CTCs) of interest would be enabling particularly in 5 discriminating abnormal from normal PCs. While peripheral blood (PB) is routinely sampled to measure the myeloma monoclonal (M) protein and other soluble molecules as part of the standard diagnostic workup, genomic profiling of MM CTCs is not yet routinely used in clinical care. Recent works characterizing MM CTCs have focused on the surface marker CD138 (syndecan-1) for identification and isolation and have provided significant insights into the nature and role of these cells in myeloma. MM CTCs have been found to correlate with disease progression and survival and provide a new avenue for disease risk stratification in precursor myeloma and disease monitoring (minimal residual disease [MRD]) for relapsed/refractory (RRMM) patients 36–40 . Current bulk and single-cell technologies to detect and isolate CD138+ MM CTCs include flow cytometry (FC) 37,41 , the combined CellSearch-DEPArray method (Silicon Biosystems) 42,43 , and custom microfluidics 18 . While multiplex flow cytometry enables quantitative separation of cell types by marker-based gating and further characterization of disease, an enrichment step is required to find and isolate single cells of interest, thus limiting its application in contexts where target cells are in low abundance where unbiased slide-based methods are optimal for single-cell profiling. Similarly, the CellSearch-DEPArray method also relies on initial PC enrichment by CD138 and CD38 positivity and may likewise lack the sensitivity and accuracy needed for some MGUS and RRMM patient populations. Furthermore, the high degree of heterogeneity in marker expression by PCs requires an unbiased approach capable of detecting diverse PC populations based on their morphological characteristics. To resolve the challenges in enrichment-based methods for PC detection, Zhang et al. developed an enrichment-free single-cell detection assay to characterize MM CTCs in PB draws of newly diagnosed (NDMM) patients using CD138, CD45, and pS6 (ribosomal protein S6) on the Epic Sciences Platform 44 , the commercial version of the High-Definition Single Cell Assay (HDSCA) workflow, an identification method originally developed for CTC detection and characterization in epithelial cancers 45–49 . The Epic Sciences team applied the “no-cell-left behind” approach and performed extensive validation, linearity, qualification, and reproducibility experiments for an HDSCA-based MM CTC assay to detect and characterize circulating rare cells in myeloma. While the morphological analysis and enumeration of HDSCA MM CTCs provided insights into the morphological phenotypes and distribution of CD138 and pS6 expressing rare cells in NDMM PB, genomic profiling to validate the technology’s capability to distinguish normal and abnormal PCs has not been performed. Furthermore, morphogenomic validation to evaluate clonal and subclonal variability and to correlate single copy number alteration profiles to clinical diagnostic and disease monitoring remains to be performed in this enrichment-free methodology. In this study, we modified the Epic Science’s MM CTC methodology to include CD56 rather than pS6 in a 4-plex (CD138, CD56, CD45, and DAPI) immunofluorescence assay adapted to the 3rd generation HDSCA 21 to detect, sequence, and describe the 6 morphogenomic profiles of PCs in NDMM and MGUS PB and BMA samples. We define MM CTC and bone marrow PC (BMPC) candidates as CD138+ nucleated cells, with the expression of CD56 further subtyping normal and abnormal PCs. PC malignancy was validated via next-generation single-cell sequencing, in which single-cell copy number variation (scCNV) chromosomal events were correlated to results from clinical FISH and cytogenetic analysis toward clinical validation. Consistent with prior work, we detected candidate MM CTCs and BMPCs that possess the canonical morphological immunophenotype, namely CD138+ cells that are predominantly larger than other white blood cells (WBCs) and present eccentric nuclei. scCNV analysis validated that candidate MM CTCs harbor chromosomal alterations detected with standard clinical diagnostic cytogenetic methodologies and with additional single-cell subclones with genomic profiles not detected by clinical methods. The reported data provide additional technical validation of an unbiased single-cell liquid biopsy approach for the detection and genomic characterization of MM CTCs with promising applications in early disease (MGUS) and in other myeloma patient subgroups where clonal malignant cells can be in low abundance. 2.2.2. Materials and Methods 2.2.2.1. Patient Enrollment and Sample Acquisition All patients enrolled in this study provided informed consent following an Institutional Review Board (IRB)-approved protocol (PA18-1073) and were accrued at the University of Texas MD Anderson Cancer Center. Five patients (four NDMM and one MGUS) enrolled prospectively between 5 April 2019 and 29 May 2019 provided paired blood and bone marrow samples. To obtain an accurate diagnosis, all patients underwent a bone marrow biopsy, blood work, 24-hour urine collection, and advanced whole-body imaging. The same samples from each patient were analyzed via FC by MD Anderson Cancer Center as part of the standard MM diagnostic workup. Four normal blood donor (NBD) samples from individuals with no previously known pathologies were procured from the Scripps Clinic Normal Blood Donor Service in La Jolla, CA. BMA and PB specimens were drawn and collected using anti-coagulated preservative tubes (Cell-Free DNA blood collection tube, Streck) and shipped to the USC Michelson Convergent Science Institute in Cancer (CSI-Cancer) via FedEx overnight and processed as previously described using the established and validated HDSCA workflow 45,49,50 . In short, samples underwent red blood cell lysis using an isotonic ammonium chloride buffer and all nucleated cells were plated as a monolayer onto custom-made glass slides (Marienfeld, Germany) at approximately 3 million cells per slide prior to cryopreservation (Figure 2.1A). Slides were retrospectively thawed for immunofluorescence staining (Figure 2.1B). 7 Figure 2.1 HDSCA-based 4-plex Immunofluorescence for Rare Plasma Cell Detection (A) Paired sample acquisition, processing, and cryo-banking. (B) Slides are stained with a cocktail of antibody markers targeting cells of interest. (C) Immuno-targeting for normal PC (CD138+ only), candidate malignant PCs (CD138+CD56+), and common WBCs (CD45+ only). Gt = Goat, Ms = Mouse, Rb = Rabbit, Ig = immunoglobulin, DAPI = 4′,6-diamidino-2-phenylindole, A555 = Alexa Fluor ® 555, A488 = Alexa FluorPLUS ® 488, A647 = Alexa647. 2.2.2.2. Marker Selection for 4-plex Immunofluorescence Assay The goal of this assay was to detect PCs using well-established myeloma and immune markers with CD56 expression used for further discrimination of normal from malignant cells. The 4-plex staining panel consists of CD138, CD56, CD45, and DAPI (4′,6- diamidino-2-phenylindole) for targeting cells of interest (Figure 2.1C). For PC detection in PB and BMA samples, CD138 was selected for its unique expression on PCs within the immune system 51 and its role in MM disease progression 52 . CD56 (Neural Cell Adhesion Marker 1, N-CAM-1) is a well-established MM biomarker expressed by malignant PCs in over 70% of MM patients as studied by FC 53,54 . CD45 was chosen as a WBC marker and MM PC exclusionary marker, as over 80% of patients have CD45- MM BMPCs 55 . DAPI is used to identify all nucleated cells, including PCs and surrounding common WBCs. While pS6 (used in Zhang et al. 44 ) drives PI3K/AKT activation in malignant PCs, this pathway-driven process may also be observed in other cells depending on their activation states. As such, it is not used in clinical FC assays as surface markers are prioritized for immunophenotyping of normal and malignant PCs for clinical utility 40,56–58 and we chose CD56 as the substitute for pS6 in this assay. For this assay, we procured mouse anti- 8 human IgG1 CD138 (Exbio, clone A-38, Cat #, cat# 10-520-C100, Czech Republic), rabbit anti-human IgG CD56 (Invitrogen, cat # 701379), and directly conjugated mouse anti- human CD45-Alexa647 (AbD Serotec, act# MCA87A647, Raleigh, NC, USA) along with mouse anti-human Alexa Fluor ® 555 (Invitrogen, Waltham, MA, USA) and goat anti- rabbit Alexa FluorPLUS ® 488 (Invitrogen, Waltham, MA, USA) secondary antibodies targeting CD138 and CD56 respectively. 2.2.2.3. Assay Staining and Validation in Cell Lines and Spiked NBD Samples Immunoglobulin E lambda myeloma-derived U266 cell line (gift from Dr. Akil Merchant; U266B1 TIB-196TM), immunoglobulin A lambda myeloma-derived MM.1S cell line (ATCC ® CRL-2974™), and T-lymphocyte-derived Jurkat cell line (ATCC ® TIB- 152™) were cultured according to manufacturer’s specification. Slides for assay development were generated with pure cell lines (U266, MM.1S, and Jurkat) to test the expression of CD138 in myeloma and control cell lines. To compare the expression of CD138 in cell lines compared to normal WBCs, U266 cell line was spiked into NBD blood at a 1:100 dilution. To establish the optimal concentration for anti-CD138, anti-CD56, and anti-CD45 antibodies, we performed antibody titrations from concentrations of 0 μg/mL to 10 μg/mL using contrived samples with the U266, MM.1S, and Jurkat cell lines. For specificity of the anti-CD138 antibody, we used Jurkat cells as a CD138- control. For specificity of the mouse anti-human Alexa Fluor ® 555 secondary antibody, we used a no primary antibody control in which the anti- CD138 antibody is omitted from staining. Additional experiments for assay sensitivity and reproducibility were previously performed and demonstrated by Zhang et al., with an accuracy of 97.5% and the capability of detecting one spiked CD138+ cell in 3 million cells 44 . 2.2.2.4. Assay Staining and Validation in Patient PB and BMA Provided the final reagents and conditions above, matched PB and BMA slides were assayed for MM CTCs and BMPCs in patient samples. Following previously established protocols 45 , slides were thawed after −80 °C storage, fixed using 2% paraformaldehyde for 20 min, and washed using TBS buffer. The slides were incubated with 10% goat serum for 30 min to block non-specific binding sites for secondary antibodies. Next, the slides were incubated with the primary antibody mix consisting of 2 µg/mL of CD138, 5 µg/mL of rabbit anti-human CD56, and 1.6 µg/mL of mouse anti- human CD45-Alexa647 for 1 h at room temperature. Following TBS washes, secondary antibodies goat anti-mouse Alexa Fluor ® 555 (1:500) and goat anti-rabbit Alexa FluorPLUS ® 488 (1:500) were added for CD138 and CD56 staining, respectively, with DAPI for 40 min at room temperature. As previously described 45,49,50,59 , the slides were mounted with live-cell media, cover slipped, and sealed for downstream microscopy imaging and technical analysis. 9 2.2.2.5. Imaging and Technical Analysis for Rare Cell Detection and Cell Classification Each slide was imaged at 100× magnification using a custom-made fluorescent scanning microscope, generating 2304 frames per slide with a total of approximately 9200 images across four fluorescent channels 45 . The image data set was analyzed via a custom rare event detection algorithm known as OCULAR (Outlier Clustering Unsupervised Learning Automated Report) 21 . OCULAR utilizes the principles of image processing for single-cell segmentation and feature extraction, dimensionality reduction, and unsupervised clustering approaches to distill morphologically distinct rare cells from common cells. Briefly, 761 features per single cell are extracted using EBImage 60 . K- Nearest Neighbor is applied to classify cells based on marker signal intensity, cell shape, and geometry and subsequently used to filter all distinct cell populations (i.e., MM CTCs and BMPC candidates) from surrounding common cells (CD138-CD45+ WBCs). For final classification and enumeration, a trained technician confirmed the automated classification by visually inspecting the expression of each marker and the morphological integrity of the cell and assigning it the correct candidate morphological subtype. All cells manually classified were re-confirmed by two additional individuals for cross-validation and reproducibility. MM CTC and BMPC candidates are defined by eight subtypes, as expanded from 44 : • CD138+ • CD138+CD56+ • CD138+CD45+ • CD138+CD56+CD45+ • CD138−: any cells larger than surrounding WBCs and have eccentric nuclei • Apoptotic: any CD138+ cells with condensed DAPI pattern and/or blebbing as previously described 44,45 • PC clusters: cluster consisting of two or more CD138+ cells • Binucleated PC: CD138+ cells presenting two morphological distinguishable nuclei Additional rare cells of interest not categorized as MM CTCs or BMPCs were tracked, characterized, and enumerated using the OCULAR software previously described 21 and were not included in this study. Each cell category is reported as total count of cells per mL of blood. 2.2.2.6. Single-Cell Sequencing and CNV Analysis To generate single-cell genomic data for morphogenomic characterization, candidate CD138+ cells (and morphological subsets thereof with additional representative cells from groups of interest) were isolated and sequenced using protocols previously developed and described by our laboratory 47,61,62 . In short, candidate single cells are isolated using a robotic micromanipulator system and subjected to whole- 10 genome amplification (WGA; Sigma Aldrich, St. Louis, MI, USA) and library construction as described 47,61,62 . Amplified DNA was purified using the QIAquick PCR purification kit (Qiagen, Hilden, Germany) and resulting DNA was quantified using the Quibit Fluorometer (Thermo Fisher). Indexed Illumina sequencing libraries were constructed and barcoded using the NEBNext Ultra DNA Library Preparation Kit (New England Biolabs, Ipswich, MA, USA). The amplified DNA fragments with target size were sequenced at either the USC Dornsife Sequencing Core or Fulgent Genomics (Temple City, CA, USA) to generate approximately 500,000 mapped reads. Sequenced reads were analyzed using our previously described CNV pipeline for reference genome mapping, read alignment, and single-cell ploidy determination 21,59,63 . For each sequenced and mapped cell, 2 copy numbers were normal and any cell whose profile deviate was considered altered. Chromosomal alterations were evaluated across the cells to establish the clonal relationship between cells from the same slide. 2.2.2.7. Karyotyping and Fluorescent in Situ Hybridization (FISH) for Clinical Diagnosis Standard FISH was performed for all patients as part of the diagnostic workup for monoclonal gammopathies. Bone marrow aspirates were collected at study enrollment and CD138+ cells were enriched using RoboSep-STM (Stem Cell Technologies, Vancouver, BC, Canada) plasma cell enrichment. This increases sensitivity for detecting cytogenetic abnormalities associated with plasma cell neoplasms. For successful FISH analysis for detecting myeloma-associated abnormalities, the clinical laboratory requires the bone marrow to contain neoplastic plasma cells detected by flow cytometry greater than 0.05% of the total cells analyzed, and greater than 25% of plasma cells are aberrant neoplastic plasma cells. The tests were developed, and their performance characteristics were determined by M.D. Anderson Cancer Center Cytogenetics Laboratory as required by the CLIA’88 regulations. Chromosomal analysis (karyotyping) was performed on metaphase cells prepared from bone marrow (BM) aspirate specimens cultured for 24 h without mitogens or for 72 h with lipopolysaccharide (LPS), using standard techniques. Twenty Giemsa-banded metaphases were analyzed, and the results were reported using the International System for Human Cytogenetic Nomenclature (ISCN 2020). FISH for common abnormalities associated with plasma cell myeloma was performed on interphase nuclei obtained from cultured BM cells using dual-color FISH probe sets designed to detect the rearrangements of t(4;14)/IGH::FGFR3, t(11;14)/IGH::CCND1, and copy number changes of CDKN2C/CKS1B, RB1/13q34; and TP53/CEP17, according to the manufacturer’s instructions (Abbott Molecular, Abbott Park, IL, USA). Two hundred nuclei are analyzed for each probe set, and the results were reported based on the cut-off values established by our clinical cytogenetics laboratory. 2.2.2.8. Correlating scCNV to Clinical Cytogenetics To map scCNV chromosomal alterations detected by our 4-plex assay to the results from the clinical diagnostic workup, we focused on the 12 common cytogenetic alteration 11 events probed in the MD Anderson Cancer Center targeted clinical FISH workup protocol. For every single cell, we assessed the presence or absence of each cytogenetic event and quantified how many cells harbor genomic alterations observed using standard clinical methods. 2.2.2.9. Statistical Analysis All data analysis was performed in the R programming language (R version 3.6.3). The intersection analysis in rare cell count and cytogenetic-based validation was performed using the ComplexUpSetR package. Comparison of total MM CTC count between NDMM and NBD samples was computed using the Wilcoxon signed-rank test with a 95% confidence interval. 2.2.3. Results 2.2.3.1. Expression of CD138 and CD56 in U266, MM.1S, and Jurkat Cells Spiked in Normal Blood From antibody validation experiments on U266, MM.1S, and Jurkat (negative control) cells, the final optimal concentration for anti-CD138 was found to be 2 µg/mL and 5 µg/mL for anti-CD56. In comparative stains, Jurkat cells show no expression of CD138, while both MM.1S and U266 are positive for CD138 (Figure 2.2A) with a statistically significant difference in CD138 intensity (Figure S1B, C). Omission of the primary antibody (0 µg/mL of CD138) yielded no CD138+ cells in both positive and negative cell lines in titration experiments. In NBD spike-in experiments in U266, CD138+CD56-CD45− and double-positive CD138+CD56+ U266 cells were detected with morphology that is distinct from surrounding CD138−CD56−CD45+ common WBCs (Figure 2.2D). Consistent with the histological description of MM PCs, the U266 cells were CD138+CD56+CD45- and larger than surrounding WBCs with the canonical eccentric nucleus (Figure 2.2D). A subset of CD138+CD45+ U266 cells was also detected consistent with cell line marker expression variability. 12 Figure 2.2 Assay Validation of Markers in Spiked NBD Samples. A. Representative images at 100x magnification of Jurkat, U266, and MM.1S pure cell lines stained with CD138 (yellow) and DAPI (blue). B. Cell density distribution by CD138 intensity across cell lines; Jurkat (dark red), U266 (green), MM1s (gold). C. Comparative quantification of CD138 intensity in cell lines. D. Representative images of U266 spiked in NBD cells stained with CD138 (red), CD45 (green). All nucleated cells DAPI positive (blue). Together, these data confirm the specificity of the antibodies consistent with and supplementing sensitivity, reproducibility, specificity, and analytical validation work for HDSCA technology, as previously reported 21,44 . 2.2.3.2. Patients and Study Cohort The patient cohort is comprised of one male MGUS (age 78), one female NDMM (age 63), and three male NDMM (ages 80, 54, and 66) with NBD controls consisting of two males (ages 48 and 65) and two females (ages 54 and 59). By clinical BMA histopathology for NDMM and MGUS, all BMPCs are CD138+ and, with the exception of patient MM03, also CD56+. MM01 and MGUS presented CD45+ histopathology, and both cases have the lowest percentages of BMPCs. Detailed characteristics of patients in this study are shown in Table 2.1. Table 2.1 Patient Demographics and Clinical Characteristics. 13 sFLC: serum-free light chain, NDMM: newly diagnosed multiple myeloma, MGUS: monoclonal gammopathy of undetermined significance, PC: plasma cell, BMA: bone marrow aspirate, dim: dim marker expression as defined by flow cytometry. MGUS MM01 MM02 MM03 MM04 Age 78 80 63 54 66 Sex Male Male Female Male Male Diagnosis MGUS NDMM NDMM NDMM NDMM Ig Isotype IgGk IgGk IgGk IgGk IgAk Percent BMPC in the aspirate 1 14 15 30 8 Percent Aberrant PC from the total PC BM compartment 92 64.5 95.2 98.8 98.2 Flow CD138 Positive Positive Positive Positive Positive Flow CD56 Positive Positive Positive Negative Positive Flow CD45 Positive Positive Negative Positive (dim) Negative M-Spike (g/dL) 0.7 1.6 0.4 2.9 4.3 sFLC ratio 8.14 93.43 186.84 17.28 6.17 Karyotype Normal NA Normal Hypodiploid Normal FISH (Positive) Three copies of CCND1 Three copies of CCND1; Monosomy 13 Three copies of EGFR3 and CCND1; trisomies 1 and 17; monosomy 13 Monosomies 1, 13, and 17; loss of one copy of IGH Three copies of CCND1 Clinical Presentation Low-risk MGUS for progressio n to MM by PETHEMA 13 criteria Patient with standard-risk myeloma achieved complete remission after initial therapy with carfilzomib, Patient with standard-risk myeloma achieved a partial response after initial therapy with carfilzomib, Patient with high-risk myeloma achieved complete remission after therapy with carfilzomib, lenalidomide, dexamethasone but passed away with myeloma progressive Patient with standard-risk myeloma achieved complete remission after initial therapy with carfilzomib, 14 lenalidomide, dexamethasone lenalidomide, dexamethasone disease 21 months after diagnosis lenalidomide, dexamethasone 2.2.3.3. Morphological Characterization, Classification, and Enumeration of MM CTCs and BMPCs As an unbiased, enrichment-free immunofluorescence assay supported by a previously published rare cell detection computational algorithm 21 , the plasma cell assay detects a heterogeneous mixture of morphologically distinct cells from the BMA and PB samples. Beyond CD138+ cells, additional rare cells are identified by a combination of cellular and physical morphology and their CD56 and CD45 expression. A decision tree representation (Figure 2.3A) and UMAP projection (Figure 2.3B) separating distinct cells based on their geodesic distance in the Riemannian manifold presents cell groups of all the circulating rare cells detected in PB of all the samples (MGUS, NDMM, and NBD) in this study. The CD138+ rare cell groups closely clustered separately from the CD56+ and CD45+ rare cells. 15 Figure 2.3 Rare Single-cell Classification and Enumeration. (A). Decision tree structure classification of DAPI+ cells showing all candidate cell groups detected, with respect to immunofluorescence expression, along with representative images (400× magnification). (B) UMAP projection of all detected circulating rare cells colored by their respective classification groups. Using the 761 features extracted with EBImage, Uniform Manifold Approximation Manifold (UMAP) 64,65 , a non-linear dimensionality reduction method, was used to represent circulating rare cells and their corresponding classification groups in two-dimensional space for analysis. (C) Enumeration and (D) proportional distribution of circulating rare cells (cells/mL) for each cell group across all samples. (E) Distribution of circulating rare cell counts across NDMM (N = 4), MGUS (N = 1), and NBD (N = 4) grouped by rare event group based on marker expression. Color scheme for cell classification groups is consistently preserved. 16 Quantitative enumeration of detected circulating rare cells found that all 258 CD138+CD56+CD45− candidate MM CTCs are exclusively found in NDMM and MGUS patients, with MM01 accounting for 186 cells, followed by MM02 with 51 cells; MM03 does not have any such double-positive cells (Figure 2.3C–E). The CD138−CD56+CD45− cells seen across samples including NBD are likely characteristics of normal CD56+ NK T cells. Intersection analysis (Figure 2.2E) shows the distribution of detected cells for each channel and combination thereof when compared between individual patients and across disease states versus NBD. CD138+CD56−CD45− are exclusively found in myeloma patients. While CD138+CD56+CD45+ cells are predominantly in myeloma patients, they are also found in NBD (Figure 2.3–E), possibly being normal PCs that are not fully differentiated. Since MM bone marrow is not a rare event space, the analysis focused on cell characterization instead of enumeration. Focusing on CD138+ cells to further characterize MM CTCs and BMPCs, the immunophenotypes and morphology of MM CTCs (CD138+ circulating rare cells) and BMPCs include subsets with CD56 and CD45 signals with BMPCs showing PC clusters, binucleated PCs, apoptotic PCs, and CD138− candidate PCs, as previously observed 44 (Figure 2.4A). Looking at the CD138 signal in circulating rare cells of different patients, the median intensity across rare cells was compared between patients. The intensity of CD138 and CD56 is consistently higher in cells from patient samples as compared to NBDs, confirming the specificity of the assay for rare PC detection in PB of patients (Figure 2.4B, C). Manual classification for CD138 and CD56 (red = positive, black = negative) is also consistent with a signal expression showing that, except for NBD04, which had a PC candidate with a CD138 signal significantly above the maximum intensity, patient blood samples constitute the only specimens with more than three cells that the technical analysis found to be positive (Figure 2.4B). 17 Figure 2.4 Morphological Characterization of MM CTCs and BMPCs. (A) Microscopy images of representative cells from morphological subtypes of PB and BMA cells. (B) CD138 and (C) CD56: signal intensity in circulating rare cells across samples, colored by positive or negative marker expression based on manual classification. (D) Statistical comparison of MM CTC count between NDMM and NBD. (E) Statistical comparison of non-plasma cell CD56 positive cells (sum of CD56+ and CD56+CD45+ cells). While the quantification of circulating CD138+ cells in the blood of NBDs has not been extensively investigated and well established, two studies have reported 0–5 CD138+ cells/mL in circulation 55,66,67 . Among the four controls in this study, NBD04 had three CD138+ cells and NBD02 had one, with zero cells detected in the other two donors (Figure 2.2–E). All patient samples exhibited CD138+ cells (MM CTCs) with a minimum of four cells/mL in MM03 and a high count of 196 cells in MM01. Accordingly, the total number of MM CTC candidate cells is significantly higher in NDMM compared to NBD samples (p = 0.029) (Figure 3D). Despite higher counts in NDMM compared to NBD for circulating non-PC CD56+ (candidate NK T cells), the difference in total cell count between the two pathologies is not statistically significant (p = 0.15) (Figure 2.4E). Notably, the CD138+CD56+CD45− cells, which constitute the candidate circulating malignant PCs, were found exclusively in NDMM and MGUS samples, supporting that the detected MM CTCs are the primary candidate malignant PCs and further confirming assay specificity. 18 2.2.3.4. scCNV for Morphogenomic Validation of Malignant Phenotypes in Detected MM CTCs and BMPCs To validate that the candidate malignant PCs harbor genomic aberrations, single- cell next-generation sequencing analysis was performed on target cells of interest across morphological groups. In total, from paired PB and BMA samples of the four NDMM patients, 165 cells were isolated and successfully underwent WGA, sequencing, and analysis, with an additional 30 cells from paired PB and BMA samples of the one MGUS case. For the NBD controls, a total of 12 cells from four donors were sequenced and analyzed. For controls within a sample, 1 to 2 WBCs (CD45+ only) from each patient sample are also included in the pool of single cells selected for sequencing. Furthermore, data from clinical diagnostic FISH cytogenetic results (Figure 2.5A and Table 2.1) were mapped to single-cell CNV profiles of altered cells. The sequenced cell counts across study subjects and samples with the corresponding scCNV profile status are shown in (Figure 2.5B). Representative CD138+ cells and their respective scCNV profiles (Figure 4C) from PB (left) and BMA (right) validate that the detected candidate MM CTCs and MMPCs (CD138+CD45− cells) are malignant PCs. Representative positive cytogenetic events from the clinical FISH analysis of the BM plasma cells (Figure 2.4A) are marked with a hashed rectangle (blue for deletion and red for gain). To join morphological observations with CNV results, quantification of normal and altered cells across morphological types is presented in (Figure 2.5D for PB, Figure 2.5E for BM). Except for MM03 PB, all of the BMA and PB samples for the four NDMM patients and the MGUS case have CD138+ cells with genomic alterations consistent with the characteristic myeloma hyperploid karyotypes exhibiting copy number alterations among the odd- numbered chromosomes 35,68 . 19 Figure 2.5 Morphogenomic Validation of Detected Candidate MM CTCs and BMPCs. (A). Clinical observations of the status of 12 common cytogenetic events detected by FISH karyotyping across patients’ bone marrow samples. (B) Distribution of scCNV profiles across patients and sample types. (C) Representative morphological phenotype with corresponding scCNV profiles of subclones containing clinically determined key cytogenetic events. The scCNV profile is for the single CD138+ cell in the corresponding IF image. The red and blue hashed rectangles indicate an alteration event where the patient is also positive by clinical cytogenetics detection. There were no altered cells in MM03 blood. (D, E). Single-cell count between normal and altered genomic profiles across morphological groups. A cell is considered altered if the CNV profile contains at least one discernable chromosomal aberration or ploidy. 20 Consistent with the malignant PC phenotype, in both sample types, all altered cells are CD138+, except for three CD138-CD56+ cells in PB. Additionally, the CD138+CD56+CD45− is the predominant morphotype with the most genomically altered cells with 36/44 (81.8%) of cells in PB and 44/54 (81.4%) cells in the BMA being altered (Figure 4D, E). As expected in NBD controls, all sequenced CD138+ cells from the NBDs have normal CNV profiles, confirming that they are normal PCs in circulation. Furthermore, at least one WBC (CD45+ only) per sample was sequenced and presented no alterations. Figure 2.6 contains the detailed chromosomal scCNV profiles of all sequenced cells across the study cohort. Figure 2.6 sCNV in PB and BMA samples in the cohort Heatmap representation of all sequenced cells and associated chromosomal alterations across patients and sample types On a patient-by-patient basis, clonally altered cells were found in MM01 PB (14/17 cells sequenced; 82.3%) and BMA (17/28 cells sequenced; 60.7%). The same five subclones could be discerned in both PB and BMA, with the dominant subclone exhibiting a canonical hyperdiploid trisomy profile with gains in chromosomes 3, 5, 7, 9, 11, 15, 19, and 10, and with other subclones containing additional losses in 1p and 13q. In MM02, 22/28 (78.6%) PB cells and 32/40 (80%) BMA cells were altered with five subclones in both samples. The dominant subclone harbors copy number gains in chromosomes 5, 7, 15, and 21 and loss of chromosomes 8, 12, 13, 14, 16, 20, and 22. All 22 altered cells in MM02 PB are polyploid, with 19/22 (86.4%) cells being triploid, 2/22 (9.1%) cells tetraploid, and 21 1/22 (4.5%) cells pentaploid. Of the 32 altered cells in the BM, 3/32 (9.4%) are diploid, 26/32 (81.2%) are triploid, and 3/32 (9.4%) are tetraploid. In MM03, all four sequenced PB cells have a normal CNV profile, and 5/12 (41.7%) BMA cells are altered with a unique clonal profile for the altered cells carrying gains in 3, 5, 7, 9, 11, 15, 18, and 19. In MM04, 4/9 (44.4%) PB and 16/27 (59.2%) BMA cells showed clonal alterations. The dominant subclone was found in both specimens and, consistent with trisomic hyperdiploidy, harbor gains in 3, 5, 7, 9, 11, 15, 19, and 21 and 16q loss with a secondary subclone that carries losses in 1p and 4q. In the MGUS PB, CNV analysis revealed one (1/11 cells sequenced, 9.0%) circulating aberrant MM CTC with a pentaploid profile and a loss of chromosome 8 while 12/19 (63.1%) malignant BMPCs exhibited a CNV profile with the classic gains of odd-numbered chromosomes; 3, 5, 9, 11, 15, 19, and 21 as well as chromosome 18. No subclones were found in the PB of this MGUS case and none of the BMA altered cells carry the chromosome 8 deletion found in the PB sample (Figure 2.6). Taken together, these data provide a sample of the genomic landscape of detected MM CTCs and BMPCs with alterations consistent with genomics observed in the previous myeloma single-cell studies using various CTC detection methods and sequencing approaches 26,35,38,39,69,70 . 2.2.3.5. Mapping scCNV Events to FISH Cytogenetics for Clinical Validation Towards validating the clinical application of the single-cell morphogenomic profiling method, we looked to see whether the same cytogenetic profiles detected by clinical FISH diagnostic assays are found among scCNV profiles of MM CTCs and malignant BMPCs. From clinical diagnostic FISH and karyotyping results (Table 2.1 and Figure 2.5A), all the five patients in this cohort were negative for 6/12 (50%) of the common cytogenetic aberrations namely cyclin D1, all three common translocation-based events, FGFR3 gain (via Trisomy 14), and 1q32 loss. MM01 was positive for CCND1 (11q) gain and 13q loss, MM02 was positive for gains of 1q21, 11q, 14q, 17p, and 13q loss. MM03 was positive for 17p and 13q losses while MM04 and MGUS were positive for only 11q gain (Figure 2.5A). To evaluate how well the clinically positive cytogenetic events map to the single- cell profiles detected with the HDSCA-based morphogenomic approach, we quantified the total number of MM CTCs and BMPCs from paired patient samples (Figure 2.5B) with mapped cytogenetic aberrations (Figure 2.5A). As presented in (Figure 2.7A), 75 cells harbor hemizygous loss of 13q (22 in MM01 and 53 in MM01), 67 cells harbor 11q gain (12 in MGUS, 30 in MM02, 1 in MM02, 5 in MM03, 19 in MM04), 17 cells harbor 17p loss (12 in MM01, 5 in MM02), while 1q21 and 17p gains are found in 8 cells (7 in MM01, 1 in MM02) and 2 cells (1 in MM02, 1 in MM04), respectively, with 14q gain being the only common cytogenetic abnormality not found in any of the scCNV profiles (Figure 2.7A). Concordant with clinical cytogenetics, our CNV analysis found at least one cell carrying the common alterations identified by clinical FISH analysis, except for MM02, where no 22 cells with 14q gain were found. Furthermore, when more than one cell harbors cytogenetic aberrations, they are present in both PB and BMA samples for that patient (Figure 2.7B) confirming that the detected MM CTC has the same genomic clonality as tumorigenic BMPCs analyzed in diagnostic cytogenetics. Figure 2.7 Correlation of scCNV events with diagnostic cytogenetic aberrations (A) Intersect analysis mapping co-occurrence of FISH positive events in scCNV across PB and BMA samples. (B) Distribution of positive FISH events in scCNV across patients. (C) Enumeration of scCNV harboring indicated FISH cytogenetic aberrations observed across the patient samples. Blue = loss, (D) Chromosomal alterations across scCNV profiles of sequenced cells from all patient samples. Co-occurrence analysis confirmed that the common cytogenetic events are mapped to the scCNV profiles (Figure 2.7B, C) and identified additional subclones harboring aberrations dubbed negative in diagnostic cytogenetics (Figure 2.7D). While clinical cytogenetics found that MM01 is positive for only 11q gain and 13q loss, the patient carries malignant PCs with 17p loss (12 cells), and 1q gain (7 cells). A total of 49 altered cells in MM02 harbor only 13q loss, and one cell in MM04 harbors 17p gain, which 23 is negative in clinical cytogenetics. Loss in 4p (location of FGFR3) was found in two cells from MM04 PB and in one cell in MM01 BMA, suggesting the two patients carry this alteration that clinical FISH and cytogenetic analysis identified as negative. Together, the scCNV data presented reproducibly map the clinically observed chromosomal alterations to the single-cell CNV genomic events observed in MM CTCs and malignant BMPCs detected with our single-cell assay analysis and further delineate additional subclonal events not detectable by standard diagnostic cytogenetics. 2.2.4. Discussion PC malignancies are highly heterogeneous in their genomics and patient risk both in pre-malignant states (MGUS and smoldering MM [SMM]) and in malignant conditions (NDMM and RRMM) thus presenting significant challenges both in early disease detection and therapeutic intervention 69 . Consequently, myeloma remains an incurable disease despite advances in therapeutic modalities as a vast majority of patients eventually relapse due in large part to persistent malignant clones that escape immune surveillance and therapeutic targeting 31–33 . Robust delineation of the complex clonal and subclonal heterogeneity in myeloma requires single-cell approaches capable of combined cell morphotype and genomic copy number analysis, providing the opportunity to detect rare cells in an unbiased manner with high sensitivity to identify rare subclones in the patient’s PB for longitudinal disease monitoring of slow-progressing PC neoplasms. Since current clinical standard practice relies on flow cytometry and enrichment-based methods, the identification of malignant cells in myeloma remains a challenge, particularly in the blood of patients with a low population of CD138+ cells that can be isolated and characterized. While flow cytometry is rapid, cheap, and has demonstrated clinical utility in bone marrow samples, its low sensitivity presents analytical limitations for utility in peripheral blood. There remains a significant unmet need for an enrichment- free, highly sensitive, blood-based single-cell method of detecting and characterizing rare myeloma clones. This study reports the technical and clinical validation of a highly sensitive and specific enrichment-free 4-plex morpho-genomic methodology as a robust approach to detect and characterize circulating rare cells in the PB of patients with PC malignancies. Using paired PB and BMA samples from a cohort of four NDMM patients, one MGUS patient, and four control NBDs and applying HDSCA’s unbiased approach for rare single-cell detection and genomic validation via single-cell sequencing of candidate rare cells, we describe the morphotypes of cells detected using a four-plex immunofluorescence panel comprised of DAPI, CD138, CD56, CD45. We further characterize CD138+ candidate MM CTCs and BMPCs in the PB of NDMM and MGUS patients relying on CD56 expression to morphologically discriminate malignant from normal PCs. Consistent with previous observations, the CD138+ compartment of PCs is heterogeneous in CD138, CD56, and CD45 expression with variable cell morphologies 24 observed between PB and bone marrow 44 . Furthermore, enumeration of the MM CTCs can stratify myeloma patients from NBD using a threshold of >3 CD138+ cells/mL PB. Both MM CTCs and BMPCs in the CD138+ population include subsets of CD138+ only, CD138+CD56+, CD138+CD45+ cells with both small and large morphology (in comparison to common WBCs), and with the canonical characteristic of pericentric nuclei as observed in clinical pathology and confirmed previously 44 . We found candidate PCs in the PB with varying CD138 intensities including CD138− candidate PCs detected based on their large size and eccentric nuclei as compared to surrounding common WBCs, consistent with prior observations 44,55,57,67 . Next-generation single-cell sequencing for CNV profiling provided genomic validation of candidate aberrant cells. Whole-genome sequencing of MM CTC and BMPC candidates revealed aberrated CNV profiles in four NDMM patients and one MGUS case with four NBDs used as controls. In NDMM patients, scCNV genomic analysis validated PC aberrancy in candidate MM CTCs harboring clonality consistent with the paired BMPCs. For clinical validation, we correlated clinical cytogenetic observations to our scCNV data and found concordance of genomic scCNV to clinical cytogenetic results, thus validating the technical capability for identification of rare clones that are informative for clinical practice in PC neoplasms. Notably, the identification of rare clones carrying genomic events not detectable by standard clinical methods is a key strength of this work for future clinical utility in rare cell events contents like MGUS PB and minimal residual disease detection. There are notable limitations in this study. The small patient cohort size minimizes the extent of biological interpretation and warrants a large-scale study to validate the results presented in this work. Further, the addition of SMM, RRMM, and PCL patient samples would strengthen the observations made here and expand the domain of clinical application for this four-color enrichment-free assay in the ability to detect and describe rare myeloma cells across the progression spectrum. Additionally, the low pass single- cell sequencing conducted here was unable to detect IGH translocations, while future studies will establish the capability to detect translocations as part of the presented morphogenomic profiling, allowing for the direct utility to the hyperdiploid and trisomy myeloma. Despite the noted limitations, the strengths of this study are in the genomic data that provided both technical and clinical validation for the four-color unbiased immunofluorescence assay built with markers that have been extensively studied and validated as part of clinical histopathology panels 71 . The data presented use patient samples showing direct clinical application. We sampled both bone marrow and PB to demonstrate the robustness in both morphological and genomic characterization of PCs. Beyond NDMM, we show further evidence for use in ultra-rare event samples through the incorporation of MGUS PB and BMA to demonstrate potential in early detection and identified a genomically altered clone. We believe that future work in large cohorts will 25 establish clinical utility for this technology as a blood-based liquid biopsy method to delineate rare genomic subclones in myeloma patients. Additional significant implications include early diagnosis and disease monitoring in contexts where patients are expected to have a low abundance of malignant cells. HDSCA-based morphogenomic analysis provides an alternative method to deconvolute the genomic heterogeneity of myeloma and has the potential to serve as a supportive tool for clinical decisions. 2.3. Characterization of BCMA Expression in Circulating Rare Single Cells of Patients with Plasma Cell Neoplasms The content of this section is adapted as submitted for review as Libere J. Ndacayisaba, Kate Rappard, Stephanie Shishido, Sonia Setayesh, Guilin Tang, Pei Lin, Nicholas Matsumoto, Ching-Ju Hsu, Rafael Nevarez, Carmen Ruiz Velasco, Amin Naghdloo, Eric Yang, Kevin Kelly, James Hicks, Jeremy Mason, Robert Orlowski, Elisabet Manasanch, Peter Kuhn. “Single-Cell Characterization of BCMA Expression in Circulating Rare Cells Detected in Plasma Cell Neoplasia Patients”. Blood Advances. Manuscript tracking no: ADV-2022-008383 2.3.1. Introduction Of the millions of B cells produced daily in the germinal centers of peripheral lymphoid organs, only a few will mature, survive, and commit to differentiation into plasma cells (PCs). The survival and proliferation of these post-germinal center B cells relies primarily on a set of survival and growth molecules that include the tumor necrosis factor (TNF) B-cell Maturation Antigen (BCMA, aka CD269 or TN FRSF17). BCMA is a type III transmembrane receptor glycoprotein whose gene, located at the 16p13 chromosomal region, is primarily expressed in mature and terminally differentiated B lymphocytes 72–74 . Beyond surface expression, a perinuclear staining of BCMA (BCMAp) has also been described and suggested to have a putative role in antibody production 75,76 . BCMA functions as the primary TNF membrane receptor for the cognate ligands A Proliferation-inducing Ligand (APRIL) and B-cell Activation Factor of the TNF family (BAFF) in the core signaling pathways that regulate the growth, proliferation, and survival of differentiating B cells in both benign and malignant states 77 . In lymphoproliferative cancers, BCMA is progressively expressed across the B cell differentiation path starting from late-stage B cells, proliferating plasmablasts, and PCs 74,78,79 (Figure 2.8A). In PC malignancies, BCMA functions primarily as a membrane receptor in key signaling cascades via NF-kB, MAPK/ERK, p38, and JNK/Elk-1 for cell growth, proliferation, and survival of committed B lymphocytes and PCs 74,76 and further in the maintenance of an immunosuppressive tumor microenvironment in multiple myeloma (MM) 78,80 . BCMA has, consequently, emerged as a promising diagnostic and prognostic marker of disease and therapeutic target of interest in hematologic 26 malignancies, particularly in MM and its precursor neoplasms. (ref 81 and Phase 1 trial #: NCT05055063). As 80-100% of MM cell lines express BCMA 79,82 and nearly 100% of MM patients’ bone marrow (BM) express BCMA 78 , the specificity in robust expression has motivated the development of multiple novel anti-BCMA immunotherapies that include chimeric antigen receptor T-cell, bispecific antibodies, and antibody-drug conjugates among other therapeutic modalities, some of which have achieved 90-100% clinical responses 78,82 . Since nearly all cases of MM relapse, quantifying BCMA expression would enable monitoring of therapy response and minimal residual disease. Additionally, detection and profiling of BCMA-expressing cells could be a method for monitoring minimal residual disease and finding therapy resistant malignant B cells and PCs. Despite its critical role in disease pathogenesis and as a potent therapeutic target, surface BCMA has not previously been pursued and validated as a biomarker for routine clinical analysis outside of immunohistochemistry for core BM biopsy, which has been shown to be less sensitive than Flow Cytometry for BCMA quantification 83 . Previous research benchmarking BCMA against CD138, the current standard marker for PC identification showed that BCMA is superior and more robust for isolating PCs in bone marrow aspirates (BMA) 84 . Further, there is limited literature on BCMA expression in circulating cells of patients with hematologic malignancies and, to the best of our knowledge, BCMA has not been used for detection or isolation of circulating malignant PCs. A single cell liquid biopsy method to robustly detect, morphologically characterize, and quantify circulating BCMA-expressing cells could establish BCMA as a key prognostic biomarker in both pre-malignant conditions and overt B lymphoid cancers. In this study, we report the technical development and initial genomic and clinical validation of a new, slide-based enrichment-free 4-plex (BCMA, CD138, CD45, DAPI) immunofluorescence assay for circulating rare cell detection and morphogenomic profiling of BCMA+ cells in PC malignancies, from here referred to as the "BCMA assay". The technical methodology is built on the established ‘no-cell-left-behind' approach of the High-Definition Single Cell Assay (HDSCA) workflow, previously validated clinically in various pathologies including breast cancer 59 , myocardial infarction 46 , melanoma 46 , prostate cancer 21 , bladder cancer 22 , colorectal cancer 23 , and multiple myeloma 24,44 and has been optimized for BCMA detection purpose. The BCMA assay was applied to patient samples collected from patients with MGUS (monoclonal gammopathy of undetermined significance), SMM (smoldering MM), NDMM (newly diagnosed MM), RRMM (relapsed/refractory MM), and LPL (lymphoplasmacytic leukemia). BCMA- expressing cells of varying size and morphology were detected, with BCMA and CD138 cell fractions characteristic of candidate normal and abnormal PCs, plasmablasts, and precursor myeloma cells. Downstream single-cell copy number variation (CNV) analysis confirmed that the candidate aberrant cells detected harbor chromosomal genomic alterations also found using standard diagnostic clinical cytogenetics. The 4-plex immunofluorescence liquid biopsy assay reported here for the identification and 27 morphological delineation of circulating tumor cells in B lymphoid malignancies has potential utility in early detection of myeloma and therapy response monitoring for patients undergoing anti-BCMA treatment. 2.3.2. Materials and Methods 2.3.2.1. Patient Enrollment and Sample Acquisition Following an Institutional Review Board (IRB)-approved protocol (PA18-1073), patients enrolled in this study provided informed consent and were accrued at The University of Texas MD Anderson Cancer Center (MDACC; Houston, TX). For standard diagnostic workup, patients underwent a BM core needle biopsy, blood work, 24-hour urine collection, and whole-body imaging. For this study, paired peripheral blood (PB) and BMA specimens from the diagnostic draw were collected from nine patients (two MGUS, two SMM, three NDMM, one RRMM, and one LPL) who were prospectively enrolled between 04/10/2019 and 02/09/2021. For each patient’s draw, a second tube was analyzed via flow cytometry (FC) by MDACC as part of the standard MM diagnostic workup. Two normal blood donor (ND) samples from individuals with no previously known pathologies were procured from the Scripps Clinic Normal Blood Donor Service (La Jolla, CA). PB and BMA specimens were drawn and collected using anti-coagulated preservative tubes (Cell-Free DNA blood collection tube, Streck) and shipped to the USC Michelson Convergent Science Institute in Cancer (CSI-Cancer) via FedEx overnight and processed as previously described using the established and validated HDSCA workflow 49 . Briefly, samples underwent red blood cell lysis using an isotonic ammonium chloride buffer and all nucleated cells were plated as a monolayer onto custom-made glass slides (Marienfeld, Germany) at approximately 3 million cells per slide. Plated slides were cryopreserved for future staining experiments and analysis. 2.3.2.2. Fluorescent in Situ Hybridization (FISH) and Karyotyping for Clinical Diagnosis Following the diagnostic workup protocol for monoclonal gammopathies, standard FISH was performed for patients in this cohort. To increase sensitivity for detection of cytogenetic abnormalities in plasma cell neoplasms, enrichment of CD138+ cells using RoboSep-STM (Stem Cell Technologies) for plasma cell selection was performed on BMA collected at the time of study enrollment. The MDACC Cytogenetics Laboratory developed and evaluated the performance characteristics of the tests following the requirements established by the CLIA’88 regulations. To successfully perform FISH analysis for myeloma-associated aberrations, the clinical laboratory requires that the proportion of neoplastic PCs be larger than 0.05% of total cells analyzed and aberrant neoplastic PCs must represent more than 25% of all PCs as analyzed by flow cytometry. For karyotyping, metaphase cells were prepared from BMA specimens cultured without mitogens for 24 hours or with lipopolysaccharide (LPS) for 72 hours and chromosomal analysis was performed following standard techniques. Results are 28 reported following the International System for Human Cytogenetic Nomenclature (ISCN 2020) from twenty Giemsa-banded metaphases. Following manufacturer’s specifications (Abbott Molecular, Abbott Park, IL) for FISH detection of chromosomal rearrangements in t(4;14)/IGH::FGFR3, t(11;14)/IGH::CCND1, and copy number changes of CDKN2C/CKS1B, RB1/13q34; and TP53/CEP17 in PC myeloma, FISH analysis was performed on interphase nuclei acquired from BM cells utilizing dual-color FISH probe targeting the above common myeloma abnormalities. For each probe, analysis is performed on two hundred nuclei and results reported following threshold values as previously established by the MDACC clinical cytogenetics laboratory. 2.3.2.3. Selection of Markers and Immunofluorescence Targeting for the Development The rationale for developing the BCMA 4-plex assay is grounded in BCMA’s progressive expression across the B cell differentiation spectrum (Figure 2.8A) and the increasing interest in this marker as a therapeutic target as demonstrated by the rise in clinical studies over the past 9 years (Figure 2.8B). The selection criteria for CD138, CD45, and DAPI have been previously described in two validated myeloma assays for morphogenomic differentiation of malignant and normal PC and identification of rare cell clones 24,44 . Here, we substituted CD56 with BCMA for a new 4-plex (CD138, BCMA, CD45, DAPI) immunofluorescence staining panel with the primary goal of targeting key cells of interest (Figure 2.8C) and characterizing circulating B lymphocytes expressing BCMA in the PB of patients with PC neoplasms. For assay development and validation, mouse anti-human IgG1 CD138 (Exbio, clone A-38, cat# 10-520-C100, Czech Republic), rabbit anti-human IgG BCMA (Abcam, cat # ab253242), and directly conjugated mouse anti-human CD45-Alexa647 (AbD Serotec, cat# MCA87A647, Raleigh, NC) were acquired. Additionally, mouse anti-human Alexa Fluor® 555 (Invitrogen, Waltham, MA) and goat anti-rabbit Alexa FluorPLUS® 488 (Invitrogen, Waltham, MA) were procured as secondary antibodies targeting CD138 and BCMA, respectively. 29 Figure 2.8 Assay rationale and 4-plex immunofluorescence targeting (A) B cell differentiation and BCMA expression during B cell maturation spectrum from GC B to mature plasma cell 78,79 . (B) Clinical studies involving BCMA as a therapeutic target of interest. Data from ClinicalTrials.gov with “BCMA” as the search key term, as of April 2022. ‘Not Applicable‘ = study phase explicitly marked as ’not applicable’. “Unknown“ = no data entered for study phase. (C). Immunofluorescence targeting with different antibodies for detection of BCMA+ and CD138+ cells in the enrichment-free HDSCA workflow with histological features of plasma cells and B cells. Large CD138+ with eccentric nuclei represent the plasma cell compartment while small CD45+ and BCMA+CD138- represent the B cell fraction. Gt=Goat, Ms=Mouse, Rb=Rabbit, Ig=immunoglobulin, A555=Alexa Fluor 555 dye, A488=Alexa Fluor PLUS 488 dye, A647=Alexa Fluor 647. 2.3.2.4. BCMA staining and validation in cell lines and spiked NBD samples Anti-CD138 and anti-CD45 antibody testing and validation was performed as previously described (Zhang 2016, Ndacayisaba et al. 2022). BCMA testing was 30 performed on U266, MM1.S, and Jurkat cell lines. Immunoglobulin E lambda myeloma- derived U266 cell line (gift from Dr. Akil Merchant; U266B1 TIB-196TM), immunoglobulin A lambda myeloma-derived MM1.S cell line (ATCC® CRL-2974™), and T-lymphocyte-derived Jurkat cell line (ATCC® TIB-152™) were cultured according to manufacturer’s specification. Slides for assay development were generated with pure cell lines (U266, MM1.S, and Jurkat) to test the expression of BCMA in myeloma and control cell lines. To establish the optimal concentration for anti-BCMA, we performed antibody titrations from concentrations of 0 to 10 μg/mL using contrived samples with the U266, MM.1S, and Jurkat cell lines. For specificity of the anti-BCMA antibody, Jurkat cells were used as a BCMA- control. For specificity of the mouse anti-human Alexa Fluor® 555 secondary antibody, a no primary antibody control in which the anti-BCMA antibody is omitted from staining was used. To analyze the expression of BCMA in cell lines compared to normal white blood cells (WBCs), U266 cell line was spiked into NBD blood at a 1:100 dilution. For assay sensitivity testing, NBD slides were spiked with U266 cells in serial dilutions of 0, 1, 10, and 100 cells per NBD slide. Spiked slides were stained using the final assay concentrations and conditions (Table 2.2) and the number of detected CD138+ and BCMA+ cells were correlated with the spiked cell line count for linearity analysis. Table 2.2 BCMA Assay Staining Protocol. RT = room temperature. GS = Goat Serum Staining Protocol for the BCMA assay Fixation: 2% PFA for 20 min Wash: 1X TBS 2x3 min Blocking: 10% filtered GS in TBS for 30 min Primary Mix: CD138 mouse IgG1 (2 µg/mL) BCMA rabbit IgG (2.5 µg/mL) CD45-Alexa647 mouse IgG2a (1.6 µg/mL) incubated 1 hr RT Wash: 1X TBS 2x3 min 10% filtered GS in TBS for 30 min Secondary Mix: Alexa555 (1:500) + AlexaPLUS488 (1:500) + DAPI incubated 40 min RT Wash and finish: 1X TBS 2x3 min, dip ddH2O, coverslip with live cell media, seal 2.3.2.5. BCMA staining and validation in patient PB and BMA For detection of BCMA-expressing cells in patients with PC neoplasia, PB slides were stained with the developed assay in patients with MGUS, SMM, NDMM, RRMM, and LPL. Slides were thawed after -80°C storage, fixed using 2% paraformaldehyde for 31 20 minutes and washed using TBS. The slides were incubated with 10% goat serum for 30 minutes to block non-specific binding sites for secondary antibodies. Next, the slides were incubated with a primary antibody mix consisting of CD138 (2 µg/mL, Exbio), CD45-Alexa647 (1.6 µg/mL, AbD Serotec), and BCMA (2.5 µg/mL, Abcam) for 1 hour at room temperature. Slides were washed with TBS and subjected to an additional 1 hour 10% goat serum incubation. Secondary antibodies Alexa Fluor 555 goat anti-mouse (1:500, Invitrogen) and Alexa FluorPLUS 488 goat anti-rabbit (1:500, Invitrogen) were added for CD138 and BCMA visualization, respectively, with DAPI for 40 minutes at room temperature. Slides were washed with TBS and rinsed in water prior to being mounted with live-cell media, cover slipped, and sealed for downstream imaging and technical analysis. 2.3.2.6. Imaging and Technical Analysis for Rare Cell Detection and Classification Image generation, technical analysis, and rare cell detection on the slide-based HDSCA 4-plex immunofluorescence technology has been described 21,23,24,49 . Briefly, stained slides are imaged at 100x magnification, and an unsupervised clustering algorithm identifies rare cells based on cellular morphology and marker expression across all 2-3 million cells per slide as described by 761 features per cell. Rare cells are segregated from common cells. The rare cell fraction from the automated algorithm is then classified based on CD138 and BCMA expression and the candidate subtypes are visually inspected for cellular and morphological integrity, classified by channel-type, and enumerated by a trained technician. The classifications of all reported cells are re- confirmed by two additional researchers for reproducibility and cross-validation. The rare cell subtypes of interest as detected and in this assay fall into the following primary categories: • BCMA+CD138+CD45- • BCMA+CD138-CD45- • BCMA+CD138+CD45+ • BCMA-CD138+CD45- While additional cellular and acellular events are detected and tracked by the OCULAR algorithm 21,22 , this study focuses on the DAPI+ fraction indicated above. Enumeration results for each category are reported as the total count of cells per mL of blood. 2.3.2.7. Single Cell Sequencing and CNV Analysis for Genomic Validation To investigate whether the different rare cell subtypes harbor myeloma genomic alterations and delineate which subset of BCMA+ cells are aberrant, single cell CNV (scCNV) analysis was performed following previously validated protocols 21,59 . In brief, a robotic micromanipulator is used to pick single cells off the glass slides and transfer cells into PCR tubes containing lysis buffer (200mM KOH; 50 mM DTT). The individual tubes are stored at -80°C until the lysate is thawed and subjected to whole genome amplification (WGA; Sigma Aldrich) and library construction using paired indices 32 (Illumina). WGA was performed using the WGA4 Genomeplex Single Cell Whole- Genome Amplification Kit (Sigma Aldrich). Amplified DNA was purified using the QIAquick PCR purification kit (Qiagen) and resulting DNA was quantified using the Quibit Fluorometer (Thermo Fisher). Indexed Illumina sequencing libraries were constructed and barcoded using the NEBNext Ultra DNA Library Preparation Kit (New England Biolabs). The amplified DNA fragments with target size were sequenced at Fulgent Genomics (Temple City, CA) to generate approximately 500,000 mapped reads. Sequenced reads are analyzed using our previously described CNV pipeline for reference genome mapping, read alignment, and single cell ploidy determination 59,61,63 . For each sequenced and mapped cell, 2 copy numbers are normal and any cell whose profile contains chromosomal locations with counts that deviate is harboring chromosomal alterations and is therefore an aberrant cell. Chromosomal alterations are evaluated across the cells to establish the clonal relationship between cells from the same slide. WBCs (CD45+BCMA-CD138-) from each patient sample along with cells of interest from NBD samples were included as negative controls and subjected to the same sequencing and analysis workflow. To evaluate whether BCMA-expressing cells are genomically altered, the morphological classification of sequenced cells was correlated with scCNV profile status of altered or normal. Further, BCMA expression was correlated with specific chromosomal events to see whether there were clonal cells in the early stages of B cell differentiation and maturation. For initial clinical validation, the scCNV profiles of cells detected by our 4-plex assay, were compared to the patient-level cytogenetic alteration events probed in the MDACC targeted clinical FISH workup protocol as part of clinical diagnosis. For every single cell, we assessed the presence or absence of each cytogenetic event and quantified how many cells harbor genomic alterations observed using standard clinical methods. 2.3.2.8. Statistical Analysis Data analysis was performed in the R programming language (version 3.6.3). Statistical comparison was computed using the Kruskal-Wallis test with a 95% confidence interval. The correlation between marker intensity is calculated using the Pearson correlation with the linear relationship reported using the Pearson correlation coefficient. 2.3.3. Results 2.3.3.1. Assay Validation and BCMA Expression in Cell Lines and Spiked Normal Blood Donor Samples Technical validation for CD138, CD45, and DAPI in HDSCA has been previously reported 24,44 . In titration experiments with U266, MM1.S, and control Jurkat cells, BCMA expression is higher in U266 and MM1.S consistently across concentration conditions while no BCMA expression was detected in control Jurkat cells as expected (Figure 2.9A). Through these titration experiments, the final optimal concentration for the anti-BCMA 33 antibody was determined to be 2.5µg/mL when multiplexed with 2µg/mL of anti-CD138 and 1.6µg/mL of anti-CD45.To test BCMA specificity in circulating cells of NBD controls, U266 cell lines were spiked in NBD slides and stained with the 4-color assay. In spiked- in stains, U266 were CD138+BCMA+CD45- and generally larger than surrounding WBCs (CD138-BCMA-CD45+) with the canonical eccentric nucleus (Figure 2.9B) as expected, providing validation for the specificity of the anti-BCMA antibody for the assay. Figure 2.9 Assay development and validation in cell lines and spiked normal blood (A) Titration curve of BCMA at 0-10µg/mL concentrations in U266, MM1S, and Jurkat cell lines. (B) Representative images at 100x and 400x magnification of BCMA+ U266 cells with surrounding WBCs (green). (C) Linearity between cell counts in spiked-in experiments for CD138+ cells. (D) Linearity for BCMA+ cells in U266 spiked in NBD. For assay sensitivity, linearity analysis on the serial dilution experiments shows the correlation between the number of U266 cells spiked in NBD total count of detected CD138+ (Figure 2.9C) and BCMA+ (Figure 2.9D). A significant positive correlation was observed with R-squared values of 0.998 and 0.641, respectively. While only 0-5 CD138+ cells are expected to be found in unspiked NBDs, our experiments detect BCMA+ cells in NBD samples (range: 5-110 cells) which are normal BCMA-expressing mature B cells. All the BCMA+CD138- cells detected in the NBD are smaller than U266 cell lines and equal in size to BCMA-CD138-CD45+ WBCs. 34 2.3.3.2. Characteristics of Patients in the Study Cohort The patient cohort consists of nine patients diagnosed at five different states on the PC neoplasia continuum from MGUS to RRMM with an additional patient with LPL and two age-matched NBD as controls. The cohort consists of 3 males and 6 females with a median age of 63 (range 38-84). Clinical bone marrow plasma cell profiling shows higher malignant plasma cell percentage consistent with myelomateous conditions except for MGUS2 who presented with 0% aberrant plasma cells. Accordingly, FC analysis found positive expressions for disease markers (CD138, CD38, CD56) and negativity in normal/exclusionary markers (CD45, CD19). The LPL patient presents both PC and B cell aberration with FC positivity in CD19 and CD45 in bone marrow plasma cells, indicative of a histologically lymphoplasmacytic myeloma phenotype. Additional clinical features of the patients in this study are detailed in Table 2.3. Table 2.3 Patient demographics and clinical characteristics FC: Flow Cytometry, sFLC: serum-free light chain, MGUS: monoclonal gammopathy of undetermined significance. SMM: Smoldering multiple myeloma. NDMM: newly diagnosed multiple myeloma. RRMM: Relapsed/Refractory multiple myeloma, LPL: Lymphoplasmacytic Leukemia. Patient IDs are reflective of the disease state at diagnosis. VGPR: Very good partial response MGUS 1 MGU S2 SMM 1 SMM 2 NDM M1 NDM M2 NDMM 3 RRM M LPL Age 78 38 72 55 63 67 58 50 84 Sex Male Femal e Fema le Femal e Female Female Female Male Mal e Diagno sis MGUS MGU S SMM SMM NDM M NDM M NDMM RRM M LPL Ig Isotype IgGk IgM 𝜆 IgGk IgGk IgGk IgAk IgGk IgD 𝜆 IgG 𝜆 PC in BM aspirate (%) 1 4 10 10 15 76 7 27 2 Aberra nt PC from the total PC BM compar 92 0 78.9 98 95.2 100 99.6 99.9 94 35 tment (%) FC CD138 Positiv e NA Positi ve Positi ve Positiv e Positiv e Positive Positi ve Posi tive FC CD38 Positiv e NA Positi ve Positi ve Positiv e Positiv e Positive Positi ve Posi tive FC CD56 Positiv e NA Positi ve Negat ive Positiv e Positiv e Positive Positi ve Neg ativ e FC CD45 Negati ve NA Nega tive Negat ive Negati ve Positiv e Negativ e Negat ive Posi tive FC CD19 Negati ve NA Nega tive Negat ive Negati ve Negati ve Negativ e Negat ive Posi tive FC CD27 Positiv e NA Positi ve Positi ve Negati ve Negati ve Negativ e Negat ive Posi tive M- Spike (g/dL) 0.7 0.4 3.9 1.5 0.4 0.2 2.1 0.3 0.2 sFLC ratio 8.14 1.06 37.58 8.45 186.84 1362.1 1 8.7 691.27 44.2 4 Karyot ype Norma l Norm al Nor mal Norm al Norma l 43~45, X,del( X)(p22. 1),+1,d el(1)(p 32),ad d(1)(p3 2),psu dic(3;1 )(q25;p 13),t(4; 8)(q21; q24.3), -12,- 13,der( 14;18)( p10;q1 0),- 22,+1~ Normal Norm al Nor mal 36 2mar[c p16]/4 7,XX,+ 5[1]/46 ,XX[3] FISH Three copies of CCND 1 Norm al Triso my 11 Negat ive Three copies of CCND 1 t(4;14) and monos omy 13 Normal t11:14 ) Nor mal Clinical Present ation Low- risk MGUS for progre ssion to MM by PETH EMA 13 criteria Low risk MGU S by PETH EMA criteri a High -risk SMM by PET HEM A criter ia High- risk SMM by PETH EMA criteri a Standa rd-risk NDM M treated with with carfilz omib, lenalid omide, dexam ethaso ne )KRd) with CR. Remai ns in remissi on and on lenalid omide mainte nance 3 years after High- risk NDM M, primar y refract or to bortez omib, lenalid omide and dexam ethaso ne., Deceas ed 7mont hs after initial diagno sis Standar d-risk NDMM treated with KRd 4 cycles followe d by autolog ous stem cell transpl ant with melpha lan 200 mg/mg 2 with VGPR, remains VGPR 3 years after diagnos is on mainte Stand ard- risk RRM M. Patien t refract ory to lenali domi de and elotuz umab at time sampl e was taken Ne wly diag nose d Wal den strö m’s mac rogl obul ine mia (MY D88 L26 5P mut atio n posi tive) requ irin g treat 37 diagno sis nance with lenalid omide and daratu mumab clinical trial men t 2.3.3.3. Morphological Characterization and Enumeration of BCMA+ cells in Patient PB Morphological analysis of cells in patient PB was performed to characterize the different detected rare cells and subsequent subtypes of BCMA+ cells. When comparing expressions of BCMA, CD138, and CD45 across all rare circulating cells, distinct patterns of expression were observed among rare cells of interest with varying degrees of BCMA and CD138 expression across cell size and shapes (Figure 2.10A). Figure 2.10 Morphological characterization of BCMA+ cells in PB samples A. Representative candidate rare cells with comparative BCMA and CD138 expression across different cell sizes. B. UMAP projection of all candidate cells using all 761 single cell morphology and marker intensity features, colored by CD138 expression. C. UMAP projection, colored by BCMA expression. D. UMAP projection, colored by CD45 expression. E. Density plots showing channel intensities for each disease state. 38 We detect BCMA+CD138-CD45-, BCMA+CD138+CD45-, BCMA-CD138+CD45- and BCMA+CD138+CD45+ cells and characterize their varying cell size and morphology. The cellular subtypes represent candidate normal and abnormal PCs, plasmablasts, and precursor post-GC B cells. A staining pattern consistent with perinuclear BCMA (BCMAp) was also observed in patient microscopy imaging data (Figure 2.11) and represents additional assay capabilities. Figure 2.11 Representative images of observed BCMAp staining patterns (A) 100x magnification. (B) 400x magnification. Dimensionality reduction analysis by Uniform Manifold Approximation and Projection (UMAP) identified distinct groups, with two main clusters of BCMA+ cells, one of which is CD138+CD45- and another of which is CD138-CD45+ (Figure 2.10 B-D) with cell populations distinctly mapped in unique density distributions across CD138, BCMA, and CD45 intensities (Figure 2.10 E). The staining intensities are consistent with manual classification of circulating rare cells showing distinct populations of cells expressing CD138, BCMA, and CD45. Biomarker based classified groups represent candidate PCs, committed clonotypic B cells, and plasmablasts (Figure 2.12 A). Quantification of the different circulating rare cell groups across patients shows that CD138+ cells are higher in neoplastic myeloma and not in NBD, with a higher proportion being found in NDMM than precursor states (SMM and MGUS) (Figure 2.12B-D) consistent with our prior observations 24,44 . BCMA+ cells were detected in all samples including NBDs, confirming that the expression of BCMA in early B cells, albeit suggested to be low, is detectable in our assay. Comparing the total BCMA+ count, BCMA+ cells are higher in disease states than in normal blood, with the highest total count observed in NDMM, the overt malignant state. (Figure 2.12 E). Together, the morphological analysis and enumeration in the 4-plex assay shows that BCMA expression in circulating rare cells is found across different cells sizes and 39 with varying CD138 and CD45 patterns, with higher counts in NDMM compared to other conditions. Figure 2.12 Quantitative enumeration of circulating rare and BCMA+ cells A. UMAP projection, colored by manual cell classification of rare cell groups. B. the total cell count per immunofluorescence intensity classification. C. Proportion of cells across patients and D. Enumeration by channel type. E. Total count of BCMA+ cells across disease state. 2.3.3.4. scCNV for BCMA-expressing cells and Correlation to FISH Cytogenetics BCMA expression has been shown to be higher in malignant PCs than in normal PCs and is progressively expressed from post-GC B cells towards fully differentiated PCs 84 . To investigate whether circulating BCMA+ precursor B cells harbor genomic alterations, scCNV analysis of sequenced BCMA+ and BCMA- cells and correlation to FISH cytogenetic data was performed to validate the BCMA assay. Candidate circulating rare cells of interest from 7 patients and 2 NDs were sequenced. No cells were picked and sequenced from the MGUS2 and LPL patients. Representative scCNV profiles and the respective marker expression and morphology of the sequenced single cell are shown in Figure 2.13A. Of all sequenced cells (N = 108), 50 displayed altered and 58 non-altered CNV profiles. Quantitative cell counts for both normal and altered scCNV across patients (Figure 2.13B) and across disease states (Figure 2.13C) show that altered cells were identified in all disease states and across 6/7 patients (the four sequenced cells in SMM2 40 had a normal scCNV profile). For controls, all cells from NBDs are normal (Figure 2.12B- C). Distribution of marker-based classification across normal and altered scCNV profiles are shown in Figure 2.13D with both normal and altered covering the morphological classifications. The six BCMA+ cells detected have normal scCNV profiles. 3/24 (12.5%) BCMA+CD45+ cells are altered while 32/46 (69.5%) CD138+BCMA+ cells are altered. 9/14 (64.3%) CD138+ cells are altered and 4/12 (33.3%) BCMA+CD138+CD45+ are altered, 1/3 (33.3%) of DAPI+ only is altered (Figure 2.13D). For initial clinical validation, concordance analysis to cytogenetic events across scCNV profiles was performed. From clinical FISH cytogenetic results (Figure 2.13E), patients MGUS2, SMM2, NDMM3, and LPL are negative for the 12 key diagnostic cytogenetic targets, MGUS1 and SMM1 are positive for only 11q (CCND1) gain. NDMM1 is positive for 17p (TP53) gain, 13 (RB1) deletion, 11q (CCND1) gain, 1q21 (CKS1B) gain, and 14q (IGH) gain. NDMM2 is positive for t(4;14)/(p16;32) translocation, 13 (RB1) and 1q32 (CDKN2C) deletions, and 1q21 (CKS1B) gain. RRMM is positive for t(11;14)/(p13;q32) translocation and Cyclin D1. The positive FISH cytogenetic events, which are detected in CD138+ bone marrow plasma cells, were mapped onto scCNV profiles of circulating rare cells with morphological subtypes as classified by the BCMA assay detection. 48/50 (96%) of the altered cells, spanning 5 morphological groups, harbor at least one of the positive cytogenetic events (Figure 2.13F). The CD138+BCMA+CD45- morphotype constitutes the most altered group with cells spanning across 6 FISH cytogenetic events. 41 Figure 2.13 scCNV analysis for genomic validation of candidate aberrant cells A. Representative CNV profiles along with cellular morphology for that single cell. Dashed rectangles mark diagnostic cytogenetic events also detected in scCNV. Red=gain, Blue=loss. B. Distribution of sequenced cell counts grouped by normal and altered scCNV profile across patients, across diseases states in C and morphological groups in D. E. Clinical cytogenetics results from FISH and Karyotyping for 12 common genomic events used for myeloma diagnosis. F. Intersection plot showing total count of altered single cells harboring alterations detected by clinical cytogenetics and corresponding morphological classification. Taken together, the genomic data points to heterogeneous BCMA expression across normal and altered PCs, and the precursor plasmablasts and post-GC B-cells. Altered CNV profiles were predominantly in the CD138+BCMA+ phenotype consistent with 42 prior observations that BCMA has higher expression specific to PCs 79 . BCMA+CD45+ constitutes a primarily unaltered fraction of cells that are likely normal BCMA+ post-GC and memory B cells. 2.3.4. Discussion BCMA is a TNF receptor for NFkB, MAPK/ERK, p38, and JNK/Elk-1 signaling pathways that promotes cell growth, proliferation, and survival as well as maintenance of an immuno-suppressive tumor microenvironment in both benign and neoplastic myeloma and other lymphoproliferative malignancies 78 . Detection and characterization of circulating PCs and their precursor plasmablasts and B cells early in the disease progression and in relapsed patients remain a critical challenge due to low abundance of cell populations of interest (rare cell context) such as clonotypic B cells and plasmablasts in early MGUS and therapy-resistant clones in relapsed conditions 78,82 . The exclusivity and robustness of BCMA expression in myeloma renders it a key and reliable marker for identification of malignant plasma cells 84 and an exciting target for various therapeutic modalities 82 . This study reports a new 4-plex (BCMA, CD138, CD45, DAPI) immunofluorescence assay for enrichment-free detection and characterization of BCMA- expressing cells in PC neoplasms and initial validation for utilizing the technology for tracking these cells across the myeloma progression spectrum. While CD138 and CD45 are standard markers in PC characterization and clinical diagnosis, BCMA is a newly described clonal B cell and PC marker of great interest in hematologic malignancies 78,81 which confers this assay with broader capability for detecting and characterizing terminal PCs, plasmablasts, and post germinal center (GC) B cells. Further, the BCMA assay identifies cells expressing both the cell surface transmembrane BCMA and the perinuclear Golgi BCMAp 75 conferring the assay extended capabilities for rare cell detection and monitoring both in pre-treated and treated conditions. With BCMA as the cornerstone marker and in combination with CD138, the 4-plex assay is an ideal rare cell technology capable of simultaneously detecting and characterizing clonotypic B-cells; plasmablasts; normal, malignant, and dendritic PCs; and other non-canonical BCMA- expressing cells. The BCMA+CD45+ cells in NBD samples, confirmed to be genomically normal, are hypothesized to be normal activated post-GC B cells with high proliferation index as they are likely undergoing active maturation towards terminally differentiated PCs. Among notable limitations in this work is the small cohort size (N=9) spanning different disease states. Future studies on a larger cohort will provide further insights into the pattern of BCMA expression in the PB of patients with hematologic malignancies. Further, myeloma patients present a serum soluble BCMA (sBCMA), resulting from the shedding of membranous BCMA cleaved from PCs by gamma-secretase 85 . sBCMA predicts anti-BCMA therapy response and progression free survival (PFS) in RRMM 86,87 43 as gamma-secretase inhibition improves anti-BCMA immunotherapy efficacy 88 . It has been recently shown that increased sBCMA in serum correlates with poor prognosis in MGUS and SMM 87 . While we observed nuclei-free BCMA+ events in our staining, further characterization of sBCMA is needed to understand the spectrum of BCMA but was beyond the scope of this initial analysis. Additionally, multiplexed proteomic profiling for candidate post-GC B cells, plasmablasts, and dendritic PCs is warranted for further validation of cell immunophenotypes beyond the four markers in this assay to assign the distinct phenotypes that characterize these candidate precursor cells. For therapy response monitoring, the BCMA assay has not yet been validated in patients receiving BCMA- targeted therapies with such agents as ADCs and bispecific antibodies that may potentially cause BCMA internalization or sterically hinder antibody binding in the assay. As such, additional validation in a cohort of anti-BCMA treated patients with longitudinal monitoring using the BCMA assay will elucidate any potential false negatives and validate the specificity and limit of detection. With the ultra-rare cell detection capabilities and clinical validation, this assay has multiple applications in basic research and clinical care. Beyond the detection and characterization of BCMA+ cells, another area of interest is in the search for myeloma stem cells (MMSCs): a rare cell population thought to be quiescent myeloma cells that behave as tumor-initiating cells as a result of their interaction with the tumor microenvironment 89 . MMSCs have been hypothesized to be a subgroup of plasma B cell precursors that are CD24+ 90 and or have plasmablasts phenotype with CD24+ expression 90 . We believe these cells are among the BCMA+ cells detected by this assay and further molecular characterization will provide additional evidence. Prospective studies with large cohorts incorporating patients under anti-BCMA therapies will provide additional validation for the clinical utility of the assay for early disease detection and monitoring of MRD, toxicity, and therapy response to anti-BCMA treatment modalities. Finally, with the versatility of the HDSCA platform, newly identified targets of interest such as GPRC5D 91,92 , FcRH5 93,94 , CD73 (5’-Nucletidase) 95 , and others 96 can be adapted into the 4-plex immunofluorescence technology for custom targeted cell-based measurement of disease and therapy response for treatment agents targeting these proteins both in pre-clinical and clinical development. 2.4. Conclusion The CD56 study provides technical and initial clinical validation of a slide-based and enrichment-free single-cell immunofluorescence assay for the morphogenomic detection and characterization of normal and aberrant PCs from blood and bone marrow samples of NDMM and MGUS patients. By morphological description, next-generation single-cell sequencing, and correlation of cytogenetics patient data, we demonstrate that our assay recapitulates the detection of common genetic aberrations currently used as the 44 standard for diagnosis and disease monitoring. Concordance analysis further identifies additional rare genetic clones not detected by conventional clinical FISH and karyotyping methodologies. While additional large-scale validation studies are needed to demonstrate the clinical utility of this assay as an approach with the potential to support clinical decisions, particularly in conditions with rare and ultra-rare cells. The BCMA work reports development and validation of a slide-based 4-plex (CD138, BCMA, CD45, DAPI) immunofluorescence assay for enrichment-free detection, quantification, and morphogenomic characterization of BCMA-expressing cells in patients (N = 9) with plasma cell neoplasms. Varying morphological subtypes of circulating BCMA-expressing cells are detected across the CD138(+/-) and CD45(+/-) compartments and represent candidate clonotypic post-germinal center B cells, plasmablasts, normal and malignant PCs. Genomic analysis by single-cell sequencing and correlation to clinical FISH cytogenetics provides validation, with data showing that patients across the different neoplastic states carry both normal and altered BCMA-expressing cells. Further, altered cells harbor cytogenetic events detected by clinical FISH. The reported enrichment-free liquid biopsy approach has potential applications as a single-cell methodology for early detection of BCMA+ B lymphoid malignancies and in monitoring therapy response for patients undergoing anti-BCMA treatments. Chapter 3 Molecular Subtyping of NDMM using Non-negative Matrix Factorization This chapter is adapted from a manuscript in preparation for publication as Libere J Ndacayisaba, Dean Tessone, Amin Naghdloo, Jeremy Mason, James Hicks, Peter Kuhn. “Identification and Characterization of Molecular Subtypes in Myeloma by Non- negative Matrix Factorization”. 3.1. Introduction This chapter focuses on the application of non-negative matrix factorization (NMF), an unsupervised machine learning algorithm, to subtype NDMM patients based on their RNA transcriptomic profiles. Beyond the clinical factors, MM is well understood from a genetic perspective. MM presents with genetic heterogeneity reflected in the vast number of genetic driver events that lead to disease. In the efforts to deconvolute the intra- and inter-tumor heterogeneity in MM patients, the distribution of genomic events follows a pattern where hyperdiploidy in odd-numbered chromosomes and IGH locus translocations dominate the MM mutational landscape with 40-50% and 40% respectively 32 . While the two main genetic events are almost mutually exclusive additional secondary events include deletion of 17p in 10%, deletion of 13q in 30-50%, and other single nucleotide polymorphisms 32 . Chromothripsis and oncogene mutations constitute additional driver events in MM. The genomic complexity requires further deconvolution of the transcriptional programs where differential expression and splicing, 45 gene regulation play key roles in further complicating disease biology. We hypothesized there are underlying transcriptomic profiles that reflect disease subtypes in myeloma and where delineation and would be clinically enabling. To explore the transcriptional subtypes of MM, we applied NFM to RNA sequencing data from 807 patients in the Multiple Myeloma Research Foundation’s (MMRF's) CoMMpass study 97 . NMF models high-dimensional data as non-negative vector groups and has previously been shown to be highly effective at eliciting biological insights from complex genomic data 98 . Further, the non-negative nature of biological data makes it amenable to NMF analysis. In prior transcriptional subtyping work, NMF groups have been shown to correlate with previously described tumor subtypes and to represent distinct structural variations and mutations within a single group of transcriptional data 99 . In our work, this approach was used to identify six novel subtypes of disease within MM that describe potentially distinct biological mechanisms that we suggest could be clinically relevant and actionable. 3.2. Materials and methods 3.2.1. The Multiple Myeloma Research Foundation (MMRF) CoMMpass study Datasets for this work were obtained under MMRF CoMMpass Study (V15) Network Institutional Review Board approved Informed Consent; Copernicus IRB (IRB # QUI1-11-217). MMRF CoMMpass RNA sequencing data was collected from 807 patients from four different countries, namely The United States, Canada, Italy, and Spain. The CoMMpass study is a longitudinal study that follows patients for eight years, with a mean age at diagnosis of 64 years 11,97 . Each patient underwent treatment at the recommendation of the oncologist. The cohort data is systematically analyzed and released in Interim Analyses twice a year and Interim Analysis 15 (IA15) was used for this work. RNA sequencing was conducted using bone marrow samples and the protocol for RNA sequencing has been previously published 11,97 . In brief, bone marrow aspirates from each patient were subjected to immunomagnetic bead separation using the Miltenyi MACS Cell Separation System (Miltenyi, San Diego, CA, USA) to enrich for CD138- positive malignant MM PCs. Total RNA was extracted from CD138-positive PCs the using QiaAmp RNeasy Mini Kit (Qiagen, Hilden, Germany). Nucleic acids were quality assessed using the Qubit 2.0 (Thermo Fisher, Waltham, MA, USA) and Agilent Tape Station. RNA was converted to cDNA using random primers with Superscript II (Invitrogen, Waltham, MA). After second strand synthesis, the resulting molecules were used for library prep using the Illumina TruSeqRNA library kit. TopHat v2.0.11 was employed for alignment of RNA-seq reads, CuffDiff v2.2.1 for differential expression analysis, and Salmon 0.7.2 for isoform quantification. 46 3.2.2. Selection and pre-processing of RNA sequencing data RNA data was processed prior to analysis. First, ENSEMBL identification numbers were matched to HGNC symbols utilizing the AnnotationDbi v1.56.2 R package 100 . Genes that did not match to any HGNC symbols were removed from the dataset. The gene set was then filtered for protein coding genes using the biomaRt v2.50.1 R package 101 . Next, genes encoding for immunoglobulins were filtered from the set. Finally, genes were filtered for low library size at a standard threshold of 0.5 transcripts per million across a minimum of 25% of samples. The 2000 most highly variable genes were selected for use in the NMF through the scran package in R v1.22.1 102 . 3.2.3. Non-negative Matrix factorization for transcriptomic subtyping NMF was implemented to cluster RNA data (in transcripts per million) using the NMF R package v0.23.0 103 . The NMF algorithm 104 and its application to computational biology tasks such as gene expression analysis have been previously described 105,106 . In short, NMF takes as input a product matrix A with n rows, describing genes, and m columns, describing samples and decompose into WH, where matrix W describes the gene groups, while matrix H describes the meta-cluster (metagenes) of samples based on rank coefficient k. We applied the non-Smooth NMF 107 which implements a smoothing matrix function to the NMF model to increase the sparseness and maintain faithfulness to the original data representation throughout the matrix decomposition calculations. This is because, in gene expression, not all genes contribute equally to the weight of meta- clusters (NMF groups). We used the cophenetic coefficient and dispersion to find the final NMF groups at which the NMF model converged. 3.2.4. Survival analysis in NMF subtypes Survival analysis was performed using the survminer v0.4.9 R package 108 and the survival v3.2-13 R package 109 . Kaplan-Meier curves were calculated for both overall survival (OS) and progression-free survival (PFS) for each of the NMF groups and the statistical significance of the grouping calculated by p-value. 3.2.5. Gene Set Enrichment Analysis (GSEA) for biological pathway determination To determine which key pathways are enriched in the different NFM subtypes, we performed GSEA on the NMF groups. GSEA maps a phenotype of interest (the NMF groups) to gene sets with known biological knowledge to determine which gene sets have a statistically significant difference between target phenotypes 110,111 . Gene sets describing fifty “hallmark” biological pathways were queried from the Molecular Signatures Database and mapped to the NMF subtypes using the msigdbr v7.4.1 R package 112 . Each of the NMF groups were compared against the other five groups in GSEA utilizing differential expression through DESeq2 v1.34.0 113 and fgsea v1.16.0 114 . 47 3.2.6 Analysis of the Tumor Necrosis Factor (TNF) gene family in NMF candidate groups TNFs, among which is BCMA, are key regulators of B cell maturation and plasma cell proliferation and survival. The TNF family is heavily associated with malignancies of PCs and its differential expression provides a mechanism by which to understand MM biology as MM has a variety of shared TNFs that characterize the disease. To investigate the contribution of different gene families into the molecular subtyping, we performed hierarchical clustering analysis on TNF genes as an example a gene family of interest in MM. The hierarchical clustering analysis was performed ComplexHeatmap R package v2.10.0 115 . Using normalized and scaled transcripts per million data, we performed hierarchical clustering of all TNF ligands and receptors across the six NMF groups. 3.2.7. Copy number variation (CNV) profiling of NMF groups Conventional cytogenetic and NGS genomic analysis established the major genomic subtypes, namely the hyperdiploid and translocation-driven MM groups, which constitute the current major molecular subtypes in clinical diagnosis and standard of care. We investigated whether the major genomic classes map to specific NFM groups or if there are genomic signatures within each transcriptomic subtype. For this analysis, we correlated the RNA transcriptomic groups with DNA genomic data by mapping the chromosomal DNA CNV profiles of each patient to the NMF subtypes. Copy number, defined as a gain or loss of a broad region of chromosome, was obtained from NGS results as performed according to the CoMMpass study protocol. 3.3. Results 3.3.1. Identification of six transcriptomic subtypes in NDMM From the rank survey analysis using the cophenetic coefficient and dispersion to optimize for the most representative meta-clusters, six candidate NMF groups were identified as the optimal solution of the non-smooth function. The six-cluster solution is shown in the NMF consensus matrix (Figure 3.1A) where the clusters are well-separated with less than 10% of samples showing a clustering correlation of less than 0.8. 48 Figure 3.1 NFM candidate clusters and corresponding survival analysis A. NMF Consensus Matrix for NMF with six candidate clusters. A deep red color, correlating with a score of 1, indicates a complete agreement with samples to group. A deep blue color, correlating with a score of 0, indicates no agreement with samples to group. B. Survival Kaplan-Meier curves for the six subtypes of NDMM. Curves are color per group based on survival (green to red, from best to worst) with the cohort line being in black. OS and PFS are shown as top and bottom graphs, respectively. 3.3.2. Overall survival (OS) and progression-free survival (PFS) in NMF groups The survival analysis on the six candidate clusters shows distinct survival profiles in the NMF subtypes that deviate from the population survival profile (Figure 3.1B). Specifically, there is significant difference in OS for the different subtypes (p=0.015). PFS is not significantly different (p=0.21). On per group basis, group 6 have the best OS and PFS while group 3 has the worst OS with group 4 having the worst PSF. Since PFS can be cofounded by the variation in treatments, we will focus the analysis on OS the target end point of interest. In the order of best to worst OS, group 6 has the best 5-year OS (76.70% +/- 4.23%), followed by group 1 (70.16% +/- 7.29%), then group 4 (66.34% +/- 8.39%), group 2 (61.21% +/- 4.38%), with group 5 (55.91% +/- 5.53%) and group 3 (51.08% +/- 7.57%) having the worst OS respectively. In comparison to the entire cohort, the OS of group 2 resemble that of this population of NDMM patients. Taken together, the data suggests that NMF subtyping identified a transcriptomic signature that correlate with survival and may potentially be used to stratify patients. 3.3.3 GSEA identified key pathways underlying the NFM groups 49 The proposed NMF groups could be used to further understand disease biology and particular in understanding how dysregulated pathways correlate with patient outcomes. We run the GSEA protocol across all NMF groups and ranked hallmark pathways across gene enrichment scores. Figure 3.2 shows the top ranked pathways for four groups, two with the best OS and 2 with the worst OS. Figure 3.2 GSEA pathways and survival outcomes across NMF groups Groups are colored according to survival profiles (green for best and red for worse) and ordered side-by-side for worst- vs-best and second-best vs second-worse. A. Group 6. B. Group 1. C. Group 3. D. Group 5. Blue bars are those with an adjusted p value of <0.05 while red bars are those with an adjusted p value greater than 0.05. 50 In group 6 (Figure 3.2A), which has the best OS amongst other subtypes and total patient population, the GSEA found that that MYC targets and interferon alpha are significantly upregulated while TNF-alpha (via NFkB), EMT, angiogenesis, and inflammatory pathways are downregulated. Not only the majority of downregulated pathways are for tumor formation and maintenance, but the upregulation of interferon alpha has also been previously correlated with host anti-tumor effects and provides a potential reasoning for improved prognosis in this group 116 . For group 1 (Figure 3.2B), with the second-best OS, all key pathways are downregulated as compared to the rest of the groups with MYC targets, oxidative stress, and cell cycle checkpoints representing the most downregulated pathways (Figure 3.2B). Group 3 (Figure 3.2C), with the worst survival, harbors the most significantly upregulated tumor driver pathways, specifically TNF-alpha (via NFkB), TGF-beta signaling, hypoxia, P53, apoptotic and inflammatory pathways. The co-enrichment and co-upregulation of these pathways is indicative of an aggressive tumor phenotype driven by highly proliferative and pro-survival plasma cells maintained via such factors as BCL2 and BCMA (TNFRSF17). In addition, the enrichment of inflammatory response and IL6 signaling indicate that apoptotic mechanisms are causing an inflammatory tumor microenvironment and thus contributing to the poor OS in this subtype of MM patients. In Group 5 (Figure 3.2D), with the second worse survival, the upregulated pathways include MYC and E2F targets followed by the unfolded protein response (UPR) and metabolic pathways. In this group, the TNFa, interferon, and inflammatory pathways are downregulated (Figure 3.2D). Not shown are the GSEA figures for group 2 and group 4 which have survival closely similar to the patient cohort’s survival. In group 2, upregulated pathways include E2F, cell cycle checkpoints (G2M), and UPR with tumor-driving pathways such as TNFa, P53, and inflammatory drivers downregulated. In group 4, key upregulated pathways include, inflammatory, DNA damage, and reactive oxygen species pathways with the MYC targets and UPR being downregulated. While these groups harbor myeloma driving pathways, they are enriched in pathways that are currently targeted therapeutically with state-of-the-art agents in MM. More specifically, besides the cell cycle checkpoints, the UPR is part of the proteasome biology in MM and represents a class of therapeutic targets with several different MM treatments available, including bortezomib, carfilzomib, and ixazomib as proteasome inhibitors. Overall, gene set enrichment analysis has provided insight into potential mechanisms of both pro- and anti-tumor activity within the six NMF subtypes of MM and may explain the survival profiles across the groups. 3.3.4 TNF genes present distinct profiles in NMF groups In the assessment of TNF gene expression across NMF groups, we find distinct patterns of enrichment (Figure 3.3). There are major clusters of TNF genes; on set 51 consistently highly expressed in all the groups (left) and another set consistently with low expression across all groups. Across groups, group 4 has a particular profile with a set of TNF genes with low expression (dark blue) and different subset of TNF genes highly expressed (yellow/orange) in this group and not in other groups. Figure 3.3 Distribution of TNF receptors and ligands in NMF groups Hierarchical clustering of TNF genes across the cohort. The columns (x-axis) represent TNF genes and the rows (y- axis) represent the six NMF subtypes of MM, (G for group followed by the group number). When comparing, there is no strong correlation between the TNF signature and survival, suggesting that this gene family by itself is not a driver of survival outcomes. Of particular note in this analysis, BCMA (TNFRSF17) is uniformly highly expressed across all six groups, consistent with the current data that BCMA is highly expressed in myeloma plasma cells and been a therapeutic target of interest in recent years 117 . On the other hand, a set of TNF receptors and ligands show high expression within certain groups than others and may point towards potential biomarker for MM subtypes. 3.3.5 CNV profiles reveal genomic signatures in NMF groups In the DNA CNV analysis, NMF groups broadly show genomic profiles consistent with myeloma genomics where we find mutational events in odd numbered chromosomes (Figure 3.4). However, differences in chromosomal events are observed when comparing survival between groups (Figure 3.4A). Group 6, 1, and 4, with the best survival outcome have consistently similar driver CNV events with some minor patient-specific variations within each group. Group 5 and 3, which have the worst survival outcomes show specific chromosomal deviations from the group with better outcomes. Specifically, group 3 is primarily driven by deletions of chromosome 13, while group 6 is driven by trisomy of 52 odd number chromosomes, a well-described molecular signature. Further, group 3 and 5 show minimal or no loss of 1p and chromosome 4 while also having minimal to no event on chromosome 2. While it’s worth noting that there are different number of patients in each NMF group, the CNV data provide genomic subtypes mapped to transcriptomic groups and consistent with survival outcomes. 53 Figure 3.4 DNA CNV profiles of patients in NMF groups CNV profiles are ordered by NMF group survival from best to worse. A. Mean CNV count in NMF group. B. CNV profiles per patient within groups; each row represents a patient, and each column represents a chromosome. White bars are odd numbered chromosomes and grey are even numbered chromosome. Chromosomal events are colored in red (gain) and blue (loss). 3.4. Discussion MM, an incurable PC malignancy, is characterized by expansive genetic heterogeneity between patients and subclonal variations within the tumor and spatially across bone marrow locations where myeloma fluid tumors reside. Despite significant progress in novel therapeutic agents that have led to increased survival, nearly 100% of patients relapse, partially due to therapy-resistant subclones that evolve new genetic alterations. While the genetic heterogeneity is well-understood, the functional consequences at the gene expression and protein levels remain to be elucidated. We hypothesized that there are transcriptomic molecular subtypes in myeloma that can be delineated to elucidate biological variability and better stratify patients by survival. We applied NMF, an unsupervised machine learning approach to model gene expression data, on RNA sequencing data across a cohort of 807 patients from the MMRF’s CoMMpass study to develop transcriptomic subtypes in NDMM. We identified six distinct transcriptomic groups of MM that describe specific biological signatures and survival profiles. DNA CNV analysis identified signatures that suggest MM subtypes may go beyond the two clinically defined groups. When we compared the GSEA top pathways in the four groups with best and worst survival profiles, there is noticeable inverse relationship between upregulated and downregulated pathways that aligns with the group survival probabilities. This provides additional evidence that the candidate NMF groups, identified by transcriptional profiling, can be used to best stratify patients. Further, based on the observations from our GSEA and TNF gene family analysis, it’s plausible that targeting the specific pathways underlying the biology in the NMF subtypes may offer better outcomes. Chapter 4 Mathematical Oncology to Integrate Multimodal Clinical and Liquid Biopsy Data for the Prediction of Survival The content of this chapter is adapted as submitted for publication. Ndacayisaba L.J., Mason J, Kuhn P. (2021). “Mathematical Oncology to Integrate Multimodal Clinical and Liquid Biopsy Data for the Prediction of Survival.” Chapter in Circulating Tumor Cells: Advances in Liquid Biopsy Technologies, Springer, 2nd Ed. 54 4.1. Introduction This chapter describes the work carried out in developing deep learning survival prediction models using multiparametric liquid biopsy and clinical data. While the research was performed in breast and prostate cancers, it shows the feasibility of a framework that can be applied in the plasma cell neoplasms. Remarkable breakthroughs in liquid biopsy and contemporary quantitative methods in multi-omics have afforded a deeper resolution of tumor biology and generated a wealth of multiparametric data. In parallel, theoretical advances in mathematical modeling, machine learning, and artificial intelligence (AI) have enhanced our capabilities to predict patient outcomes. The implementation and application of these approaches to modern biomedicine will revolutionize cancer care and actualize the promise of precision oncology. The significance of these technological and scientific advances is reflected by the priority given to AI and liquid biopsy in the National Cancer Institute (NCI)’s strategic plan for research in cancer detection and diagnosis 118–120 and the US Food and Drug Administration (FDA)’s approval of a variety of blood-based molecular profiling 121 and AI technologies for clinical use 122,123 . AI is the theoretical development and practical application of intelligent agents enabled by learning algorithms that are capable of mimicking logic and cognition of the human brain for a given task 124–126 . Since the conceptualization of learning machines by Alan Turing 127 and further characterization of AI by John McCarthy 128 , the field has further developed as an important area of research in mathematics, computer science, and engineering with applications that expands into computational biomedicine and particularly in mathematical oncology. The majority of AI tools currently in clinical use are for histopathology and radiology where machine and deep learning approaches have seen significant success enabled by theoretical advances in computer vision and image processing. In contrast, the success of predictive mathematical models in more complex spatiotemporal aspects of disease and patient outcomes such as therapy response and survival have lagged behind. Despite this, recently, efforts have been made to leverage cohort level data to identify trends in metastatic spread 129–131 as well as to predict overall and progression free survival 132 . Robustness in such models requires not only basic patient demographic and clinical variables, but also quantification of tumor biology features at the human, tissue, cellular, and molecular levels to account for the heterogeneity among tumors and inherent variability in latent feature relationships within an individual patient. For example, how does intracellular protein expression in a circulating tumor cell (CTC) relate to the metabolic state of the tumor tissue? This expands greatly upon the limited set of patient variables that clinical models of diagnosis, prognosis, and prediction have historically relied on. Recent technological advancements in liquid biopsy-based tumor profiling have driven the generation of extended biological and disease data from cancer patients at various scales and have consequently motivated the development of complex 55 mathematical models to map key driving elements in events such as disease initiation and progression in normal conditions as well as under treatment pressures and ushered in the development of machine and deep learning prediction models of survival 133–135 . A liquid biopsy, as defined by the NCI, is a method applied to a blood sample for the identification, isolation, testing, and analysis of analytes pertaining to an individual’s tumor 136 , which also extends to other fluids where similar analytes are expected to be found. These novel, minimally invasive methods boast a short procedural collection time and low sample acquisition cost. They allow for extensive tumor molecular profiling without the need for an invasive tissue biopsy and are also compatible with longitudinal patient monitoring 137–141 . Additionally, these large multi-omics data can be generated from multiple specimen types; both common (e.g., peripheral blood) and those most pertinent to the disease being studied (e.g., bone marrow aspirate for prostate cancer, multiple myeloma, or aqueous humor for retinoblastoma). Prominent analytes include CTCs, cell-free nucleic acids (DNA and RNA), extracellular vesicles, and metabolic molecules, among others 142 . The characterization and analysis of CTCs have proven to both reproducibly represent the heterogeneity of the initiating tumor (primary or metastatic) 137,138 and often also present unique genomic clonality as cancer evolves as a result of microenvironment pressure or therapeutic perturbation 59,61,143 , making them one of the fundamental analytes used in single cell biology to elucidate the underpinnings of the hallmarks of cancer 144 . Comprehensive profiling of CTCs spans diverse methodologies, including reverse transcription polymerase chain reaction (RT-PCR), fluorescent in situ hybridization (FISH), next generation sequencing (NGS), and array comparative genomic hybridization (aCGH) to name a few, however, the most commonly used is immunofluorescence 142 . For a single cell of interest, comprehensive molecular profiling can generate morphological data, genomic and proteomic landscapes, metabolomic states, transcriptomic states, as well as epigenetic maps, providing higher resolution into the biology of the tumor in space and time 145,146 . The emerging rise in large liquid biopsy multimodal datasets add multiscale resolution to patient descriptors and has motivated efforts from consortia of private, academic, and government to create bespoke repositories of datasets for research. The Blood Profiling Atlas in Cancer (BloodPAC) 147,148 , pioneered by the team of then-Vice- President Biden’s Cancer Moonshot Initiative 149 , is one such leading initiative for the standardization of liquid biopsy platform data elements, data curation and storage across academia and commercial organizations 149,150 . These multi-center liquid biopsy datasets are rich in tumor biology that can be utilized, in combination with demographic and clinical data, to improve the accuracy of predicting individual patient outcomes. This chapter aims to highlight how the convergence of AI and liquid biopsy holds the promise for the future of cancer detection and the realization of precision oncology. We present advances in predictive mathematical oncology with a sharp focus on 56 integrating multimodal and multiscale liquid biopsy data into machine and deep learning models for survival prediction. To guide the reader, this chapter starts off with a data- driven discussion about the elements and structure of demographic and clinical data and presents the challenges associated with data integration emanating from different health care systems. Next, we discuss the recent emergence of multimodal liquid biopsy datasets and present the challenges of data sparsity, scale, and dimensionality as it pertains to developing robust predictive models of patient outcomes. Methodological approaches for handling missingness, sparsity, and high-dimensional data in the context of developing predictive mathematical models are subsequently presented, with a deep dive into generative adversarial neural networks. The next two sections discuss a proof of concept in building predictive machine and deep learning models in both multi-center clinical data in breast cancer and on integrated clinical and single cell morpho- proteogenomic liquid biopsy data in metastatic prostate cancer. Finally, the chapter concludes with a forward-looking discussion on how future advances in integrating liquid biopsy data with predictive mathematical modeling will improve patients’ outcomes. 4.2. Materials and Methods 4.2.1. Integration of Multi-Center Demographic and Clinical Data Considerations to address the challenge of multi-institutional data integration in oncology focus on addressing the lack of standardization and interoperability across different clinical centers and healthcare systems, the resulting sporadic incompleteness in data, and suitable approaches to impute mixed multimodal data types. 4.2.1.1. Non-standardization and sporadic incompleteness in multi-center demographic and clinical data Cancer patient data typically consist of demographic and clinical factors that describe the patient, the tumor, and the treatment at specific moments in time. The patient features are mostly commonplace descriptors such as age, gender, race, and ethnicity, and measurable values that are easily and routinely collected in the clinic, such as height, weight, and blood pressure. The disease- and treatment-specific features are often tailored to the primary cancer (e.g., breast, lung, prostate), but usually contain the stage at diagnosis, tumor size and grade, lymph node biopsy status, and specific histology (e.g., ductal [breast], adenocarcinoma [lung], urothelial [bladder]). Locations of metastatic progression, specific interventions, therapy types (e.g., chemo, radiation, targeted), and relevant corresponding dates (e.g., diagnosis, progression, death, last follow-up) are also included. These demographic and clinical variables contain prognostic and predictive markers that can be used for clinical decision making based on how well a patient is doing 57 or will do given a specific treatment pathway. However, the patient, tumor, and treatment variables (data elements) are often collected with some level of sparsity and inconsistency in data entries because of a variety of factors which can include the duration of the disease, people’s capabilities of changing locations, evolving disease complexities and the need for subspecialties, as well as various comorbidities. Additionally, and due in part to health care disparities, most cancer patients are diagnosed in the community health centers while a subset of patients are diagnosed at larger academic cancer centers 151–153 , which creates inconsistencies in collected data. Additionally, as many patients with severe disease still prioritize follow-ups at local health centers for convenience, even after major decisions might be made at academic medical centers, the resulting challenges in adherence to standards and variation in data collection methods bring further disconnect, both at the individual patient level and in cohort studies 154 . Routine follow-up doctor visits can be missed, measurements forgotten or skipped (e.g., height, weight), or a patient can refuse to report certain information (e.g., ethnicity, drug use, alcohol use). Beyond missed follow-up visits for cancer surveillance and/or care, regular visits to non-oncologists (physician checkups) can be missed and can be just as important in data collection to provide a more complete patient data matrix. Additionally, the inherent heterogeneity within the disease creates missing data due to the wide diversity in both disease progression (e.g., metastases, tumor response) as well as treatment patterns (e.g., drug classes, side effects, etc.), all of which are in the context of an ever-changing landscape of new and novel therapies. Over the past decade, many steps have been taken to increase the utility of electronic health records (EHRs) for improved patient outcomes, starting with the Centers for Medicare & Medicaid Services incentivizing their meaningful use 155,156 . Subsequently, the 21st Century Cures Act ensured patients have access to their own data and set the stage for the interoperability between disparate EHR sources (e.g., general practitioner, urologist, cardiologist, etc.) 155,157,158 . Despite these efforts however, the current system of specialty and generic health records, as well as legacy databases, are not compatible with each other, leading to multi-level interoperability problems 159,160 . With regards to data entry and recording methods, further intrinsic variability is inevitable due to lack of machine readability, particularly in categorical data entries by physicians and clinical staff (e.g., free text areas with no prescribed values such as with comment boxes). Additionally, there are various demographic and clinical terminologies that can be used based on the reporting institution’s preferences (e.g., race: black vs African American; weight: pounds vs kilograms). Similarly, institutional preferences to diagnostic guidelines and tumor profiling technologies affect the consistency of data elements. The same biomarker, such as the HER2 status in breast cancer, can be measured via FISH or Immunohistochemistry (IHC) 161 . An extended discussion is illustrated in Section 6 within a multi-center breast cancer dataset where Memorial Sloan Kettering Cancer Center preferentially focuses on tumor anatomical data elements during the time 58 period of data collection from 1975 to 2013, while MD Anderson Cancer Center focuses on patient data elements. One consequence from combining such datasets is that sets of variables unique to a subset of the sources create missing data in those that do not contain those variables. While merging multiple datasets is a technically straightforward solution, this quickly becomes a non-trivial task due to the above inconsistencies and incompleteness and the considerations for the origin of these variations and the implications for building robust clinical-grade predictive models. Traditionally, one would only use the shared variables across datasets, but with increasingly more sources, this would reduce the size and viability of the combination, thus rendering any downstream analyses useless. We therefore present and discuss some of the methodological advances in handling the missingness in multi-center multimodal data; approaches that are further applicable to the challenge of integrating liquid biopsy datasets to mathematical modeling efforts. 4.2.1.2. Imputation in mixed data types As a consequence of the previously discussed lack of interoperability and standardization, there is a critical need to develop a methodological workflow for handling missingness in datasets within and emanating from various sources. Further, since prediction models require consistency in datasets 162,163 , there are two primary options for handling missing data: 1) remove patients and/or variables with missing values or 2) impute the missing values from those available. Imputation is the means by which missing data instances are estimated or extrapolated from other data in the study. Given that demographic and clinical datasets are already small and limited, removing patients or variables would significantly affect the accuracy and clinical application of any models built from them 163,164 . Imputing the missing values can be done in several ways. Some common methods include imputation by mean (average of all available values), substitution (using a non-included patient’s value), hot deck (randomly chosen from similar patients), cold deck (systematically chosen from similar patients), regression (predicted from regression of available values), stochastic regression (predicted from regression of available values with a random addition), and interpolation or extrapolation (estimated from other observations of the same patient) 164,165 . The most optimal method is dependent on the data and many have applied multiple statistical and machine learning algorithms to this task 164,166 . A few widely used implementations include knnImpute 167 , MICE 168 , and missForest 169 . For our modeling approaches, we have benchmarked these methods and found that missForest is a reproducibly robust method for handling missingness in mixed data types and in both high and low dimensional data cases and utilize the results for building survival prediction models. 59 4.2.2. Sparsity, Scale, and Dimensionality of Liquid Biopsy Data in Comprehensive Profiling Liquid biopsy data is inherently multimodal and multiscale. In order to profile the biology of the tumor with sufficient resolution using liquid biopsies, various methods are used to generate morphometric, genomic, proteomic, transcriptomic, epigenomic, and metabolomic data 144 . While limited datasets are available in the public sphere, for the purpose of this chapter, we focus on the liquid biopsy data generated by the High- Definition Single Cell Assay (HDSCA) workflow, our next-generation morpho- proteogenomics technology for comprehensive multianalyte liquid biopsy profiling which integrates both a microscopy-based image analysis and a computational methodology for rare cell detection and clustering. As part of a dedicated data science infrastructure for real-time analysis, the data are aggregated into a custom SQL database for streamlined downstream analysis and for integration into mathematical modeling and predictive machine and deep learning. 4.2.2.1. Comprehensive CTC profiling with HDSCA HDSCA is a comprehensive workflow for cell-free and cell-based molecular profiling and has been described as a high-throughput CTC analysis workflow by Thiele et al. 62 and in Springer Tumor Liquid Biopsies 170 (Figure 4.1A). Briefly, HDSCA is founded in a “no-cell left behind” approach, has been validated across multiple epithelial cancers 171 including breast 45,59,172 , prostate 45,173,174 , lung 175–178 , pancreatic 45 , colorectal 23 , melanoma 46 , retinoblastoma 179 , and in blood cancers, namely multiple myeloma 44 . Figure 4.1 Multimodal liquid biopsy data from the HDSCA workflow 60 A. The HDSCA Workflow and Chain of Custody in Data Generation. B. CTC morphometric data include CTC enumeration in single event and clusters, cellular morphology with cell and nuclear size and shape along with marker expression. C. Genomics data include chromosome level amplification and deletions along with instability score and LST values for each sequenced cell. D. Imaging Mass Cytometry (IMC) proteomics data as measured by ion count per for each pair of heavy metal isotope and conjugated antibody for a target multiplexed panel that includes membranous, cytoplasmic, and nuclear markers. HDSCA has been comprehensively validated 45,50,49 and commercialized as the Epic Sciences platform. While not yet approved by the FDA as a 510(k) or premarket approved product, HDSCA is clinically validated for CLIA/CAP compliance and regulatory approved at the state level as a laboratory developed test and is reimbursed for the identification of AR-V7 in prostate cancer 174,180–184 . A core aspect of HDSCA is the recording of the complete chain of custody of every analyte from the moment the sample enters the laboratory through data analysis. This includes receiving the packaged sample, isolating the plasma for cell-free nucleic acids, plating the nucleated cells onto a glass slide, cryobanking if/when applicable, immunofluorescent staining with a study specific assay, and scanning via automated microscopy. Since the entire population of cells is individually distributed across the slide, which allows for downstream analyses at the single cell level. Additionally, the adaptability and flexibility of HDSCA makes it a broadly enabling technology for liquid biopsy data generation across multiple disease areas and specimens. While our work has primarily focused on peripheral blood and bone marrow aspirate, other fluid specimens can be analyzed, as evident by recent expansions into aqueous humor 179 , cerebral spinal fluid, and peritoneal fluid. This versatility means multianalyte characterization can be performed on CTCs, tumor microenvironment cells, extracellular vesicles (exosomes, oncosomes, etc.) 185 , cell-free nucleic acids, platelets, and other secreted molecules, thus providing extensive datasets at multiple scales of the tumor biology for added biological resolution in predictive modeling. To date and of primary focus in the results presented in this chapter, the main categories of multimodal data generated via HDSCA are products of the CTCs identified, namely enumeration, single cell morphology, genomics, and proteomics. 4.2.2.2. Morphometrics: immunofluorescent detection, identification, and enumeration of CTCs The morphometric features of each single cell are generated by EBimage 60 and primarily based on fluorescent imaging of the cell. These features, which include marker expression as well as cell and nuclear shape and eccentricity, are used to describe the biophysical and cellular identity of the cell. A rare event detection and classification algorithm utilizes these cell morphology data to identify CTCs among other nucleated cells located on a slide, yielding an enumeration value (Figure 4.1B). 61 4.2.2.3. Genomics: copy number variation analysis in single cells and cell-free DNA Given that the cartesian coordinates of each individual cell are maintained for every slide, relocation, removal, and genetic barcoding for genomic analysis is possible. Target CTCs are isolated for sequencing following a standardized protocol: whole genome amplification (WGA) followed by library preparation for DNA sequencing as previously described 59,186 . Single cell sequencing data is run through a genomics pipeline to generate chromosomal DNA copy number variation (CNV) data that consists of chromosomal deletions and amplifications that further determine the clonality of CTCs. The aggregate chromosomal data can also be measured and analyzed as instability scores and/or large-scale state transitions (LSTs) 187,188 . Following a standardized protocol and advanced through the same genomics pipeline as single cells, the mutational landscape of the tumor as captured in the liquid biopsy sample can be analyzed alongside that of cells 189 (Figure 4.1C). 4.2.2.4. Proteomics: multiplex imaging mass cytometry for subcellular features of single cells Targeted multiplex imaging mass cytometry (IMC) for single cell subcellular profiling applies 40-plex Maxpar metal-labeled antibody panel staining on a 400 x 400 μm region of interest (ROI) on a glass slide 48,190 . A single ROI typically contains 200-300 cells that include target CTCs (cells of interest) and surrounding white blood cells. Laser ablation on ROIs is followed by plasma torch ionization measured by time-of-flight to quantify metal ions by mass differential. Pixel-based image reconstruction is followed by segmentation to generate ion counts, the readout for protein expression. Automated gating is applied and resulting data contain single cell phenotype for the identification of cell subpopulations that describe the type and state of single cells contributing to the tumor biology in circulation and in the tumor microenvironment 47,48,191 (Figure 4.1D). 4.2.3. Integration and Augmentation of Multimodal and Multiscale Liquid Biopsy The heterogeneity in the dimensionality and scale of oncology data poses a data integration challenge particularly for the prediction of patient-level outcomes. Multimodal measurements range from personal and demographic information to physiological tumor data, to histopathology imagery and subcellular characterization of tumor tissue resident cells. Additional measurements of disease state include blood count, metabolic data, CTC count, and molecular biomarkers from targeted gene panel analysis and cancer antigen profiling. The cellular and molecular characterization of fluid samples is performed on any specimen in fluid form such as blood, urine, bone marrow aspirate, cerebral spinal fluid, peritoneal fluid, and more. These measurements generate a truly comprehensive analysis of the patient’s disease at all scales and yield data spanning multiple dimensions. 62 4.2.3.1. The need for methods for multimodal data integration The multiscale and high-dimensional nature of contemporary comprehensive profiling in oncology presents a challenge for modeling and prediction, and computational methods to combine the resulting multi-parametric data are needed. Additionally, data is collected across multiple time-points during a patient’s care. For patients with metastatic disease, samples are collected at different physiological and anatomical locations. This spatiotemporal aspect of cancer adds increased complexity for data integration. While efforts to develop multimodal data integration approaches are being explored and implemented in oncology, and are showing both promising utility for predictive modeling 192–196 , such work has not been explored in the context of liquid biopsy for survival prediction. 4.2.3.2. The need to augment data to address sparsity in liquid biopsy Mathematical oncology modeling requires large patient cohorts for predictive power. This is especially true for machine and deep learning efforts where enormous data are needed for algorithmic learning of data representation to uncover the complex relationships within the data space, thereby enabling predictive capabilities that are robust, reproducible, and clinically useful. Therefore, one of the main challenges in modeling lies in integrating datasets from different studies to achieve sufficient patient numbers for training and testing. Data sparsity arises when datasets collected across different studies and clinical sites with variations in collection technologies and experimental design report different data elements, leading to gaps or ‘missing data’ in the combined dataset. This is especially true for liquid biopsy studies where, up to now, the cohort sizes in research level studies are relatively small, requiring novel data integration methods where data augmentation would be most useful. Data augmentation is the process by which data instances or elements are expanded by synthetic data generation of distribution sampling to increase the representation of the learning space while preserving the distribution of the data in a given cohort 197 . 4.2.3.3. Data augmentation using generative models and information geometry Methods for data augmentation spans the fields of deep learning, generative modeling, information geometry, and probability theory. The application of generative adversarial neural networks (GANs) 198 developed by a team at the University of Toronto is an example of recent successful development in data augmentation using deep learning. The fundamental structure and architecture of GANs comprises of two multilayer perceptrons, a generative model and a discriminative model, that work in a minimax two-player adversarial game to map the distribution of the training data with that of a real and synthetic dataset 198 . The adversarial game between the two neural networks can be likened to a situation where a counterfeiter is fabricating a bill while a 63 law enforcement officer works to find the counterfeit note, a game in which success depends on maximizing the understanding of the finely detailed features of the notes. Beyond the theoretical framework by Goodfellow et al, NVIDIA’s computer vision teams published a practical implementation showing successful human face mimicry achieved by GANs of an optimal discriminator and popularized in the creation of ‘deepfakes’, convincingly realistic images and videos 199–201 . Applications of GANs in biology and healthcare have more recently emerged, particularly in computational histopathology for augmenting, cleaning, and enhancing of pathological tissue images for classification problems 123,202,203 . Unlike in classical digital pathology, liquid biopsy data is not entirely image-based and, therefore, the application of GANs to liquid biopsy data is still lacking. As such, we explore the feasible utility of this approach for data augmentation, particularly in small cohort settings. Further, the lack of interpretability of neural network-based models hinders their applicability to biomedical tasks and approaches from information geometry. Bayesian inference models are apt for these tasks as they are amenable to statistical interpretation and thus improve the utility of generative models 204 , which has motivated the advances in applying copulas and conditional generative models for data augmentation 205,206 . Copulas, originally theorized by the Sklar’s theorem 207 , are multivariate functions that model the probability distributions of data and are adaptable for synthetic data generation 208,209 . Copulas can be implemented for multivariate and topological Gaussian modeling 210,211 , as part of a generative deep neural network for out- of-sample data generation 212,213 . Using the Synthetic Data Vault 214 , we benchmarked six different synthetic data generation models including Gaussian Copula, Copula GAN, CTGAN, 215,216 and TVAE and found that Gaussian Copula performed the best in sampling a larger integrated dataset that preserves the distribution of the real liquid biopsy cohort data. 4.2.4. Advances in Machine and Deep Learning Methodologies for the Prediction of Survival Survival analysis and modeling is a statistical approach to measure and predict time-to-event (also known as survival time) based on underlying variables and their risk contribution to the target outcome 217 . In medicine and oncology, time-to-death is the prominent survival time and various linear and non-linear methods have been developed and optimized for survival data modeling. The Kaplan-Meier estimator, a univariate linear approach 218 , has been the primary method for survival analysis 219 . While linear survival methods work well in complete and small cohorts, survival modeling in high- dimensional, non-linear, and censored datasets requires specialized survival prediction techniques. Extensions to the Kaplan-Meier estimator such as the work of Sir David R. Cox 220 and others have advanced the field to account for relationships between covariates and account for the hazards functions especially in multivariate settings. 64 4.2.4.1. Applications of classical machine learning to survival prediction The application of classification and regression prediction methods to survival analysis continues to pave the way for development of new techniques for use in multivariate and multimodal censored datasets. Besides Leo Breiman’s random forest model’s 221 adaptation as an ensemble method for survival prediction 222–224 , support vector machines 225 , and Bayesian 226,227 among other regression and classification algorithms have also been adapted for survival prediction 228,229 . The advent of complex and large datasets in oncology and the need to utilize them for patient outcomes prediction presents a limitation for some of the classical machine learning models. 4.2.4.2. Advances in deep learning models for survival prediction The accelerated success of deep learning in the past three decades has spurred further developments in applying artificial neural networks for robust learning of non- linear representations in multimodal datasets and their flexibility has allowed for handling large, censored data and applying non-linear risk functions for survival prediction. Complex heterogeneous and multiparametric datasets require nonlinear survival prediction methods that capture the complex relationships between patient multimodal covariates. The Faragi-Simon network was the pioneering work in the application of deep learning methods for non-linear survival prediction 230 and multiple neural network architectures have been adapted since 135 . 4.2.4.3. DeepSurv as a case example: architecture and hyperparameters Introduced by Katzman and colleagues 231 , DeepSurv is a deep feed-forward neural network model consisting of alternating connected and dropout layers through which baseline patient data is propagated. The multilayer perceptron architecture implements a Cox proportional hazards (CPH) model 232 and fits both the survival and hazard functions as part of the output layer. Together, these developments highlight the increasing convergence between the fields of artificial intelligence and survival prediction in oncology and open doors for implementation into exploring the utility of liquid biopsy data. In our work, we sought to incorporate liquid biopsy to this exciting interdisciplinary approach towards optimizing the prediction of patient outcomes. In the following two sections, we benchmark machine and deep learning survival models against CPH models in the context of multi-institutional data integration and liquid biopsy and clinical data integration. 65 4.3. Results 4.3.1. Survival Prediction on Multi-Center Demographic and Clinical Data in Two Breast Cancer Cohorts To illustrate the capabilities of multi-center data integration in the context of survival prediction, we utilize two distinct demographic and clinical datasets from the Memorial Sloan Kettering Cancer Center (MSK) 129 and The University of Texas MD Anderson Cancer Center (MDA) 131 . Both sets consist of early-stage breast cancer patients that are non-metastatic at diagnosis and eventually progress to metastatic disease. 4.3.1.1. Patient data, integration and imputation of MSK and MDA datasets The MSK set contains 446 patients diagnosed and treated between 1975 and 2009 with 44 corresponding variables. The MDA set contains 3,735 patients diagnosed and treated between 1980 and 2016 with 23 corresponding variables. Both sets contained the patient’s age, date of definitive surgery (i.e., lumpectomy, mastectomy), histology type, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth receptor 2 (HER2) status, lymphovascular invasion, dates and sites of metastatic development, date of last known follow-up, and survival status (deceased vs alive). Additionally, both sets contained information on targeted therapy, chemotherapy, and hormonal therapy. The MSK set contained information about primary tumor lateral location (left vs right), size, histological grade, and surgical margins, while the MDA set contained information about race, menopausal status, and BMI. Prior to combining these two sets, certain steps were taken to ensure compatibility. First, all dates provided in the MSK dataset were converted to “days from diagnosis” due to the MDA dataset not providing dates for patient privacy concerns. Second, some metastatic sites were reduced to a more general classification to enable grouping with equivalent terminologies (e.g., intrathoracic lymph nodes → distant lymph nodes). Lastly, values for common variables such as ER, PR, and HER2 status were adjusted for consistent nomenclature (e.g., Positive, Pos, +). Variables used within the model are listed in Table 1 and selected based on their predetermined clinical significance as well as their inclusion in both datasets. Note that some variables were only included in one of the two but were deemed too important not to include. Table 4.1 Demographic and clinical variables in the integrated breast cancer dataset List of demographic and baseline clinical variables utilized in survival prediction models for the merged breast cancer dataset as well as MSK and MDA individually. Statistics represent data before missing value imputation. Total MSK MDA Age, (years) Median (Min-Max) 48 (19-96) 48 (23-80) 48 (19-96) Race, N (%) 66 White 2611 (62.4) --- 2611 (62.4) Black 533 (12.7) --- 533 (12.7) Hispanic 427 (10.2) --- 427 (10.2) Asian 120 (2.9) --- 120 (2.9) Other 44 (1.1) --- 44 (1.1) Menopause status, N (%) Premenopausal 1823 (43.6) --- 1823 (43.6) Postmenopausal 1847 (44.2) --- 1847 (44.2) BMI (kg/m 2 ) Median (Min-Max) 27.1 (14.3-61.9) --- 27.1 (14.3-61.9) Tumor Size, (cm) Median (Min-Max) 2 (0.1-12) 2 (0.1-12) --- Histology Type, N (%) Ductal 3174 (75.9) 12 (0.3) 3162 (75.6) Invasive Ductal 432 (10.3) 353 (8.4) 79 (1.9) Lobular 255 (6.1) 0 (0.0) 255 (6.1) Mixed 179 (4.3) 17 (0.4) 162 (3.9) Other 86 (2.1) 9 (0.2) 77 (1.8) Invasive Lobular 38 (0.9) 38 (0.9) 0 (0.0) Histology Grade, N (%) G1: Well differentiated 5 (0.1) 5 (0.1) --- G2: Moderately differentiated 83 (2.0) 83 (2.0) --- G3: Poorly differentiated 263 (6.3) 263 (6.3) --- Histology Location, N (%) Left 216 (5.2) 216 (5.2) --- Right 224 (5.4) 224 (5.4) --- Clinical Stage, N (%) Stage I 704 (16.8) --- 704 (16.8) Stage II 1671 (40.0) --- 1671 (40.0) Stage III 1360 (32.5) --- 1360 (32.5) ER Status, N (%) Positive 2381 (56.9) 325 (7.8) 2056 (49.2) Negative 1606 (38.4) 118 (2.8) 1488 (35.6) PR Status, N (%) Positive 1803 (43.1) 222 (5.3) 1581 (37.8) Negative 2123 (50.8) 197 (4.7) 1926 (46.1) HER2 Status, N (%) Positive 766 (18.3) 82 (2.0) 684 (16.4) 67 Negative 2646 (63.3) 342 (8.2) 2304 (55.1) Nuclear Grade, N (%) I 96 (2.3) --- 2.7 (0.1) II 992 (23.7) --- 28.3 (0.7) III 2413 (57.7) --- 68.9 (1.6) Lymphovascular Invasion, N (%) Positive 1765 (42.2) 188 (4.5) 1577 (37.7) Negative 2239 (53.6) 163 (3.9) 2076 (49.7) Inflammatory Breast Cancer, N (%) Yes 307 (7.3) 8 (0.2) 299 (7.2) No 3874 (92.7) 438 (10.5) 3436 (82.2) Overall Survival Time, (years) Median (Min-Max) 5.2 (0.4-33.7) 8.4 (1-33.7) 4.9 (0.4-31.6) Survival Status, N (%) Deceased 2901 (69.4) 273 (6.5) 2628 (62.9) Censored 1280 (30.6) 173 (4.1) 1107 (26.5) This newly combined dataset contained 4,181 primary breast cancer patients with 17 variables and a total missingness of 21.8% (Figure 4.2). It is important to note that there was an added 13.6% missingness created by utilizing variables that were unique to one of the two datasets (MSK: 2 variables generating 10.5% missingness; MDA: 5 variables generating 3.1% missingness). Missing values were imputed with the missForest methodology, and the pre- and post-distributions can be seen in. Figure 4.2 Missingness in MSKCC and MDACC integrated breast cancer dataset 68 Missing data plot for Memorial Sloan Kettering Cancer Center and MD Anderson Cancer Center integrated breast cancer dataset. 4.3.1.2. Survival prediction with a CPH, a random survival forest, and DeepSurv To predict overall survival, we first split the data into three sets for training (64%), validating (16%), and testing the model (20%). We then use DeepSurv to build a feed- forward neural network to predict survival risk for each patient 231 . Initial models were trained and hyperparameters tuned for optimization. Model performance and prediction accuracy were measured by Harrell’s concordance index 233 and negative-log likelihood (NLL). The c-index is a measure of model performance on prediction and ranges from 0 (no concordance) to 1 (perfect concordance). NLL is calculated by the loss function and it decreases with increasing iterations as the model learns and makes fewer prediction errors. Subsequently, the input variables were ranked based on the importance of their contribution to the accuracy of the model. This was measured by mean decrease in accuracy within the baseline random survival forest (RSF) model. Figure 4.3A shows predicted risk of the DeepSurv model as compared to a CPH model, an RSF, and the ground truth (Kaplan-Meier of actual patient survival). As indicated, the CPH model outperforms (closer to ground truth; c-index = 0.712) our deep learning model. Both models outperform the RSF. The deep learning model achieves a c- index while training of 0.705 and NLL of 6.97. The corresponding RSF achieves a c-index of 0.706. The most important variables for model prediction, as determined by the RSF, were the patient’s clinical stage at diagnosis, ER status, PR status, nuclear grade, and tumor size (Figure 4.3B). 69 Figure 4.3 Survival predictions and variable importance A. Predicted survival curves show survival probability over time as calculated by DeepSurv model, CPH model, and RSF built from breast cancer merged datasets. B. Feature importance generated by RSF indicating most important variables in predicting overall survival. When taken separately, the models built from the individual datasets perform similarly to the combined. Using only the MSK data (446 patients; 12 unique variables), a DeepSurv model (c-index = 0.709) outperforms both its respective CPH (c-index = 0.702) and RSF models (c-index = 0.662). Likewise, for those utilizing only MDA data (3,735 70 patients; 15 unique variables), the DeepSurv model achieves a c-index of 0.706 while the CPH and RSF models produce c-indexes of 0.704 and 0.697, respectively. 4.3.1.3. DeepSurv outperforms RSF and CPH in individual and merged datasets In both the individual and combined dataset cases, all three models performed similarly with minimal difference in accuracy where DeepSurv and CPH outperform RSF. This observation is consistent with the benchmark results by Katzman et al. where DeepSurv outperformed RSF 231 . The close similarity in performance between CPH and DeepSurv is explained by the fact that DeepSurv is an implementation of the CPH model in a feed-forward neural architecture and therefore preserves the theoretical hazard proportionality assumption in the CPH model 232 . The results show that despite the variations in data collection between different clinical sites the performance of the DeepSurv model remains robust when data are combined and imputed using our proposed framework. In comparison to RSF where the difference in accuracy is significant between the single datasets and the combined (i.e., variations in the size of features and instances), the deep learning architecture handles such variations in data set size more efficiently. Further, the top features in the RSF variable importance ranking shows a mixture of variables at different scales, i.e., patient level parameters (clinical stage) along with biochemical measurements of molecular disease features (ER and PR status) and tissue-based histology information (nuclear grade and tumor size). This result points to the importance of utilizing multimodal data in predictive models by expanding the feature space with extensive disease characterization variables at multiple dimensions including cell-based and cell-free liquid biopsy data. 4.3.2. Survival Prediction on Integrated Demographic, Clinical, and Liquid Biopsy Data in a Prostate Cancer Cohort Based on the results cited above, we hypothesize that liquid biopsy data, when added to demographic and clinical data, may improve the precision and accuracy of survival prediction models built from demographic and clinical data alone and develop proof-of-principle models where we integrate HDSCA morpho-genomics liquid biopsy data with clinical data in a prostate cancer cohort. The structure and architectures of the survival models benchmarked here (DeepSurv, CPH, RSF) are consistently and reproducibly kept similar to those in the previous section. 4.3.2.1. Patient data, imputation, and augmentation Individual-level data from a randomized, open-label study conducted at The University of Texas MD Anderson Cancer Center (MDA) investigating the efficacy of carboplatin in combination with cabazitaxel was used to investigate this hypothesis (NCT01505868) 234,235 . This trial included patients who were previously treated for metastatic castration resistant prostate (mCRPC). The entire study consisted of 170 participants with 1 being excluded from treatment, however only 68 provided peripheral 71 blood samples for analysis via the HDSCA workflow. Of these 169, 92 received carboplatin in combination with cabazitaxel and 77 received cabazitaxel alone. Demographic and clinical variables were collected at initial screening visits and prior to liquid biopsy sample collection (Table 4.2). Demographic and clinical variables for each patient included age, race, ECOG performance status, Gleason score, prior metastatic locations, and blood biomarkers. Other common clinical variables associated with mCRPC were also included. Table 4.2 Demographic and clinical variables in the integrated prostate cancer dataset List of demographic and baseline clinical variables utilized in survival prediction models for the integrated prostate cancer dataset. Statistics represent data before missing value imputation. Total Clinical Only Clinical + LB Treatment Arm, N (%) Cabazitaxel 77 (45.6) 47 (46.5) 30 (44.1) Cabazitaxel+Carboplatin 92 (54.4) 54 (53.5) 38 (55.9) Age, (years) Median (Min-Max) 61 (42-84) 61 (42-79) 61 (43-84) Race, N (%) White 129 (76.3) 75 (74.3) 54 (79.4) African American 24 (14.2) 16 (15.8) 8 (11.8) Hispanic 12 (7.1) 7 (6.9) 5 (7.4) Asian American 3 (1.8) 2 (2.0) 1 (1.5) Native American 1 (0.6) 1 (1.0) 0 (0.0) Height, (cm) Median (Min-Max) 177 (156-190) 177 (156-190) 177 (160-190) Weight, (kg) Median (Min-Max) 91 (56.6-152) 88.2 (56.6-141.2) 93.4 (61-152) BMI, (kg/m 2 ) Median (Min-Max) 28.8 (17.6-47.7) 28.45 (17.6-47.7) 29 (19.5-46.4) Radical Prostatectomy, N (%) Yes 67 (39.6) 33 (32.7) 34 (50.0) No 88 (52.1) 54 (53.5) 34 (50.0) Gleason Score, N (%) 2+3 2 (1.2) 2 (2.0) 0 (0.0) 3+3 9 (5.3) 5 (5.0) 4 (5.9) 3+4 10 (5.9) 7 (6.9) 3 (4.4) 3+5 2 (1.2) 1 (1.0) 1 (1.5) 4+3 15 (8.9) 9 (8.9) 6 (8.8) 4+4 20 (11.8) 11 (10.9) 9 (13.2) 72 4+5 65 (38.5) 36 (35.6) 29 (42.6) 5+4 8 (4.7) 3 (3.0) 5 (7.4) 5+5 9 (5.3) 5 (5.0) 4 (5.9) Metastatic Setting at Diagnosis, N (%) Localized at diagnosis 62 (36.7) 36 (35.6) 26 (38.2) De Novo metastatic 48 (28.4) 30 (29.7) 18 (26.5) Prostate Specific Antigen (ng/mL) Median (Min-Max) 30 (0-4847.1) 35.5 (0.1-4847.1) 20.4 (0-681.6) ECOG Performance Status, N (%) 0 43 (25.4) 23 (22.8) 20 (29.4) 1 or 2 126 (74.6) 78 (77.2) 48 (70.6) Total Alkaline Phosphatase, (IU/L) Median (Min-Max) 124 (36-1434) 140 (36-1434) 120 (40-1080) Bone Alkaline Phosphatase, (IU/L) Median (Min-Max) 32 (5.1-692) 36 (5.2-692) 29 (5.1-316) Lactate Dehydrogenase, (U/L) Median (Min-Max) 539 (165-8595) 551 (165-8595) 498.5 (239-2171) Urinary N-telopeptide, (nmol/mmol) Median (Min-Max) 48 (7-790) 51 (9-777) 39 (7-790) Albumin, (g/dL) Median (Min-Max) 4.1 (2.4-5.1) 4.1 (2.4-5.1) 4.1 (3.2-4.8) Hemoglobin, (g/dL) Median (Min-Max) 12.1 (8.4-34.1) 11.95 (8.4-15.2) 12.15 (8.4-34.1) White Blood Cell Count, (#/L) Median (Min-Max) 6.8 (2.2-16.6) 6.4 (2.2-13.4) 6.9 (3.7-16.6) Bone Metastases, N (%) Yes 155 (91.7) 94 (93.1) 61 (89.7) No 14 (8.3) 7 (6.9) 7 (10.3) Lymph Node Metastases, N (%) Yes 72 (42.6) 43 (42.6) 29 (42.6) No 97 (57.4) 58 (57.4) 39 (57.4) Visceral Metastases, N (%) Yes 42 (24.9) 23 (22.8) 19 (27.9) No 127 (75.1) 78 (77.2) 49 (72.1) AVPC Signature (Clinical), N (%) Pos 45 (26.6) 4 (4.0) 41 (60.3) 73 Neg 34 (20.1) 7 (6.9) 27 (39.7) AVPC Signature (IHC), N (%) Pos 12 (7.1) 2 (2.0) 10 (14.7) Neg 14 (8.3) 2 (2.0) 12 (17.6) Progression Free Survival, (months) Median (Min-Max) 5.4 (0.7-31) 5.2 (0.7-31) 5.45 (1.3-19.2) Progression Status, N (%) Progressed 160 (94.7) 93 (92.1) 67 (98.5) Censored 9 (5.3) 8 (7.9) 1 (1.5) Overall Survival, (months) Median (Min-Max) 17.4 (1.5-71.1) 16.7 (1.5-69.2) 20.25 (4.2-71.1) Overall Status, N (%) Deceased 156 (92.3) 92 (91.1) 64 (94.1) Censored 6 (3.6) 2 (2.0) 4 (5.9) For the liquid biopsy variables, enumeration of cellular phenotypes based on channel positivity/negativity as well as genomics (large scale transitions and instability scores) were included. Across all clinical and demographic variables, there was a missingness of 11.7% (Figure 4.4). When considering the liquid biopsy variables, there was a missingness of 58.0%, in large part due to the 101 patients that did not have liquid biopsy data available. All missing variables were imputed via missForest. After imputation, the Gaussian copula algorithm was utilized to create a synthetic dataset of 10,000 patients, approximately the prevalence of mCRPC patients in the United States 236,237 . 74 Figure 4.4 Missingness in demographic and clinical prostate cancer data Missing data plot of demographic and clinical variables associated with prostate cancer dataset used for survival prediction models. 4.3.2.2. Survival prediction with a CPH, an RSF, and DeepSurv For predicting overall survival in this mCRPC cohort, we split the augmented dataset of demographic, clinical, and liquid biopsy variables into two sets for training (80%) and testing (20%). The original dataset of 169 patients was used as a validation set. We first utilize only the demographic and clinical data to build a CPH, an RSF, and a DeepSurv model, and subsequently tune hyperparameters for optimization. As before, prediction accuracy for the DeepSurv model was measured by c-index and NLL. The input variables were ranked based on the importance of their contribution to the accuracy of the RSF model. Figure 4.5A shows predicted risk for the DeepSurv, RSF, and CPH models that utilized only the clinical and demographic data, as well as the ground truth (Kaplan- Meier of actual patient survival). Our deep learning model achieves a c-index of 0.680 and NLL of 7.95. The corresponding CPH model and RSF achieve c-indexes of 0.683 and 0.667, respectively. Figure 5.B shows the variable importance for each predictor as measured by the RSF. The three most important variables for model prediction were the patient’s lactate dehydrogenase (LDH), hemoglobin, and albumin levels. 75 Figure 4.5 Predicted survival and variable importance in multimodal BC and PC data A. Predicted survival curves show survival probability over time as calculated by DeepSurv model, CPH model, and RSF built from demographic and clinical prostate cancer data. B. Feature importance generated by corresponding RSF indicating most important variables in predicting overall survival. C. Predicted survival curves show survival probability over time as calculated by DeepSurv model, CPH model, and RSF built from integrated demographic, clinical, and liquid biopsy prostate cancer data. D. Feature importance generated by corresponding RSF indicating most important variables in predicting overall survival. To investigate the added benefit of liquid biopsy data, we next build models with these variables included with demographic and clinical data. Consistent with the prior models, the augmented dataset is split into two sets for training (80%) and testing (20%), with the original dataset used as a validation set. We build a CPH, an RSF, and DeepSurv model with tuned hyperparameters, and compare it to the Kaplan-Meier of actual patient survival (Figure 4.5.C). Our deep learning model achieves a c-index of 0.697 and NLL of 7.94. The corresponding CPH model and RSF achieve c-indexes of 0.925 and 0.739, respectively. Figure 5.D shows the variable importance for each predictor as measured by the RSF. The most important variables for model prediction were the patient’s average LST deletion on chromosome 7, LDH level, average LST deletion on chromosome 2, average amplification instability score on chromosome 17, and hemoglobin level. Comparing these metrics to those generated by the demographic and clinical only models, we see that the addition of liquid biopsy parameters improves concordance in both the DeepSurv and RSF models (0.680 vs 0.697 and 0.667 vs 0.739, respectively). While the CPH does improve concordance with the addition of liquid biopsy data, the integrated model clearly does not fit the data as seen in Figure 4.5C. 76 4.3.2.3. Integrated survival prediction models outperform those built on clinical data only This modeling study integrates multimodal liquid biopsy data with demographic and clinical data to demonstrate the feasibility and utility of adding single cell resolution data to survival prediction models in a small cohort of mCRPC where data augmentation is needed. The top ranked features in the RSF variable importance provides insights into the utility of incorporating liquid biopsy data into the model. While lactate dehydrogenase, hemoglobin, and albumin levels are the top features in the demographic and clinical only model, average genomic HDSCA liquid biopsy features outrank them in the combined model. In fact, only six of the top 50 features are non-liquid biopsy variables, with the majority (n=40) being average instability scores and average LST. This observation suggests that these liquid biopsy morpho-genomic features are important in the prediction of overall survival. Overall, while we present a framework for multimodal data integration for liquid biopsy by applying Gaussian copulas for augmentation and missForest for imputation in a small cohort of mCRPC, the results prompt the need for further exploration and development of theoretical and practical methods that robustly combine extensive demographic and clinical datasets with smaller, more limited liquid biopsy ones across disease areas in oncology. 4.4. Conclusion To spotlight how modern methods of mathematics and computer science can leverage sparse multimodal clinical and liquid biopsy data to build and interpret models of both cohort and individual patients, we present advances in mathematical oncology modeling with a particular focus on how liquid biopsy data, when incorporated in novel deep and machine learning survival prediction models, have the potential to improve robustness of model accuracy in comparison to models based only on demographic and clinical data. Integrating multianalyte liquid biopsy data expands the algorithmic learning space into the cellular and subcellular features which add deeper biological resolution to the model. The future of precision oncology will rely on cutting-edge advances in liquid biopsy and clinically applicable mathematical oncology developments that incorporate multimodal and multi-dimensional datasets into the quantitative assessment of a tumor’s spatiotemporal progressive evolution 238 . In liquid biopsy, as the technologies mature and more large-scale datasets become available through public and private databases and data repositories, the need advanced methods for data curation, robust data commons for storage, and sharing across institutions and research teams will become greater along with the need to standardize data elements for optimized data exchange and machine readability 141,239 . In predictive mathematical oncology, developments in combining AI approaches with mechanistic models will drive the experimental validation of theoretical biology 240 and push the utility of liquid biopsy data to become an integral component in 77 cancer detection, disease monitoring, and therapy response prediction. Transfer learning 241–243 will become a critical tool in modeling within the sparse liquid biopsy datasets 244 as it will allow sharing and repurposing of pretrained models across platforms and modeling tasks. The implementation of federated learning 245 to access data across data commons and repositories to build and optimize prediction models will be critical 246 for biomedical applications where the need to homogenize the heterogeneity across institutions and solve the challenge of interoperability and standardization. Explicit model interpretation in deep learning remains an open area for further research and development 247 and since interpretable AI is critical for biology and medicine, further developments in the field will be needed to motivate healthcare practitioners to implement and adapt integrative predictive models 248 . Finally, as liquid biopsy technologies and mathematical oncology models continue to mature and reach clinical utility, oncologists and clinical teams are called to play an integral role in the development and validation of next-generation mathematical models for the optimization of precision oncology and improvement of cancer patient outcomes. Once mathematical models that integrate clinical data and liquid biopsy are robust enough to reach clinical utility for prediction of individual patient outcomes, implementation of adaptive therapy 240,249,250 and episode-based treatment efficacy 251 will be realized. Beyond enabling discovery of blood-based biomarkers in clinical practice, such predictive models will provide competitive intelligence for drug development enterprises and pharmaceutical organizations running clinical trials of novel agents and in new indications. Chapter 5 Ongoing and future work in PC neoplasia characterization: from discovery to clinical implementation This chapter briefly describes current and prospective research that was enabled by this Ph.D dissertation and sections represent manuscripts at different stages of preparation. Section 5.1 and 5.2 highlight the inherent flexibility of the HDSCA workflow and how it can enable the development of custom 4-plex assays for the detection and morphological analysis of virtually any rare cells of interest. 78 Figure 5.1 The versatility of HDSCA for custom 4-plex assays Section 5.3 describes the development of targeted proteomics for multiplex single-cell characterization. Machine learning projects are highlighted in section 5.4 and 5.5. 5.1 Leveraging the versatility of HDSCA for custom 4-plex assays in PC neoplasia The versatility of the HDSCA workflow will enable the development of custom immunofluorescence assays for profiling B lymphoid cells. New clinically promising proteins targets are continually identified and pursued for validation either for diagnostic and prognostic biomarker use or as therapeutic targets in oncology. In myeloma and other hematologic malignancies, such recent targets of interest include GPRC5D 91 , FcRH5 (FCRL5), CD73 (5’-Nucletidase) 95 and others 96 . With the versatility of the clinically validated HDSCA workflow, practically any such marker can be adapted into the 4-plex immunofluorescence technology for custom and targeted cell-based measurement of disease and therapy response for treatment agents targeting these proteins in pre-clinical and clinical development. To explore the flexibility for developing bespoke HDSCA assays for profiling cells of interest in myeloma, we procured an anti-GPRC5D antibody and, following the established immunofluorescence protocols 21,24 , stained using three color panel consisting of DAPI, CD45 (in Cy5), and GPRC5D (in TRITC). Using an NBD sample spiked with MCF7 myeloma cell lines, we show staining only in MCF cell lines and not in other cells in the patient blood (Figure 5.2). For control, staining at 0 ug/mL of GPRC5D confirmed the marker to be specific to myeloma cell lines. 79 Figure 5.2 GPRC5D IF assay staining in NBD samples spiked with MCF cell lines A. Staining at 0ug/mL concentration for negative control. B. Staining at 5ug/mL. Blue=DAPI, Green=CD45, Red=GPRC5D. This is initial work adapting GPRC5D for myeloma expanding on the work in CD56 and BCMA assays shows the flexibility by which a given marker of interest can be adapted to the HDSCA workflow for various purposes. Key use cases with higher clinical potential include developing a custom assay for a clinical trial to monitor therapy response in patients undergoing therapies targeting the marker of interest. With the high sensitivity of the assays presented here, a custom assay will provide therapy response readout in a rare cell context. 5.2 Detection and Profiling of VIM, CK, and CD31 Expression in PC Neoplasms Beyond the detection and analysis of lymphocytes, HDSCA assays can be applied to any tissue type to profile TiME microenvironment cells particularly the endothelial, epithelial, and mesenchymal cell compartments in PB and BMA of patients with PC dyscrasia, cells traditionally not of pathological interest for clinical decisions. Similar to solid tumors, there is evidence that malignant plasma cells rely on a remodeled bone marrow to establish a permissive tumor microenvironment. Specifically, for neoplastic PCs to survive and proliferate in the BM, extensive remodeling and vascularization is required, processes that require the involvement of surrounding endothelial, mesenchymal, and epithelial BM cells. Particularly, stromal cells, osteoclats, macrophages, and fibroblasts are directly engaged in interactions with malignant plasma cells via secreted vascularization and angiogenic factors 252–254 . Consequently, cases of myeloma patients having carcinoma-like presentations and vice-versa have been reported 255–257 . Beyond the involvement in the local BM tumor, circulating ECs have been observed in myeloma and shown to correlate positively with serum M protein and β2- microglobulin suggesting their role in MM systemic disease activity and as a biomarker 80 for disease progression 258 . Cytokeratin expression and platelets have also been reported in patients with myeloma. Studies have shown that platelets (CD31+ analytes) are highly activated in myeloma patients compared to controls, with MM/SMM being higher than MGUS, indicating that platelet activation is associated with disease stage 259 . Since the work in these reports utilizes traditional enrichment-based and conventional pathology, we reason that a “no-cell-left-behind” method would robustly detect all the cells in the patient samples expressing the markers used to profile solid tumors and other pathologies. To investigate the presence of endothelial, epithelial, and mesenchymal cells in PC neoplasms, paired PB and BM samples from patients with PC neoplasms were stained and analyzed through the HDSCA landscape assay (CK, VIM, CD45/CD31, DAPI) 21–23 . Initial analysis shows widespread presence of rare cells with endothelial, epithelial, and mesenchymal-like cells in PB and BM samples of MGUS, SMM, NDMM, and RRMM patients (Figure 5.3). Figure 5.3 Morphology and enumeration of detected MM TiME cells expressing CK, Vim, and CD31 A. Representative images of candidate epithelial cells. B. Representative images of candidate CD31+ cells and endothelial cells. C. Representative images of candidate mesenchymal cells. D. Preliminary quantification of the circulating rare cells in the blood of patients’ baseline/diagnosis draw. E. Quantification of circulating rare cells at follow up timepoints. 81 Ongoing analysis will quantify the extended distribution of the cell types across the myeloma study cohort and potential establish the role of non-lymphocytes cells in the progression of myeloma from precursor conditions. 5.3 Single-cell multiplex proteomics to deconvolute myeloma progression A fundamental limitation of 4-plex immunofluorescence assays is that they can only describe disease up to 4 channels. A potential difficulty is attaining sufficient specificity and sensitivity to detect plasma cells and corresponding clonality (aberrant vs normal) along with variability in expression of common markers in plasma cells by 4- color immunofluorescence assay. The promise of multiplex proteomics lies in the ability to expand the up to 40 markers by leveraging the capability for ion-tagging using heavy metals. The multiplex nature of the IMC platform 260,48 will permit us to assess the robustness of additional markers by protein expression and probe PC clonality by incorporating kappa, lambda, and heavy chain staining antibodies in the IMC marker panels. In the initial IMC work for myeloma, we developed and validated a 40x panel for single cell proteomics in myeloma and precursor states. The markers (Table 5.1) include surface and cytoplasmic markers for extended subcellular and functional characterization of the single cells of the BM and PB samples in plasma cell neoplasms. Table 5.1 40-plex IMC panel for targeted proteomics of the PC neoplasm TiME 82 The ongoing and future analysis will elucidate and compare the architecture differences between the local BM microenvironment and systemic disease in precursor conditions and under treatment. 5.4 Predicting disease progression from precursor conditions to malignant states Overt MM is almost always preceded by MGUS, a precursor and clinically benign condition and only about 20% of MGUS cases ever progress to malignancy. The ability to predict which MGUS patients will develop overt myeloma would be enabling in the management of plasma cell neoplasms. In this vein, machine learning and computational models of disease progression would provide a quantitative framework to estimate the probabilities of progressing from MGUS, SMM, to overt myeloma. A team at the Moffit Cancer Center built a Markov chain model based on and quantified transition probabilities between plasma cell neoplasia disease states 261 . In machine learning predictions, given a large dataset with patients with longitudinal monitoring, one can build models to predict time to progression to the next state. Using the Mayo Clinic longitudinal study on MGUS patients in Minnesota, we built regression models to predict the probability of progression from MGUS to a plasma cell malignancy. While this preliminary modeling exercise showed the feasibility of applying machine learning for 83 this task, the Mayo MGUS dataset lacks enough features for models to be properly tested and validated. A major challenge remains the availability of longitudinal datasets. Since precursor conditions are traditionally untreated, research has focused on NDMM and post-therapy states namely RRMM and PCL to improve patient survival. Recently, large scale studies in precursor disease such as the ICELANDIC study 262 and the Center for the Prevention of Progression of Blood Cancers’ PCROWD study 263 have been initiated and datasets will become available in the coming years to enable the application of machine learning algorithm to quantitatively map the progression of disease. 5.5 Therapy response prediction in plasma cell malignancies Despite improvement in treatment options and survival in the past decades, MM remains incurable, and few studies have investigated the ability to predict therapy response and elucidate the role of therapy resistance in myeloma evolution under treatment pressure. Beyond initial genomic studies showing that clonal evolution during treatment leads to therapy resistance 264 , the drivers of relapse in myeloma remain an intensive area of research. With availability of consortium datasets, it’s plausible that computational prediction models could be utilized to predict therapy response and elucidate the features that describe what patients will likely relapse. Using the CoMMpass dataset described in chapter 3, we built multiclass classification models to predict the patient’s response to first line therapy. Further we developed collaborative filtering models as recommendation system to assess the feasibility of predicting therapy response based on patient similarity. From these modeling efforts, preliminary observations show good prediction accuracies for first line of therapy. The accuracy decreases as one moves across therapy lines. Since this is an ongoing study with patients either censoring out of the study or remaining in remission over multiple years, the longitudinal model performance is reflective of decreasing data points where the models are unable to generalize in prospective lines of therapy. Chapter 6 Broad Conclusions and Perspectives Much like other cancers, PC malignancies remain virtually incurable despite significant advances in early detection, a continuously expanding therapeutic arsenal that include traditional and novel modalities, and improvement in patient survival over the past decades. Diagnosis and disease monitoring still relies on BM biopsies, which are painfully invasive, costly, and risky for patients who comprise mostly of the elderly and non-invasive blood-based liquid biopsies remain critically needed for diagnosis and management of PC cancers. Further, the heterogeneity within the network of PC neoplasms remains a challenge for delineating disease progression and robust mathematical models to dissect the subtypes and map the spatiotemporal progression of precursor and overt conditions constitute a significant unmet need. The convergent oncology work in this PhD thesis set foundational efforts for multimodal single-cell 84 profiling and machine learning to add precision and predictability in the characterization and delineation of the heterogeneity in plasma cell cancers and pave the way towards next-generation techniques in the clinical management of hematologic malignancies. In liquid biopsy, the multiparametric, enrichment-free single-cell liquid biopsy assays described in chapter 2 have enabled ongoing and future experimental pursuits in developing custom 4-plex assays to detect and characterize the immunophenotyping of virtually all cells of the tumor immune microenvironment. Initial proof-of-concept with GPCR to demonstrate the capability for leveraging the versatility of the HDSCA workflow for custom immunofluorescence assays along with ongoing work on validation of the BCMA assay in a large cohort study and and expanded profiling of endothelial, mesenchymal, and epithelial cells in TiME of patients will demonstrate the path from discovery to clinical implementation. One of the key challenges in clinically implementing single cell liquid biopsies for lymphoid cancers is the ability to robustly discriminate cancer cells by their clonotypes. Beyond the intrinsic variability in expression of immunofluorescence markers in our assays (CD45, CD138, CD56, and BCMA), they are surface glycoproteins and are limited in the ability to discriminate single cells by immunoglobulin and clonally differentiate the heavy and light chain (kappa and lambda) phenotypes, which are part of the established clinical markers for disease stratification. To this end, work in multiplex single-cell targeted proteomics 48,260,265 utilizing a validated panel of 40 markers is ongoing to establish comprehensive single- cell proteomic profiling. Beyond a comprehensive set of surface protein markers, this panel includes Ig kappa and lambda, key emerging therapeutic targets such as CD47 which has enabled extended downstream single-cell characterization the lymphoid and myeloid compartments. Actively ongoing proteomic work is focused on the single cell deconvolution of plasma cells and their precursors, non-immune TiME cells, respective subtypes, and the role of these cells in disease initiation and progression. Once implemented for clinical impact, the multimodal liquid biopsy methodologies developed in this work will place liquid biopsy for delineation of disease progression as the forefront of precision medicine in plasma cell neoplasms 64 . In predictive machine learning, the models described in chapter 3 and 4 highlight the efforts in building prediction for disease stratification and survival prediction, critical efforts towards meeting the challenges in delineating biological and clinical factors responsible for MM progression from precursor to late-stage disease states. Besides survival, models to robustly predict such patient outcomes as therapy response will be clinically impactful in treatment stratification for improved treatment outcomes. One of the fundamental challenges in predictive modeling remains the sparsity of data for modeling efforts. From a data science perspective, large cohort studies applying both conventionally validated and next-generation technologies to generate curated datasets with well-defined data elements across timescales and synchronized data guidance provides more curated datasets optimized for machine learning tasks. The CoMMpass 85 study 97 in chapter 3 represents such a dataset and enables multiple opportunities to explore various modeling tasks. On the other hand, small cohort liquid biopsy datasets that are mostly sparse and don’t have pre-defined standardized data elements present a challenge for modeling tasks. Similarly, clinical datasets from retrospective data collection results in data matrices with fewer complete instances. While chapter 4 presents methodological framework to address these challenges, the success of future modeling work will rely on larger training datasets to build time-series models. Expanding on the survival prediction models in chapter 4 for BC and PC, the increase in single-cell multimodal data in MM from the ongoing large-scale validation studies will enable clinical and liquid biopsy data integration and application of machine learning models to predict patient outcomes in myeloma and provide new insights in disease management. The increasing multimodal liquid biopsy datasets from large-scale and high- throughput experiments, coupled with the accelerating advances in machine intelligence will usher in new algorithms more capable of handling the hierarchical complexity of oncology data. Of particular interest, geometric deep learning 65 incorporates network topology in representation learning of multidimensional data where each data modality can be represented as a graph and allow for prediction of node, edge, and graph level tasks using a graph neural network architecture 66 . In a hierarchical graph neural network architecture, graphs may represent molecule and cell phenotypes of interest 67 , tissue pathology readouts 68 , and patient-level outcomes to capture the full spectrum of the biological complexity in representation learning and prediction. Such integrative methodologies will enable enhanced patient stratification and inform the next frontier in defining therapeutic and clinical pathways 266,267 and enable the realization of precision medicine in this century and beyond. 86 Bibliography 1. Solly S. Remarks on the pathology of mollities ossium; with cases. Med Chir Trans. 1844;27:435-498.8. doi:10.1177/095952874402700129 2. Ribatti D. A historical perspective on milestones in multiple myeloma research. European Journal of Haematology. 2018;100(3):221-228. doi:10.1111/ejh.13003 3. Landgren O, Kyle RA, Pfeiffer RM, et al. Monoclonal gammopathy of undetermined significance (MGUS) consistently precedes multiple myeloma: a prospective study. Blood. 2009;113(22):5412-5417. doi:10.1182/blood-2008-12-194241 4. Fernández de Larrea C, Kyle RA, Durie BGM, et al. Plasma cell leukemia: consensus statement on diagnostic requirements, response criteria and treatment recommendations by the International Myeloma Working Group. Leukemia. 2013;27(4):780-791. doi:10.1038/leu.2012.336 5. Granell M, Calvo X, Garcia-Guiñón A, et al. Prognostic impact of circulating plasma cells in patients with multiple myeloma: implications for plasma cell leukemia definition. Haematologica. 2017;102(6):1099-1104. doi:10.3324/haematol.2016.158303 6. Kumar SK, Rajkumar SV. The multiple myelomas — current concepts in cytogenetic classification and therapy. Nat Rev Clin Oncol. 2018;15(7):409-421. doi:10.1038/s41571- 018-0018-y 7. Lakshman A, Rajkumar SV, Buadi FK, et al. Risk stratification of smoldering multiple myeloma incorporating revised IMWG diagnostic criteria. Blood Cancer Journal. 2018;8(6):1-10. doi:10.1038/s41408-018-0077-4 8. Multiple myeloma: 2018 update on diagnosis, risk‐stratification, and management - Rajkumar - 2018 - American Journal of Hematology - Wiley Online Library. Accessed April 1, 2019. https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajh.25117 9. Furukawa Y, Kikuchi J. Molecular pathogenesis of multiple myeloma. Int J Clin Oncol. 2015;20(3):413-422. doi:10.1007/s10147-015-0837-0 10. Landgren O, Graubard BI, Katzmann JA, et al. Racial Disparities in the Prevalence of Monoclonal Gammopathies: A population-based study of 12,482 persons from the National Health and Nutritional Examination Survey. Leukemia. 2014;28(7):1537-1542. doi:10.1038/leu.2014.34 11. Manojlovic Z, Christofferson A, Liang WS, et al. Comprehensive molecular profiling of 718 Multiple Myelomas reveals significant differences in mutation frequencies between African and European descent cases. PLOS Genetics. 2017;13(11):e1007087. doi:10.1371/journal.pgen.1007087 87 12. Waxman AJ, Mink PJ, Devesa SS, et al. Racial disparities in incidence and outcome in multiple myeloma: a population-based study. Blood. 2010;116(25):5501-5506. doi:10.1182/blood-2010-07-298760 13. Pérez-Persona E, Vidriales MB, Mateo G, et al. New criteria to identify risk of progression in monoclonal gammopathy of uncertain significance and smoldering multiple myeloma based on multiparameter flow cytometry analysis of bone marrow plasma cells. Blood. 2007;110(7):2586-2592. doi:10.1182/blood-2007-05-088443 14. Zavidij O, Haradhvala NJ, Mouhieddine TH, et al. Single-cell RNA sequencing reveals compromised immune microenvironment in precursor stages of multiple myeloma. Nat Cancer. 2020;1(5):493-506. doi:10.1038/s43018-020-0053-3 15. Lomas OC, Tahri S, Ghobrial IM. The microenvironment in myeloma. Curr Opin Oncol. 2020;32(2):170-175. doi:10.1097/CCO.0000000000000615 16. Witzig TE, Dhodapkar MV, Kyle RA, Greipp PR. Quantitation of circulating peripheral blood plasma cells and their relationship to disease activity in patients with multiple myeloma. Cancer. 1993;72(1):108-113. doi:10.1002/1097-0142(19930701)72:1<108::AID- CNCR2820720121>3.0.CO;2-T 17. Kraan J, Sleijfer S, Strijbos MH, et al. External quality assurance of circulating tumor cell enumeration using the CellSearch® system: A feasibility study. Cytometry Part B: Clinical Cytometry. 2011;80B(2):112-118. doi:10.1002/cyto.b.20573 18. Qasaimeh MA, Wu YC, Bose S, et al. Isolation of Circulating Plasma Cells in Multiple Myeloma Using CD138 Antibody-Based Capture in a Microfluidic Device. Sci Rep. 2017;7(1):45681. doi:10.1038/srep45681 19. Rawstron AC, Orfao A, Beksac M, et al. Report of the European Myeloma Network on multiparametric flow cytometry in multiple myeloma and related disorders. Haematologica. 2008;93(3):431-438. doi:10.3324/haematol.11080 20. Jelinek T, Bezdekova R, Zatopkova M, et al. Current applications of multiparameter flow cytometry in plasma cell disorders. Blood Cancer J. 2017;7(10):e617. doi:10.1038/bcj.2017.90 21. Chai S, Matsumoto N, Storgard R, et al. Platelet-coated circulating tumor cells are a predictive biomarker in patients with metastatic castrate resistant prostate cancer. Mol Cancer Res. Published online January 1, 2021. doi:10.1158/1541-7786.MCR-21-0383 22. Shishido SN, Sayeed S, Courcoubetis G, et al. Characterization of Cellular and Acellular Analytes from Pre-Cystectomy Liquid Biopsies in Patients Newly Diagnosed with Primary Bladder Cancer. Cancers. 2022;14(3):758. doi:10.3390/cancers14030758 23. Kolenčík D, Narayan S, Thiele JA, et al. Circulating Tumor Cell Kinetics and Morphology from the Liquid Biopsy Predict Disease Progression in Patients with Metastatic Colorectal Cancer Following Resection. Cancers. 2022;14(3):642. doi:10.3390/cancers14030642 88 24. Ndacayisaba LJ, Rappard KE, Shishido SN, et al. Enrichment-Free Single-Cell Detection and Morphogenomic Profiling of Myeloma Patient Samples to Delineate Circulating Rare Plasma Cell Clones. Current Oncology. 2022;29(5):2954-2972. doi:10.3390/curroncol29050242 25. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer Statistics, 2021. CA: A Cancer Journal for Clinicians. 2021;71(1):7-33. doi:10.3322/caac.21654 26. Morgan GJ, Walker BA, Davies FE. The genetic architecture of multiple myeloma. Nat Rev Cancer. 2012;12(5):335-348. doi:10.1038/nrc3257 27. Bergsagel PL, Kuehl WM. Chromosome translocations in multiple myeloma. Oncogene. 2001;20(40):5611-5622. doi:10.1038/sj.onc.1204641 28. K.C. A, R.D C. Pathogenesis of Myeloma. Annu Rev Pathol Mech Dis. 2011;6(249–274). doi:10.1146/annurev-pathol-011110-130249. 29. R.A. K, S.V R. Multiple myeloma. - DOI - PMC. 2008;111:2962-2972. doi:10.1182/blood- 2007-10-078022. 30. Terpos E, Ntanasis-Stathopoulos I, Gavriatopoulou M, Dimopoulos MA. Pathogenesis of bone disease in multiple myeloma: from bench to bedside. Blood Cancer Journal. 2018;8(1):1-12. doi:10.1038/s41408-017-0037-4 31. L. R, D. A, M. K, et al. Combination of flow cytometry and functional imaging for monitoring of residual disease in myeloma. - DOI - PMC. 2019;33:1713-1722. doi:10.1038/s41375-018-0329-0. 32. Schürch CM, Rasche L, Frauenfeld L, Weinhold N, Fend F. A review on tumor heterogeneity and evolution in multiple myeloma: pathological, radiological, molecular genetics, and clinical integration. Virchows Arch. 2020;476(3):337-351. doi:10.1007/s00428-019-02725-3 33. L. R, S.S. C, O.W. S, et al. Spatial genomic heterogeneity in multiple myeloma revealed by multi-region sequencing. - DOI - PMC. 2017;8(268). doi:10.1038/s41467-017-00296-y. 34. Abdallah N, Rajkumar SV, Greipp P, et al. Cytogenetic abnormalities in multiple myeloma: association with disease characteristics and treatment response. Blood Cancer J. 2020;10(8):1-9. doi:10.1038/s41408-020-00348-5 35. Cardona-Benavides IJ, de Ramón C, Gutiérrez NC. Genetic Abnormalities in Multiple Myeloma: Prognostic and Therapeutic Implications. Cells. 2021;10(2):336. doi:10.3390/cells10020336 36. Nowakowski GS, Witzig TE, Dingli D, et al. Circulating plasma cells detected by flow cytometry as a predictor of survival in 302 patients with newly diagnosed multiple myeloma. Blood. 2005;106(7):2276-2279. doi:10.1182/blood-2005-05-1858 89 37. Gonsalves WI, Rajkumar SV, Gupta V, et al. Quantification of clonal circulating plasma cells in newly diagnosed multiple myeloma: implications for redefining high-risk myeloma. Leukemia. 2014;28(10):2060-2065. doi:10.1038/leu.2014.98 38. Lohr JG, Kim S, Gould J, et al. Genetic interrogation of circulating multiple myeloma cells at single-cell resolution. Sci Transl Med. 2016;8(363):363ra147. doi:10.1126/scitranslmed.aac7037 39. Mishima Y, Paiva B, Shi J, et al. The Mutational Landscape of Circulating Tumor Cells in Multiple Myeloma. Cell Rep. 2017;19(1):218-224. doi:10.1016/j.celrep.2017.03.025 40. Sanoja-Flores L, Flores-Montero J, Garcés JJ, et al. Next generation flow for minimally- invasive blood characterization of MGUS and multiple myeloma at diagnosis based on circulating tumor plasma cells (CTPC). Blood Cancer Journal. 2018;8(12):1-11. doi:10.1038/s41408-018-0153-9 41. Paiva B, Vídriales MB, Rosiñol L, et al. A multiparameter flow cytometry immunophenotypic algorithm for the identification of newly diagnosed symptomatic myeloma with an MGUS-like signature and long-term disease control. Leukemia. 2013;27(10):2056-2061. doi:10.1038/leu.2013.166 42. Weiss B, Sasser K, Rao C, et al. Circulating Multiple Myeloma Cells (CMMCs): A Novel Method for Detection and Molecular Characterization of Peripheral Blood Plasma Cells in Multiple Myeloma Precursor States. Blood. 2014;124(21):2031. doi:10.1182/blood.V124.21.2031.2031 43. Foulk B, Schaffer M, Gross S, et al. Enumeration and characterization of circulating multiple myeloma cells in patients with plasma cell disorders. British Journal of Haematology. 2018;180(1):71-81. doi:10.1111/bjh.15003 44. Zhang L, Beasley S, Prigozhina NL, et al. Detection and Characterization of Circulating Tumour Cells in Multiple Myeloma. J Circ Biomark. 2016;5. doi:10.5772/64124 45. Marrinucci D, Bethel K, Kolatkar A, et al. Fluid Biopsy in Patients with Metastatic Prostate, Pancreatic and Breast Cancers. Phys Biol. 2012;9(1):016003. doi:10.1088/1478- 3975/9/1/016003 46. Ruiz C, Li J, Luttgen MS, et al. Limited genomic heterogeneity of circulating melanoma cells in advanced stage patients. Phys Biol. 2015;12(1):016008. doi:10.1088/1478- 3975/12/1/016008 47. Malihi PD, Morikado M, Welter L, et al. Clonal diversity revealed by morphoproteomic and copy number profiles of single prostate cancer cells at diagnosis. Converg Sci Phys Oncol. 2018;4(1):015003. doi:10.1088/2057-1739/aaa00b 48. Gerdtsson E, Pore M, Thiele JA, et al. Multiplex protein detection on circulating tumor cells from liquid biopsies using imaging mass cytometry. Converg Sci Phys Oncol. 2018;4(1):015002. doi:10.1088/2057-1739/aaa013 90 49. Shishido SN, Welter L, Rodriguez-Lee M, et al. Preanalytical Variables for the Genomic Assessment of the Cellular and Acellular Fractions of the Liquid Biopsy in a Cohort of Breast Cancer Patients. The Journal of Molecular Diagnostics. 2020;22(3):319-337. doi:10.1016/j.jmoldx.2019.11.006 50. Rodríguez-Lee M, Kolatkar A, McCormick M, et al. Effect of Blood Collection Tube Type and Time to Processing on the Enumeration and High-Content Characterization of Circulating Tumor Cells Using the High-Definition Single-Cell Assay. Arch Pathol Lab Med. 2018;142(2):198-207. doi:10.5858/arpa.2016-0483-OA 51. O’Connell FP, Pinkus JL, Pinkus GS. CD138 (Syndecan-1), a Plasma Cell Marker: Immunohistochemical Profile in Hematopoietic and Nonhematopoietic Neoplasms. American Journal of Clinical Pathology. 2004;121(2):254-263. doi:10.1309/617DWB5GNFWXHW4L 52. Sanderson RD, Yang Y. Syndecan-1: A dynamic regulator of the myeloma microenvironment. Clin Exp Metastasis. 2008;25(2):149-159. doi:10.1007/s10585-007- 9125-3 53. Chang H, Samiee S, Yi QL. Prognostic relevance of CD56 expression in multiple myeloma: A study including 107 cases treated with high-dose melphalan-based chemotherapy and autologous stem cell transplant. Leukemia & Lymphoma. 2006;47(1):43-47. doi:10.1080/10428190500272549 54. Fujino M. The histopathology of myeloma in the bone marrow. J Clin Exp Hematop. 2018;58(2):61-67. doi:10.3960/jslrt.18014 55. Paiva B, Almeida J, Pérez-Andrés M, et al. Utility of flow cytometry immunophenotyping in multiple myeloma and other clonal plasma cell-related disorders. Cytometry Part B: Clinical Cytometry. 2010;78B(4):239-252. doi:10.1002/cyto.b.20512 56. J.J.M. D, L. L, S. B, et al. EuroFlow antibody panels for standardized n-dimensional flow cytometric immunophenotyping of normal, reactive and malignant leukocytes. - DOI - PMC. 2012;26:1908-1975. doi:10.1038/leu.2012.120. 57. Robillard N, Wuillème S, Moreau P, Béné MC. Immunophenotype of Normal and Myelomatous Plasma-Cell Subsets. Front Immunol. 2014;5:137. doi:10.3389/fimmu.2014.00137 58. Sanoja-Flores L, Flores-Montero J, Pérez-Andrés M, Puig N, Orfao A. Detection of Circulating Tumor Plasma Cells in Monoclonal Gammopathies: Methods, Pathogenic Role, and Clinical Implications. Cancers. 2020;12(6):1499. doi:10.3390/cancers12061499 59. Welter L, Xu L, McKinley D, et al. Treatment response and tumor evolution: lessons from an extended series of multianalyte liquid biopsies in a metastatic breast cancer patient. Cold Spring Harb Mol Case Stud. 2020;6(6):a005819. doi:10.1101/mcs.a005819 91 60. Pau G, Fuchs F, Sklyar O, Boutros M, Huber W. EBImage—an R package for image processing with applications to cellular phenotypes. Bioinformatics. 2010;26(7):979-981. doi:10.1093/bioinformatics/btq046 61. Dago AE, Stepansky A, Carlsson A, et al. Rapid Phenotypic and Genomic Change in Response to Therapeutic Pressure in Prostate Cancer Inferred by High Content Analysis of Single Circulating Tumor Cells. PLoS One. 2014;9(8). doi:10.1371/journal.pone.0101777 62. Thiele JA, Pitule P, Hicks J, Kuhn P. Single-Cell Analysis of Circulating Tumor Cells. In: Murray SS, ed. Tumor Profiling: Methods and Protocols. Methods in Molecular Biology. Springer; 2019:243-264. doi:10.1007/978-1-4939-9004-7_17 63. Baslan T, Kendall J, Ward B, et al. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling. Genome Res. 2015;25(5):714-724. doi:10.1101/gr.188060.114 64. McInnes L, Healy J, Saul N, Großberger L. UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software. 2018;3(29):861. doi:10.21105/joss.00861 65. Becht E, McInnes L, Healy J, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37(1):38-44. doi:10.1038/nbt.4314 66. A H, N H, S A, et al. Detection and characterization of plasma cells in peripheral blood: correlation of IgE+ plasma cell frequency with IgE serum titre. Clinical and experimental immunology. 2002;130(3). doi:10.1046/j.1365-2249.2002.02025.x 67. Caraux A, Klein B, Paiva B, et al. Circulating human B and plasma cells. Age-associated changes in counts and detailed characterization of circulating normal CD138- and CD138+ plasma cells. Haematologica. 2010;95(6):1016-1020. doi:10.3324/haematol.2009.018689 68. Peterson JF, Rowsey RA, Marcou CA, et al. Hyperhaploid plasma cell myeloma characterized by poor outcome and monosomy 17 with frequently co-occurring TP53 mutations. Blood Cancer J. 2019;9(3):20. doi:10.1038/s41408-019-0182-z 69. Bolli N, Avet-Loiseau H, Wedge DC, et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat Commun. 2014;5:2997. doi:10.1038/ncomms3997 70. Manier S, Park J, Capelletti M, et al. Whole-exome sequencing of cell-free DNA and circulating tumor cells in multiple myeloma. Nat Commun. 2018;9:1691. doi:10.1038/s41467-018-04001-5 71. P. L, R. O, G. T, C.S W. Flow cytometric immunophenotypic analysis of 306 cases of multiple myeloma. Am J Clin Pathol. 2004;121(482–488). doi:10.1309/74R4TB90BUWH27JX. 92 72. Banner DW, D’Arcy A, Janes W, et al. Crystal structure of the soluble human 55 kd TNF receptor-human TNFβ complex: Implications for TNF receptor activation. Cell. 1993;73(3):431-445. doi:10.1016/0092-8674(93)90132-A 73. Madry C, Laabi Y, Callebaut I, et al. The characterization of murine BCMA gene defines it as a new member of the tumor necrosis factor receptor superfamily. Int Immunol. 1998;10(11):1693-1702. doi:10.1093/intimm/10.11.1693 74. Coquery CM, Erickson LD. Regulatory Roles of the Tumor Necrosis Factor Receptor BCMA. Crit Rev Immunol. 2012;32(4):287-305. doi:10.1615/CritRevImmunol.v32.i4.10 75. Gras MP, Laâbi Y, Linares-Cruz G, et al. BCMAp: an integral membrane protein in the Golgi apparatus of human mature B lymphocytes. Int Immunol. 1995;7(7):1093-1106. doi:10.1093/intimm/7.7.1093 76. Hatzoglou A, Roussel J, Bourgeade MF, et al. TNF Receptor Family Member BCMA (B Cell Maturation) Associates with TNF Receptor-Associated Factor (TRAF) 1, TRAF2, and TRAF3 and Activates NF-κB, Elk-1, c-Jun N-Terminal Kinase, and p38 Mitogen-Activated Protein Kinase. The Journal of Immunology. 2000;165(3):1322-1330. doi:10.4049/jimmunol.165.3.1322 77. Ryan MC, Hering M, Peckham D, et al. Antibody targeting of B-cell maturation antigen on malignant plasma cells. Mol Cancer Ther. 2007;6(11):3009-3018. doi:10.1158/1535- 7163.MCT-07-0464 78. Cho SF, Anderson KC, Tai YT. Targeting B Cell Maturation Antigen (BCMA) in Multiple Myeloma: Potential Uses of BCMA-Based Immunotherapy. Front Immunol. 2018;9:1821. doi:10.3389/fimmu.2018.01821 79. Dogan A, Siegel D, Tran N, et al. B-cell maturation antigen expression across hematologic cancers: a systematic literature review. Blood Cancer J. 2020;10(6):73. doi:10.1038/s41408-020-0337-y 80. Ware CF. The TNF receptor super family in immune regulation. Immunological Reviews. 2011;244(1):5-8. doi:10.1111/j.1600-065X.2011.01065.x 81. Sanchez L, Dardac A, Madduri D, Richard S, Richter J. B-cell maturation antigen (BCMA) in multiple myeloma: the new frontier of targeted therapies. Therapeutic Advances in Hematology. 2021;12:2040620721989585. doi:10.1177/2040620721989585 82. Shah N, Chari A, Scott E, Mezzi K, Usmani SZ. B-cell maturation antigen (BCMA) in multiple myeloma: rationale for targeting and current therapeutic approaches. Leukemia. 2020;34(4):985-1005. doi:10.1038/s41375-020-0734-z 83. Salem DA, Maric I, Yuan CM, et al. Quantification of B-cell maturation antigen, a target for novel chimeric antigen receptor T-cell therapy in Myeloma. Leuk Res. 2018;71:106-111. doi:10.1016/j.leukres.2018.07.015 93 84. Frigyesi I, Adolfsson J, Ali M, et al. Robust isolation of malignant plasma cells in multiple myeloma. Blood. 2014;123(9):1336-1340. doi:10.1182/blood-2013-09-529800 85. Laurent SA, Hoffmann FS, Kuhn PH, et al. γ-secretase directly sheds the survival receptor BCMA from plasma cells. Nat Commun. 2015;6(1):7333. doi:10.1038/ncomms8333 86. Bujarski S, Soof C, Chen H, et al. Serum b-cell maturation antigen levels to predict progression free survival and responses among relapsed or refractory multiple myeloma patients treated on the phase I IRUX trial. JCO. 2018;36(15_suppl):e24313-e24313. doi:10.1200/JCO.2018.36.15_suppl.e24313 87. Visram A, Soof C, Rajkumar SV, et al. Serum BCMA levels predict outcomes in MGUS and smoldering myeloma patients. Blood Cancer J. 2021;11(6):1-7. doi:10.1038/s41408- 021-00505-4 88. γ-Secretase inhibition increases efficacy of BCMA-specific chimeric antigen receptor T cells in multiple myeloma | Blood | American Society of Hematology. Accessed June 9, 2022. https://ashpublications.org/blood/article/134/19/1585/374996/Secretase-inhibition- increases-efficacy-of-BCMA 89. Johnsen HE, Bøgsted M, Schmitz A, et al. The myeloma stem cell concept, revisited: from phenomenology to operational terms. Haematologica. 2016;101(12):1451-1459. doi:10.3324/haematol.2015.138826 90. Gao M, Bai H, Jethava Y, et al. Identification and Characterization of Tumor-Initiating Cells in Multiple Myeloma. J Natl Cancer Inst. 2020;112(5):507-515. doi:10.1093/jnci/djz159 91. Smith EL, Harrington K, Staehr M, et al. GPRC5D is a target for the immunotherapy of multiple myeloma with rationally designed CAR T cells. Sci Transl Med. 2019;11(485):eaau7746. doi:10.1126/scitranslmed.aau7746 92. Fernández de Larrea C, Staehr M, Lopez AV, et al. Defining an Optimal Dual-Targeted CAR T-cell Therapy Approach Simultaneously Targeting BCMA and GPRC5D to Prevent BCMA Escape–Driven Relapse in Multiple Myeloma. Blood Cancer Discov. 2020;1(2):146-154. doi:10.1158/2643-3230.BCD-20-0020 93. Elkins K, Zheng B, Go M, et al. FcRL5 as a target of antibody-drug conjugates for the treatment of multiple myeloma. Mol Cancer Ther. 2012;11(10):2222-2232. doi:10.1158/1535-7163.MCT-12-0087 94. Stewart AK, Krishnan AY, Singhal S, et al. Phase I study of the anti-FcRH5 antibody-drug conjugate DFRF4539A in relapsed or refractory multiple myeloma. Blood Cancer Journal. 2019;9(2):1-12. doi:10.1038/s41408-019-0178-8 95. Ray A, Song Y, Du T, et al. Identification and validation of ecto-5’ nucleotidase as an immunotherapeutic target in multiple myeloma. Blood Cancer J. 2022;12(4):1-9. doi:10.1038/s41408-022-00635-3 94 96. Leow CCY, Low MSY. Targeted Therapies for Multiple Myeloma. J Pers Med. 2021;11(5):334. doi:10.3390/jpm11050334 97. Skerget S, Penaherrera D, Chari A, et al. Genomic Basis of Multiple Myeloma Subtypes from the MMRF CoMMpass Study.; 2021:2021.08.02.21261211. doi:10.1101/2021.08.02.21261211 98. Brunet JP, Tamayo P, Golub TR, Mesirov JP. Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A. 2004;101(12):4164-4169. doi:10.1073/pnas.0308531101 99. Chapuy B, Stewart C, Dunford AJ, et al. Molecular Subtypes of Diffuse Large B-cell Lymphoma are Associated with Distinct Pathogenic Mechanisms and Outcomes. Nat Med. 2018;24(5):679-690. doi:10.1038/s41591-018-0016-8 100. Pagès H, Carlson M, Falcon S, Li N. AnnotationDbi: Manipulation of SQLite-based annotations in Bioconductor. Published online 2021. https://bioconductor.org/packages/AnnotationDbi. 101. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/bioconductor package biomart. Nature Protocols. 2009;4(8):1184-1191. doi:10.1038/nprot.2009.97 102. Lun ATL, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor. F1000Research. 2016;5:2122. doi:10.12688/f1000research.9501.2 103. Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics. 2010;11(1). doi:10.1186/1471-2105-11-367 104. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix. Published online 1999. 105. Devarajan K. Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology. PLoS Comput Biol. 2008;4(7):e1000029. doi:10.1371/journal.pcbi.1000029 106. Taslaman L, Nilsson B. A Framework for Regularized Non-Negative Matrix Factorization, with Application to the Analysis of Gene Expression Data. PLoS One. 2012;7(11):e46331. doi:10.1371/journal.pone.0046331 107. Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-Marqui RD. Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans Pattern Anal Mach Intell. 2006;28(3):403-415. doi:10.1109/TPAMI.2006.60 108. Kassambara A. Survminer package in R v0.4.9. Published online 2021. 109. Therneau T. A Package for Survival Analysis in R. Published online 2021. 95 110. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge- based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545-15550. doi:10.1073/pnas.0506580102 111. Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 2003;34(3):267-273. doi:10.1038/ng1180 112. Dolgalev I. Msigdbr package in R v7.4.1. Published online 2021. 113. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with deseq2. Genome Biology. 2014;15(12). doi:10.1186/s13059-014-0550-8 114. Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. Fast gene set enrichment analysis. Published online February 1, 2021:060012. doi:10.1101/060012 115. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32(18):2847-2849. doi:10.1093/bioinformatics/btw313 116. Ferrantini M, Capone I, Belardelli F. Interferon-α and cancer: Mechanisms of action and new perspectives of clinical use. Biochimie. 2007;89(6-7):884-893. doi:10.1016/j.biochi.2007.04.006 117. Yu B, Jiang T, Liu D. BCMA-targeted immunotherapy for multiple myeloma. Journal of Hematology & Oncology. 2020;13(1):125. doi:10.1186/s13045-020-00962-7 118. Institute NC. Strategic Planning at NCI. 2019;(ttps://www.cancer.gov/about- nci/overview/strategic-planning#ui-id-2>). 119. Institute NC. Cancer Detection and Diagnosis Research, <https://www.cancer.gov/research/areas/diagnosis>. Published online 2020. 120. Institute NC. Artificial Intelligence - Opportunities in Cancer Research, <https://www.cancer.gov/research/areas/diagnosis/artificial-intelligence>. Published online 2020. 121. Hofman P, Heeke S, C AP. Liquid biopsy in the era of immuno-oncology: is it ready for prime-time use for cancer patients? Annals of oncology : official journal of the European Society for Medical Oncology. 2019;30:1448-1459,. doi:10.1093/annonc/mdz196 122. Benjamens S, P D, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med. 2020;3:118,. doi:10.1038/s41746-020-00324-0 123. Bera K, Schalper KA, DL R. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nature reviews Clinical oncology. 2019;16:703-715,. doi:10.1038/s41571-019-0252-y 96 124. Y M, R B. Introduction to artificial intelligence in medicine. Minimally invasive therapy & allied technologies. MITAT : official journal of the Society for Minimally Invasive Therapy. 2019;28:73-81,. doi:10.1080/13645706.2019.1575882 125. Amisha MP, M P. Overview of artificial intelligence in medicine. Journal of family medicine and primary care. 2019;8:2328-2331,. doi:10.4103/jfmpc.jfmpc_440_19 126. Ramesh AN MJ Kambhampati C et al. Artificial intelligence in medicine. Annals of the Royal College of Surgeons of England. 2004;86:334-338,. doi:10.1308/147870804290 127. TURING AM. I.—COMPUTING MACHINERY AND IN℡LIGENCE. Mind. 1950;LIX:433-460,. doi:10.1093/mind/LIX.236.433 128. McCarthy J. in Philosophical Logic and Artificial Intelligence. Thomason RH, ed. Published online 1989. 129. Newton PK, Mason J, N V. Spatiotemporal progression of metastatic breast cancer: a Markov chain model highlighting the role of early metastatic sites. NPJ Breast Cancer. 2015;1:15018,. doi:10.1038/npjbcancer.2015.18 130. G MJ, S L. Development of metastatic brain disease involves progression through lung metastases in EGFR mutated non-small cell lung cancer. Convergent science physical oncology. 2017;3. 131. Fujii T, Mason J, A C. Prediction of Bone Metastasis in Inflammatory Breast Cancer Using a Markov Chain Model. Oncologist. 2019;24:1322-1330,. doi:10.1634/theoncologist.2018- 0713 132. Hasnain Z, Mason J, K G. Machine learning models for predicting post-cystectomy recurrence and survival in bladder cancer patients. PLoS One. 2019;14:0210976,. doi:10.1371/journal.pone.0210976 133. Spooner A, Chen E, A S. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci Rep. 2020;10:20410,. doi:10.1038/s41598-020-77220-w 134. Chi CL, WN S, Wolberg WH. Application of artificial neural network-based survival analysis on two breast cancer datasets. AMIA Annu Symp Proc. Published online 2007:130- 134. 135. Zhu W, Xie L, J H. The Application of Deep Learning in Cancer Prognosis Prediction. Cancers (Basel. 2020;12. doi:10.3390/cancers12030603 136. Institute NC. NCI Dictionaries: liquid biopsy, <https://www.cancer.gov/publications/dictionaries/cancer-terms/def/liquid-biopsy>. Published online 2021. 97 137. E L, K P. Liquid biopsies. Genes Chromosomes Cancer. 2019;58:219-232,. doi:10.1002/gcc.22695 138. Lim SB, Lee W, J V. Liquid biopsy: one cell at a time. NPJ Precis Oncol. 2019;3:23,. doi:10.1038/s41698-019-0095-0 139. G R, S RK, M B. Liquid Biopsies in Cancer Diagnosis, Monitoring, and Prognosis. Trends Pharmacol Sci. 2019;40:172-186,. doi:10.1016/j.tips.2019.01.006 140. Underwood JJ, Quadri RS, SP K. Liquid Biopsy for Cancer: Review and Implications for the Radiologist. Radiology. 2020;294:5-17,. doi:10.1148/radiol.2019182584 141. Ignatiadis M, GW S, Jeffrey SS. Liquid biopsy enters the clinic - implementation issues and future challenges. Nature reviews Clinical oncology. Published online 2021. doi:10.1038/s41571-020-00457-x 142. Todenhöfer T, Pantel K. Stenzl A et Al. (Biopsies TL, ed.). Springer International Publishing; 2020. 143. Spiliotaki M, Kallergi G, C N. Dynamic changes of CTCs in patients with metastatic HR(+)/HER2(-) breast cancer receiving salvage treatment with everolimus/exemestane. Cancer Chemother Pharmacol. 2021;87:277-287,. doi:10.1007/s00280-020-04227-5 144. Bratulic S, F G, Nielsen J. The Translational Status of Cancer Liquid Biopsies. Regenerative Engineering and Translational Medicine. Published online 2019. doi:10.1007/s40883-019-00141-2 145. Lee J, DY H, Hwang D. Single-cell multiomics: technologies and data analysis methods. Exp Mol Med. 2020;52:1428-1442,. doi:10.1038/s12276-020-0420-2 146. Hodara E, Morrison G, A C. Multiparametric liquid biopsy analysis in metastatic prostate cancer. JCI Insight. 2019;4. doi:10.1172/jci.insight.125529 147. BloodPAC. Blood Profiling Atlas in Cancer, <https://www.bloodpac.org/>. Published online 2021. 148. Grossman RL, Abel B, S A. Collaborating to Compete: Blood Profiling Atlas in Cancer (BloodPAC) Consortium. Clin Pharmacol Ther. 2017;101:589-592,. doi:10.1002/cpt.666 149. Institute NC. Cancer Moonshot, <https://www.cancer.gov/research/key- initiatives/moonshot-cancer-initiative>. Published online 2021. 150. Godsey JH, Silvestro A, JC B. Generic Protocols for the Analytical Validation of Next- Generation Sequencing-Based ctDNA Assays: A Joint Consensus Recommendation of the BloodPAC’s Analytical Variables Working Group. Clin Chem. 2020;66:1156-1166,. doi:10.1093/clinchem/hvaa164 98 151. Mahal BA, Chen YW, V M. National sociodemographic disparities in the treatment of high-risk prostate cancer: Do academic cancer centers perform better than community cancer centers? Cancer. 2016;122:3371-3377,. doi:10.1002/cncr.30205 152. Zavala VA, Bracci PM, JM C. Cancer health disparities in racial/ethnic minorities in the United States. Br J Cancer. 2021;124:315-332,. doi:10.1038/s41416-020-01038-6 153. Rodriguez-Alcalá ME, H Q, Jeanetta S. The Role of Acculturation and Social Capital in Access to Health Care: A Meta-study on Hispanics in the US. J Community Health. 2019;44:1224-1252,. doi:10.1007/s10900-019-00692-z 154. Graves KD, Huerta E, J C. Perceived risk of breast cancer among Latinas attending community clinics: risk comprehension and relationship with mammography adherence. Cancer Causes Control. 2008;19:1373-1382,. doi:10.1007/s10552-008-9209-7 155. HealthIT.gov. Blue Button <https://www.healthit.gov/topic/health-it-initiatives/blue- button>. Published online 2019. 156. Stankiewicz M. CMS rebrands Meaningful Use, reduces reporting measures, <https://www.fiercehealthcare.com/payer/cms-releases-ipps-ehr-meaningful-measure>. Published online 2018. 157. Office HP. HHS Finalizes Historic Rules to Provide Patients More Control of Their Health Data, <https://www.hhs.gov/about/news/2020/03/09/hhs-finalizes-historic-rules-to-provide- patients-more-control-of-their-health-data.html>. Published online 2020. 158. UFaD A. 21st Century Cures Act, <https://www.fda.gov/regulatory-information/selected- amendments-fdc-act/21st-century-cures-act>. Published online 2020. 159. Reisman M. EHRs: The Challenge of Making Electronic Data Usable and Interoperable. P t. 2017;42:572-575. 160. Sullivan T. Why EHR data interoperability is such a mess in 3 charts. 2018;(ttps://www.healthcareitnews.com/news/why-ehr-data-interoperability-such-mess-3- charts>). 161. Sui W, Ou M, J C. Comparison of immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) assessment for Her-2 status in breast cancer. World J Surg Oncol. 2009;7:83,. doi:10.1186/1477-7819-7-83 162. Gill MK, Asefa T, Y K. Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique. Water Resources Research. 2007;43. doi:10.1029/2006WR005298 163. Barakat MS, Field M, A G. The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance. Health Inf Sci Syst. 2017;5:16,. doi:10.1007/s13755-017-0039-4 99 164. Jerez JM, Molina I, PJ GL. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010;50:105-115,. doi:10.1016/j.artmed.2010.05.002 165. Musil CM, Warner CB, PK Y. A comparison of imputation techniques for handling missing data. West J Nurs Res. 2002;24:815-829,. doi:10.1177/019394502762477004 166. Richman MB. Trafalis TB & Adrianto I. In: Haupt SE, Pasini A, Marzban C, eds. Artificial Intelligence Methods in the Environmental Sciences. ; 2009:153-169. 167. Troyanskaya O, Cantor M, G S. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520-525,. doi:10.1093/bioinformatics/17.6.520 168. Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4:30,. doi:10.3978/j.issn.2305-5839.2015.12.63 169. DJ S, P B. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112-118,. doi:10.1093/bioinformatics/btr597 170. Keomanee-Dizon K, SN S, Kuhn P. (Biopsies TL, ed.). Springer International Publishing; 2020. 171. Cho EH, Wendel M, M L. Characterization of circulating tumor cell aggregates identified in patients with epithelial tumors. Phys Biol. 2012;9:016001,. doi:10.1088/1478- 3975/9/1/016001 172. Phillips KG, Kolatkar A, Rees KJ, et al. Quantification of cellular volume and sub-cellular density fluctuations: comparison of normal peripheral blood cells and circulating tumor cells identified in a breast cancer patient. Front Oncol. 2012;2. doi:10.3389/fonc.2012.00096 173. Lazar DC, Cho EH, Luttgen MS, et al. Cytometric comparisons between circulating tumor cells from prostate cancer patients and the prostate tumor derived LNCaP cell line. Phys Biol. 2012;9(1):016002. doi:10.1088/1478-3975/9/1/016002 174. Scher HI, Graf RP, NA S. Assessment of the Validity of Nuclear-Localized Androgen Receptor Splice Variant 7 in Circulating Tumor Cells as a Predictive Biomarker for Castration-Resistant Prostate Cancer. JAMA Oncol. 2018;4:1179-1186,. doi:10.1001/jamaoncol.2018.1621 175. Nieva J, Wendel M, MS L. High-definition imaging of circulating tumor cells and associated cellular events in non-small cell lung cancer patients: a longitudinal analysis. Phys Biol. 2012;9:016004,. doi:10.1088/1478-3975/9/1/016004 176. Wendel M, Bazhenova L, R B. Fluid biopsy for circulating tumor cell identification in patients with early-and late-stage non-small cell lung cancer: a glimpse into lung cancer biology. Phys Biol. 2012;9:016005,. doi:10.1088/1478-3967/9/1/016005 100 177. Shishido SN, Carlsson A, J N. Circulating tumor cells as a response monitor in stage IV non-small cell lung cancer. J Transl Med. 2019;17:294,. doi:10.1186/s12967-019-2035-8 178. Carlsson A, Nair VS, Luttgen MS, et al. Circulating Tumor Microemboli Diagnostics for Patients with Non–Small-Cell Lung Cancer. Journal of Thoracic Oncology. 2014;9(8):1111-1119. doi:10.1097/JTO.0000000000000235 179. Polski A, Xu L, RK P. Cell-Free DNA Tumor Fraction in the Aqueous Humor Is Associated With Therapeutic Response in Retinoblastoma Patients. Transl Vis Sci Technol. 2020;9:30,. doi:10.1167/tvst.9.10.30 180. Scher HI, Lu D, NA S. Association of AR-V7 on Circulating Tumor Cells as a Treatment- Specific Biomarker With Outcomes and Survival in Castration-Resistant Prostate Cancer. JAMA Oncol. 2016;2:1441-1449,. doi:10.1001/jamaoncol.2016.1828 181. Graf RP, Hullings M, ES B. Clinical Utility of the Nuclear-localized AR-V7 Biomarker in Circulating Tumor Cells in Improving Physician Treatment Choice in Castration-resistant Prostate Cancer. Eur Urol. 2020;77:170-177,. doi:10.1016/j.eururo.2019.08.020 182. Armstrong AJ, Luo J, DM N. Prospective Multicenter Study of Circulating Tumor Cell AR- V7 and Taxane Versus Hormonal Treatment Outcomes in Metastatic Castration-Resistant Prostate Cancer. JCO Precis Oncol. 2020;4. doi:10.1200/po.20.00200 183. Armstrong AJ, Halabi S, J L. Prospective Multicenter Validation of Androgen Receptor Splice Variant 7 and Hormone Therapy Resistance in High-Risk Castration-Resistant Prostate Cancer: The PROPHECY Study. J Clin Oncol. 2019;37:1120-1129,. doi:10.1200/jco.18.01731 184. Brown LC, Lu C, ES A. Androgen receptor variant-driven prostate cancer II: advances in clinical investigation. Prostate Cancer Prostatic Dis. 2020;23:367-380,. doi:10.1038/s41391-020-0215-5 185. Gerdtsson AS, Setayesh SM, PD M. Large Extracellular Vesicle Characterization and Association with Circulating Tumor Cells in Metastatic Castrate Resistant Prostate Cancer. Published online 2021. doi:10.3390/cancers13051056 186. Dago AE, Stepansky A, A C. Rapid phenotypic and genomic change in response to therapeutic pressure in prostate cancer inferred by high content analysis of single circulating tumor cells. PLoS One. 2014;9:101777,. doi:10.1371/journal.pone.0101777 187. Greene SB, Dago AE, LJ L. Chromosomal Instability Estimation Based on Next Generation Sequencing and Single Cell Genome Wide Copy Number Variation Analysis. PLoS One. 2016;11, e0165089. doi:10.1371/journal.pone.0165089 188. Malihi PD, Graf RP, A R. Single-Cell Circulating Tumor Cell Analysis Reveals Genomic Instability as a Distinctive Feature of Aggressive Prostate Cancer. Clin Cancer Res. 2020;26:4143-4153,. doi:10.1158/1078-0432.Ccr-19-4100 101 189. Welter L, Xu L, D M. Treatment response and tumor evolution: lessons from an extended series of multianalyte liquid biopsies in a metastatic breast cancer patient. Cold Spring Harbor molecular case studies. 2020;6. doi:10.1101/mcs.a005819 190. Giesen C, Wang HA, D S. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat Methods. 2014;11:417-422,. doi:10.1038/nmeth.2869 191. Poreba M, Groborz KM, W R. Multiplexed Probing of Proteolytic Enzymes Using Mass Cytometry-Compatible Activity-Based Probes. J Am Chem Soc. 2020;142:16704-16715,. doi:10.1021/jacs.0c06762 192. Liang M, Li Z, T C. Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach. IEEE/ACM Trans Comput Biol Bioinform. 2015;12:928-937,. doi:10.1109/tcbb.2014.2377729 193. Tan X, Su AT, H H. Applying Machine Learning for Integration of Multi-Modal Genomics Data and Imaging Data to Quantify Heterogeneity in Tumour Tissues. Methods Mol Biol. 2021;2190:209-228,. doi:10.1007/978-1-0716-0826-5_10 194. Ray B, Henaff M, S M. Information content and analysis methods for multi-modal high- throughput biomedical data. Sci Rep. 2014;4:4411,. doi:10.1038/srep04411 195. Johnson K, Howard GR, D M. Integrating multimodal data sets into a mathematical framework to describe and predict therapeutic resistance in cancer. bioRxiv. Published online 2020. doi:10.1101/2020.02.11.943738 196. Yan R, Ren F, X R. 460-469. Springer International Publishing 197. Sandfort V, Yan K, PJ P. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci Rep. 2019;9:16884,. doi:10.1038/s41598-019-52737-x 198. Goodfellow I, Pouget-Abadie J, M M. Generative Adversarial Networks. Published online 2014. 199. Crystal DT, Cuccolo NG, AMS I. Photographic and Video Deepfakes Have Arrived: How Machine Learning May Influence Plastic Surgery. Plast Reconstr Surg. 2020;145:1079- 1086,. doi:10.1097/prs.0000000000006697 200. Hwang Y, JY R, Jeong SH. Effects of Disinformation Using Deepfake: The Protective Effect of Media Literacy Education. Cyberpsychol Behav Soc Netw. 2021;24:188-193,. doi:10.1089/cyber.2020.0174 201. Sample I. What are deepfakes – and how can you spot them?, <https://www.theguardian.com/technology/2020/jan/13/what-are-deepfakes-and-how-can- you-spot-them>. Published online 2020. 102 202. Tschuchnig ME, GJ O, Gadermayr M. Generative Adversarial Networks in Digital Pathology: A Survey on Trends and Future Potential. Patterns. 2020;N Y) 1:100089,. doi:10.1016/j.patter.2020.100089 203. Lafarge MW, Pluim JPW, KAJ E. 83-91. Springer International Publishing 204. Luedtke A, Carone M, N S. Learning to learn from data: Using deep adversarial learning to construct optimal statistical procedures. Sci Adv. 2020;6:2140,. doi:10.1126/sciadv.aaw2140 205. Jeong B, Lee W, DS K. Copula-Based Approach to Synthetic Population Generation. PLoS One. 2016;11, e0159496. doi:10.1371/journal.pone.0159496 206. Sun Y, A CI, Veeramachaneni K. Learning Vine Copula Models for Synthetic Data Generation. In: Proceedings of the AAAI Conference on Artificial Intelligence 33. ; 2019:5049-5057,. doi:10.1609/aaai.v33i01.33015049 207. Durante F, J FS, Sempi C. A topological proof of Sklar’s theorem. Applied Mathematics Letters. 2013;26:945-948,. doi:10.1016/j.aml.2013.04.005 208. Wilson AG, Ghahramani Z. Copula Processes. In: Advances in Neural Information Processing Systems. Vol 23. Curran Associates, Inc.; 2010. Accessed December 12, 2022. https://papers.nips.cc/paper/2010/hash/fc8001f834f6a5f0561080d134d53d29-Abstract.html 209. Benth FE, Di Nunno G, Schroers D. Copula Measures and Sklar’s Theorem in Arbitrary Dimensions. Published online December 22, 2020. doi:10.48550/arXiv.2012.11530 210. M O, Y L. A Gaussian Copula Model for Multivariate Survival Data. Stat Biosci. 2010;2:154-179,. doi:10.1007/s12561-010-9026-x 211. Murray JS, Dunson DB, L C. Bayesian Gaussian Copula Factor Models for Mixed Data. J Am Stat Assoc. 2013;108:656-665,. doi:10.1080/01621459.2012.762328 212. Kamthe S, S A, Deisenroth M. Copula Flows for Synthetic Data Generation. ArXiv. 2021;2101(00598). 213. L X, K V. Synthesizing Tabular Data using Generative Adversarial Networks. Published online 2018. 214. Patki N, R W, Veeramachaneni K. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA. ; :399-410. 215. Halicek M, Ortega S, H F. Conditional generative adversarial network for synthesizing hyperspectral images of breast cancer cells from digitized histology. Proc SPIE Int Soc Opt Eng. 2020;11320. doi:10.1117/12.2549994 216. Xu L, Skoularidou M. Cuesta-Infante A et al. In: NeurIPS. 103 217. Clark TG, Bradburn MJ, SB L. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89:232-238,. doi:10.1038/sj.bjc.6601118 218. Dudley WN, R W, Coombs N. An Introduction to Survival Statistics: Kaplan-Meier Analysis. J Adv Pract Oncol. 2016;7:91-100,. doi:10.6004/jadpro.2016.7.1.8 219. Goel MK, P K, Kishore J. Understanding survival analysis: Kaplan-Meier estimate. Int J Ayurveda Res. 2010;1:274-278,. doi:10.4103/0974-7788.76794 220. Kotz S, Johnson NL, eds. Cox DR. in Breakthroughs in Statistics: Methodology and Distribution. Published online 1992. 221. Breiman L. Random Forests. Machine Learning. 2001;45:5-32,. doi:10.1023/A:1010933404324 222. H W, G L. A Selective Review on Random Survival Forests for High Dimensional Data. Quant Biosci. 2017;36:85-96,. doi:10.22283/qbs.2017.36.2.85 223. Ishwaran H, Kogalur UB, E B. Random survival forests. The Annals of Applied Statistics. 2008;2:841-860. 224. Wongvibulsin S, KC W, Zeger SL. Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis. BMC Med Res Methodol. 2019;20:1,. doi:10.1186/s12874-019-0863-0 225. in ZD. In: 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC. ; 2011:6816-6819. 226. Brard C, Le Teuff G, MC LD. Bayesian survival analysis in clinical trials: What methods are used in practice? Clin Trials. 2017;14:78-87,. doi:10.1177/1740774516673362 227. Biard L, Bergeron A, V L. Bayesian survival analysis for early detection of treatment effects in phase 3 clinical trials. Contemp Clin Trials Commun. 2021;21:100709,. doi:10.1016/j.conctc.2021.100709 228. Zupan B, Demsar J, MW K. Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif Intell Med. 2000;20:59-75,. doi:10.1016/s0933- 3657(00)00053-1 229. Wang P, Y L, Reddy C. Machine Learning for Survival Analysis. ACM Computing Surveys (CSUR. 2017;51:1-36. 230. D F, R S. A neural network model for survival data. Stat Med. 1995;14:73-82,. doi:10.1002/sim.4780140108 231. Katzman JL, Shaham U, A C. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18:24,. doi:10.1186/s12874-018-0482-1 104 232. Roodnat JI, Mulder PG, ET T. The Cox proportional hazards analysis in words: examples in the renal transplantation field. Transplantation. 2004;77:483-488,. doi:10.1097/01.tp.0000110424.27977.a1 233. Harrell FE Jr, Califf RM, DB P. Evaluating the yield of medical tests. Jama. 1982;247:2543-2546. 234. Carlsson A, Kuhn P, Luttgen MS, et al. Paired High-Content Analysis of Prostate Cancer Cells in Bone Marrow and Blood Characterizes Increased Androgen Receptor Expression in Tumor Cell Clusters. Clin Cancer Res. 2017;23(7):1722-1732. doi:10.1158/1078- 0432.CCR-16-1355 235. Corn PG, Heath EI, A Z. Cabazitaxel plus carboplatin for the treatment of men with metastatic castration-resistant prostate cancers: a randomised, open-label, phase 1-2 trial. The Lancet Oncology. 2019;20:1432-1443,. doi:10.1016/s1470-2045(19)30408-5 236. Kelly SP, Anderson WF, PS R. Past, Current, and Future Incidence Rates and Burden of Metastatic Prostate Cancer in the United States. Eur Urol Focus. 2018;4:121-127,. doi:10.1016/j.euf.2017.10.014 237. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;68(1):7- 30. doi:10.3322/caac.21442 238. Alix-Panabières C. The future of liquid biopsy. Nature. 2020;579:9,. doi:10.1038/d41586- 020-00844-5 239. Keating SM, Taylor DL, AL P. Opportunities and Challenges in Implementation of Multiparameter Single Cell Analysis Platforms for Clinical Translation. Clin Transl Sci. 2018;11:267-276,. doi:10.1111/cts.12536 240. Rockne RC, Hawkins-Daarud A, KR S. The 2019 mathematical oncology roadmap. Phys Biol. 2019;16:041005,. doi:10.1088/1478-3975/ab1a09 241. Y S, K R. Patient Representation Transfer Learning from Clinical Notes based on Hierarchical Attention Network. AMIA Jt Summits Transl Sci Proc. Published online 2020:597-606. 242. Kensert A, PJ H, Spjuth O. Transfer Learning with Deep Convolutional Neural Networks for Classifying Cellular Morphological Changes. SLAS Discov. 2019;24:466-475,. doi:10.1177/2472555218818756 243. Estiri H, S V, Murphy SN. Generative transfer learning for measuring plausibility of EHR diagnosis records. J Am Med Inform Assoc. 2021;28:559-568,. doi:10.1093/jamia/ocaa215 244. Wainrib G. Transfer Learning and the Rise of Collaborative AI, <https://owkin.com/collaborative-ai/transfer-learning/>. Published online 2021. 105 245. Brisimi TS, Chen R, T M. Federated learning of predictive models from federated Electronic Health Records. Int J Med Inform. 2018;112:59-67,. doi:10.1016/j.ijmedinf.2018.01.007 246. Vaid A, Jaladanki SK, J X. Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19. medRxiv. Published online 2020. doi:10.1101/2020.08.11.20172809 247. A G, D K. Interpretable Artificial Intelligence: Why and When. AJR Am J Roentgenol. 2020;214:1137-1138,. doi:10.2214/ajr.19.22145 248. Hao J, Kim Y, T M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genomics. 2019;12:189,. doi:10.1186/s12920-019-0624-2 249. Gatenby RA, Silva AS, RJ G. Adaptive therapy. Cancer Res. 2009;69:4894-4903,. doi:10.1158/0008-5472.Can-08-3658 250. West J, You L, J Z. Towards Multidrug Adaptive Therapy. Cancer Res. 2020;80:1578- 1589,. doi:10.1158/0008-5472.Can-19-2669 251. Hussey PS, Sorbero ME, A M. Episode-based performance measurement and payment: making it a reality. Health Aff (Millwood. 2009;28:1406-1417,. doi:10.1377/hlthaff.28.5.1406 252. Berardi S, Caivano A, Ria R, et al. Four proteins governing overangiogenic endothelial cell phenotype in patients with multiple myeloma are plausible therapeutic targets. Oncogene. 2012;May;31(18):2258–69. 253. Dutsch-Wicherek M. REVIEW ARTICLE: RCAS1, MT, and Vimentin as Potential Markers of Tumor Microenvironment Remodeling: MARKERS OF TUMOR MICROENVIRONMENT REMODELING. Am J Reprod Immunol. 2010;19;63(3):181–8. 254. Ribatti D, Nico B, Vacca A. Importance of the bone marrow microenvironment in inducing the angiogenic response in multiple myeloma. Oncogene. 2006;Jul;25(31):4257–66. 255. Mathur P, Alapat D, Kumar M, Thanendrarajan S. Metastatic prostate cancer with bone marrow infiltration mimicking multiple myeloma. Clin Case Rep. 2017;6(2):269-273. doi:10.1002/ccr3.1308 256. Cho SY, Jeong JH, Lee WI, et al. Plasma Cell Myeloma Initially Presenting as Lung Cancer. Ann Lab Med. 2013;33(3):225-228. doi:10.3343/alm.2013.33.3.225 257. Prasad R, Verma SK, Sodhi R. Multiple myeloma with lung plasmacytoma. Lung India. 2011;28(2):136-138. doi:10.4103/0970-2113.80331 258. Zhang H. Circulating endothelial progenitor cells in multiple myeloma: implications and significance. Blood. 2005;15;105(8):3286–94. 106 259. Takagi S, Tsukamoto S, Park J, et al. Platelets Enhance Multiple Myeloma Progression via IL-1β Upregulation. Clin Cancer Res. 2018;24(10):2430-2439. doi:10.1158/1078- 0432.CCR-17-2003 260. Bodenmiller B, Zunder ER, Finck R, et al. Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nat Biotechnol. 2012;30(9):858-867. doi:10.1038/nbt.2317 261. Altrock PM, Ferlic J, Galla T, Tomasson MH, Michor F. Computational Model of Progression to Multiple Myeloma Identifies Optimum Screening Strategies. JCO Clinical Cancer Informatics. 2018;(2):1-12. doi:10.1200/CCI.17.00131 262. Rögnvaldsson S, Love TJ, Thorsteinsdottir S, et al. Iceland screens, treats, or prevents multiple myeloma (iStopMM): a population-based screening study for monoclonal gammopathy of undetermined significance and randomized controlled trial of follow-up strategies. Blood Cancer J. 2021;11(5):1-13. doi:10.1038/s41408-021-00480-w 263. Bustoros M, Sklavenitis-Pistofidis R, Park J, et al. Genomic Profiling of Smoldering Multiple Myeloma Identifies Patients at a High Risk of Disease Progression. J Clin Oncol. 2020;38(21):2380-2389. doi:10.1200/JCO.20.00437 264. Maura F, Bolli N, Angelopoulos N, et al. Genomic landscape and chronological reconstruction of driver events in multiple myeloma. Nat Commun. 2019;10(1):3835. doi:10.1038/s41467-019-11680-1 265. Chang Q, Ornatsky OI, Siddiqui I, Loboda A, Baranov VI, Hedley DW. Imaging Mass Cytometry. Cytometry A. 2017;91(2):160-169. doi:10.1002/cyto.a.23053 266. Mikhael J, Ismaila N, Cheung MC, et al. Treatment of Multiple Myeloma: ASCO and CCO Joint Clinical Practice Guideline. JCO. 2019;37(14):1228-1263. doi:10.1200/JCO.18.02096 267. Clinical Pathways Updates in Multiple Myeloma: New and Expected Treatments for Refractory and Resistant Multiple Myeloma. HMP Global Learning Network. Published January 14, 2021. Accessed September 2, 2022. https://www.hmpgloballearningnetwork.com/site/jcp/clinical-pathways-updates-multiple- myeloma-new-and-expected-treatments-refractory-and-resistant
Abstract (if available)
Abstract
Plasma cell neoplasms are a network of benign and malignant disorders that are clinically and biologically distinct. Molecular profiling in overt and precursor conditions shows additional subtypes based on associated clonal immunoglobulins and thus manifests significant heterogeneity in disease risk, progression rates, treatment response, prognosis, and survival outlook. While the progression of plasma cell cancers is clinically defined, the biological mechanisms and sequential events leading to progression between states remain poorly understood, hindering accurate prediction of state change. Further, the intrinsic heterogeneity in the multi-state spatiotemporal progression makes it difficult to predict which patients will or will not progress. Additionally, standard monitoring for disease management in both pre-malignant and overt myeloma is through repetitive, highly invasive, and costly bone marrow biopsies. The above unmet needs have motivated the research work in this dissertation thesis applying a convergent oncology approach integrating liquid biopsy via single cell multiparametric profiling and machine and deep learning for disease subtyping and outcomes prediction in cancer patients. Single cell morphoproteogenomics assays were developed for rare cell detection and characterization in the bone marrow and peripheral blood of patients with lymphoproliferative cancers and validated as potential blood-based liquid biopsy technologies for clinical applications. Machine learning methods to delineate subtypes and predict therapy response in newly diagnosed multiple myeloma are presented in this work as quantitative framework to delineate the heterogeneity in myeloma. Finally, work to integrate multimodal liquid biopsy data with predictive models are discussed as a promising quantitative approach towards personalized medicine and improved patient outcomes.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Molecular signature of aggressive disease and clonal diversity revealed by single-cell copy number analysis of prostate cancer cells across multiple disease states
PDF
Generation and characterization of anti-CD138 chimeric antigen receptor T (CAR-T) cells for the treatment of hematologic malignancies
PDF
Deconvolution of circulating tumor cell heterogeneity and implications for aggressive variant prostate cancer
PDF
Applying multi-omics in cancer liquid biopsy for improved patient monitoring and biomarker discovery
PDF
Early detection of lung cancer by characterizing circulating rare cells using peripheral blood liquid biopsy
PDF
Exploring the effects of CXCR4 inhibition on circulating tumor cell populations in metastatic prostate cancer
PDF
Heterogeneity and plasticity of malignant and non-malignant circulating analytes in breast carcinomas
PDF
Genetic risk factors in multiple myeloma
PDF
Evaluation of preservatives in blood collection tubes for cell-free RNA transcriptional profiles in human plasma
PDF
Role of the bone marrow niche components in B cell malignancies
PDF
Malignant cell fraction prediction using deep learning: from point estimate to uncertainty quantification
PDF
RNA methylation in cancer plasticity and drug resistance
PDF
Model development of breast cancer detection and staging via rare event enumeration from a liquid biopsy: a retrospective descriptive clinical research study
PDF
Potential of aqueous humor as a liquid biopsy for uveal melanoma
PDF
Mechanisms that dictate beta cells’ response to stress in the context of genetic mutation, pregnancy, and infection
PDF
Immune infiltrates in papillary thyroid carcinomas
PDF
Developing a robust single cell whole genome bisulfite sequencing protocol to analyse circulating tumor cells
PDF
A synthetic lethal screen for NF-κB-dependent plasma cell disorders
PDF
Determining the epigenetic contribution of basal cell identity in cystic fibrosis
PDF
A single cell time course of senescence uncovers discrete cell trajectories and transcriptional heterogeneity
Asset Metadata
Creator
Ndacayisaba, Libère Jensen (author)
Core Title
Multimodal single-cell biology and machine learning to characterize plasma cell neoplasms
Contributor
Electronically uploaded by the author
(provenance)
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Medical Biophysics
Degree Conferral Date
2022-12
Publication Date
12/30/2023
Defense Date
08/08/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
assay development,early detection,hematology,liquid biopsy,machine learning,mathematical oncology,morphoproteogenomics,multimodal data,multiomics,multiple myeloma,OAI-PMH Harvest,oncology,plasma cell,predictive modeling,single cell biology
Format
theses
(aat)
Language
English
Advisor
Lee, Jerry S.H. (
committee chair
), Kelly, Kevin (
committee member
), Kuhn, Peter (
committee member
), Oberai, Assad (
committee member
)
Creator Email
nalibere@gmail.com,ndacayis@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112661531
Unique identifier
UC112661531
Identifier
etd-Ndacayisab-11397.pdf (filename)
Legacy Identifier
etd-Ndacayisab-11397
Document Type
Dissertation
Format
theses (aat)
Rights
Ndacayisaba, Libère Jensen
Internet Media Type
application/pdf
Type
texts
Source
20230104-usctheses-batch-999
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
assay development
early detection
hematology
liquid biopsy
machine learning
mathematical oncology
morphoproteogenomics
multimodal data
multiomics
multiple myeloma
oncology
plasma cell
predictive modeling
single cell biology