Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Investigating hematopoietic stem cells via multiscale modeling, single-cell genomics, and gene regulatory network inference
(USC Thesis Other)
Investigating hematopoietic stem cells via multiscale modeling, single-cell genomics, and gene regulatory network inference
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INVESTIGATING HEMATOPOIETIC STEM CELLS VIA MULTISCALE MODELING, SINGLE-CELL GENOMICS, AND GENE REGULATORY NETWORK INFERENCE by Megan Kristen Rommelfanger A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTATIONAL BIOLOGY AND BIOINFORMATICS) May 2023 Copyright 2023 Megan Kristen Rommelfanger Epigraph ...there’s a little twist, a flame, a small shift. It is subtle, it comes when you least expect it. Wait for it... No matter how awful and long your journey, I can promise you the turn. One day it will lift. - Chanel Miller ii Dedication To my love, Nick. To Ken, Karen, Ramona, and Alan. To Michael, Emma, and Annalise. A piece of this achievement belongs to each of you for helping me make it to this moment. To Adam, for offering constant, undeserved kindness and patience. iii Acknowledgements I would like to thank Rong Lu and members of the Lu lab for many helpful discussions on multiple sections of this work. I am grateful to Michael P.H. Stumpf for insightful comments and feedback on the manuscript associated with the second chapter of this work. I would also like to thank K. Lenhard Rudolph, MiaoMiao Suo, Yulin Chen, and Marthe Behrends for their excellent collaboration which produced the third and fourth chapters of this work. Thank you to Jonathan Martinez for helping with the inference method testing in the fourth chapter. This work was partially supported by a USC WiSE Top-up Fellowship (to M.K.R.). iv TableofContents Epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2: A single-cell resolved cell-cell communication model explains lineage commitment in hematopoiesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1 A multiscale model of cell-cell communication between single cells . . . . . . . . . 8 2.3.2 Model specification and choice of parameters . . . . . . . . . . . . . . . . . . . . . 11 2.3.3 Intrinsic and extrinsic cell-cell communication noise . . . . . . . . . . . . . . . . . 14 2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.1 Cell-cell communication over a wide range of signaling topologies leads to divergent cell fates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Varying the strength of cell-cell coupling results in different ratios of stable subpopulations of heterogeneous cell types . . . . . . . . . . . . . . . . . . . . . . 20 2.4.3 Intrinsic and extrinsic noise alter cell fate decision-making boundaries . . . . . . . 24 2.4.4 Spatial structure in communication networks impacts cell fate outcomes . . . . . . 26 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 3: Interrogating HSC phenotypes with single-cell RNA sequencing data analysis . . . . . 33 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 Bulk RNA sequencing and Gene ontology (GO) enrichment analysis. . . . . . . . . 35 3.3.2 Single cell RNA-seq (scRNA-seq) data normalization and quality control . . . . . . 36 3.3.3 Visualization, clustering, and differential expression of scRNA-seq data using Seurat. 36 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 v 3.4.1 Igf2bp2 controls mitochondrial metabolism/protein synthesis related genes in HSC of young mice but its gene regulatory function is lost in aged HSC. . . . . . . 37 3.4.2 Igf2bp2 expression is enriched in a subcluster in HSCs of young mice cosegregating with expression of Lin28, Igf/Pi3k, and stemness-related genes. . . . . . . . . . . . 38 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 4: Gene regulatory network inference for hematopoietic stem cell state transitions with dietary interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Animal Experiments and 10x Multiome Protocol . . . . . . . . . . . . . . . . . . . 49 4.3.2 HSPC multiomic data normalization and quality control . . . . . . . . . . . . . . . 50 4.3.3 HSPC multiomic data clustering and peak calling . . . . . . . . . . . . . . . . . . . 51 4.3.4 Feature Selection for Network Inference . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.5 HSPC multiomic pseudotime ordering, feature selection, and pseudocell construction 52 4.3.6 HSPC multiomic ATAC accessibility scoring for pseudocells . . . . . . . . . . . . . 53 4.3.7 Pseudotime-ordered pseudocell (POP) inference model . . . . . . . . . . . . . . . . 54 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 The stem-to-multipotent hematopoietic transition is maintained throughout lifetime and dietary perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.2 Contemporaneous gene expression and chromatin accessibility measurements enable inference of dynamic gene regulatory interactions . . . . . . . . . . . . . . 56 4.4.3 popInfer outperforms other methods that run on scRNA-seq alone . . . . . . . . . 62 4.4.4 popInfer identifies Dnmt1 and Satb1 as top regulators with different roles across the aging and DR phenotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Chapter 5: Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Appendix A: Supplementary information for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . 88 Appendix B: Supplementary information for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . 94 vi ListofFigures 1.1 Cartoon of stem cell differentiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Overview of multiscale cell fate commitment model. . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Sample trajectories of GATA1 for cells signaling in a loop. . . . . . . . . . . . . . . . . . . . 18 2.3 Cell fate commitment across a range of signaling topologies. . . . . . . . . . . . . . . . . . 21 2.4 Cell fate coupling is controlled by the external signaling parameter. . . . . . . . . . . . . . 23 2.5 Extrinsic and intrinsic sources of noise reduce fidelity of cell fate choice. . . . . . . . . . . 26 2.6 Spatial organization of cells regulates patterns of cell fate commitment. . . . . . . . . . . . 29 3.1 Igf2bp2 deletion decreases the expression of genes related to mitochondrial metabolism and protein synthesis in young myeloid biased HSCs . . . . . . . . . . . . . . . . . . . . . 39 3.2 Igf2bp2-expression in young HSC co-segregates with Lin28b, Igf/Pi3k, and stemness- related gene expression signature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 Igf2bp2-high cluster is enriched for the expression of imprinted genes that are expressed in long-term HSC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 High expression of megakaryocyte/thrombocyte and erythroid marker genes indicates that HSCs in clusters 1, 2, 6, and 7 are lineage primed. . . . . . . . . . . . . . . . . . . . . . 44 4.1 Overview of experimental study design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2 popInfer workflow for single-cell multiome gene regulatory network inference. . . . . . . 62 4.3 popInfer performance comparison with scRNA-seq GRN inference methods. . . . . . . . . 65 4.4 popInfer identifies Dnmt1 and Satb1 as key regulators of the aged HSC phenotype. . . . . 69 A.1 Sample simulation of a loop of 20 cells with concentrations of both G and P over time. Parameters are set toλ = 28.0 andA 0 = 1.0. . . . . . . . . . . . . . . . . . . . . . . . . . 88 vii A.2 Chains and loops of cells from Fig. 3A-3B along with curves where the number of cells n = 100. The number of simulation iterations was reduced to 100 per A 0 value in simulations of 100 cell topologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.3 Cell fate probability distributions for a loop of three cells with one inverse signal. . . . . . 89 A.4 Probability distributions of (A) cell two in a chain and (B) two cell loop converging to the G high state resulting from a consensus signal to the parameterB and dual consensus signals to both parametersA andB. A 0 is fixed as 0.65. The signaling parameters for the signals changingA andB areλ A = 28.0 andλ B = 28.0 respectively. . . . . . . . . . . . 89 A.5 Probability distributions of cell fate commitment to G high steady state for cell 2 in a chain of two cells with a dissensus signal whereκ ∈ [16,40]. . . . . . . . . . . . . . . . . 90 A.6 (A)-(D) Sample trajectories of a loop of two cells withA 0 = 1.0175 andλ = 40. . . . . . . 90 A.7 Distributions of loops of 10 cells with different values of λ . The values of A 0 were selected so that the distributions had similar expected values. For the blue distribution, A 0 = 0.9973 and the expected value is5.284. For the grey distribution,A 0 = 1.016 and the expected value is5.154 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 A.8 Probability distributions that a given cell in a two cell loop will converge to the erythroid state with varying amounts of (A) extrinsic noise or (B) intrinsic noise. . . . . . . . . . . . 91 A.9 (A) Sample trajectory of a chain of four cells whereA 0 = 1.01 andλ = 38.0. (B)-(D) Plots ofA 2 (t),A 3 (t), andA 4 (t) over the same timescale as the trajectory. . . . . . . . . . . . . 92 A.10 Probability distributions of cell 2 in a chain of cells converging to theG high state where λ = 18 with different mean wait times, µ , given in the legend. . . . . . . . . . . . . . . . . 93 A.11 Sample of simulated data points along with the fitted curve for cell 2 in a chain of cells whereλ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 B.1 Comparison of gene expression and gene accessibility score for Cd34. . . . . . . . . . . . 95 B.2 Overview of scRNA-seq processing in the old ad libitum (oAL) sample. . . . . . . . . . . . 96 viii Abstract During hematopoiesis, a concert of intrinsic and extrinsic factors guide hematopoietic stem cells (HSCs) towards different hematopoietic lineages. In this work, a variety of these factors are probed using mul- tiscale mathematical modeling, data analysis of single-cell genomics data, and gene regulatory network inference. The influence of cell-cell communication on hematopoietic progenitor lineage decision making is explored using a multiscale mathematical model of the well-characterized GATA1-PU.1 bifurcation. The relationship between HSC aging and changes in metabolism is evaluated by analyzing bulk and single-cell RNA-sequencing data from HSCs extracted from wild type andIgf2bp2 knockout mice of varying ages. Lastly, changes in gene regulation during hematopoiesis is evaluated using a proposed novel method, popInfer, for inferring gene regulatory networks from single-cell multiome (RNA and ATAC) data. popIn- fer is applied to the transition from HSCs to multipotent progenitors from young and old mice fed ad libitum and diet restricted to compare how the gene regulatory networks are perturbed with each condi- tion. In total, this dissertation further elucidates how HSC function and lineage decision-making occurs and is influenced by cell-cell communication, metabolism, aging, and diet. ix Chapter1 Introduction Hematopoietic stem cells (HSCs) are stem cells found in the bone marrow that are responsible for populat- ing the blood with the full spectrum of mature hematopoietic cell types and maintaining proper amounts of each cell type throughout life [1]. These essential cells are capable of differentiating into any special- ized hematopoietic cell, via hematopoiesis: the complex process of sequential lineage decisions, ultimately leading to a final cell fate [2]. Stem cells undergoing differentiation become committed to a specific cell fate through a sequence of often binary and irreversible lineage specification choices (Fig. 1.1). Hematopoiesis has been highly studied as a model stem cell differentiation system, however, there remain many unan- swered questions in the study of hematopoiesis: what factors initiate cell fate decisions? How do progeni- tor cells determine what type of mature cell to become? How does aging affect HSC function? What gene regulatory network is responsible for the transition from HSCs out of the stem state? The development of next-generation sequencing technologies has provided the opportunity to profile a variety of molecules at the resolution of both whole samples of cells and single cells. One such data type is single-cell RNA sequencing experiments which measure the abundance of each mRNA in a cell or a population of cells, thus providing a snapshot of the transcriptome [3]. Processing the output of an RNA sequencing experiment gives us a count matrix of the number of occurrences of each mRNA (denoted by the corresponding gene) in the samples or single cells, allowing for the application of computational 1 tools to perform tasks such as high-resolution cell type classification and cell type marker gene identifi- cation. Another data type utilized in this work to assess chromatin accessibility is single-cell assay for transposase-accessible chromatin (ATAC) sequencing, which gives us the counts of reads from the acces- sible chromatin overlapping genomic regions [4]. The ability to study gene expression and accessibility at single-cell resolution offers an excellent opportunity to apply mathematical and statistical tools to probe biological questions like those proposed above. In Chapter 2, I present a multiscale model built to describe how cell-cell signaling influences cell lineage decision making. As a model system, I studied the granulocyte-monocyte vs. megakaryocyte-erythrocyte fate decision, dictated by the GATA1-PU.1 network. Incorporating cell-cell signaling into the model of this lineage decision explains how cell fate decisions are probabilistic with respect to initial gene expression states, how specific distributions of heterogeneous cell types are maintained and adjusted in response to environmental cues, how noise in signaling increases variability of cell fate decisions, and how proximal cells couple cell fate. Figure 1.1: Cartoon of stem cell differentiation. Colors represent the transcriptional state and branching points represent bifurcations in cell fate. 2 Chapter 3 aims to characterize the role of the gene Igf2bp2 in HSC aging. This is achieved through analyzing bulk and single-cell RNA sequencing datasets provided by our collaborators at the Leibniz In- stitute of Aging.Igf2bp2 is identified to be involved with regulating mitochondrial metabolism and protein synthesis in young myeloid biased HSCs. As a result, the expression of these genes becomes more similar to old mice as opposed to young, implying that loss of Igf2bp2-mediated regulation is involved with the aging phenotype. In Chapter 4, I present a new computational method, popInfer, for inferring gene regulatory networks of dynamic processes from single-cell multiome (RNA and ATAC measurements from the same single- cells). I apply popInfer to four single-cell multiome samples from our collaborators at the Leibniz Institute of Aging, comprised of HSCs from mice of varying ages and diet treatments. popInfer outperforms other methods that are applied to RNA measurements alone and is able to infer top regulators of the HSC to multipotent progenitor cell transition, as well as identify how these regulators change in response to diet and age. Lastly, Chapter 5 provides an overview and discussion of the results presented in this work. Next steps for future work are suggested for each of the three projects presented in Chapters 2-4. Finally, ideas for more comprehensive projects combining the ideas from Chapters 2-4 are proposed. 3 Chapter2 Asingle-cellresolvedcell-cellcommunicationmodelexplainslineage commitmentinhematopoiesis Parts of this chapter are included in a manuscript which is already published in Development [5]. 2.1 Abstract Cells do not make fate decisions independently. Arguably, every cell fate decision occurs in response to environmental signals. In many cases cell-cell communication alters the dynamics of a cell’s internal gene regulatory network to initiate cell fate transitions, yet models rarely take this into account. Here we develop a multiscale perspective to study the granulocyte-monocyte vs. megakaryocyte-erythrocyte fate decisions. This transition is dictated by the GATA1-PU.1 network, a classical example of a bistable cell fate system. We show that, for a wide range of cell communication topologies, even subtle changes in signaling can have pronounced effects on cell fate decisions. We go on to show how cell-cell coupling through signaling can spontaneously break the symmetry of a homogenous cell population. Noise, both intrinsic and extrinsic, shapes the decision landscape profoundly, and affects the transcriptional dynamics underlying this important hematopoietic cell fate decision-making system. 4 2.2 Introduction The production of mature cell types from stem cells and progenitors is essential for development and organ homeostasis. Nevertheless, in few cases are we able to fully specify the conditions necessary to drive stem cell differentiation towards a particular cell lineage. Stem cell differentiation is controlled by cell-internal and external signals that, in turn, control the transcriptional state of the cell and specify its eventual fate [6, 7, 8]. These changes in transcriptional state are often irreversible and involve binary choices. Thus, multipotent cells, through successive binary lineage specification choices, eventually become committed to a specific lineage and cell fate. Significant unanswered questions remain regarding the cell fate decisions that dictate lineage specification of stem/progenitor cells: how large is the role of extracellular signaling in regulating cell differentiation? What mechanisms allow cells from initially homogeneous or clonal populations to converge to different lineages? How does one population of progenitor cells maintain stable heterogeneous subpopulations of committed cell types? During lineage specification, changes in gene expression are controlled by gene regulatory networks (GRNs), consisting of genes and their protein products (transcription factors), which are able to regulate the expression of various genes, including other transcription factors, creating feedback loops [9]. Codified through mathematical models, GRNs can be studied in light of their steady states, where each steady state can represent a committed cell fate. Certain GRN topologies permit bistability, that is, more than one steady state can be reached for a single set of biological conditions [10]: such topologies are frequently observed in networks that control cell fate decisions [11, 12]. The GRN topology that dictates a particular lineage decision is generally conserved across cells, thereby providing insight into the intracellular dynamics that occur during cell differentiation. Crucially, the GRNs that instigate or reinforce lineage decisions are not only controlled by cell-intrinsic networks, but also by cell-extrinsic signals. The GRN topology considered in this work consists of two mutually repressive genes; a topology frequently observed among gene networks mediating lineage decisions [13, 14]. One such lineage decision 5 occurs during hematopoiesis: myeloid progenitor cells make a binary choice between commitment to the megakaryocyte-erythroid (ME) lineage or the granulocyte-monocyte (GM) lineage. GATA1 andPU.1(SPI1) mutually inhibit one another; GATA1 is expressed in the ME lineage and PU.1 is expressed in the GM lineage. This mutually repressive GRN has been extensively studied and characterized in models, mostly consisting of ordinary differential equations that permit bistability, thus enabling investigation into the dynamics of this myeloid lineage decision [15, 16, 17, 18, 19] Given the GATA1-PU.1 mutual inhibitory loop that leads to bistability, changing the initial conditions (gene expression levels) within a bistable region is sufficient to change the cell fate. It has thus been proposed that random fluctuations of GATA1 and PU.1 levels are primarily responsible for determining cell fate in the bipotent progenitor population that has dual ME and GM lineage potential [18]. More recently this notion was challenged by a study that used a double reporter mouse (PU.1 eGFP Gata1 mCherry ) to show that that random fluctuations of PU.1 and GATA1 are insufficient to initiate the cell fate decision between ME and GM lineages [20]. Hoppe et al. also provide strong evidence that ME v. GM lineage specification cannot be determined solely from initial ratios of PU.1 to GATA1 expression. As suggested by Hoppe et al. [20], extracellular signaling could resolve this controversy. Cell signaling pathways including Jak/Stat, Wnt/β -catenin, MAPK/ERK/p38, and PI3K/Akt have been recently identified as critically important during GM and ME cell fate specification [21]. Previous studies of hematopoiesis both in vivo and in vitro have also hinted that cell-cell signaling influences cell fate decision making. In vivo, it has been shown that the number of hematopoietic cells transplanted affects the phenotype of the regenerated blood system in the recipient [22]. In vitro, hematopoietic stem and progenitor cells placed into wells of different cavity sizes, and therefore different numbers of neighhboring cells, gave rise to different cell fate patterns [23]. These two lines of evidence suggest that cell-cell signaling plays a role in hematopoietic cell fate specification. Both the theoretical gap and the experimental data thus point to a 6 pressing need to consider the impact of extracellular signaling dynamics when modeling hematopoietic differentiation. The role of extracellular signaling, however, has been chronically understudied in studies of gene regulatory network models and their stability. Extracellular signaling through paracrine factors enables populations of cells to share information and coordinate behaviors over short distances, and is implicated in a range of cell fate behaviors from developmental branching [24, 25] to patterning [26] and migration [27]. A seemingly simple question motivated this work: how do intercellular signaling networks impact myeloid lineage specification as controlled by the GATA1-PU.1 gene regulatory network? Several studies have characterized the impact on cell fate of cell-cell communication through extra- cellular signals. One approach taken is to model the internal gene dynamics with differential equations and model cell-cell communication by allowing for molecular diffusion of proteins between neighboring cells [28, 29, 30]. These methods must neglect anisotropic effects, and assume that signaling between cells occurs on the same timescale as intracellular dynamics; yet the processes can occur on vastly different timescales. Other approaches have relaxed this assumption, permitting different timescales by modeling the response time of a cell to a signal received as a random variable [31]. This approach however omits any detail of the intracellular dynamics, which are often crucial to the response; GRNs are the enforcing mechanisms of the phenotypic switches. In order to model the individual behaviors of a population of communicating cells, the gap between intracellular dynamics and extracellular signaling must be bridged. A model of cell fate decision-making is needed that permits cell-cell communication while allowing for both description of the cell-internal dynamics and of the extracellular signaling dynamics without making diffusion-like approximations. We present a multiscale model that bridges this gap. We assume deterministic dynamics within each cell and thus model the cell-internal GRN with ordinary differential equations (ODEs). We assume that signals sent between cells can be described by a Poisson process. Signals received by cells alter the internal 7 GRN dynamics through their effects on parameters of the ODE model. We test this model in a large range of different intercellular signaling topologies. We find that the addition of cell-cell communication to GRN dynamics leads to model outcomes (cell fate distributions) becoming probabilistic: cell fate choice probabilities now depend on the cell’s position in a particular signaling topology. We discover that the model can intuitively characterize cell-cell coupling, changes to which impact the cell fate decisions made, which can lead to mixtures of heterogeneous cell types within a population. We also study how noise impacts fate outcomes: we find that although both intrinsic and extrinsic noise alter the cell fate decision- making boundaries, extrinsic noise is the dominant driver of cell fate variability. Finally, we study how cell fates are influenced by spatial architecture, and we demonstrate that the extent to which cell fates are coupled is controlled by proximity in spatially structured cell-cell signaling systems. 2.3 MaterialsandMethods 2.3.1 Amultiscalemodelofcell-cellcommunicationbetweensinglecells To investigate how external signaling impacts intracellular dynamics during cell fate decisions, we first must select an internal GRN topology to model. We choose to model a mutual inhibitory GRN: such net- work topologies arise frequently in cell fate decision making and developmental biology [32, 33]. Within certain parameter regions, such models give rise to two stable steady states and bistability (Fig. 2.1A). For example, it has been observed that the GRN determining the bifurcation of erythroid/megakaryocyte lineages from granulocyte/monocyte lineages contains a core mutual inhibitory loop topology consisting of mutual antagonism of GATA1 (G) and PU.1 (P ). For this genetic switch, high expression of G corre- sponds to commitment to the erythroid/megakaryocyte lineage and high expression ofP corresponds to commitment to the granulocyte/monocyte lineage. 8 This myeloid progenitor cell lineage decision has been rigorously studied, and there exist robust models of the intracellular GRN dynamics [15, 16, 18, 17, 19, 34]. Here we implement the ODE model defined by Chickarmane et al. which follows the Shea-Ackers formalism for transcription factor dynamics [35, 15]. The ODE model is given by Eqs. 1-3, where X is a transcription factor recruited by GATA-1 to bind to and inhibit PU.1 expression. These ODEs give rise to two stable steady states for a region of parameter space with respect to the parametersA,B andC, whereA,B, andC are parameters summarizing environmental signals (Fig. 2.1B). The parameters through which the extracellular environment is summarized will be modified below to implement a multiscale model of cell-cell communication. The parameter values used throughout this work can be found in Table S1. dG dt =− β 1 G+ α 1 A+α 2 G 1+γ 1 A+γ 2 G+γ 3 GP (2.1) dP dt =− β 2 P + α 3 B+α 4 P 1+γ 4 B+γ 5 P +γ 6 GP +γ 7 GX (2.2) dX dt =− β 3 X + α 5 G 1+γ 8 G+γ 9 C (2.3) To incorporate cell-cell signaling into the model, we must modify the cell-internal ODEs in a way that reflects the signal received by the cell without unrealistically changing the internal dynamics (e.g. a loss of bistability). For simplicity, we choose a single parameterA to summarize the effects of external signaling. We study both signals that recruit nearby cells to commit to the same lineage and the opposite lineage of the sender. For the first case, we consider two cells, cell 1 and cell 2, where cell 1 signals to cell 2. Let G 1 (t) andP 1 (t) denote the concentrations ofG andP in cell 1 at timet respectively and letA 2 (t) be the value of the parameterA in cell 2 at timet. Since cell 1 signals to cell 2 to coordinate fate decisions (“be like me” signal), thenA 2 (t) should increase if the ratioG 1 (t) : P 1 (t) indicates that cell 1 has committed to – or is increasingly likely to commit to – the ME state (whereG is highly expressed). Conversely,A 2 (t) should 9 decrease if the ratioG 1 (t) :P 1 (t) indicates that cell 1 has committed to – or is increasingly likely to commit to – the GM state where P is highly expressed. To model the arrival process of the signals received by cells, we use a Poisson process. This choice is based upon previous results which have demonstrated that the arrival of independent Brownian particles to an absorbing boundary is a Poisson process [36]. This result does assume that ligands are continuously produced and secreted from the “sender” cell, which does not fully reflect the underlying biology, but we reason that it is an appropriate limiting assumption to allow for a simple, tractable signaling model. Based on these assumptions, we constructed a communication model: we treat signaling as a Poisson process, where the wait time between signals being sent by a cell follows an exponential random variable with meanµ . After this wait time, at timet s , a signal is sent and we change the value ofA 2 by: A 2 (t) = G 1 (ts) λ +P 1 (ts) A 2 (t s ), for t∈ [t s ,t s +τ ) ,λ> 0 A 0 , otherwise (2.4) After a delayτ ,A 2 returns to it’s original value,A 0 (Fig. 2.1C). Returning to the same valueA 0 between signals ensures that attractor states (Fig. 2.1B) do not change. Sampled wait time values are rounded to integer values to avoid integration errors. In Fig. 2.1D is a schematic of this signaling model. Fig. A.9 gives a sample simulation of a chain of four cells and the values ofA 2 ,A 3 , andA 4 plotted over time. In the case of signals that promote nearby cells to commit to the opposite lineage, we can follow the same principles to define a signal: 10 A 2 (t) = P 1 (ts) κ +G 1 (ts) A 2 (t s ), for t∈ [t s ,t s +τ ) ,κ> 0 A 0 , otherwise (2.5) These definitions of signaling also allow for multiple cells to signal a single cell at similar times while still having A 2 well defined. We implemented this model in Julia [37], where we used the DifferentialEqua- tions.jl [38] and Distributions.jl [39] packages for numerical simulation. The model developed here shares similarities with models specified by piecewise-deterministic Markov processes (PDMPs), first introduced by Davis [40]. Such models are comprised of continuous dynamics and a discrete Markov process which can alter the continuous dynamics at discrete time points. PDMPs have previously been applied to biological systems for the study of gene expression and genetic networks [41, 42]. In the case of two cells sharing one directed signal between them, the model described here closely resembles a PDMP, with the exception that here, due to information transfer between cells, the discrete process is not Markovian. 2.3.2 Modelspecificationandchoiceofparameters To elucidate our choice of signaling model, we briefly discuss some alternatives. Examples of alternative signaling model choices include: signals that change more than one ODE parameter, additive rather than multiplicative signaling, or signals that permanently change ODE parameters. We choose to modify a sin- gle parameter for interpretability and to constrain the model space: the ODE system exhibits bifurcations with respect to many of its parameters, and we do not seek to explore bifurcations of codimension 2 or greater here. Modeling with additive signals brings complications, such as the possibility of obtaining negative values of A. Also, if a large number of committed cells are communicating with a single cell, the value of|A| may be unrealistically large and dominate the ODE. This problem is dealt with in the 11 signaling model we present here through the parameterλ . Lastly, permitting signals to cause permanent changes to ODE parameters (rather than over time intervals) can result in the divergence ofA. Thus, we have selected a signaling model that can capture how changes in the extracellular environment can alter cell internal GRN dynamics while still preserving the overall behaviors (i.e. the attractor states) of the dynamical system. In this work, we will assume that cells are homogeneous, i.e. all cells share the initial conditions (G,P,X) = (20,1,20). The internal GRNs are identical with the same parameter values, including the same initial value ofA, denotedA 0 . We set the signaling period to beτ = 5, and the mean of the signaling wait time distribution to be µ = 50. Changing the values of µ and τ does cause qualitative changes as long as τ is sufficiently large relative to µ . For more insight on the relationship between the wait time distribution and τ, see Fig. A.10. Unless otherwise specified, we set λ = κ = 1. We initially chose λ = κ = 1 to avoid unrealistic behaviors when G or P ≈ 0. We will provide an in depth discussion on the roles of these parameters in the next section. For all topologies depicted schematically below, regular arrowhead arrows correspond to consensus signaling (Eq. 4) and flat (inhibition) arrowhead arrows correspond to dissensus signaling (Eq. 5). As a result of signaling, cell fate decisions became probabilistic rather than deterministic. Then, for each cell in a signaling topology, we can examine the probability distribution of the cell converging to a certain lineage. To do so, for each signaling topology tested, we selected a range of values of A 0 to simulate. The step size between theA 0 values was scaled between each topology depending on the range of the probability distribution. For each value of A 0 , we simulated N trajectories, counting the number of times each cell converged to both fates. From these counts, we estimated the probability that each cell converged to theG high state and plotted: P(G High)≈ # of trajectories converging toG high state N 12 A sample of the simulated data points can be found in Fig. A.11. Unless stated otherwise, probabilities are estimated by runningN = 1,000 simulations. Whileλ = 1 was an intuitive first choice to avoid divergence in signaling parameters, it relies on the assumption that a cell converging to theP high state implies thatP is more highly expressed thanG+1 and vice versa. This turns out to not necessarily be true. Rather, for a given value of A 0 , being in the P high state means that P is more highly expressed relative to the other stable steady state value of P . Recalling Fig. 2.1B, we see that for some values of A 0 , for example A 0 = 0.8, G is always more highly expressed than P regardless of which stable steady state a cell converges to. Looking at the bifurcation diagram, we see that in theP high state, the maximum steady state value ofG is approximately16.97 and the minimum value ofP is approximately1.475. Similarly, in theG high state, the maximum steady state value ofP is approximately0.3529 and the minimum value ofG is approximately40.35. From here, we wanted to select values ofλ which satisfy the following: G {max P High State} λ +P {min P High State} < 1, G {min G High State} λ +P {max G High State} > 1, λ> 0. These inequalities are satisfied for λ ∈ [15.495,39.997]. Similarly forκ , in order to satisfy the analo- gous inequalities for the dissensus signal, we redefine the dissensus signal as A 2 (t) = P 1 (ts)+κ G 1 (ts) A 2 (t s ), for t∈ [t s ,t s +τ ) ,κ> 0 A 0 , otherwise (2.6) whereκ ∈ [15.495,39.997]. We will explore how selecting values ofλ in or near this range changes the behaviors of both individual cells and populations of cells. 13 2.3.3 Intrinsicandextrinsiccell-cellcommunicationnoise In this work, we focus on how intercellular signaling alters cell decision making. Therefore, we only consider noise with respect to signaling (not the internal GRN dynamics). Noise in the signal can arise due to the noisy extracellular environment (extrinsic noise) or can arise during the signal transduction within a cell (intrinsic noise) [30, 31]. Here we define models for both sources of noise. For extrinsic noise, i.e. noise with respect to the extracellular environment, letη e be a random variable such that η e ∼ N (0,σ 2 e ), where σ 2 e > 0. For Cell k in a given topology, at the start of each new wait period, we sample a value ofη e and updateA k (t) by A k (t) = max(0,A 0 +η e ) (2.7) We truncateA k (t) at zero to avoid negative values. However, we select values ofσ 2 e small enough that the truncation is rarely necessary in simulation. Note that noise inA k (t) results in noise during the signaling period as well by Eqs. (4) and (5). For intrinsic noise, i.e. noise with respect to signal transduction, letη p be a random variable such that η p ∼ N (0,σ 2 p ), whereσ 2 p > 0. For Cellk in a given topology, at the start of each new pulse period, we sample a new value ofη and updateA k (t). A k (t s ) = max 0, G 1 (t s ) λ +P 1 (t s ) A 2 (t s )+η p (2.8) In the analysis of noise effects below, we decrease the mean wait time to µ = 40, resulting in a larger signal to wait time ratio, allowing signals to have a greater influence on target cells. Doing so accentuates the effects of intrinsic and extrinsic noise because cells are spending more time in a pulse state. 14 We would like to thank Rong Lu and members of the Lu lab for many helpful discussions. We are grateful to Michael P.H. Stumpf for insightful comments and feedback on the manuscript. No competing interests declared. MKR and ALM conceived the project and designed the work. MKR designed and implemented code and performed simulations. MKR and ALM analyzed the data. ALM supervised the work. MKR and ALM wrote the manuscript. The model was developed in Julia, and all the code associated with simulation and analysis is available under an MIT license at: https://github.com/maclean-lab/Cell-Cell-Communication 2.4 Results 2.4.1 Cell-cellcommunicationoverawiderangeofsignalingtopologiesleadstodivergent cellfates To determine how cell-cell communication instructs hematopoietic lineage specification, we construct a multiscale model of the bifurcation point in hematopoiesis that separates erythroid/megakaryocyte from granulocyte/monocyte lineages (Fig. 2.1A). This lineage decision is controlled in part by the mutually antagonistic genes GATA1 (G) and PU.1 (P), giving rise to a bistable model (Fig. 2.1B). We model the cell-internal GRN coupled to a cell-cell communication motif that can alter the cell-internal expression of G in neighboring cells, according to a Poisson process (Fig. 2.1C-D). Full details of the model specification can be found in the Materials and Methods. We test the model for a wide range of signaling topologies. We sought to ensure that core assumptions were upheld, namely that all cells eventually reach a steady state, even in the presence of noisy signaling, and that cells exhibit bistability with respect toA 0 (the initial value ofA). These behaviors were preserved 15 in all signaling topologies tested. Moreover, we found that the addition of cell-cell signaling to a previously deterministic model results in probabilistic cell fate choices. In Fig. 2.2, simulated trajectories for a 20-cell loop topology (corresponding to Fig. 2.3B) are shown. We plot only the concentration ofGATA1 (G) as a proxy for fate choice, since its state is sufficient to determine the steady state of the full system (ifG is high at steady state,P is low, and vice versa); as illustrated by the trajectories of bothG andP (Fig. A.1). These trajectories illustrate the variety of cell fate distributions that are observed in the presence of cell-cell communication (fate convergence to either the high or low state, and fate divergence). In Fig. 2.2C-D, the values of A 0 are equal, demonstrating that A 0 does not determine the proportion of cells in each state. Rather, for each value of A 0 , the probability of each cell converging to a certain state can be computed. In analyzing how signals propagate down chains of cells, two striking observations are made. First, each additional signal shifts the P(G) curve (the probability of reaching the G high state) to the left, although these shifts are successively smaller down the chain such that eventually the distributions con- verge. Second, a “ sharpening” of the P(G) distributions occurs whereby the region of uncertain fate decreases for cells further down a chain (Fig. 2.3A). This is a hallmark of cooperativity: accumulated sig- nals by cell-cell communication act to reinforce cell fate decision-making, leading to regions of uncertainty that are decreasing with the total number of signals that are acting on a cell. In loops, all cells converge to the same fate for each simulated trajectory, therefore every cell in the model has the same probability distribution (below we will see that the coordination of fates in all cells within a loop signaling topology can be broken by changing the cell coupling strength). The probability distribution for a cell loop depends on the number of cells in the loop, and we see that for loops of size n, both of the behaviors observed above are recapitulated (Fig. 2.3B). For both cell loops and chains, we observe that as the number of cells in the signaling topology increases, so does cell response fidelity, i.e. sharpness of the curve (Fig. 2.3A-B), a prediction that matches experimental data on paracrine signaling 16 A Time Wait Time B C D Concentration 10 1.5 10 10 0.5 1 10 -0.5 ... } Cell 1 Cell 2 } G P X } G P X Figure 2.1: Overview of multiscale cell fate commitment model. (A) Illustration of cells in a two- attractor state model (such as the GATA1-PU.1 inhibitory loop). (B) Bifurcation diagram for the system of ODEs described by Eqs. 1-3, with respect to the external signaling parameterA. (C) Cell-cell communi- cation via signals sent between cells (e.g. from cell 1 to cell 2) is modeled by a Poisson process; with wait times sampled from an exponential distribution and fixed signaling ‘pulses’ of length τ , where the value ofA 2 is set by Eqs. 4 or 5. (D) Full model schematic, depicting the cell-internal gene regulatory network of {G,P,X} modeled by ODEs and the external signalA 2 modeled by the Poisson process in (C). 17 Time Time Time Time G Concentration G Concentration G Concentration G Concentration A B C D Figure 2.2:SampletrajectoriesofGATA1(G)forcellssignalinginaloop. Representative simulations of a 20-cell loop signaling topology. Colors used only to distinguish the trajectories. Across all trajectories, the signaling parameter is set toλ = 28. (A)A 0 = 0.995. (B)A 0 = 0.9955. (C-D)A 0 = 1.0. in wound healing [43]. This holds true for large topologies up to at least 100 cells in size (Fig. A.2). Moreover, the limiting distributions for the cell chain and the cell loop are the same, highlighting an important underlying convergence. If we consider a dissensus signal (the opposite of a consensus signal), the fate probability distribution of cell 2 shifts in the opposite direction than previously (Fig. 2.3C), relative to the value of A 0 at which cell 1 switches lineages. When an inhibition signal is incorporated into a loop (Fig. 2.3D), the two cells in the loop no longer coordinate lineages for every trajectory as they did when all signals were regular. Notably, in Fig. 2.3D there exists a large range of A 0 for which cell 1 converges to the G high state and cell 2 converges to theG low state, i.e. dissensus signals significantly changes cell fate behaviors in both chains and loops of cells. 18 In a feed-forward signaling motif (Fig. 2.3E), multiple promoting signals reinforce the lineage choice of cell 3, thus increasing cooperativity and as a result the cell commits to theG high state at smaller values ofA 0 than it would in a cell chain. Other three-cell signaling topologies also display interesting behaviors (Fig. 2.3F-G and Fig. A.3). In an incoherent feed-forward motif, we observe a multiplicative effect between the dissensus signal from cell 1 and the consensus signal from cell 1 via cell 2 (Fig. 2.3F). The addition of the dissensus signal results in the distribution of cell 3 being shifted to the right relative to a corresponding cell in a chain. For a doubly dissensus topology (Fig. 2.3G), we find that for a region of A 0 the cell fates can diverge (two cells in the high state; the third in the low state) with nonzero probability. Consensus or dissensus signaling can occur when the signaling pathway mediating the signal contains a positive or a negative feedback loop, respectively. There are many such recorded examples of signaling in hematopoiesis and other contexts. One example of a consensus signal directly implicated in the GM vs. ME cell fate choice is mediated by the cytokine TNF-α , which activates the p38 MAPK pathway, in turn activating TNF-α as a target gene [44, 21]. A principal example of dissensus signaling, both in the context of hematopoiesis and other cellular processes, is the Delta-Notch signaling pathway [45], which leads to consecutively divergent patterns of activity. We expanded our analysis of cell-cell communication to consider other extracellular signals, and their effects alone or in competition with each other. We studied a model of consensus signaling that impacts the activation ofPU.1 rather thanGATA1, i.e. changes the parameterB. Cell-cell communication in this model impacts cell fate similarly to in the original, i.e. for a bistable region with respect to B 0 , lineage specification becomes probabilistic for simple cell signaling topologies (Fig. A.4). We also studied a model with two competing extracellular signals, i.e. we change both parametersA andB through independent Poisson processes. In the context of a two cell chain, we found that the joint influence of the signals sharpens the cell fate distribution. Full characterization of the two-signals model has yet to be conducted 19 since the size of the parameter space doubles in this model, leading to challenges in fully elucidating the region of bistability. The signaling topologies studied here pertain to small and idealized cell signaling networks. Nonethe- less, their behaviors point to general and important effects of cell-cell communication. In all of the above experiments, the addition of cell-cell signaling transformed a population of homogeneous, independent, and deterministic cells into one of heterogeneous cells that choose fates non-deterministically. Thus, given cells with identical initial conditions (i.e. transcriptional states), external signaling changes cell fate out- comes at the population level. This corroborates a major finding of Hoppe et al. [20]: that GMP/MEP cell fate decisions are not predictable from initial transcription factor ratios alone. Our model not only supports this result, but offers an explanation. The missing determinant of cell fate commitment – which acts to break the symmetry of an initially homogenous population of cells – is cell-cell communication. 2.4.2 Varyingthestrengthofcell-cellcouplingresultsindifferentratiosofstablesubpopulations ofheterogeneouscelltypes Through investigation of cell-cell communication in the multiscale model introduced above, we discovered that cell fate divergence can occur under the control of the model parameter λ . We thus characterize λ as a “cell-cell decoupling” mechanism. Analysis of cell-cell coupling and decoupling led to two significant findings: cell-cell communication can explain how a population of cells maintains stable subpopulations of heterogeneous cell types; and the distribution of cells that converge to each fate depends both on cell-cell coupling and on the external environment. Analysis of the effects of λ on consensus signaling topologies shows that – for a chain of two cells – asλ increases the probability distribution of the second cell shifts to the right (Fig. 2.4A). Opposite results are observed when we analyzed the effects of κ on dissensus signaling topologies, where the probability distribution of the second cell shifts to the left asκ increases (Fig. A.5). We identified λ = 22 as close to 20 1 2 n 1 2 1 2 1 2 3 1 2 3 1 2 3 1 2 n 2 5 10 Distribution of n cell in a chain or loop topology of n cells th Deterministic cell Cell 1 Cell 2 Cell 3 Cells 1 and 2 in the G high state and Cell 3 in the G low State A B C D E F G 1 Cell 2 in a chain Cell 3 in a chain Cells 1 and 3 in the G high state and Cell 2 in the G low State Figure 2.3: Cellfatecommitmentacrossarangeofsignalingtopologies. Schematic of different cell signaling topologies (left) and the corresponding probability distributions: the probability that each cell in the network will commit to the G high steady state for different values of A 0 . (A) Chains of cells of length n. Each probability distribution corresponds to one cell in the chain. (B) Loops of cells of sizen. In a loop of fixed size, each cell from 2 ton has the same probability distribution; thus one curve is plotted for each loop ofn cells. (C) A chain of two cells with an dissensus signal: the distribution of cell 2 fates is shifted to the right, i.e. it commits at higher values, compared with consensus signal in (A) where the distribution of cell 2 is shifted to the left. (D) A loop of two cells with one consensus and one dissensus signal; dashed line denotes the distribution of a single cell that receives no signals (deterministic). Note that we cannot observe the fate where cell 1 is in the G low state and cell 2 is in the G high state. (E) Three-cell feed forward motif; cells 3 (dashed orange) and 4 (dashed red) from a chain of cells marked for comparison. We see that receiving multiple signals results in a non-additive synergistic effect. (F) Feed forward motif with a dissensus signal; cell 3 (dashed orange) from a chain of cells marked for comparison. (G) Loop of three cells with two dissensus signals. Dashed lines denote the probabilities that: i) cells 1 & 2 are in G high state while cell 3 is in G low state (dark green); or ii) cells 1 and 3 are in G high state while cell 2 is in the G low state (light green). a critical value, where cell fate decision-making becomes switch-like. The value of A 0 where the curve forλ = 22 switches from one lineage to another is extremely close to the value at which a deterministic cell with the same initial conditions switches. Overall,λ determines the range over which the probability curve for the second cell is nonzero. 21 We next assessed how different values of λ changed coordination of cell fate decisions between lin- eages. Previously, we observed that for any trajectory of a loop topology of any size, where λ = 1, all cells converged to the same state, regardless of the value ofA 0 . Further, although all cells in the topology converge to the same lineage, the probability that they all converge to a given lineage depended onA 0 . To see whether or not this behavior was conserved for other values ofλ , we tested a two cell loop topology with a range ofλ values. We observed that there exists a valueλ ∗ ≈ 22 such that forλ<λ ∗ , all the cells coordinate their lineage just as we had seen before. Interestingly, forλ > λ ∗ , the two cells do not always coordinate their lineage decisions. For example, Fig. A.6 gives sample trajectories showing two cells in a loop converging to each combination of states. Fig. 2.4D shows the two cell loop results for λ = 38, giving the probabilities that each cell converges to the G high state as well as the probabilities that the cells are decoupled, converging to opposite lineages. For eachλ , we recorded the maximum value of the probability that the cells converge to different lineages. Fig. 2.4C shows how the maximum probability of cells converging to different lineages increases with λ . Next, we looked at larger loops of 10 cells. For each value of A 0 , we simulated 500 trajectories and recorded how many cells were in the G high state verses the G low state. Fig. 2.4D shows results for λ = 30 and A 0 = 1.0,1.002, and 1.004. We see that the distribution of cells converging to each state changes with the value ofA 0 . Further analysis of distributions with different values of λ can be found in Fig. A.7. We have identified an explicit mechanism by which cell-cell communication can break the symmetry of a homogeneous population of progenitor cells, and give rise to stable, heterogeneous populations of lineage-committed cells. The proportion of cells committed to each lineage depends on the external en- vironment, A 0 , and the strength of cell-cell coupling due to signaling, λ . Moreover, these results show that fluctuations in the external environment can lead to shifts in the relative abundances of committed 22 Value Probability Probability Probability 0 2 4 8 6 10 0.1 0.2 0.3 0 † Number of Cells in the G High State = 1.0 = 1.002 = 1.004 A B C D 40 28 22 16 32 † = 22 0 2 4 8 6 10 0 2 4 8 6 10 0.1 0.2 0.3 0 0.1 0.2 0.3 0 Figure 2.4: Cell fate coupling is controlled by the external signaling parameter. (A) Probability distributions of cell fate commitment (to G high state) for cell 2 in a chain of two cells for λ ∈ [16,40]. (B) Probability distributions for a loop of two cells as a function of A 0 , with λ = 38. Probability of cell fate commitment to G high state (black); probability that the two cells will not commit to the same fate (gray), with the maximum of the distribution marked (red cross). (C) For a loop of two cells, the maximum probability that the cells do not converge to the same lineage as λ varies; red cross equivalent to (B) is marked for comparison. (D) Comparison of distributions for the number of cells that commit to the G high state in a loop of 10 cells withλ = 30. cell types. These results corroborate previous work that studied the generation of heterogeneous cell pop- ulations through stem cell differentiation [46], and showed that external signals (e.g. through cell-cell communication) are required both to maintain heterogeneous cell populations and to shift relative cell types abundances in response to environmental perturbations. 23 2.4.3 Intrinsicandextrinsicnoisealtercellfatedecision-makingboundaries We have until now assumed that signals are passed between cells with perfect fidelity. In fact, multiple sources of noise contribute to imperfect communication between cells, and the modeling framework here lends itself well to the investigation of the effects of intrinsic vs. extrinsic noise [47, 48]. We investigated the impact of two different sources of noise in cell-cell communication: due to cell-extrinsic factors, i.e. noise with respect to the extracellular environment (Eq. 7); or due to cell-intrinsic factors, i.e. noise due to signal transduction downstream of a paracrine signaling factor (Eq. 8). These sources of noise are represented in the model by varying either the baseline level of cell-cell communication (extrinsic noise) or the cell signaling pulse level, i.e. the intrinsic signal transduction noise (Fig. 2.5A). For simple topologies, as we would expect, the observed variability in cell fate outcomes increases as the variance of either the extrinsic or the intrinsic noise increases (Fig. A.8). This results also holds for larger cell-cell communication topologies, e.g. a ten-cell loop, intrinsic (η p ) and extrinsic (η e ) noise (Fig. 2.5B-D) both reduce the sensitivity of the cell fate decision-making boundary. We also observe unexpected and striking results. First, not only are the probability distributions flattened by either noise source, but they are shifted to the left, i.e. intrinsic and extrinsic noise directly affect the decision-making boundary (by shifting its mean), as well as the sensitivity of cell-fate decision-making. Second, even under the presence of noise, we still observe the effects of cooperativity at play through the sharpening of distributions down a chain of cells (Fig. 2.5D). That is, individual noisy signals increase the variability (reduce the sensitivity) of cell fate decision-making, but the cumulative effects of noisy signaling can at least partially compensate for this, and decrease the variability in cell fate decision-making. Through comparison of the relative effects of the intrinsic signal transduction noise (Fig. 2.5B) and the extrinsic extracellular noise (Fig. 2.5C-D), we see that the impact on the cell fate decision-making boundary is much larger for extrinsic rather than intrinsic noise contributions. From inspection of Eqs. 6-7, this is in part due to smaller duration of the pulseτ relative to the mean wait timeµ . The result of which is that 24 whenη e ∼N (0,0.1), (Fig. 2.5D), the probability curves for cells 2-10 in the chain flatten to the extent that the sensitivity with which to distinguish cell fate decision-making by cell position along the chain is lost entirely. Moreover, the probability curves for cells≥ 2 along the chain intersect with the point at which cell one (which is deterministic) switches fates from the low to the high state (black line in Fig. 2.5D), thus forcing all other cells also into the high state with probability= 1. This coordination of cell fates is influenced by cell-cell coupling (here we use λ = 18<λ ∗ , i.e. a value ofλ for which we observe complete coupling (fate coordination) between cells). The dominant impact of extrinsic over intrinsic noise here is in agreement with previous works, including a study that quantified the contributions of extrinsic and intrinsic noise in the MAPK signaling pathway, and showed that extrinsic noise is the dominant driver of cell-to-cell variability [49]. It has also been shown that explicit extrinsic noise contributions are necessary to explain mRNA abundance distributions [50]. In summary, the effects of intrinsic and extrinsic noise on cell-cell communication topologies are to increase the variance of the resulting cell fate distributions and (surprisingly) to alter the mean values of these probability distributions. In other words, the presence of noise alone can force cells to change lineages. The observed increases in the variability of cell fate decision-making are maintained for large cell-cell communication topologies. A similar result was described in a study of cell fate decision-making during early mouse gastrulation [51]: transcriptional noise is greatest at the point of cell fate decision- making (when epiblasts begin to differentiate). Our results reiterate the same point made by Mohammed et al., that gene expression noise is beneficial during windows of cell fate decision-making as it leads to an increased possible repertoire of cell fates. Our findings go further in that they suggest a rationale: that the increase in transcriptional noise results from noisy extracellular factors influencing cell-cell signaling during differentiation. 25 Time C D A B Probability Probability Probability 2 5 10 1 Distribution of n cell in a chain of 10 cells th Figure 2.5: Extrinsicandintrinsicsourcesofnoisereducefidelityofcellfatechoice. (A) Schematic depicting how extrinsic noise (η e ) and intrinsic noise (η p ) are modeled through their effects on the signaling strength A. (B) Probability distributions of cell fate commitment (to G high state) for each cell in a ten cell chain, with λ = 18: no added noise (solid lines); and with added noise (dashed lines). The colors darken as the position of the cell along the chain increases. The type of noise modeled is intrinsic, with Var(η p ) =σ 2 p = 0.1. (C) As for (B), but for extrinsic noise modeled, with Var(η e ) =σ 2 e = 0.04. (D) As for (B), but for extrinsic noise modeled, with Var(η e ) =σ 2 e = 0.1. 2.4.4 Spatialstructureincommunicationnetworksimpactscellfateoutcomes Years of thorough investigation into the spatial organization of the bone marrow have revealed complex environmental interactions and stem cell niches [52, 53, 54, 55, 56], however much less focus has been given to the locations of progenitor cells including multipotent cells of GM and ME lineage potential. Moreover, the freedom of cell movement relative to e.g. epithelial tissues limit our ability to characterize spatial cell locations. Coupled with the current limitations on detecting cell-cell communication directly from data, it is not yet possible to infer spatial cell-cell communication networks in hematopoietic cells from 26 data. Thus, to investigate the impact of spatial organization on cell fate, we investigated two topologies that characterize spatial extrema: a fully connected topology and a bidirectional chain of cells (Fig. 2.6A- B). These models use the “connectedness” of a cell to characterize environments either lacking in spatial structure (fully connected topology) or highly structured (bidirectional cell chain). We simulate ten-cell models for both the well-mixed and spatially-restricted topologies, with the same model parameters and initial conditions. We set λ = 28 and reduced the signaling pulse period to 1.0, to better distinguish the signals due to the large total number of signals being sent and received in the fully connected topology. In the fully connected model, each cell is topologically equivalent and has the same probability distribution of converging to the GM lineage (G high) state (Fig. 2.6C). However, in the spatially restricted bidirectional chain, the fate choice probability distribution is position dependent: the farther from the center of the chain the more the distributions diverge (Fig. 2.6D). The spatial organization encapsulated in the bidirectional chain impacts hematopoietic lineage specification, even in a population of homogeneous cells. To explore the spatial patterns of cell fate that emerge, we computed the probability of an individual cell’s commitment to the GM lineage conditioned on the fate of cell 5, at the center of the bidirectional chain (Fig. 2.6 E-G). We found that the influence of the fate of cell 5 decreased with increasing distance from cell 5 in the chain. For cell 2, the probability that cell 2 commits to the GM lineage is only marginally affected by the fate of cell 5 (Fig. 2.6E). (As a function of A 0 , over range of uncertainty in the fate of cell 5, cell 1 commits to the GM lineage with certainty.) Cells nearby to cell 5, in contrast, are much more likely to commit to the GM lineage if cell 5 itself commits to the GM lineage (Fig. 2.6F-G). That is, local spatial architecture is captured in this model and manifests through the fate coupling of cells in close proximity. This is in stark contrast with cells signaling between each other in a fully connected topology, whereby cells all share the same distributions of cell fate choice. 27 We expect hematopoietic organization in the bone marrow to reflect a niche intermediate between the two spatially-extreme models studied here. As we have shown through simulation, spatial cell arrange- ments in the bone marrow will lead to a location-dependent coupling of cell fates in local neighborhoods of hematopoietic progenitor cells. This prediction of the reliance of cell fate on the fate of nearby cells is supported by experimental evidence. In recent work studying hematopoietic progenitor cells in vivo, [57] show that, unlike at steady state, during regeneration (when requirement for new myeloid cell production is high) progenitor cells cluster together in groups. In agreement with our model, the spatial structure introduced by spatially proximal progenitor cells influences lineage commitment: the clustered GMPs ac- tivate cell cycle pathways and are driven towards differentiation into granulocytes, relative to the steady state GMPs. Our results could be further tested in culture using similar methods to [20] and analyzing the impact of spatial proximity on lineage commitment. While it is currently beyond the capacity of experi- mental technologies to infer single-cell communication networks directly from data, this is changing [58], leading to future opportunities to fit single-cell signaling models to data. 2.5 Discussion Despite many theoretical and experimental advances in our understanding of gene regulatory network (GRN) dynamics, our ability to use GRN models to explain cell fate decision-making during differentiation of multipotent progenitor cells remains limited. Here, with application to a well-studied cell fate GRN: the GATA1-PU.1 mutual inhibition loop [15, 16, 17, 18, 19], we introduced a new model that can simultaneously describe GRN dynamics and single cell-resolved cell-cell communication. Notably, although cell-cell com- munication is often assumed to be a critical component of cell differentiation, it is rarely incorporated into models. The previous studies that have characterized cell-cell communication in models did not capture the detailed complexity of these dynamics, by making simplifying assumptions either regarding the GRN dynamics [31] or regarding the mechanisms by which cells signal [30, 29]. We found that by combining 28 1 2 10 . . . 3 1 2 10 . . . A B C D G F E Probability Probability Probability Probability Probability Conditional cell fate probabilities for cell 2 P(Cell high | Cell 5 low) P(Cell high | Cell 5 high) i i Cell i, i Conditional cell fate probabilities for cell 3 Conditional cell fate probabilities for cell 4 3 4 5 Figure 2.6: Spatialorganizationofcellsregulatespatternsofcellfatecommitment. Schematic rep- resentation of models for(A) a fully connected signaling topology and(B) a bidirectional chain signaling topology. (C)-(D) Probability distributions of the probability that each cell in the bidirectional chain (C) and fully connected (D) signaling topologies will commit to the G high steady state for different values of A 0 . Probabilities that: (E) cell 2;(F) cell 3; or(G) cell 4 will commit to the G high steady state, conditional on the fate of cell 5. Cell 5 committing to the G high state (blue); cell 5 committing to the G low state (red). The model was simulated in n=10 independent experiments each with 100 trajectories. The mean of all 1000 simulations is given, and the error bars denote the standard deviation observed across the 10 samples. these dynamic processes that describe noisy cell fate decision-making at single-cell resolution we were able to reconcile several outstanding controversies in the field. 29 Over a large domain of possible cell-cell communication topologies, we found that cell-cell communi- cation alters cell fate decision-making boundaries, which become probabilistic in response to the levels of external signaling factors. This helped to reconcile a controversy: that of whether or not transcriptional stochasticity is sufficient to initiate the granulocyte-monocyte vs. megakaryocyte-erythrocyte cell fate decision. Previous models supported the hypothesis, however Hoppe et al. presented compelling evidence to contradict it [20]. Our results agree with Hoppe et al., in that we show that eventual cell fate cannot be inferred from the initial gene expression state alone—and go further in emphasizing the stochastic nature of these cell fate decisions [59]. Analysis of cell-cell coupling led to the discovery that stable distributions of heterogeneous cell types can be robustly maintained by a set of external signals. This result offered insight into another open question: that of how population-level cell fate behaviors emerge during cell differentiation [46]. We also showed how (primarily extrinsic) noise increases the variability of cell fate decision-making, in line with previous analyses of transcriptional noise during development [51]. Finally, simulation of two topologies representing spatial extrema (well-mixed versus highly structured) suggest that increased coupling of cell fates occurs in proximal cells, a behavior observed experimentally during hematopoietic regeneration [57]. We sought to tightly constrain model complexity here, for reasons of parsimony and interpretabil- ity. Relaxing some of the constraints imposed will likely yield further interesting results. We assumed throughout that cells were initially homogeneous, i.e. they shared the same initial conditions and internal GRN networks. Heterogeneous initial conditions and heterogeneous cell fate decision-making (i.e. differ- ent GRNs in different cells) ought to be explored, for example by considering interactions between two progenitor cell types, each controlled by their own GRN. An alternative approach naturally lending itself to the analysis of spatially organized tissues and spatial heterogeneity is agent-based modeling. However, the methods that our framework relies on (tractability of the steady states and bifurcations) remain out of reach for most agent-based modeling approaches. 30 There is also much room for exploration of larger and more varied cell signaling network topologies. Here, future work should be guided by data, as it becomes harder to justify large signaling networks chosen a priori. Spatial transcriptomics [60] and new technologies [58] will assist in the inference of cellular networks from data; inferred topologies could then be input to our model framework. These may include dissensus as well as consensus signals. A central challenge for the model introduced here is that of fitting parameters to data. Ideally this would require both spatially and temporally resolved single-cell transcriptomic data—at the limits of cur- rent technologies, although this is changing [61]. Thus in the current work we rely on comparison of qualitative features arising from the model with previous experimental studies. Moreover, due to the hy- brid deterministic-stochastic formulation of the model, resulting in time-dependent signaling parameters, we doubt that it will be possible to derive a likelihood function for Bayesian parameter inference. Thus, approximate Bayesian computation will likely be required. Yet, even here, simulation times may be pro- hibitive or require further approximations to be made. Future work should also treat more carefully the stochastic nature of gene expression, for example by replacing the deterministic GRN dynamics with discrete stochastic simulation or stochastic differential equations (SDEs). This would be straightforward to accomplish numerically, although it will complicate the analysis of model bifurcations. While the complete tractability of the deterministic model made it an ideal first candidate with which to study the effects of cell-cell communication, noise undoubtedly plays important roles in regulating single cell phenotypes [62]. In conclusion, the introduction of a single-cell resolved cell-cell communication model of GRN dy- namics has helped to explain various cell fate decision-making phenomena. Even in a tightly constrained model space, we have shown that changes in the distribution of cell fates due to cell-cell communication can be broad and varied. More generally, we have highlighted the need to consider multiscale effects in models of cell fate dynamics. We anticipate that the application of these methods to other GRNs will lead 31 to greater understanding of specific cell fate decision-making control points, as well as general principles of control of stem cell differentiation. 32 Chapter3 InterrogatingHSCphenotypeswithsingle-cellRNAsequencingdata analysis Parts of this chapter are included in a manuscript which is already published in Blood [63]. 3.1 Abstract Detrimental phenotypic changes in hematopoietic stem and progenitor cells (HSPCs) during aging have been linked recently to changes in metabolism and protein synthesis. In theLin28b/Hmga2 pathway, down- regulation is correlated with loss of self-renewal of adult vs fetal hematopoietic stem cells (HSCs). Igf2bp2, a key downstream target of the Lin28b/Hmga2 pathway, is an RNA binding protein that regulates the stability and translation of messenger RNA. While the changes of the upstreamLin28b/Hmga2 pathway in aging has begun to be characterized, Igf2bp2 has not yet been interrogated for it’s role in hematopoietic aging. In this chapter, bulk RNA-sequencing data from young and old, wild-type and Igf2bp2 knockout mice HSCs and single-cell RNA-sequencing data from young wild-type HSCs are analyzed. In young mice, it is discovered that Igf2bp2 regulates mitochondrial metabolism, protein synthesis, and stemness-related genes. However, this regulation byIgf2bp2 is lost with aging, as no difference was detected in these genes between old samples. From the single-cell data, a sub-cluster enriched for Igf2bp2 expression exhibits 33 differentially expressed genes that align with those identified in the bulk samples, suggesting that perhaps that this sub-cluster of HSCs diminishes with aging. 3.2 Introduction Aging has pronounced impacts on hematopoietic stem cell (HSC) function and hematopoiesis. HSC phe- notypic changes during aging include heightened HSC self-renewal and an increase in the HSC pool, specifically myeloid-biased (My-bi) HSCs [64, 65, 66, 67, 68]. The elevated numbers of My-bi HSCs cause an influx of mature myeloid cells, another hallmark of aging in the hematopoietic system [69, 70]. The goal of this project is to explore whether or not the Lin28/let-7 pathway contributes to these phenotypic changes observed during aging. The Lin28/let-7 pathway influences many cellular functions, specifically some metabolic functions [71, 72]. Cell metabolism has previously been linked to hallmarks of stem cell function, like self renewal and quiescence [73, 74, 75]. Thus, we were particularly interested in studying how the metabolic functions regulated via the Lin28/let-7 pathway were related to the aging phenotype. In the Lin28/let-7 pathway, one downstream target is Igf2bp2 which is known to regulate cellular metabolism and mitochondrial function [76, 77, 78, 79, 80]. We will investigate how this gene contributes to HSC aging. To study HSC aging, we will use RNA sequencing datasets, which provide us with a snapshot of the transcriptome of a single cell or sample of cells. Bulk RNA-seq is useful in many contexts, in particular in this work we will compare bulk RNA-seq measurements of samples of cells from different types of mice. Additionally, scRNA-seq allows us to sequence the transcriptome of each individual cell in our sample. The full scope of scRNA-seq applications is still yet to be realized, but notably scRNA-seq has advanced our understanding of HSC differentiation [7, 81, 82]. 34 Our collaborators from the Rudolph Lab at the Leibniz Institute of Aging generated two datasets to study the role of Igf2bp2 in aging. The first experiment they performed was bulk RNA-sequencing of My- bi HSCs from four different groups of mice: young and old wild type (WT) mice ( Igf2bp2 +/+ ), and young and old homozygous germline knockout mice (Igf2bp2 − /− ). Each of the four groups of mice contained samples from four mice, for a total of 16 mice in the experiment. The second experiment was scRNA-seq of My-bi HSCs from young WT mice. From my data analysis, we found that loss of Igf2bp2 expression indeed contributes to aging HSC phenotype, thus elucidating the role of the Lin28/let-7 pathway in the HSC aging. 3.3 Methods 3.3.1 BulkRNAsequencingandGeneontology(GO)enrichmentanalysis. The raw reads were pseudoaligned to the GRCm38 mouse transcriptome using Salmon (v1.4.0) with de- fault parameters. The transcript per million values outputted by Salmon were imported to R (v4.0.2) and summarized into a gene-level matrix using the tximport R package (v1.16.1). Differential gene expression was carried out using the DESeq2 R package (v1.28.1). A gene was considered differentially expressed if the Benjamini-Hochberg adjusted p-value was less than 0.05. 1,421 differentially expressed genes (DEG) were identified in young my-bi HSCs from Igf2bp2 − /− versusIgf2bp2 +/+ mice; 26 DEG were identified in old my-bi HSCs fromIgf2bp2 − /− versusIgf2bp2 +/+ mice. An overlap analysis was performed on the DEG in young my-bi HSCs fromIgf2bp2 − /− vs. Igf2bp2 +/+ mice with 83 identified target RNAs directly bound to Igf2bp2 in brown fat as reported previously [78]. Gene ontology (GO) enrichment analysis was performed on the differentially expressed genes in young my-bi HSCs from Igf2bp2 − /− vs. Igf2bp2 +/+ mice using the R package GOstats (v2.54.0). Significant GO terms were selected if the Benjamini-Hochberg adjusted p-value was less than 0.05. Taking the DEGs 35 down-regulated in youngIgf2bp2 − /− versusIgf2bp2 +/+ mice which are associated with the GO terms “Mi- tochondrial organization”, “Peptide metabolic process”, “Mitochondrial respiratory chain complex assem- bly”, “Proteasomal ubiquitin-independent protein catabolic process”, “Cellular amide metabolic process”, and “Translation”, we used the Quantro R package (v1.22.0) to evaluate the differences in the expression of these genes across groups of samples. The quantro function outputs an F statistic and the corresponding p-value for each pair of sample types: Young Igf2bp2 +/+ vs. Old Igf2bp2 +/+ , young Igf2bp2 − /− vs. old Igf2bp2 − /− and young Igf2bp2 − /− vs. old Igf2bp2 +/+ . 3.3.2 SinglecellRNA-seq(scRNA-seq)datanormalizationandqualitycontrol ScRNA-seq data were processed using 10X Genomics Cell Ranger (v5.0.0) and mapped to the GRCm38 reference genome. Seurat R package (v3.2.3) was used for all further analysis. Data were read into R as a count matrix and log-transformed using the Seurat function SCTransform. For quality control, we began with 7,887 cells with mean number of genes 3,683.5, mean read count 14,037.9, and mean percentage of mitochondrial reads 8.2. We removed cells with fewer than 500 genes, fewer than 3,000 reads, or over50% mitochondrial reads, selecting 7,435 cells to use for further analysis. 3.3.3 Visualization, clustering, and differential expression of scRNA-seq data using Seurat. For Uniform Manifold Approximation and Projection (UMAP) and clustering, the first 30 principal com- ponents were used. UMAP plots of the expressions of Lin28b, Hmga2, Igf2bp2, Igf2, Igf1, H19, Rian, and Cdkn1c exclude the top 5%,1%,0%,5%,5%,0%,1%, and 1% of values, respectively (Figs. 3.1 and 3.3). Clustering was performed using the Seurat function FindClusters with resolution 0.3, resulting in 9 unique clusters. The Seurat function FindMarkers with a Bonferroni adjusted p-value cutoff of 0.01 was used to de- termine specific markers for each cluster. These markers were then compared with common hematopoietic 36 lineage markers in order to identify which clusters were already primed towards a certain lineage. Clusters 1, 2, 6, and 7 were excluded based on their expression of: Pf4, Itga2b, Vwf, Gata1, Mki67, Car1, and Klf1, markers of megakaryocyte/thrombocyte and erythroid lineages. Clusters 0, 3, 4, 5, and 8 were not clearly primed towards a certain lineage. To perform differential expression on Cluster 3 relative to Cluster 0, 4, 5, and 8, we again used FindMarkers with an adjusted p-values cutoff of 0.01. 3.4 Results 3.4.1 Igf2bp2 controls mitochondrial metabolism/protein synthesis related genes in HSCofyoungmicebutitsgeneregulatoryfunctionislostinagedHSC. To identifyIgf2bp2-controlled genes in HSCs, mRNA sequencing was conducted on freshly isolated, myeloid biased HSCs (CD150highCD34-LSK) from homozygous, germline knockout mice (Igf2bp2 − /− , versus wild- type mice (Igf2bp2 +/+ ) mice. HSC were isolated from young adult mice (3 months) and aged mice (22-26 month, n= 4 mice per group). A principal component (PC) analysis of the mRNA sequencing revealed that Igf2bp2 deletion leads to clustering separation of My-bi HSCs from young mice but not of HSC from old mice (Fig 3.1A). Concordantly, a much larger number of differentially expressed genes (DEG) were identified in My-bi HSCs of Igf2bp2 − /− vs. Igf2bp2 +/+ mice at young age (1421 DEG) compared to old age (26 DEG). These data indicated that Igf2bp2 has a profound function in controlling gene expression in HSC specifically in young mice, but its gene regulatory function in HSC is lost during aging. Since Igf2bp2 is an RNA binding protein to regulate gene expression, the DEG in My-bi HSC fromIgf2bp2 − /− vs. Igf2bp2 +/+ from young mice were compared with mRNAs that were previously identified to be directly bound toIgf2bp2 in mouse brown fat [78]. Of note, mRNA that are bound byIgf2bp2 exclusively overlaped with downregulated but not with upregulated DEG in Igf2bp2 − /− HSC versus Igf2bp2 +/+ HSC (Fig 3.1B). 37 To analyze which biological processes were regulated by Igf2bp2 in HSC from young mice, a “Gene Ontology (GO)” analysis was conducted on DEG that were downegulated in My-bi HSCs from from young Igf2bp2 − /− vs. Igf2bp2 +/+ mice. This analysis revealed a significant reduction in expression of genes related to GO terms of mitochondrial metabolism and protein synthesis (Fig 3.1C,D). The gene status of Igf2bp2 had no clear effect on the expression of the subset of DEGs in old mice (Fig 3.1D). Using the subset of DEGs (idenitifed via the youngIgf2bp2 − /− vs. Igf2bp2 +/+ comparison, shown in Fig 3.1D), we performed ANOVA, and compared the gene signature of old Igf2bp2 +/+ mice with young Igf2bp2 − /− or Igf2bp2 +/+ mice. Interestingly, the expression profile of the DEGs showed a significant age-dependent decline in HSC from Igf2bp2 +/+ mice (Fig 3.1D, p=0.048). There is no evidence that the expression signature of the DEGs is different in Igf2bp2 +/+ HSC from aged mice compared with Igf2bp2 − /− HSC in young mice (p=0.49). Together, these data indicated that Igf2bp2 controls the expression of genes related to mitochondrial metabolism and protein synthesis in My-bi HSC from young mice but its gene regulatory function is lost in HSC from old mice. Moreover, the deletion of Igf2bp2 mimics the aging-related changes in the expression of these genes in My-bi HSC, which may contribute to changes in HSC function during aging. 3.4.2 Igf2bp2expressionisenrichedinasubclusterinHSCsofyoungmicecosegregating withexpressionofLin28,Igf/Pi3k,andstemness-relatedgenes. In order to determine the role of Igf2bp2 in unperturbed hematopoiesis of wildtype mice, scRNA-seq was conducted on freshly isolated My-bi HSCs from a pool of 6 week old, male, wildtype mice (n=5). Through cell clustering (via the Louvain algorithm in Seurat), scRNA-seq identified a cluster (Cluster 3), of My- bi HSC, which was enriched for Igf2bp2 expressing cells compared to the other 8 sub-clusters that were identified (Fig 3.2A-C, Fig 3.3A). In agreement with this observation, Lin28b, the upstream regulator of let7/Hmga2/Igf2bp2, was also enriched in Cluster 3 (Fig 3.2D,E), while the expression of Hmga2 expression was more ubiquitous across the different clusters (Fig 3.3B,C). Interestingly, the expression of Igf2 and Igf1, 38 Figure 3.1: (Caption on next page) two prominent mRNAs that are bound and regulated by Igf2bp2, also showed an enrichment in Cluster 3 of the HSCs (Fig 3.2F-I). Together, these data indicated that Lin28b/Igf2bp2 expression is enriched in a sub-cluster of My-bi HSCs in young mice co-segregating with increased expression of bonafide mRNA targets of Igf2bp2. To identify mRNA and pathways that are regulated by Igf2bp2 during unperturbed hematopoiesis in My-bi HSCs of wildtype mice, gene expression in Cluster 3 was compared to gene expression of clusters that were unbiased/unprimed towards particular hematpoietic lineages. To determine which clusters were 39 Figure 3.1: Igf2bp2 deletion decreases the expression of genes related to mitochondrial metabolism and protein synthesis in young myeloid biased HSCs. (A) Principal component analysis (PCA) plot indicates that HSC of young mice have a different transcriptome compared to HSC of old mice. Interestingly, Igf2bp2 gene status strongly separates the principal components of genes expression profiles in HSC from young mice but has almost no effect on the principal components of genes expression profiles in HSC from old mice. (B) Venn diagram depicting the overlap of DEG in HSC of youngIgf2bp2 − /− versusIgf2bp2 +/+ mice with 83 mRNA that were previously identified to be directly bound by Igf2bp2 in brown fat1. Note that mRNA that are bound byIgf2bp2 exclusively overlap with downregulated DEG inIgf2bp2 − /− HSC versus Igf2bp2 +/+ HSC. (C) Bar graph depicts the top ten Gene Ontology (GO)-terms enriched for downregulated DEG in young Igf2bp2 − /− mice vs. Igf2bp2 +/+ mice. The gene number enriched in each term is shown at the end of bar. Asterisks highlight mitochondria metabolism (red) & proteins synthesis-related (blue) GO- terms. The DEG that were included in these GO terms are depicted in (D) – a heat map on the expression pattern of DEG included in the mitochondria metabolism & protein synthesis-related GO-terms marked by asterisks in panel C. Color scale indicates the expression level. The heatmap includes 10 target genes that were previously shown1 to be bound by Igf2bp2 (green circle in B) including 8 genes are related to mitochondrial metabolism (marked by red asterisks) and 2 genes are related to protein synthesis (marked by blue asterisks). An ANOVA test was applied to compare the expression of genes shown in heat map be- tween youngIgf2bp2 − /− HSC and oldIgf2bp2 − /− HSC (p=0.26111) or between youngIgf2bp2 +/+ HSC vs. old Igf2bp2 +/+ HSC (p=0.04794) or between young Igf2bp2 − /− HSC vs. old Igf2bp2 +/+ HSC (p=0.48618). Note that Igf2bp2 deletion makes the expression profile of mitochondria metabolism & protein synthesis relate genes of young HSC more similar to old HSC. unprimed, we initially looked at the top DEG by cluster (Fig. 3.3A) and found that Clusters 1,2, and 6 up- regulated Pf4, Mki67, and Car1 respectively which are associated with megakaryocyte/erythrocyte (ME) commitment. Further investigating this, many other ME markers were highly expressed in Clusters 1, 2, 6, and 7 (Fig. 3.4). In Clusters 1 and 2, megakaryocyte markers Pf4, Vwf, and Itga2b were enriched (Fig. 3.4B,C,D). From here, we concluded that the cells in Clusters 1 and 2 were stem cell-like megakaryocyte progenitors. While transcriptionally they are very similar to HSCs, they are primed towards the megakary- ocyte lineage and differentiate in response to inflammation signals [83]. From the high expression of Gata1 and Car1 in Cluster 6, we determined that this cluster was also primed towards the ME lineage. Lastly, since Cluster 7 had moderate expression of Itga2b, Mki67, and Gata1, we also labeled it as a ME primed cluster. Thus, Clusters 1,2,6, and 7 were excluded from comparison with the Igf2bp2 enriched Cluster 3. Compared to the unprimed HSC clusters (Cluster 0,4,5,8, Fig 3.2A), the expression signature of Cluster 3 showed an upregulation of IGF/PI3K signaling (Igf1, Pik3r1) and a downregulation of inhibitors of AKT 40 Figure 3.2: (Caption on next page) 41 Figure 3.2: Igf2bp2-expression in young HSC co-segregates with Lin28b, Igf/Pi3k, and stemness-related gene expression signature. (A) UMAP plot with Seurat clustering analysis revealing 9 distinct clusters (Cluster 0 to Cluster 8). (B-I) Feature plots and histograms on target genes of the Lin28/Igf2bp2 pathway depicting the expression levels and the percentage of positive cells in the HSC-subclusters for (B,C) Igf2bp2, (D,E) Lin28b, (F,G) Igf2, and (H,I) Igf1. Gray dots indicate no expression, and red intensity indicates the expression level of each gene. (J) Heatmap of differentially expressed genes (DEG) in HSCs from Cluster 3 (enriched for Igf2bp2 expressing cells) compared with HSCs from Cluster 0, 4, 5 and 8 (adjusted p-value <0.01, Bonferroni). Color scale indicates gene expression level. Asterisks mark known stemness-related genes (red), imprinted genes expressed in quiescent-enriched, long-term HSCs2 (blue), and regulators of Igf/Akt signaling (green). The DEG (marked by yellow asterisks) that were overlapping with DEG in HSC pool analysis are depicted in (K) – a heatmap on the expression pattern of the overlapping DEG (marked by yellow asterisks in panel J) in young Igf2bp2 − /− HSC versus Igf2bp2 +/+ HSC from bulk RNA sequencing analysis (Fig. 7). signaling (Igfbp4 and Cmtm7) (Fig 3.2J, green asterisks). IGF/PI3K/AKT represent major regulators of cell metabolism and growth that are known to be activated by Igf2bp2 mediated RNA expression [84, 85, 86, 87]. Interestingly, Cluster 3 is also enriched for the expression of genes related to quiescence and stemness of HSC including Mllt342, Cdkn1c/p5743, Procr44, Xbp145 and Slfn246 (Fig 3.2J, red asterisks). Similarly, Cluster 3 was enriched for the expression of genes that are regulated by imprinting and known to be highly expressed in quiescent-enriched, long-term HSCs [88], including Rian, Cdkn1c/p57, H19 and Meg348 (Fig 3.3D-I and Figure 3.2J, blue asterisks). Together, these data indicated that Igf2bp2 expression marks a subcluster of My-bi HSC, which is enriched in expression of stemness-related genes and genes that regulate IGF/PI3K/AKT activity. The connection between Igf2bp2 expression in Cluster 3 with the co-enriched genes in this cluster was supported by a comparison with the pooled RNA seq data on young Igf2bp2 − /− vs. Igf2bp2 +/+ HSC (Fig 3.1). The differentially enriched/depleted genes in Cluster 3 that overlapped with DEG in Igf2bp2 − /− vs. Igf2bp2 +/+ HSC were all in agreement. Specifically, eight genes with enriched expression in Cluster 3 overlapped with DEG in Igf2bp2 − /− vs. Igf2bp2 +/+ HSC and all of these genes were significantly downregulated in Igf2bp2 − /− HSC vs. Igf2bp2 +/+ HSC (Fig 3.2K). In contrast, three genes with decreased expression in Cluster 3 overlapped with DEG inIgf2bp2 − /− vs. Igf2bp2 +/+ HSC and 42 Figure 3.3: Igf2bp2-high cluster is enriched for the expression of imprinted genes that are expressed in long-term HSC. (A) Heat map showing top differentially expressed genes in each subcluster as depicted in Figure 3.2A. Color scale indicates the level of gene expression. (B-I) Feature plots and histograms on HSC-related, imprinted genes depicting the expression level and the percentage of positive cells in the HSC-subclusters (as shown in Fig. 3.2A) for (B,C) Hmga2, (D,E) Rian, (F,G) Cdkn1c and (H,I) H19. Gray dots indicate no expression, and red intensity indicates the expression level of each gene. all of these genes were significantly upregulated in HSCs from young Igf2bp2 − /− vs. Igf2bp2 +/+ mice (Fig 3.2K). 43 Figure 3.4: High expression of megakaryocyte/thrombocyte and erythroid marker genes indicates that HSCs in clusters 1, 2, 6, and 7 are lineage primed. (A) UMAP plot with Seurat clustering (B-H) Feature plots of common marker genes of megakaryocyte/thrombocyte and erythroid lineages: (B) Itga2b, (C) Vwf, (D) Pf4, (E) Klf1, (F) Mki67, (G) Gata1, (H) Car1. 3.5 Discussion In this work, we aimed to learn about HSC aging, specifically how metabolic processes regulated by the Lin28/let− 7 pathway contribute to the aging HSC phenotype. Through data analysis of two different RNA sequencing experiments, we found that in old mice, Igf2bp2 function is lost, and thus regulation 44 of downstream Igf2bp2 targets is altered. This was demonstrated by the bulk RNA-sequencing experi- ment, where My-bi HSCs from oldIgf2bp2 − /− andIgf2bp2 +/+ mice had similar expression acrossIgf2bp2- regulated genes. The functions of the targets ofIgf2bp2 are primarily related to cellular metabolic function and protien synthesis. Interestingly, My-bi HSCs from youngIgf2bp2 − /− mice and My-bi HSCs from all old mice had similar expression among Igf2bp2-mediated genes, implying that loss of Igf2bp2 is a contributor to the aged-HSC phenotype. In the scRNA-seq experiment, a sub-cluster was identified which had differential up-regulation ofIgf2bp2 as well as differential expression of metabolism-related pathways (IGF/PI3K/AKT) and stemness- related genes. Thus, these results imply that Igf2bp2 function as a regulator of metabolism is closely tied to globalHSC aging. 45 Chapter4 Generegulatorynetworkinferenceforhematopoieticstemcellstate transitionswithdietaryinterventions 4.1 Abstract Delineating the regulatory mechanisms underlying cell differentiation and cell fate decision points has long been a central question to stem cell biology. Full characterization of cell differentiation requires accu- rate reconstruction of the underlying gene regulatory networks (GRNs) governing the system - a problem which has been extensively studied using methods which draw from a wide range of fields. These methods have nearly exclusively been developed for application to single-cell RNA-sequencing (scRNA-seq) data. More recently, advancements in single-cell multiome data - where both RNA and chromatin accessibility measurements can be extracted from the same single cells - offers an exciting new opportunity to investi- gate GRN structures by leveraging these multiple measurements. In this work, we present popInfer, a new method for inferring GRNs on single-cell mulitome data, and demonstrate that popInfer outperforms meth- ods developed to run on scRNA-seq data alone. We apply popInfer to four single-cell mulitome datasets of hematopoietic stem and progenitor cells from mice of different ages and diet treatments. popInfer is able to identify changes in regulatory mechanisms that explain phenotypic changes in hematopoiesis observed in response to aging and diet. 46 4.2 Introduction Hematopoietic stem cells (HSCs) maintain the blood system through the continual production of myeloid and lymphoid cell types throughout life. Mammalian hematopoiesis occurs primarily in the bone marrow, where HSCs, residing in restricted niches [89], give rise to multipotent progenitor (MPP) cells, and subse- quently lineage-restricted progenitors of various identities through a complex set of regulatory processes. Certain regulatory steps during hematopoiesis are well-characterized, e.g. the control of myeloid-restricted lineages via GATA1-PU.1 mutual inhibition [17], and of granulocyte/monocyte lineage commitment via IRF8-GFI1 mutual inhibition [90]. However, elsewhere in hematopoiesis, knowledge of the crucial gene regulatory network (GRN) interactions controlling cell state transitions is lacking. This includes one of the earliest cell state transitions, whereby HSCs lose their self-renewal and stemness capacity as they transi- tion to MPPs [91]. Transcriptional as well as epigenetic factors are implicated in the HSC loss of stemness transition [92, 93]. HSC function deteriorates with age. HSC aging phenotypes include reduced regenerative capacity, myeloid skewing, and increased clonality that carries with it the risk of cancer [94, 95, 96]. While it was previously hypothesized that these aging-related changes in HSCs were largely due to DNA damage, recent studies have revealed a more complex view of HSC aging where many processes, including the epigenetic landscape and metabolism contribute to the phenotype [97]. Caloric restriction by various forms of fasting or diet restriction increases lifespan [98]. While the molecular mechanisms underlying this phenotype are still being elucidated, the attenuation of age-associated effects by caloric restriction has been demonstrated in various tissues and organisms [99]. Diet restriction (DR) is defined as caloric restriction by10− 30%, a range which avoids undernutrition [100]. In HSCs, stem cell aging-related effects (including expansion of total HSC pool and myeloid-biased skewing) have been reduced upon DR in mice [100]. In response to DR, HSC quiescence increased, leading to improved repopulation capacity after diet treatment, however lymphopoiesis was impaired during DR treatment [100]. The regulatory mechanisms 47 underlying this attenuation of HSC aging through treatment by DR are unknown; their identification could permit not only finer control of HSC aging, but also understanding of how the aging phenotype in HSCs is established. Building on models that seek to infer GRNs from buk RNA-seq data [101], many new methods for GRN inference have been developed in light of single-cell data, particularly scRNA-seq. These are build on the premise that the higher-resolution offered by decomposing bulk samples into single cells can improve GRN inference, and that the single-cell noise does not overwhelm the signal. Methods derive from fields ranging from statistical learning [102], dynamical systems theory [103], tree-based approaches [101, 104], information theory [105, 106], and time series [107], all with varying assumptions, computational time, and overall performances. Despite their promise: it has been challenging to achieve high performance on GRN inference with single-cell data overall, especially temporally ordered data; perhaps due to the levels of noise in these data. It has been shown that in some cases randomly ordered cells outperform the “correctly” ordered cells [107]. Inferring the gene regulatory processes controlling hematopoiesis under normal conditions or dietary interventions requires new methods. Single-cell genomics approaches hold great promise, and have shed light onto cell types [108], GRNs [109], transition dynamics [110, 111], and cell-cell communication [112, 113] but gene expression alone may not be enough. Recently, advances in experimental technologies enable the joint measurement of gene expression by RNA-sequencing (RNA-seq) and chromatin accessibility by assay for transposase-accessible chromatin by sequencing (ATAC-seq) in the same single cells [114, 115]. These multiomic data present a new opportunity to learn complex dynamic gene regulatory processes. Here we present popInfer: network inference with pseudocells over pseudotime, a new method to infer gene regulatory networks using single-cell multiomic data. popInfer learns directed signed GRNs, i.e. able to predict activating vs. inhibitory interactions between transcription factor and target gene. We apply popInfer to study hematopoiesis during dietary interventions in young and aged mice, focusing on 48 the dynamics of the transition from HSCs to MPPs. Through comparison with reference ChIP-seq data, we demonstrate that during healthy hematopoiesis (ad libitum diet) that popInfer outperforms alternative GRN inference methods using scRNA-seq data. We show that popInfer performance comes from two key factors: i) dynamic gene expression and chromatin accessibility data; and ii) joint measurements from the same single cells (matched data sampled separately for scRNA-seq and scATAC-seq do not suffice). Comparative analysis of the GRNs predicted by popInfer to control the transition from HSCs to MPPs during DR reveals that DNA-methyltransferase 1 (Dnmt1) acts as a crucial regulator of increased HSC quiescence upon diet restriction in young mice, the effect of which is lost upon aging. Overall, we show how joint measurements of gene expression and chromatin accessibility can be leveraged using popInfer to predict the gene regulatory interactions that control cell state transitions. 4.3 Methods 4.3.1 AnimalExperimentsand10xMultiomeProtocol Young (∼ 6.5 months old) and aged (∼ 26 months old) mice were randomly distributed into age- and weight-matched groups. Over the course of the first week the food intake of the individual mice was measured. At the beginning of the second week, the DR groups were fed once a day with a food portion corresponding to70% of the normal intake of the respective mouse. The mice’s hind limbs (including hip bones joints), forelimbs and spines were dissected. Cells were collected by centrifugation and stained with anti-cKIT-APC AB – 8µl for young samples, 10µl for old samples to account for the age-related increase in the number of c-kit positive cells [116, 117]. c-kit positive cells (hematopoietic stem and progenitor cells) were sorted. Samples were loaded separately onto the channels of the 10x Genomics Chromium Controller and processed with Single Cell Multiome ATAC + Gene Expression following the standard manufacturer’s protocol. 49 4.3.2 HSPCmultiomicdatanormalizationandqualitycontrol yAL, yDR, oAL, and oDR single cell multiome ATAC + RNA data were processed using 10X Genomics Cell Ranger ARC (v2.0.0) mapped to the GRCm38 reference genome. Seurat (v3.2.3) and ArchR (v1.0.2) packages were used for all further analysis. For oAL, we began with4754 cells. We first removed ambient RNA using SoupX [118], manually setting the contamination to 10%. Using ArchR for ATAC quality control, cells with TSS enrichment < 5.0 or number of fragments< 1200 were removed. Using Seurat for RNA quality control, we remove cells with RNA counts > 15000 or < 1500, cells with number of features < 1300, or cells with mitochondrial percentage> 20%. After this initial quality control, 4082 cells remained. Lastly, we ran Doublet Finder [119] with3% doublet formation rate and PC neighborhood sizepK = 0.29, removing 122 doublets and leaving us with a final set of 3960 cells (Fig. B.2A). The oAL RNA counts were transformed by regressing out G2M and S phase cell cycle scores (from the CellCycleScoring Seurat function) using the Seurat function SCTransform (Fig. B.2B-C). For oDR, we began with2576 cells. We first removed ambient RNA using SoupX, manually setting the contamination to10%. Using ArchR for ATAC quality control, cells with TSS enrichment< 8.0 or number of fragments < 2000 were removed. Using Seurat for RNA quality control, we remove cells with RNA counts> 10000 or< 1000, cells with number of features< 500, or cells with mitochondrial percentage > 35%. After this initial quality control, 2299 cells remained. Lastly, we ran Doublet Finder with 3% doublet formation rate and PC neighborhood sizepK = 0.17, removing69 doublets and leaving us with a final set of 2230 cells. The oDR RNA counts were transformed by regressing outG2M andS phase cell cycle scores (from the CellCycleScoring Seurat function) using the Seurat function SCTransform. For yAL, we began with4185 cells. We first removed ambient RNA using SoupX, manually setting the contamination to10%. Using ArchR for ATAC quality control, cells with TSS enrichment< 5.0 or number of fragments < 2000 were removed. Using Seurat for RNA quality control, we remove cells with RNA 50 counts > 9000 or < 1000, cells with number of features < 700, or cells with mitochondrial percentage > 25%. After this initial quality control, 3423 cells remained. Lastly, we ran Doublet Finder with 3% doublet formation rate and PC neighborhood sizepK = 0.29, removing103 doublets and leaving us with a final set of 3320 cells. The yAL RNA counts were transformed by regressing outG2M andS phase cell cycle scores (from the CellCycleScoring Seurat function) using the Seurat function SCTransform. For yDR, we began with 3616 cells. We first removed ambient RNA using SoupX, manually setting the contamination to 10%. Using ArchR for ATAC quality control, cells with TSS enrichment < 5.0 or number of fragments< 3000 were removed. Using Seurat for RNA quality control, we remove cells with RNA counts> 9000 or< 700, cells with number of features< 500, or cells with mitochondrial percentage > 40%. After this initial quality control, 3146 cells remained. Lastly, we ran Doublet Finder with 3% doublet formation rate and PC neighborhood sizepK = 0.27, removing94 doublets and leaving us with a final set of 3052 cells. The yDR RNA counts were transformed by regressing outG2M andS phase cell cycle scores (from the CellCycleScoring Seurat function) using the Seurat function SCTransform. 4.3.3 HSPCmultiomicdataclusteringandpeakcalling In all datasets, the first 30 principal components were used for louvain clustering and Uniform Manifold Approximation and Projection (UMAP) in Seurat. Clustering resolution was set to0.25 foroAL andoDR, 0.35 foryAL, and0.31 foryDR, resulting in 6 distinct clusters in oAL, 7 distinct clusters in oDR and yAL, and 5 clusters in yDR. Corresponding cell types were annotated using canonical hematopoietic cell type markers. For peak calling, all four datasets were used to create a single ArchR object (Fig. B.2D-E). Using the clusters defined using the RNA, the two smallest clusters in both oAL andoDR were removed prior to peak calling and the smallest cluster inyDR was removed prior to peak calling. Pseudo-bulk replicates were made in ArchR using the addGroupCoverages function with maxReplicates set to 10 and groupBy set 51 to the samples and clusters defined using the RNA data. The pseudo-bulk replicates were used for peak calling via the addReproduciblePeakSet ArchR function. 4.3.4 FeatureSelectionforNetworkInference To select features to use as input to the gene regulatory network inference model, we first identified genes that were differentially expressed between the HSC and multipotent progenitor clusters for each of the four samples using Seurat’s FindMarkers function. We defined a gene to be differentially expressed if it’s adjustedp-value was less than0.05. We then took the union over these four sets of DEGs, giving us107 genes. From this set of genes, we removed three genes that weren’t found in the reference chromatin annotation. The remaining104 genes were used as the input features to the network inference method. 4.3.5 HSPCmultiomicpseudotimeordering,featureselection,andpseudocellconstruction On each dataset, DPT [120] was applied to the RNA assay to assign pseudotime values. The root cell of DPT was set to be the cell with highest expression of the sum of Mecom and Mpl, two canonical HSC markers. For yAL, the root cell was defined as the cell with the fourth highest expression of the sum of Mecom and Mpl because the top three cells either did not lie in the HSC cluster or were on the border of the UMAP with another cluster. The expression of the top 2000 variable features were used as input to DPT. For each dataset, we constructed pseudocells along pseudotime. To do so, for each dataset, cells were first ordered along pseudotime. Cells are then partitioned into similarly sized pseudocell bins where the first n modℓ bins contain⌊ n ℓ ⌋+1 cells and the remainingn− (n modℓ) bins contain⌊ n ℓ ⌋ cells. Pseudocell expression (x p ) was defined as the average of the expression of the k cells within it’s corresponding bin (Fig. 4.2B): x p = 1 k k X j=1 x j 52 wherex 1 ,x 2 ,...x k are the expression values of the cells in the pseudocell bin for pseudocellp. 4.3.6 HSPCmultiomicATACaccessibilityscoringforpseudocells Using the same bins of cells defined above, we define an ATAC accessibility score which is draws upon ideas from the accessibility score defined in Janssens et al. [121]: From the RNA pseudocell construction above, for each psudeocellp we have a set ofk cells that assigned to that pseudocell’s bin. To compute the pseudocell accessibility score for a given gene, we first identify the gene’s transcription start site (TSS). Then, we use the window 5kb upstream from the TSS and the entire gene body within which to look for accessible genomic regions. If there is an overlapping gene in the 5kb upstream window, we trim the upstream window to exclude the overlapping region. If there is only one accessible genomic region in our window, weights are not useful and thus we skip computation. Assume that we have m > 1 accessible regions in our window. From here, for each cell in our pseudocell’s bin, we compute the Gini score [122] of the vector of ATAC counts for each genomic region in our window. We thenz-standardize the Gini scores and denote this valuez Gini . This allows us to compute our weight for theith region for cellj: w j (i) =e z Gini j (i) For each cell, we average over the product of the weight with the ATAC counts for each region ourm regions to get our gene accessibility scorey j : y j = 1 m m X j=1 a j (i)w j (i) 53 wherea j (i) is the ATAC count for theith peak of thejth cell. Averaging over this accessibility score for allk cells and normalizing by the number of reads in the TSS gives us our final gene accessibility score: y p = P k j=1 y k P k j=1 RIT(j) whereRIT(j) is the number of reads in a TSS for thejth cell (Fig. B.1). 4.3.7 Pseudotime-orderedpseudocell(POP)inferencemodel From our pre-processing, we haven pseudocells, for which we have pseudocell expression scoresx p1 ,x p2 ,...,x pn and pseudocell gene accessibility scores y p1 ,y p2 ,...,y pn . From our feature selection, we have genes G. For each geneg∈G, we run a lagged LASSO regularized linear regression model using glmnet [123]: min β 0 ,β 1 n n X i=1 (y i (g)− β 0 − x T i (g)β ) 2 +λ g ||β || 1 +λ ||β || 1 whereλ g is selected via the optimization: λ g = min λ α MSE λ (MSE trivial − (1− α ) (# nonzeroβ ) λ (# nonzeroβ ) λ MSE whereα ∈ [0,1],MSE λ is the MSE of the LASSO model forλ ,MSE trivial is the MSE when we have a trivial model (β = 0), (# nonzero β ) λ is the number of nonzero coefficients of the LASSO model for a givenλ value, andλ MSE is the value ofλ for which the LASSO model achieves optimal MSE. The need for gene-specific sparsity λ g , is discussed in the Results section. We evaluate this optimization for a given α value by running glmnet for200λ values and solving. 54 After running for a fixed alpha value, we define a G× G output matrix, W α (g i ,g j ) = 1, ifβ g i ,g j > 0 − 1, ifβ g i ,g j < 0 0, otherwise for allg i ,g j ∈G,g i ̸=g j . Finally, we run this for a sequence ofα valuesα 1 ,α 2 ,...,α s ∈ [0,1] and define the final output of popInfer to be the weight matrix: W = 1 s s X j=1 W αj . In this work, we use the sequence ofα values: α ∈{0.001,0.002,...,0.3}. 4.4 Results 4.4.1 Thestem-to-multipotenthematopoietictransitionismaintainedthroughoutlifetime anddietaryperturbations We studied early hematopoiesis across lifetime and dietary perturbations via single-nucleus sequencing of gene expression (snRNA-seq) and chromatin accessibility (snATAC-seq) in the same single cells (Fig. 4.1A). We generated data from four experimental conditions: mice fed ad libitum at a young (yAL) or old (oAL) age, and mice fed diet-restricted (DR) at a young (yDR) or old (oDR) age. All mice were fed ad libitum until two weeks prior to dissection, at which point the yDR and oDR groups received a30% calorie reduced diet (Fig. 4.1A). Bone marrow was sampled at the end of the treatment or control-matched groups, between 26 and 27 weeks of age for young mice, and between 100 and 107 weeks for old. HSPCs were sorted and isolated for single-nucleus multiome sequencing and downstream analyses (Fig. 4.1 B). RNA 55 and ATAC-sequencing data were preprocessed and each modality was analyzed before being integrated for GRN inference using popInfer. Aging and diet can each result in changes to HSC capacity and function, with the potential to affect self-renewal and quiescence, repopulation capacity, and lineage bias. Each of these processes implicates early steps in HSC differentiation whereby HSCs lose their stemness capacity and transition to multipotent progenitor cells. We thus sought to investigate how the hematopoietic subpopulations change during these early stages of differentiation and how these transitions are regulated. Each HSPC sample was clustered; and the hematopoietic subpopulations were identified using canonical marker genes (Fig. 4.1C and Fig. B.2). The yAL and yDR samples were more homogenous overall; in oAL and oDR samples we observed ex- pansions in the HSC/Multipotent pool and captured a greater number of lineage-restricted cells at various stages of hematopoietic differentiation. We analyzed the stemness/multipotent gene expression signatures by differential gene expression. The top differentially expressed genes between HSCs and Multipotent cells from yAL were selected by total log-fold change, and are plotted for all four samples in (Fig. 4.1D). Marker genes for HSCs include known stem cell markers Mecom and Mllt3; and for the Multipotent cells include known markers Flt3 and Mpo [90, 124]. The differential expression pattern between stem and multipotent cells was observed in all samples: more distinctly in the old samples, and less distinctly in the young samples where hematopoietic cells are more homogeneous. 4.4.2 Contemporaneous gene expression and chromatin accessibility measurements enableinferenceofdynamicgeneregulatoryinteractions GRN inference methods that seek to infer gene regulatory interactions governing dynamic processes fall under two categories: time-dependent and time-agnostic methods. Several time-dependent GRN infer- ence methods have developed specifically for application to non-temporal (“snapshot”) data. These infer 56 HSC Multipotent MEP Primed GMP MEP Other Satb1 Plac8 Cdk6 Flt3 H2afy Mpo Plxdc2 Itsn1 Med12l Sgms1 Pbx1 Csgalnact1 Pcdh7 Mllt3 Rora Il1rapl2 Samd12 Dlg2 Mecom Prkg1 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other HSC Multipotent MEP Primed GMP MEP CLP Primed CLP Satb1 Plac8 Cdk6 Flt3 H2afy Mpo Plxdc2 Itsn1 Med12l Sgms1 Pbx1 Csgalnact1 Pcdh7 Mllt3 Rora Il1rapl2 Samd12 Dlg2 Mecom Prkg1 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP CLP Primed CLP HSC Multipotent MEP Primed GMP MEP Other Satb1 Plac8 Cdk6 Flt3 H2afy Mpo Plxdc2 Itsn1 Med12l Sgms1 Pbx1 Csgalnact1 Pcdh7 Mllt3 Rora Il1rapl2 Samd12 Dlg2 Mecom Prkg1 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other −2.5 0.0 2.5 5.0 −2 0 2 4 UMAP_1 UMAP_2 HSC Multipotent MEP Primed GMP MEP CLP Primed CLP −4 0 4 −5 0 5 UMAP_1 UMAP_2 HSC Multipotent MEP Primed GMP MEP −5 0 5 −5 0 UMAP_1 UMAP_2 HSC Multipotent MEP Primed GMP MEP −2.5 0.0 2.5 5.0 −2 0 2 4 UMAP_1 UMAP_2 HSC Multipotent MEP Primed GMP MEP CLP Primed CLP −4 −2 0 2 −10 −5 0 5 UMAP_1 UMAP_2 HSC Multipotent MEP Primed GMP MEP HSC Multipotent MEP Primed GMP MEP Other Satb1 Plac8 Cdk6 Flt3 H2afy Mpo Plxdc2 Itsn1 Med12l Sgms1 Pbx1 Csgalnact1 Pcdh7 Mllt3 Rora Il1rapl2 Samd12 Dlg2 Mecom Prkg1 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other HSC Multipotent MEP Primed GMP MEP Other Satb1 Plac8 Cdk6 Flt3 H2afy Mpo Plxdc2 Itsn1 Med12l Sgms1 Pbx1 Csgalnact1 Pcdh7 Mllt3 Rora Il1rapl2 Samd12 Dlg2 Mecom Prkg1 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other HSC Multipotent MEP Primed GMP MEP Other Satb1 Plac8 Cdk6 Flt3 H2afy Mpo Plxdc2 Itsn1 Med12l Sgms1 Pbx1 Csgalnact1 Pcdh7 Mllt3 Rora Il1rapl2 Samd12 Dlg2 Mecom Prkg1 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other UMAP_1 UMAP_2 A B C D Age (weeks) 0 24 25 26 98 99 100 DR DR Old mice HSCPs sampled Young mice HSCPs sampled Isolate HSPCs Single-cell suspension scRNA-seq scATAC-seq Gene regulatory network inference with popInfer HSC Multipotent MEP Primed GMP MEP Other Cdk6 Cd34 Rnf220 H2afy 2610307P16Rik Satb1 Flt3 Med12l Trpc6 Dlg2 Mecom Mllt3 Plxdc2 Gda Magi2 Rora Ncam2 Plcl1 Il1rapl2 Chrm3 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other Mecom GMP HSC MEP MEP_Primed MPP Peaks Genes 30000000 30200000 30400000 Coverage (Norm. ATAC Signal Range (0−0.46) by ReadsInTSS) chr3:29948007−30553008 Figure 4.1: Overview of experimental study design. (A) Young and old DR mice are diet restricted for two weeks prior to bone marrow sampling. (B) HSCPs are sorted from the bone marrow and isolated to undergo single nuclear RNA and ATAC sequencing. popInfer, our gene regulatory network inference method for single-cell multiome data, is applied to the data to infer gene regulatory networks for each sample. (C) UMAPs of cell type clusters for all 4 samples (from left to right: oAL, oDR, yAL, yDR) (D) Heatmaps of the differentially expressed genes with top 20 log-fold changes between the HSC and Multipotent clusters of yAL. Each heatmap corresponds to the same sample above in (C). 57 the temporal process from a population of heterogeneous single cells via trajectory inference/use of pseu- dotime [105, 103, 107, 125]. However, whether pseudotime values for individual genes contain useful dynamical information is highly dataset-dependent and remains an open question overall. Deshpande et al. provided compelling evidence that, for their system of study, interpreting pseudotime values explic- itly did not improve inference versus considering only the order of cells along the pseudotime axis [107]. Moreover, inference results were comparable with randomly-ordered cells as with pseudotime-ordered cells. A motivating factor in using time-dependent methods to infer gene regulatory networks for time- evolving processes is that we expect some time-lag between regulator gene expression and target gene expression (Fig. 4.2A). By assigning pseudotime values to cells, time-lags can be incorporated into in- ference models, allowing for appropriate correlation between regulator gene and target gene expression. The key to implementing these time-series based methods is having cells equally-spaced along time (or interpolating time equidistant cells). That is, the methods rely on explicit use of pseudotime values in their model, which, as discussed above, may not be a correct interpretation of pseudotime in many cases. however, without some measure of pseudotime to determine or interpolate a cell’s temporal position, we cannot link regulator gene expression with target gene expression dynamically and therefore gain little to no additional information about our network from pseudotime when used in this manor. From here, we asked: are there more contemporaneous values that we can correlate within the same single cells that are able to capture the dynamic nature of a time-evolving process like cell differentiation? This led to our model popInfer, in which we propose to use the gene expression of the transcription factor (TF) to predict the chromatin accessibility of the target gene. The reasoning behind linking regulator gene expression with target gene accessibility is simply that if a regulator is expressed and is activating it’s target, generally the target gene’s chromatin needs to open prior to the target gene’s expression (Fig. 4.2A). While there are certainly caveats to this approach, these two values occur much more closely in time 58 than regulator expression with target expression, thus allowing us to avoid making any uninformative presuppositions about pseudotime. Thus, we assumed that if a TF regulates a given target, then we expect that there should be some type of relationship between TF gene expression and TF target gene chromatin accessibility. The sparsity of both RNA and ATAC single-cell data necessitated construction of pseudocells to obtain consistent results from gene network inference. This led us back to our conclusion about pseudotime: that it is a useful similarity ordering of cells along a trajectory. In order to bin cells together that were in a similar stage of differentiation, we used pseudotime to order the cells and then partition them. Each bin of cells was used to construct pseudocells, which served as the input to our GRN inference method (Fig. 4.2B). Pseudocell expression (x p ) is defined to be the average expression of the cells in that bin. From the single-cell ATAC data, we aimed to summarize the accessibility of a gene and it’s promoter region into a single score (Fig. 4.2 B). Inspired by the score from Janssens et al., we define a single cell’s gene accessibility (y) to be the sum over the peaks in the gene body and 5kb upstream of the transcription start site (TSS) of the product of each peak’s accessibility with thez-standardized Gini score of the peak (Fig. 4.2B) [121]. Pseudocell gene accessibilty (y p ) is then defined to be the sum of the gene accessibility score over all cells in the pseudocell bin, normalized by the sum of the number reads in a TSS for all cells in the pseudocell bin (Fig. 4.2B). Normalizing by read in TSS is integral because it accounts for single cell sequencing depth, sample data quality, and number of cells in the bin. After constructing pseudocells with expression and gene accessibility scores, popInfer performs LASSO regression using pseudocell gene expression values to predict pseudocell gene accessibility scores (Fig. 4.2 B). For each target gene(g), gene-specific sparsity (λg ) is determined via a tradeoff between model sparsity and model mean-squared error (MSE). This tradeoff is controlled by the parameter α ∈ [0,1], which is fixed for all genes. Whenα = 0, λ g is selected which gives a trivial model whereβ = 0. Whenα = 1, λ g is selected which gives a model with optimal MSE. 59 For a given α value, the LASSO model outputs a matrix of inferred coefficients β . These values are binarized, although still including sign, to get a network matrix W α 4.2 B). Preserving the sign of the LASSO coefficients allows for popInfer to produce information about activation versus inhibition between gene pairs. The final output of popInfer is the averaged sum of matrices W α for a sequence of α values in the range [0,1.0]. Values ofα closer to zero produces sparser networks. In this work we selected the sequence ofα values{0,0.001,0.002,...,0.3}. As a result of this construction, most gene-gene pairs are assigned weights of 0. We believe this construction to be advantageous because it is often unclear how to select an appropriate weight cutoff to use in other GRN inference methods. With popInfer, gene pairs with assigned weights of zero can automatically be ruled out. Weight thresholds can also be applied to the popInfer results to examine the highest confidence inferred relationships. 60 Target gene body TF TF gene body Time delay between TF and target transcription TF RNA and target accessibility contemporaneous Single-cell multiome data Order over pseudotime pseudotime Parition into n bins of k cells Bin 1 Bin 2 Bin n Bin 3 … … k cells Aggregate binned cells into pseudocells ! " = 1 % & '() * ! ' LASSO regression with gene-specific sparsity min (/0,/) 1 24 & 5() 6 (7 5 (8) −: ; −! 5 < 8 :) = + ? @ : ) ? @ = min A B CDE A (CDE trivial) −(1−B) (# nonzeroβ) A (# nonzeroβ) A GHI A B Sum over networks with varying sparsity J K (L,M) = 1, if : 5,' > 0 -1, if : 5,' < 0 0, o.w. B ) ,B = ,…,B R ∈ [0,1] J = 1 V & '() R J K W 7 " = ∑ '() * 7 ' ∑ '() * AllTSSreadsincellM 7 ' = &c ' d ∗z−standardizedGiniscore p∈ peaks in promoter and gene body Gene expression ! " Gene accessibility 7 " pseudotime Pseudocells 1 2 n 3 … … pseudotime pseudotime 61 Figure 4.2: popInfer workflow for single-cell multiome gene regulatory network inference. (A) Schematic of TF gene expression and regulation of a target gene’s expression. After TF transcription, there is a time delay before the influence is detected in target gene expression. However, TF expression must co-occur with accessibility of it’s binding site with it’s target gene, so TF expression and target accessibility should be contemporaneous. (B) Overview of popInfer methods. Cells are ordered over pseudotime and binned into n equally sized bins of k cells. Cells belonging to a given bin are aggregated to form a pseudocell, wherex j is single-cell expression of thejth cell in a pseudocell bin,x p is pseudocell expression,a j (p) is single-cell accessibility of a peakp of thejth cell in a pseudocell bin,y j is the gene accessibility score of thejth cell in a pseudocell bin, andy p is pseudocell gene accessibility score. LASSO regression is applied to the n pseudocells, where pseudocell accessibility y p is predicted from pseudocell gene expression x p . The selection of the regularization termλ g is gene-specific and governed by a tradeoff between sparsity and MSE which is controlled by α ∈ [0,1]. In the tradeoff, MSE λ is the MSE of the LASSO model for a given λ value, MSE trivial is the MSE when we have a trivial model (β = 0), (# nonzero β ) λ is the number of nonzero coefficients of the LASSO model for a given λ value, andλ MSE is the value ofλ for which the LASSO model achieves optimal MSE. 4.4.3 popInferoutperformsothermethodsthatrunonscRNA-seqalone To evaluate the performance of popInfer, we applied popInfer along with top performing scRNA-seq GRN inference methods to our control ad libitum datasets, comparing the inferred networks with reference ChIP-seq data from ChIP-atlas, as defined by the benchmarking framework BEELINE [126]. For the scRNA- seq GRN inference methods, we selected GENIE3 [101], PIDC [127], TENET [105], and SINCERITIES [125], all of which consistently perform robustly and are built upon a wide variety of mathematical frameworks. Benchmarking methods is inherently difficult because ChIP-seq is an imperfect reference. For example, we are looking specifically at HSCs and Multipotent cells while the ChIP-seq was gathered from whole bone marrow, so we cannot expect to discover all true positive relationships while looking at a restricted set of cells. Thus, relative performance of methods is more important than the actual values of performance metrics. We began by looking at the HSC to Multipotent transition in the yAL sample. We used the same set of104 input genes as we will for when we compare the HSC to Multipotent transition across all four datasets (see Methods). Because this is a relatively small set of genes, there were relatively few (111) TF-target 62 pairs in the reference ChIP-seq where both TF and target genes were in the feature set. As a result, we wanted to select a second dataset with a larger set of cells and genes to use to test these methods as well. For this, we selected theoAL dataset, but looked at the transition from HSCs to Granulocyte-Monocyte Progenitors (GMPs), selecting cells from the HSC, Multipotent, and GMP clusters. We selected oAL because non-treatment samples are necessary in order to compare with ChIP-seq data and we selected this branch of differentiation because in the UMAP it is clearly contiguous with the HSC to Multipotent transition. For features, we selected the set of differentially expressed genes between the HSC and GMP clusters in the oAL dataset, giving us 565 genes. As a result of the larger set of input genes, there were 4374 TF-target gene pairs included in the reference. For evaluation, we use area under early precision-recall curve (AUEPRC) to compare method results. For networks that we expect to be sparse and have few true positives, AUEPRC seems to be the best suited metric as we care most about the quality of the highest-weighted gene pairs. In yAL, we evaluate the AUEPRC over the range of recall values[0,0.1] while in oAL we use a smaller window of recall values in [0,0.05]. In yAL, popInfer outperforms the other methods (Fig. 4.3A). We also wondered - how necessary was the use of pseudotime in construction of pseudocells to the performance of popInfer? To test this, we randomly sampled cells from within each cluster (either HSC or Multipotent) to create pseudocells. Figure 4.3B shows the AUEPRC of popInfer applied to these cell-type specific random pseudocells for 50 replicates. The dashed green line is the AUEPRC value of the original popInfer, showing that constructing psuedocells using pseudotime results in far-outperforming results than if we just use cell-type resolution to create pseudocells. Looking at the differentiation of HSCs to GMPs in oAL, popInfer again outperforms other methods (Fig. 4.3C). Looking at the precision-recall curve, the precision values of popInfer are large relative to the other methods for small recall values, meaning that popInfer performs well at early detection of true positives (Fig. 4.3D). 63 As a final method comparison, we wondered about the consistency of results between these different methods. To explore this, we looked at the top regulators by method, defined as the top 10 genes with the largest number of outgoing edges. Looking at the yAL results, we began with popInfer and defined absolute gene pair weights greater than0.25 (the same cutoff weight that we will use later in this work) to be considered true positives, leading to233 network edges. In order to compare the methods appropriately, we took the top233 weighted edges for all methods. We found that popInfer, GENIE3, PIDC, and TENET had highly conserved top regulators (Figure 4.3E). SINCERITIES had very few overlapping top regulators relative to the other methods, so it was omitted from comparison. The high overlap in top regulators highlights the ability for these different methods - despite their vastly different underlying models - to detect similar, useful information. However, as will be discussed in the following section, popInfer’s top regulator in yAL, Dnmt1, is not detected by any other method, yet is revealed to be an integral regulator in the HSC to Multipotent transition. 4.4.4 popInferidentifiesDnmt1andSatb1astopregulatorswithdifferentrolesacross theagingandDRphenotypes Returning to our four different samples, we applied popInfer to the HSC to Multipotent transition to each sample individually to see if we could use the inferred networks to discover what changes in gene regula- tion direct the aging and DR phenotypes. In order to properly compare network results across all samples, we used the same set of input genes in each, defined as the union over all four samples of the differentially expressed genes between the HSC and Multipotent clusters. For each sample, we created 70 pseudocells and ran popInfer forα ∈{0,0.001,0.002,...,0.3}. True positive edges were defined as being gene pairs with absolute weight greater than0.25. As noted earlier when comparing top regulators across methods, Dnmt1 is identified as top regulator in yAL by popInfer, but not by any other GRN inference method (Fig. 4.3E). Indeed, in yAL Dnmt1 regulates a 64 0.00 0.01 0.02 0.03 0.04 Diff−Sampled−Cells Same−Sampled−Cells AUEPRC 0.000 0.005 0.010 0.015 POP−infer GENIE3 PIDC TENETSINCERITIES AUEPRC oAL AUEPRC 0.00 0.01 0.02 0.03 POP−Infer GENIE3 PIDC TENET SINCERITIES AUEPRC yAL AUEPRC A C D Cell-type sampled pseudocell Inference POP-Infer GENIE3 PIDC TENET Chrm3 Flt3 Rnf220 Il1rapl2 Dlg2 H2afy Prkg1 Mecom Txnip Rad51b Cdk6 Dock10 2610307P16RIK Rora Itsn1 Magi2 Mllt3 Dnmt1 Stmn1 Ikzf2 Meis1 Slc18a2 E Top 10 regulators by method B 0.00 0.01 0.02 0.03 0.04 Diff−Sampled−Cells Same−Sampled−Cells AUEPRC Figure 4.3: popInfer performance comparison with scRNA-seq GRN inference methods. (A) Bar graph of area under early precision recall (AUPRC for recall values in the range[0.0.1], normalized by0.1) for the HSC to Multipotent transition in yAL. Networks were compared with true positive TF-target pairs from ChIP-Atlas. (B) Violin plot of the area under early precision recall for yAL using popInfer, but applied to pseudocells that were comprised of single-cells randomly sampled from cell-type clusters (as opposed to pseudotime-ordered). The green dashed line indicates the AUEPRC for the regular popInfer. (C) Bar graph of area under early precision recall (AUPRC for recall values in the range [0.0.05], normalized by 0.05) for the HSC to GMP transition in oAL. Networks were compared with true positive TF-target pairs from ChIP-Atlas. (D) Early precision-recall curve for the HSC to GMP transition in oAL. (E) Venn diagram of the top 10 genes with the highest number of outgoing edges by method for the HSC to GMP transition in oAL. Absolute value of popInfer weights greater than 0.25 are considered true positives, leading to 233 network edges. For all other methods tested, the top 233 edge weights were considered true positives to ensure equal sparsity across networks in this comparison. 65 large number of targets, all with a positive/activating relationship (Fig. 4.4A). Comparing popInfer results across all four samples, it was revealed that Dnmt1 is a top regulator in all samples except yDR (Fig. 4.4B). It has been shown that Dnmt1 plays a critical regulatory role in hematopoiesis, especially for HSC and progenitor cell states through maintaining methylation status [128]. DR has also long been linked to methylation status, but the underlying mechanisms have remained unknown as to why, although it has been hypothesized that Dnmt1 is responsible for methylation changes [129, 130]. Dnmt1 regulates methylation through two different pathways: inherited methylation and targed methylation mediated by TFs [131, 132]. Since Dnmt1 differs from other TFs in that it methylates it’s targets, we hypothesized that despite the inferred activation between Dnmt1 and it’s targets, that in fact Dnmt1’s expression would be negatively correlated with it’s inferred target gene’s expression as it methylates and silences expression. Indeed, we found that in yAL, the correlation between pseudocell expression of Dnmt1 and the pseudocell expression of Dnmt1’s targets as identified by popInfer is negative for more that two-thirds of the targets (Fig. 4.4C). Plotting the pseudocell expression and accessibility of Dnmt1 and one of it’s inferred targets Nfia, we see the clear pattern of yAL is lost in yDR (Fig. 4.4D-F). In the ad libitum samples, six of the Dnmt1 targets are conserved between them, many of which are related to maintenance of HSC quiescence (Fig. 4.4G). Thus, we hypothesize that Dnmt1 functions in hematopoiesis by mediating methylation of quiescence related genes to guide HSCs into differentiation. However, in yDR, Dnmt1’s silencing of quiescence-related genes is lost, thus resolving the observed yDR phenotype of increased HSC quiescence. Looking for other significant differences between networks in order to explain phenotypic changes, we also found that regulation of Satb1 varies greatly across samples. In yDR, Satb1 expression is inhibited by nearly all of it’s regulators (Fig. 4.4H). Moreover, this negative regulation seems to be unique to yDR (Fig. 4.4I). Satb1 is responsible for directing HSCs towards lymphopoiesis - a function that is impaired in yDR 66 [133]. From this, we hypothesize that inhibition of Satb1 in yDR results in the impaired lymphopoiesis observed. 67 Satb1 Alcam Prdm16 Hdac9 Neat1 Irf2bp2 Tmsb10 Chrm3 H2afy Itsn1 Dnmt1 Vwf Irf2bp2 Nfia A C yAL yDR oAL oDR Dnmt1 number of outgoing edges 21 3 18 8 Dnmt1 hub score ranking 1 16 1 5 yAL yDR oAL oDR Satb1 number of inhibitory incoming edges 1 4 0 0 Satb1 authority score ranking 43 9 90 40 0 1 2 3 4 −1.0 −0.5 0.0 0.5 1.0 Dnmt1 RNA Correlation Frequency yAL Dnmt1 target RNA correlation Chrm3 Gda Magi2 H2afy Itsn1 Slc22a3 Sox4 Tespa1 Exoc6b Alcam Pan3 Dnmt1 Prdm16 Airn Samd12 Vwf Plxnc1 Slc9a9 Csgalnact1 Irf2bp2 Nfia Meis1 B E F D G H I Dnmt1 pseudocell RNA expression yAL Dnmt1 pseudocell RNA expression yDR Nfia pseudocell RNA expression yAL Nfia pseudocell RNA expression yDR Nfia pseudocell ATAC accessibility yDR Nfia pseudocell ATAC accessibility yAL 68 Figure 4.4: popInfer identifies Dnmt1 and Satb1 as key regulators of the aged HSC phenotype. (A) popInfer inferred network of Dnmt1 and it’s targets in yAL. Gray arrows are for activation, red arrows are for inhibition. (B) Dnmt1 network scores by sample. (C) Histogram of correlations between Dnmt1 pseudocell expression and popInfer identified Dnmt1 targets pseudocell expression in yAL. (D) Pseudocell expression of Dnmt1 in yAL and yDR. (E) Pseudocell accessibility of Nfia (an inferred Dnmt1 target in yAL) in yAL and yDR. (F) Pseudocell expression of Nfia in yAL and yDR. (G) Conserved popInfer inferred Dnmt1 targets between yAL and oAL. (H) popInfer inferred network of Satb1 and it’s targets in yDR. (I) Satb1 network scores by sample. 4.5 Discussion In this chapter, popInfer was presented as a new method to infer GRNs from single-cell multiome data. Testing performance using reference ChIP-seq data, popInfer was found to outperform other GRN infer- ence methods which run on scRNA-seq alone. While scRNA-seq GRN inference methods that use pseu- dotime do not necessarily perform better as a result of assigning pseudotime values to cells, popInfer’s performance is far better when using pseudotime values to create pseudocells than by merely using cell type clusters. Applying popInfer to single-cell multiome data from young mice fed ad libitum (yAL), young mice fed DR (yDR), old mice fed ad libitum (oAL), and old mice fed DR (oDR), regulators of key phenotypic changes observed across aging a dietary treatment were identified. First, Dnmt1 was identified to be a top regu- lator of all samples except for yDR, which is known to result in increased quiescence of HSCs. Thus, we concluded that epigenetic regulation of DNA methylation by Dnmt1 upon cell division is a main driver of differentiation, except when stem cell quiescence is increased as is the case in yDR. Second, Satb1, which is partially responsible for driving HSCs towards the lymphoid lineage, was found to be inhibited by many HSC-associated genes in yDR, but not in any other sample. This explains the impaired lymphopoiesis observed in yDR mice. While popInfer was able to identify regulators responsible for observed phenotypic changes, there are certainly limitations to the method. One limitation is that the gene accessibility score does not take into 69 account distal regulatory elements like enhancers. Thus, it is possible that TFs that are regulating a target gene might not be correlated with an accessibility score looking only at the gene body and promoter re- gion. Another drawback about popInfer is that the underlying LASSO model assumes linear dependencies between RNA and chromatin accessibility. This assumption of linearity, while restricting with respect to inference of non-linear interactions, is necessary to make the computational time tractable when looking at large numbers of cells and genes. One final choice regarding popInfer worth noting is the treatment of cell-cyle. While cell-cycle phases G2M and S were regressed for cluster designation, it was not for any other computation, including pseudotime. Deciding whether or not to regress cell-cycle is a complex one in the case of HSCs because HSCs typically divide infrequently. This means that, to a certain extent, part of hematopoietic progenitor identity is their cell-cycle activity. Thus, we decided to abstain from regressing cell-cycle aside from cell clustering. Thus, the networks results of popInfer presented in this work can be interpreted as related to cell-cycle status. Since single-cell multiome data is in it’s infancy, few methods currently exist towards the task of GRN in- ference from single-cell multiome data. One recent method is RENIN, which correlates peaks with target gene expression, then identifies potential TFs that could bind in those peak regions [134]. A drawback however is that RENIN currently is not compared with any existing GRN inference methods that run on scRNA-seq alone, so it is unclear how RENIN’s performance compares. Others have used lineage tracing methods for single-cell multiome data to learn about regulatory mechanisms by comparing motif enrich- ment for differentially accessible regions during reprogramming [135]. In conclusion, popInfer has elucidated changes in regulatory mechanisms governing the transition from HSCs to multipotent progenitors in response to aging and diet restriction. The performance of popInfer highlights how integrating multiple measurements for the same single cells properly can improve GRN inference capabilities. While only in it’s infancy, single-cell multiome data is a powerful tool, providing a more robust perspective on the layers of gene regulation. 70 Chapter5 ConclusionsandFutureDirections Hematopoiesis is a complex process comprised of transitions from HSCs to a spectrum of mature blood cell types. Cells do not travel these differentiation paths in isolation, rather, as explored in Chapter 2, population-level coordination of fates emerges as a result of communication between hematopoietic pro- genitors. Healthy hematopoiesis depends critically on the state of root cell type, HSCs, and, as highlighted in Chapter 3, changes in HSC metabolism contribute to declines in HSC function as observed with ag- ing. Aging-related changes observed in HSCs subsequently lead to modifications in the state transitions of hematopoiesis. In Chapter 4, it was shown that the gene regulatory network directing the first hematopoi- etic state transition, from HSC to multipotent progenitor, is detrimentally perturbed with aging, but that dietary interventions alter the gene-gene interactions. These changes in gene regulatory networks account for the advantageous outcomes observed in hematopoiesis as a result of dietary interventions. Just as the intricacies of a fractal persist as one moves closer, each seemingly small piece of hematopoiesis holds it’s own unique complexity, a complexity which is interwoven with every other component of the process. The model presented in Chapter 2 not only explained multiple controversies existing in the field of HSC differentiation, but more broadly highlighted the critical need to consider multiple scales of interactions when modeling biological processes. Incorporating the higher-order interaction of cell-cell signaling to the cell fate bifurcation model gave us deep insight into how the population dynamics impact the single-cell 71 dynamics. Integrating multiple scales of dynamics into biological models is necessary to understand the complexity of such systems. There are many aspects of the model presented in Chapter 2 which have yet to be explored. One option for future work is to change the initial conditions of the cell from being homogeneous to heterogeneous, representing cells beginning at different transcriptional states. Changes can also be made to the cell’s internal GRNs so that not all cells have identical GRN parameters or perhaps completely different GRNs. Another important possibility that could be incorporated to the framework is allowing for cells to move. In the model, this would be represented by changing signaling topologies during the simulation. These next steps would bring us closer to the true biological picture of heterogeneous (both in transcription and regulation), mobile cells. In Chapter 3, bulk RNA-seq samples fromIgf2bp2 knockout and wild type HSCs from young and old mice were analyzed in conjunction with a scRNA-seq sample of young wild type HSCs. It is discovered that Igf2bp2 regulates genes related to mitochondrial metabolism and protein synthesis, resulting in young Igf2bp2 − /− HSCs displaying expression patterns of these genes more similar to old wild type samples than the young wild type. A sub-cluster of HSCs in the young wild type is also identified as being enriched for Igf2bp2 expression, with differentially expressed genes enriched for IGF/PI3K signaling, a pathway which regulated cellular metabolism. This work implies that the loss of Igf2bp2 function in aging may be responsible for some of the impairments of HSC function in aging, including those identified in the bulk sequencing (mitochondrial metabolism and protein synthesis). In future work, it would be interesting to understand how the relationship between metabolism and aging impact HSC function. For example, transplantation experiments of old and youngIgf2bp2 − /− HSCs into irradiated young mice would tell us about changes in the repopulation capacities as a result of metabolic impairments. Such transplantation experiments would also be helpful in identifying how metabolic path- ways are involved with different branches of HSC differentiation. The relationship between metabolism 72 and HSC differentiation is a non-trivial one. For example, it is known that myeloid lineage skewing occurs with aging, but is ameliorated inIgf2bp2 − /− HSCs. Further, it is known that both aging and diet restriction impair lymphopoiesis, so perhaps healthy metabolism is necessary for HSCs to select transition into the lymphoid fate. Given these somewhat counter-intuitive pieces of evidence, there is certainly a need to further work to flesh out the interactions between aging, metabolism, and hematopoiesis. Chapter 4 presented popInfer, a for gene regulatory network inference from single-cell multiome data and demonstrated that it outperforms other methods that use scRNA-seq alone. popInfer was applied to HSC data from young and old mice with different dietary treatments and was able to uncover key regulators of changes in the HSC to multipotent progenitor transition and how they are perturbed depending on the state of the mouse. popInfer’s performance demonstrated how integrating multiple measurements carefully into a model can greatly improve our inference capabilities. The performance of popInfer by integrating RNA and ATAC information serves as an important reminder that differentiation is a complex process, coordinated between many layers of regulation. There are many other pieces of information about regulation that can be incorporated into popInfer in the future like en- hancer information or transcription factor motif enrichment information. Taking this a step further, hav- ing DNA methylation information, giving us another layer of regulatory information, would be extremely useful in improving our GRN inference capabilities. For example, perhaps genes which have a decrease in expression over pseudotime, but an increase in methylation of the gene body, should be excluded from inference since they are being regulated by methylation as opposed to TF binding. Towards gaining a more complete picture of hematopoiesis, the proposed ideas from each of these chapters can also be unified. Advances in spatial transcriptomics offers the opportunity to finely tune the cell-cell communication model to real bone marrow data. From here, we could examine how perturbations, like Igf2bp2 knockout, aging, or dietary interventions, alter cell-cell communication. popInfer can then be applied to the cells globally, or spatially proximal cells. One fascinating problem to work on would be 73 how to use the cell-cell communication model parameters to inform popInfer. Cell-cell communication introduces dependence between cells, so perhaps proximal cells could be used to make pseudocells as input to popInfer. In all, the layered levels of dynamics of hematopoiesis can better be addressed by combining approaches in the future. 74 Bibliography [1] Anne Wilson and Andreas Trumpp. “Bone-marrow haematopoietic-stem-cell niches”. In: Nature Reviews Immunology 6.2 (2006), pp. 93–106. [2] Stuart H Orkin and Leonard I Zon. “Hematopoiesis: an evolving paradigm for stem cell biology”. In: Cell 132.4 (2008), pp. 631–644. [3] Aleksandra A Kolodziejczyk, Jong Kyoung Kim, Valentine Svensson, John C Marioni, and Sarah A Teichmann. “The technology and biology of single-cell RNA sequencing”. In: Molecular cell 58.4 (2015), pp. 610–620. [4] Sebastian Pott and Jason D Lieb. “Single-cell ATAC-seq: strength in numbers”. In: Genome Biology 16 (2015), pp. 1–4. [5] Megan K Rommelfanger and Adam L MacLean. “A single-cell resolved cell-cell communication model explains lineage commitment in hematopoiesis”. In: Development 148.24 (2021), dev199779. [6] Ingo Roeder. “Quantitative stem cell biology: computational studies in the hematopoietic system”. In: Current opinion in hematology 13.4 (2006), pp. 222–228. [7] Elisa Laurenti and Berthold Göttgens. “From haematopoietic stem cells to complex differentiation landscapes”. In: Nature 553.7689 (2018), pp. 418–426. [8] Thomas Graf and Tariq Enver. “Forcing cells to change lineages”. In: Nature 462.7273 (2009), pp. 587–594. [9] John J. Tyson, Katherine C. Chen, and Bela Novak.Sniffers,buzzers,togglesandblinkers:Dynamics of regulatory and signaling pathways in the cell. 2003.doi: 10.1016/S0955-0674(03)00017-6. [10] Gheorghe Craciun, Yangzhong Tang, and Martin Feinberg. “Understanding bistability in complex enzyme-driven reaction networks”. In: Proceedings of the National Academy of Sciences 103.23 (2006), pp. 8697–8702.issn: 0027-8424. [11] Uri Alon. “Network motifs: theory and experimental approaches”. In: Nature Reviews Genetics 8.6 (2007), pp. 450–461. 75 [12] Julia Kamenz, Lendert Gelens, and James E. Ferrell. “Bistable, Biphasic Regulation of PP2A-B55 Accounts for the Dynamics of Mitotic Substrate Phosphorylation”. In: Current Biology 31.4 (2021), pp. 794–808.issn: 18790445.doi: 10.1016/j.cub.2020.11.058. [13] Mainak Pal, Sayantari Ghosh, and Indrani Bose. “Non-genetic heterogeneity, criticality and cell differentiation”. In: Physical biology 12.1 (2014), p. 016001. [14] James E. Ferrell. “Self-perpetuating states in signal transduction: Positive feedback, double-negative feedback and bistability”. In: Current Opinion in Cell Biology 14.2 (2002), pp. 140–148.issn: 09550674.doi: 10.1016/S0955-0674(02)00314-9. [15] Vijay Chickarmane, Tariq Enver, and Carsten Peterson. “Computational modeling of the hematopoietic erythroid-myeloid switch reveals insights into cooperativity, priming, and irreversibility”. In: PLoS Comput Biol 5.1 (2009), e1000268. [16] Campbell Duff, Kate Smith-Miles, Leo Lopes, and Tianhai Tian. “Mathematical modelling of stem cell differentiation: the PU. 1–GATA-1 interaction”. In: Journal of mathematical biology 64.3 (2012), pp. 449–468. [17] Sui Huang, Yan Ping Guo, Gillian May, and Tariq Enver. “Bifurcation dynamics in lineage-commitment in bipotent progenitor cells”. In: Developmental Biology 305.2 (2007), pp. 695–713.issn: 00121606. [18] Hannah H. Chang, Martin Hemberg, Mauricio Barahona, Donald E. Ingber, and Sui Huang. “Transcriptome-wide noise controls lineage choice in mammalian progenitor cells”. In: Nature 453.7194 (May 2008), pp. 544–547.issn: 14764687. [19] Ingo Roeder and Ingmar Glauche. “Towards an understanding of lineage specification in hematopoietic stem cells: A mathematical model for the interaction of transcription factors GATA-1 and PU.1”. In: Journal of Theoretical Biology 241.4 (2006), pp. 852–865. [20] Philipp S Hoppe, Michael Schwarzfischer, Dirk Loeffler, Konstantinos D Kokkaliaris, Oliver Hilsenbeck, Nadine Moritz, Max Endele, Adam Filipczyk, Adriana Gambardella, Nouraiz Ahmed, et al. “Early myeloid lineage choice is not initiated by random PU. 1 to GATA1 protein ratios”. In: Nature 535.7611 (2016), pp. 299–302. [21] Jinheng Wang, Chenggong Tu, Hui Zhang, Yongliang Huo, Eline Menu, and Jinbao Liu. “Single-cell analysis at the protein level delineates intracellular signaling dynamic during hematopoiesis”. In: BMC Biology 19.1 (2021), p. 201.doi: 10.1186/s12915-021-01138-6. [22] Casey Brewer, Elizabeth Chu, Mike Chin, and Rong Lu. “Transplantation Dose Alters the Differentiation Program of Hematopoietic Stem Cells”. In: Cell Reports 15.8 (2016), pp. 1848–1857. doi: https://doi.org/10.1016/j.celrep.2016.04.061. [23] Eike Müller, Tatyana Grinenko, Tilo Pompe, Claudia Waskow, and Carsten Werner. “Space constraints govern fate of hematopoietic stem and progenitor cells in vitro”. In: Biomaterials 53 (2015), pp. 709–715.doi: https://doi.org/10.1016/j.biomaterials.2015.02.095. 76 [24] Ben Lambert, Adam L. MacLean, Alexander G. Fletcher, Alexander N. Combes, Melissa H. Little, and Helen M. Byrne. “Bayesian inference of agent-based models: a tool for studying kidney branching morphogenesis”. In: Journal of Mathematical Biology 76.7 (2018), pp. 1673–1697.issn: 14321416. [25] Dagmar Iber and Denis Menshykau. “The control of branching morphogenesis”. In: Open biology 3.9 (2013), p. 130088. [26] Natalie S Scholes, David Schnoerr, Mark Isalan, and Michael PH Stumpf. “A comprehensive network atlas reveals that Turing patterns are common but not robust”. In:Cellsystems 9.3 (2019), pp. 243–257. [27] Rebecca McLennan, Linus J. Schumacher, Jason A. Morrison, Jessica M. Teddy, Dennis A. Ridenour, Andrew C. Box, Craig L. Semerad, Hua Li, William McDowell, David Kay, Philip K. Maini, Ruth E. Baker, and Paul M. Kulesa. “VEGF signals induce trailblazer cell identity that drives neural crest migration”. In: Developmental Biology 407.1 (2015), pp. 12–25.issn: 0012-1606. [28] David Ellison, Andrew Mugler, Matthew D Brennan, Sung Hoon Lee, Robert J Huebner, Eliah R Shamir, Laura A Woo, Joseph Kim, Patrick Amar, Ilya Nemenman, et al. “Cell–cell communication enhances the capacity of cell ensembles to sense shallow gradients during morphogenesis”. In: Proceedings of the National Academy of Sciences 113.6 (2016), E679–E688. [29] Andrew Mugler, Andre Levchenko, and Ilya Nemenman. “Limits to the precision of gradient sensing with spatial communication and temporal integration”. In: Proceedings of the National Academy of Sciences 113.6 (2016), E689–E695.issn: 10916490. [30] Stephen Smith and Ramon Grima. “Single-cell variability in multicellular organisms”. In: Nature Communications 9.1 (2018), p. 345. [31] Kevin Thurley, Lani F Wu, and Steven J Altschuler. “Modeling cell-to-cell communication networks using response-time distributions”. In: Cell systems 6.3 (2018), pp. 355–367. [32] Peijia Yu, Qing Nie, Chao Tang, and Lei Zhang. “Nanog induced intermediate state in regulating stem cell differentiation and reprogramming”. In: BMC Systems Biology 12.1 (2018).issn: 17520509.doi: 10.1186/s12918-018-0552-3. [33] Tian Hong, Kazuhide Watanabe, Catherine Ha Ta, Alvaro Villarreal-Ponce, Qing Nie, and Xing Dai. “An Ovol2-Zeb1 Mutual Inhibitory Circuit Governs Bidirectional and Multi-step Transition between Epithelial and Mesenchymal States”. In: PLoS Computational Biology 11.11 (2015), e1004569.issn: 15537358. [34] Michael Strasser, Fabian J. Theis, and Carsten Marr. “Stability and multiattractor dynamics of a toggle switch based on a two-stage model of stochastic gene expression”. In: Biophysical Journal 102.1 (2012), pp. 19–29.issn: 00063495. [35] M.A. Shea and G.K. Ackers. “The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation.” In: Journal of molecular biology 181.2 (1985), pp. 211–230.doi: 10.1016/0022-2836(85)90086-5. 77 [36] B. Nadler, T. Naeh, and Z. Schuss. “The stationary arrival process of independent diffusers from a continuum to an absorbing boundary is poissonian”. In: SIAM Journal on Applied Mathematics 62.2 (2001), pp. 433–447.doi: 10.1137/S0036139900372363. [37] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah. “Julia: A Fresh Approach to Numerical Computing”. In: SIAM Review 59.1 (2017), pp. 65–98. [38] Christopher Rackauckas and Qing Nie. “Differentialequations.jl–a performant and feature-rich ecosystem for solving differential equations in julia”. In: Journal of Open Research Software 5.1 (2017). [39] Mathieu Besançon, David Anthoff, Alex Arslan, Simon Byrne, Dahua Lin, Theodore Papamarkou, and John Pearson. Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem. 2021. arXiv: 1907.08611. [40] Mark HA Davis. “Piecewise-deterministic Markov processes: a general class of non-diffusion stochastic models”. In: Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984), pp. 353–388. [41] Ryszard Rudnicki and Marta Tyran-Kamińska. Piecewise deterministic processes in biological models. Vol. 1. Springer, 2017. [42] Stefan Zeiser, Uwe Franz, Olaf Wittich, and Volkmar Liebscher. “Simulation of genetic networks modelled by piecewise deterministic Markov processes”. In: IET systems biology 2.3 (2008), pp. 113–135. [43] LN Handly, A Pilko, and R Wollman. “Paracrine communication maximizes cellular response fidelity in wound signaling”. In: eLife 4 (2015), e09652.doi: 10.7554/eLife.09652. [44] Tyler Zarubin and Jiahuai Han. “Activation and signaling of the p38 MAP kinase pathway”. In: Cell Research 15.1 (2005), pp. 11–18.doi: 10.1038/sj.cr.7290257. [45] K. Ohishi, N. Katayama, H. Shiku, B. Varnum-Finney, and I.D. Bernstein. “Notch signalling in hematopoiesis”. In: Seminars in Cell and Developmental Biology 14.2 (2003), pp. 143–150.doi: https://doi.org/10.1016/S1084-9521(02)00183-0. [46] Mitra Mojtahedi, Alexander Skupin, Joseph Zhou, Ivan G Castaño, Rebecca YY Leong-Quong, Hannah Chang, Kalliopi Trachana, Alessandro Giuliani, and Sui Huang. “Cell fate decision as high-dimensional critical state transition”. In: PLoS biology 14.12 (2016), e2000640. [47] Andreas Hilfinger and Johan Paulsson. “Separating intrinsic from extrinsic fluctuations in dynamic biological systems”. In: Proceedings of the National Academy of Sciences of the United States of America 108.29 (2011), pp. 12167–12172.issn: 00278424. [48] Peter S. Swain, Michael B. Elowitz, and Eric D. Siggia. “Intrinsic and extrinsic contributions to stochasticity in gene expression”. In: Proceedings of the National Academy of Sciences of the United States of America 99.20 (2002), pp. 12795–12800.issn: 00278424. 78 [49] Sarah Filippi, Chris P. Barnes, Paul D.W. Kirk, Takamasa Kudo, Katsuyuki Kunida, Siobhan S. McMahon, Takaho Tsuchiya, Takumi Wada, Shinya Kuroda, and Michael P.H. Stumpf. “Robustness of MEK-ERK Dynamics and Origins of Cell-to-Cell Variability in MAPK Signaling”. In: Cell Reports 15.11 (2016), pp. 2524–2535.issn: 22111247. [50] Lucy Ham, Rowan D. Brackston, and Michael P.H. Stumpf. “Extrinsic Noise and Heavy-Tailed Laws in Gene Expression”. In: Physical Review Letters 124.10 (2020), p. 108101.issn: 10797114. doi: 10.1103/PhysRevLett.124.108101. [51] Hisham Mohammed, Irene Hernando-Herraez, Aurora Savino, Antonio Scialdone, Iain Macaulay, Carla Mulas, Tamir Chandra, Thierry Voet, Wendy Dean, Jennifer Nichols, John C. Marioni, and Wolf Reik. “Single-Cell Landscape of Transcriptional Heterogeneity and Cell Fate Decisions during Mouse Early Gastrulation”. In:CellReports 20.5 (Aug. 2017), pp. 1215–1228.issn: 22111247. [52] Cristina Lo Celso and David T. Scadden. “The haematopoietic stem cell niche at a glance”. In: Journal of Cell Science 124.21 (Nov. 2011), pp. 3529–3535.doi: 10.1242/jcs.074112. eprint: https://journals.biologists.com/jcs/article-pdf/124/21/3529/1441798/3529.pdf. [53] Sean J. Morrison and David T. Scadden. “The bone marrow niche for haematopoietic stem cells”. In: Nature 505.7483 (2014), pp. 327–334. [54] Ninib Baryawno, Dariusz Przybylski, Monika S. Kowalczyk, Youmna Kfoury, Nicolas Severe, Karin Gustafsson, Konstantinos D. Kokkaliaris, Francois Mercier, Marcin Tabaka, Matan Hofree, Danielle Dionne, Ani Papazian, Dongjun Lee, Orr Ashenberg, Ayshwarya Subramanian, Eeshit Dhaval Vaishnav, Orit Rozenblatt-Rosen, Aviv Regev, and David T. Scadden. “A Cellular Taxonomy of the Bone Marrow Stroma in Homeostasis and Leukemia”. In: Cell 177.7 (2019), 1915–1932.E16. [55] Samik Upadhaya, Oleg Krichevsky, Ilseyar Akhmetzyanova, Catherine M. Sawai, David R. Fooksman, and Boris Reizis. “Intravital Imaging Reveals Motility of Adult Hematopoietic Stem Cells in the Bone Marrow Niche”. In: Cell Stem Cell 27.2 (2020), 336–345.e4.doi: https://doi.org/10.1016/j.stem.2020.06.003. [56] Constantina Christodoulou, Joel A. Spencer, Shu-Chi A. Yeh, Raphaël Turcotte, Konstantinos D. Kokkaliaris, Riccardo Panero, Azucena Ramos, Guoji Guo, Negar Seyedhassantehrani, Tatiana V. Esipova, Sergei A. Vinogradov, Sarah Rudzinskas, Yi Zhang, Archibald S. Perkins, Stuart H. Orkin, Raffaele A. Calogero, Timm Schroeder, Charles P. Lin, and Fernando D. Camargo. “Live-animal imaging of native haematopoietic stem and progenitor cells”. In: Nature 578.7794 (2020), pp. 278–283.doi: https://doi.org/10.1038/s41586-020-1971-z. [57] Aurélie Hérault, Mikhail Binnewies, Stephanie Leong, Fernando J. Calero-Nieto, Si Yi Zhang, Yoon-A Kang, Xiaonan Wang, Eric M. Pietras, S. Haihua Chu, Keegan Barry-Holson, Scott Armstrong, Berthold Göttgens, and Emmanuelle Passegué. “Myeloid progenitor cluster formation drives emergency and leukaemic myelopoiesis”. In: Nature 544.7648 (2017), pp. 53–58. 79 [58] Amir Giladi, Merav Cohen, Chiara Medaglia, Yael Baran, Baoguo Li, Mor Zada, Pierre Bost, Ronnie Blecher-Gonen, Tomer-Meir Salame, Johannes U. Mayer, Eyal David, Franca Ronchese, Amos Tanay, and Ido Amit. “Dissecting cellular crosstalk by sequencing physically interacting cells”. In: Nature Biotechnology 38.5 (2020), pp. 629–637. [59] Naomi Moris, Cristina Pina, and Alfonso Martinez Arias. Transition states and cell fate decisions in epigenetic landscapes. Oct. 2016.doi: 10.1038/nrg.2016.98. [60] Andreas E. Moor and Shalev Itzkovitz. Spatial transcriptomics: paving the way for tissue-level systems biology. 2017.doi: 10.1016/j.copbio.2017.02.004. [61] Robert Foreman and Roy Wollman. “Mammalian gene expression variability is explained by underlying cell state”. In: Molecular Systems Biology 16.2 (2020), e9146.issn: 1744-4292.doi: 10.15252/msb.20199146. [62] Megan A. Coomer, Lucy Ham, and Michael P. H. Stumpf. “Shaping the Epigenetic Landscape: Complexities and Consequences”. In: bioRxiv (2020).doi: 10.1101/2020.12.21.423724. [63] Miaomiao Suo, Megan K Rommelfanger, Yulin Chen, Elias Moris Amro, Bing Han, Zhiyang Chen, Karol Szafranski, Sundaram Reddy Chakkarappan, Bernhard O Boehm, Adam L MacLean, et al. “Age-dependent effects of Igf2bp2 on gene regulation, function, and aging of hematopoietic stem cells in mice”. In: Blood, The Journal of the American Society of Hematology 139.17 (2022), pp. 2653–2665. [64] Gerald de Haan and Gary Van Zant. “Dynamic changes in mouse hematopoietic stem cell numbers during aging”. In: Blood, The Journal of the American Society of Hematology 93.10 (1999), pp. 3294–3301. [65] Wendy W. Pang, Elizabeth A. Price, Debashis Sahoo, Isabel Beerman, William J. Maloney, Derrick J. Rossi, Stanley L. Schrier, and Irving L. Weissman. “Human bone marrow hematopoietic stem cells are increased in frequency and myeloid-biased with age”. In:ProceedingsoftheNational AcademyofSciencesoftheUnitedStatesofAmerica 108.50 (2011), pp. 20012–20017.issn: 00278424. doi: 10.1073/pnas.1116110108. [66] K Sudo, H Ema, Y Morita, and H Nakauchi. “Age-associated characteristics of murine hematopoietic stem cells [In Process Citation]”. In: J Exp Med 192.9 (2000), pp. 1273–1280. [67] Jianwei Wang, Yohei Morita, Bing Han, Silke Niemann, Bettina Löffler, and K. Lenhard Rudolph. “Per2 induction limits lymphoid-biased haematopoietic stem cells and lymphopoiesis in the context of DNA damage and ageing”. In: Nature Cell Biology 18.5 (2016), pp. 480–490.issn: 14764679.doi: 10.1038/ncb3342. [68] Encarnacion Montecino-Rodriguez, Ying Kong, David Casero, Adrien Rouault, Kenneth Dorshkind, and Peter D. Pioli. “Lymphoid-Biased Hematopoietic Stem Cells Are Maintained with Age and Efficiently Generate Lymphoid Progeny”. In: Stem Cell Reports 12.3 (2019), pp. 584–596.issn: 22136711.doi: 10.1016/j.stemcr.2019.01.016. 80 [69] Aysegul V. Ergen and Margaret A. Goodell. “Mechanisms of hematopoietic stem cell aging”. In: Experimental Gerontology 45.4 (2010), pp. 286–290.issn: 05315565.doi: 10.1016/j.exger.2009.12.010. [70] Isabel Beerman, Deepta Bhattacharya, Sasan Zandi, Mikael Sigvardsson, Irving L. Weissman, David Brydere, and Derrick J. Rossia. “Functionally distinct hematopoietic stem cells modulate hematopoietic lineage potential during aging by a mechanism of clonal expansion”. In: Proceedings of the National Academy of Sciences of the United States of America 107.12 (2010), pp. 5465–5470.issn: 00278424.doi: 10.1073/pnas.1000834107. [71] Ng Shyh-Chang and George Q. Daley. Lin28: Primal regulator of growth and metabolism in stem cells. 2013.doi: 10.1016/j.stem.2013.03.005. [72] Jin Zhang, Sutheera Ratanasirintrawoot, Sriram Chandrasekaran, Zhaoting Wu, Scott B. Ficarro, Chunxiao Yu, Christian A. Ross, Davide Cacchiarelli, Qing Xia, Marc Seligson, Gen Shinoda, Wen Xie, Patrick Cahan, Longfei Wang, Shyh Chang Ng, Supisara Tintara, Cole Trapnell, Tamer Onder, Yuin Han Loh, Tarjei Mikkelsen, Piotr Sliz, Michael A. Teitell, John M. Asara, Jarrod A. Marto, Hu Li, James J. Collins, and George Q. Daley. “LIN28 Regulates Stem Cell Metabolism and Conversion to Primed Pluripotency”. In: Cell Stem Cell 19.1 (2016), pp. 66–80. issn: 18759777.doi: 10.1016/j.stem.2016.05.009. [73] Ng Shyh-Chang, George Q. Daley, and Lewis C. Cantley. Stem cell metabolism in tissue development and aging. 2013.doi: 10.1242/dev.091777. [74] Ruotong Ren, Alejandro Ocampo, Guang Hui Liu, and Juan Carlos Izpisua Belmonte. Regulation of Stem Cell Aging by Metabolism and Epigenetics. 2017.doi: 10.1016/j.cmet.2017.07.019. [75] Anne Brunet and Thomas A. Rando. Interaction between epigenetic and metabolism in aging stem cells. 2017.doi: 10.1016/j.ceb.2016.12.009. [76] Zhizhong Li, Jason A. Gilbert, Yunyu Zhang, Minsi Zhang, Qiong Qiu, Krishnan Ramanujan, Tea Shavlakadze, John K. Eash, Annarita Scaramozza, Matthew M. Goddeeris, David G. Kirsch, Kevin P. Campbell, Andrew S. Brack, and David J. Glass. “An HMGA2-IGF2BP2 Axis Regulates Myoblast Proliferation and Myogenesis”. In: Developmental Cell 23.6 (2012), pp. 1176–1188.issn: 15345807.doi: 10.1016/j.devcel.2012.10.019. [77] Valerie Schaeffer, Kim M. Hansen, David R. Morris, Renée C. LeBoeuf, and Christine K. Abrass. “RNA-binding protein IGF2BP2/IMP2 is required for laminin-β 2 mRNA translation and is modulated by glucose concentration”. In: American Journal of Physiology - Renal Physiology 303.1 (2012), pp. 75–82.issn: 03636127.doi: 10.1152/ajprenal.00185.2012. [78] Ning Dai, Liping Zhao, Diedra Wrighting, Dana Krämer, Amit Majithia, Yanqun Wang, Valentin Cracan, Diego Borges-Rivera, Vamsi K. Mootha, Matthias Nahrendorf, David R. Thorburn, Liliana Minichiello, David Altshuler, and Joseph Avruch. “IGF2BP2/IMP2-deficient mice resist obesity through enhanced translation of Ucp1 mRNA and other mRNAs encoding mitochondrial proteins”. In: Cell Metabolism 21.4 (2015), pp. 609–621. issn: 19327420.doi: 10.1016/j.cmet.2015.03.006. 81 [79] Marco Mineo, Franz Ricklefs, Arun K. Rooj, Shawn M. Lyons, Pavel Ivanov, Khairul I. Ansari, Ichiro Nakano, E. Antonio Chiocca, Jakub Godlewski, and Agnieszka Bronisz. “The Long Non-coding RNA HIF1A-AS2 Facilitates the Maintenance of Mesenchymal Glioblastoma Stem-like Cells in Hypoxic Niches”. In: Cell Reports 15.11 (2016), pp. 2500–2509.issn: 22111247. doi: 10.1016/j.celrep.2016.05.018. [80] Chenguang Gong, Zhizhong Li, Krishnan Ramanujan, Ieuan Clay, Yunyu Zhang, Sophie Lemire-Brachat, and David J. Glass. “A Long Non-coding RNA, LncMyoD, Regulates Skeletal Muscle Differentiation by Blocking IMP2-Mediated mRNA Translation”. In: Developmental Cell 34.2 (2015).issn: 18781551.doi: 10.1016/j.devcel.2015.05.009. [81] Jason D. Buenrostro, M. Ryan Corces, Caleb A. Lareau, Beijing Wu, Alicia N. Schep, Martin J. Aryee, Ravindra Majeti, Howard Y. Chang, and William J. Greenleaf. “Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation”. In: Cell 173.6 (2018), pp. 1535–1548.issn: 10974172.doi: 10.1016/j.cell.2018.03.074. [82] Rong Lu, Agnieszka Czechowicz, Jun Seita, Du Jiang, and Irving L. Weissman. “Clonal-level lineage commitment pathways of hematopoietic stem cells in vivo”. In:ProceedingsoftheNational Academy of Sciences 116.4 (2019), pp. 1447–1456.issn: 0027-8424.doi: 10.1073/pnas.1801480116. [83] Simon Haas, Jenny Hansson, Daniel Klimmeck, Dirk Loeffler, Lars Velten, Hannah Uckelmann, Stephan Wurzer, Áine M. Prendergast, Alexandra Schnell, Klaus Hexel, Rachel Santarella-Mellwig, Sandra Blaszkiewicz, Andrea Kuck, Hartmut Geiger, Michael D. Milsom, Lars M. Steinmetz, Timm Schroeder, Andreas Trumpp, Jeroen Krijgsveld, and Marieke A.G. Essers. “Inflammation-Induced Emergency Megakaryopoiesis Driven by Hematopoietic Stem Cell-like Megakaryocyte Progenitors”. In: Cell Stem Cell 17.4 (2015), pp. 422–434.issn: 18759777.doi: 10.1016/j.stem.2015.07.007. [84] Jason S.L. Yu and Wei Cui. Proliferation, survival and metabolism: The role of PI3K/AKT/ mTOR signalling in pluripotency and cell fate determination. 2016.doi: 10.1242/dev.137075. [85] Joydeep Ghosh and Reuben Kapur. Regulation of Hematopoietic Stem Cell Self-Renewal and Leukemia Maintenance by the PI3K-mTORC1 Pathway. 2016.doi: 10.1007/s40778-016-0067-z. [86] Qingchun Mu, Lijun Wang, Fengbo Yu, Haijun Gao, Ting Lei, Peiwen Li, Pengfei Liu, Xu Zheng, Xitong Hu, Yong Chen, Zhenfeng Jiang, Arash J. Sayari, Jia Shen, and Haiyan Huang. “Imp2 regulates GBM progression by activating IGF2/PI3K/Akt pathway”. In: Cancer Biology and Therapy 16.4 (2015), pp. 623–633.issn: 15558576.doi: 10.1080/15384047.2015.1019185. [87] Junguo Cao, Qingchun Mu, and Haiyan Huang. “The roles of insulin-like growth factor 2 mRNA-binding protein 2 in cancer and cancer stem cells”. In: Stem Cells International 2018 (2018). issn: 16879678.doi: 10.1155/2018/4217259. 82 [88] Aparna Venkatraman, Xi C. He, Joanne L. Thorvaldsen, Ryohichi Sugimura, John M. Perry, Fang Tao, Meng Zhao, Matthew K. Christenson, Rebeca Sanchez, Jaclyn Y. Yu, Lai Peng, Jeffrey S. Haug, Ariel Paulson, Hua Li, Xiao Bo Zhong, Thomas L. Clemens, Marisa S. Bartolomei, and Linheng Li. “Maternal imprinting at the H19-Igf2 locus maintains adult haematopoietic stem cell quiescence”. In:Nature 500.7462 (2013), pp. 345–349.issn: 00280836.doi: 10.1038/nature12303. [89] Avital Mendelson and Paul S Frenette. “Hematopoietic stem cell niche maintenance during homeostasis and regeneration”. In: Nature medicine 20.8 (2014), pp. 833–846. [90] Andre Olsson, Meenakshi Venkatasubramanian, Viren K Chaudhri, Bruce J Aronow, Nathan Salomonis, Harinder Singh, and H Leighton Grimes. “Single-cell analysis of mixed-lineage states leading to a binary cell fate choice”. In: Nature 537.7622 (2016), pp. 698–702. [91] Scott W Boyer, Aaron V Schroeder, Stephanie Smith-Berdan, and E Camilla Forsberg. “All hematopoietic cells develop from hematopoietic stem cells through Flk2/Flt3-positive progenitor cells”. In: Cell stem cell 9.1 (2011), pp. 64–73. [92] Elisa Laurenti and Berthold Göttgens. “From haematopoietic stem cells to complex differentiation landscapes”. In: Nature 553.7689 (2018), pp. 418–426. [93] Carola Ingrid Weidner, Thomas Walenda, Qiong Lin, Monika Martina Wölfler, Bernd Denecke, Ivan Gesteira Costa, Martin Zenke, and Wolfgang Wagner. “Hematopoietic stem and progenitor cells acquire distinct DNA-hypermethylation during in vitro culture”. In: Scientific reports 3.1 (2013), pp. 1–8. [94] Gerald de Haan and Seka Simone Lazare. “Aging of hematopoietic stem cells”. In: Blood, The Journal of the American Society of Hematology 131.5 (2018), pp. 479–487. [95] Brad Dykstra and Gerald de Haan. “Hematopoietic stem cell aging and self-renewal”. In: Cell and tissue research 331.1 (2008), pp. 91–101. [96] Emily Mitchell, Michael Spencer Chapman, Nicholas Williams, Kevin J Dawson, Nicole Mende, Emily F Calderbank, Hyunchul Jung, Thomas Mitchell, Tim HH Coorens, David H Spencer, et al. “Clonal dynamics of haematopoiesis across the human lifespan”. In: Nature 606.7913 (2022), pp. 343–350. [97] Eva Mejia-Ramirez and Maria Carolina Florian. “Understanding intrinsic hematopoietic stem cell aging”. In: haematologica 105.1 (2020), p. 22. [98] William R Swindell. “Dietary restriction in rats and mice: a meta-analysis and review of the evidence for genotype-dependent effects on lifespan”. In: Ageing research reviews 11.2 (2012), pp. 254–270. [99] Shuai Ma, Shuhui Sun, Lingling Geng, Moshi Song, Wei Wang, Yanxia Ye, Qianzhao Ji, Zhiran Zou, SI Wang, Xiaojuan He, et al. “Caloric restriction reprograms the single-cell transcriptional landscape of rattus norvegicus aging”. In: Cell 180.5 (2020), pp. 984–1001. 83 [100] Duozhuang Tang, Si Tao, Zhiyang Chen, Ievgen Oleksandrovich Koliesnik, Philip Gerald Calmes, Verena Hoerr, Bing Han, Nadja Gebert, Martin Zörnig, Bettina Löffler, et al. “Dietary restriction improves repopulation but impairs lymphoid differentiation capacity of hematopoietic stem cells in early aging”. In: Journal of Experimental Medicine 213.4 (2016), pp. 535–553. [101] Vân Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. “Inferring regulatory networks from expression data using tree-based methods”. In: PloS one 5.9 (2010), e12776. [102] Caleb C Reagor, Nicolas Velez-Angel, and AJ Hudspeth. “Depicting pseudotime-lagged causality across single-cell trajectories for accurate gene-regulatory inference”. In: bioRxiv (2022), pp. 2022–04. [103] Hirotaka Matsumoto, Hisanori Kiryu, Chikara Furusawa, Minoru S.H. Ko, Shigeru B.H. Ko, Norio Gouda, Tetsutaro Hayashi, and Itoshi Nikaido. “SCODE: An efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation”. In: Bioinformatics 33.15 (2017).issn: 14602059.doi: 10.1093/bioinformatics/btx194. [104] Sara Aibar, Carmen Bravo González-Blas, Thomas Moerman, Vân Anh Huynh-Thu, Hana Imrichova, Gert Hulselmans, Florian Rambow, Jean Christophe Marine, Pierre Geurts, Jan Aerts, Joost Van Den Oord, Zeynep Kalender Atak, Jasper Wouters, and Stein Aerts. “SCENIC: Single-cell regulatory network inference and clustering”. In: Nature Methods 14.11 (2017).issn: 15487105.doi: 10.1038/nmeth.4463. [105] Junil Kim, Simon T. Jakobsen, Kedar N. Natarajan, and Kyoung Jae Won. “TENET: Gene network reconstruction using transfer entropy reveals key regulatory factors from single cell transcriptomic data”. In: Nucleic Acids Research 49.1 (2021).issn: 13624962.doi: 10.1093/nar/gkaa1014. [106] Thalia E Chan, Michael PH Stumpf, and Ann C Babtie. “Gene regulatory network inference from single-cell data using multivariate information measures”. In: Cell systems 5.3 (2017), pp. 251–267. [107] Atul Deshpande, Li-Fang Chu, Ron Stewart, and Anthony Gitter. “Network inference with Granger causality ensembles on single-cell transcriptomics”. In: Cell reports 38.6 (2022).doi: 10.1016/j.celrep.2022.110333. [108] Cole Trapnell. “Defining cell types and states with single-cell genomics”. In: Genome research 25.10 (2015), pp. 1491–1498. [109] Kyle Akers and TM Murali. “Gene regulatory network inference in single-cell biology”. In: Current Opinion in Systems Biology 26 (2021), pp. 87–97. [110] Rong Lu, Norma F Neff, Stephen R Quake, and Irving L Weissman. “Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding”. In: Nature biotechnology 29.10 (2011), pp. 928–933. [111] Alejo E Rodriguez-Fraticelli, Samuel L Wolock, Caleb S Weinreb, Riccardo Panero, Sachin H Patel, Maja Jankovic, Jianlong Sun, Raffaele A Calogero, Allon M Klein, and Fernando D Camargo. “Clonal analysis of lineage fate in native haematopoiesis”. In: Nature 553.7687 (2018), pp. 212–216. 84 [112] Suoqin Jin, Christian F Guerrero-Juarez, Lihua Zhang, Ivan Chang, Raul Ramos, Chen-Hsiang Kuan, Peggy Myung, Maksim V Plikus, and Qing Nie. “Inference and analysis of cell-cell communication using CellChat”. In: Nature communications 12.1 (2021), p. 1088. [113] Shuxiong Wang, Matthew Karikomi, Adam L MacLean, and Qing Nie. “Cell lineage and communication network inference via optimization for single-cell transcriptomics”. In: Nucleic acids research 47.11 (2019), e66–e66. [114] Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen, William M Mauck, Shiwei Zheng, Andrew Butler, Maddie J Lee, Aaron J Wilk, Charlotte Darby, Michael Zager, et al. “Integrated analysis of multimodal single-cell data”. In: Cell 184.13 (2021), pp. 3573–3587. [115] Junyue Cao, Darren A Cusanovich, Vijay Ramani, Delasa Aghamirzaie, Hannah A Pliner, Andrew J Hill, Riza M Daza, Jose L McFaline-Figueroa, Jonathan S Packer, Lena Christiansen, et al. “Joint profiling of chromatin accessibility and gene expression in thousands of single cells”. In: Science 361.6409 (2018), pp. 1380–1385. [116] Aleah L Smith, Felicia M Ellison, J Philip McCoy Jr, and Jichun Chen. “c-Kit expression and stem cell factor-induced hematopoietic cell proliferation are up-regulated in aged B6D2F1 mice”. In: The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 60.4 (2005), pp. 448–456. [117] Eui-Young So, Euy-Myoung Jeong, Keith Q Wu, Patrycja M Dubielecka, Anthony M Reginato, Peter J Quesenberry, and Olin D Liang. “Sexual dimorphism in aging hematopoiesis: an earlier decline of hematopoietic stem and progenitor cells in male than female mice”. In: Aging (Albany NY) 12.24 (2020), p. 25939. [118] Matthew D Young and Sam Behjati. “SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data”. In: Gigascience 9.12 (2020), giaa151. [119] Christopher S McGinnis, Lyndsay M Murrow, and Zev J Gartner. “DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors”. In: Cell systems 8.4 (2019), pp. 329–337. [120] Laleh Haghverdi, Maren Büttner, F Alexander Wolf, Florian Buettner, and Fabian J Theis. “Diffusion pseudotime robustly reconstructs lineage branching”. In: Nature methods 13.10 (2016), pp. 845–848. [121] Jasper Janssens, Sara Aibar, Ibrahim Ihsan Taskiran, Joy N Ismail, Alicia Estacio Gomez, Gabriel Aughey, Katina I Spanier, Florian V De Rop, Carmen Bravo González-Blas, Marc Dionne, et al. “Decoding gene regulation in the fly brain”. In: Nature 601.7894 (2022), pp. 630–636. [122] Frank A Farris. “The Gini index and measures of inequality”. In: The American Mathematical Monthly 117.10 (2010), pp. 851–864. [123] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. “Regularization paths for generalized linear models via coordinate descent”. In: Journal of statistical software 33.1 (2010), p. 1. 85 [124] Vincenzo Calvanese, Andrew T Nguyen, Timothy J Bolan, Anastasia Vavilina, Trent Su, Lydia K Lee, Yanling Wang, Fides D Lay, Mattias Magnusson, Gay M Crooks, et al. “MLLT3 governs human haematopoietic stem-cell self-renewal and engraftment”. In: Nature 576.7786 (2019), pp. 281–286. [125] Nan Papili Gao, SM Minhaz Ud-Dean, Olivier Gandrillon, and Rudiyanto Gunawan. “SINCERITIES: inferring gene regulatory networks from time-stamped single cell transcriptional expression profiles”. In: Bioinformatics 34.2 (2018), pp. 258–266. [126] Aditya Pratapa, Amogh P Jalihal, Jeffrey N Law, Aditya Bharadwaj, and TM Murali. “Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data”. In: Nature methods 17.2 (2020), pp. 147–154. [127] Thalia E Chan, Michael PH Stumpf, and Ann C Babtie. “Gene regulatory network inference from single-cell data using multivariate information measures”. In: Cell systems 5.3 (2017), pp. 251–267. [128] Jennifer J Trowbridge, Jonathan W Snow, Jonghwan Kim, and Stuart H Orkin. “DNA methyltransferase 1 is essential for and uniquely regulates hematopoietic stem and progenitor cells”. In: Cell stem cell 5.4 (2009), pp. 442–449. [129] Fatma Zehra Kadayifci, Shasha Zheng, and Yuan-Xiang Pan. “Molecular mechanisms underlying the link between diet and DNA methylation”. In: International journal of molecular sciences 19.12 (2018), p. 4055. [130] Karen A Lillycrop, Emma S Phillips, Alan A Jackson, Mark A Hanson, and Graham C Burdge. “Dietary protein restriction of pregnant rats induces and folic acid supplementation prevents epigenetic modification of hepatic gene expression in the offspring”. In: The Journal of nutrition 135.6 (2005), pp. 1382–1386. [131] Eric Hervouet, François M Vallette, and Pierre-François Cartron. “Dnmt1/transcription factor interactions: an alternative mechanism of DNA methylation inheritance”. In: Genes & cancer 1.5 (2010), pp. 434–443. [132] Eric Hervouet, Paul Peixoto, Régis Delage-Mourroux, Michaël Boyer-Guittaut, and Pierre-François Cartron. “Specific or not specific recruitment of DNMTs for DNA methylation, an epigenetic dilemma”. In: Clinical epigenetics 10.1 (2018), pp. 1–18. [133] Yusuke Satoh, Takafumi Yokota, Takao Sudo, Motonari Kondo, Anne Lai, Paul W Kincade, Taku Kouro, Ryuji Iida, Koichi Kokame, Toshiyuki Miyata, et al. “The Satb1 protein directs hematopoietic stem cell differentiation toward lymphoid lineages”. In: Immunity 38.6 (2013), pp. 1105–1115. [134] Nicolas Ledru, Parker C Wilson, Yoshiharu Muto, Yasuhiro Yoshimura, Haojia Wu, Amish Asthana, Stefan G Tullius, Sushrut S Waikar, Giuseppe Orlando, and Benjamin Humphreys. “Predicting regulators of epithelial cell state through regularized regression analysis of single cell multiomic sequencing”. In: bioRxiv (2022), pp. 2022–12. 86 [135] Kunal Jindal, Mohd Tayyab Adil, Naoto Yamaguchi, Xue Yang, Helen C Wang, Kenji Kamimoto, Guillermo C Rivera-Gonzalez, and Samantha A Morris. “Multiomic single-cell lineage tracing to dissect fate-specific gene regulatory programs”. In: bioRxiv (2022), pp. 2022–10. 87 AppendixA SupplementaryinformationforChapter2 α 1 α 2 α 3 α 4 α 5 β 1 β 2 β 3 γ 1 γ 2 1.0 0.25 1.0 0.25 0.01 0.01 0.01 0.01 1.0 0.25 γ 3 γ 4 γ 5 γ 6 γ 7 γ 8 γ 9 B C 1.0 1.0 0.25 1.0 0.13 0.01 10.0 0.5 0 Table A.1: Parameter values used in the ODE system. Time Time G Concentration P Concentration A B Figure A.1: Sample simulation of a loop of 20 cells with concentrations of bothG and P over time. Param- eters are set toλ = 28.0 andA 0 = 1.0. 88 Probability Probability 2 5 10 Distribution of n cell in a chain or loop topology of n cells th 1 100 Cell Loops Cell Chains A B Cell Loops Cell Chains Figure A.2: Chains and loops of cells from Fig. 3A-3B along with curves where the number of cellsn = 100. The number of simulation iterations was reduced to 100 perA 0 value in simulations of 100 cell topologies. 1 2 3 Cell 1 Cell 2 Cell 3 Probability Figure A.3: Cell fate probability distributions for a loop of three cells with one inverse signal. A B Figure A.4: Probability distributions of (A) cell two in a chain and (B) two cell loop converging to the G high state resulting from a consensus signal to the parameterB and dual consensus signals to both parameters A andB. A 0 is fixed as 0.65. The signaling parameters for the signals changingA andB areλ A = 28.0 andλ B = 28.0 respectively. 89 40 28 22 16 32 Probability Figure A.5: Probability distributions of cell fate commitment toG high steady state for cell 2 in a chain of two cells with a dissensus signal whereκ ∈ [16,40]. Time G Concentration Time G Concentration Time G Concentration Time G Concentration A B C D Figure A.6: (A)-(D) Sample trajectories of a loop of two cells withA 0 = 1.0175 andλ = 40. 90 Figure A.7: Distributions of loops of 10 cells with different values of λ . The values ofA 0 were selected so that the distributions had similar expected values. For the blue distribution,A 0 = 0.9973 and the expected value is5.284. For the grey distribution,A 0 = 1.016 and the expected value is5.154 Probability B A Probability Figure A.8: Probability distributions that a given cell in a two cell loop will converge to the erythroid state with varying amounts of (A) extrinsic noise or (B) intrinsic noise. 91 Time Time Time Time A B C D A2 Value A3 Value A4 Value G Concentration Figure A.9: (A) Sample trajectory of a chain of four cells whereA 0 = 1.01 andλ = 38.0. (B)-(D) Plots of A 2 (t),A 3 (t), andA 4 (t) over the same timescale as the trajectory. 92 Probability Figure A.10: Probability distributions of cell 2 in a chain of cells converging to the G high state where λ = 18 with different mean wait times, µ , given in the legend. Probability Figure A.11: Sample of simulated data points along with the fitted curve for cell 2 in a chain of cells where λ = 1. 93 AppendixB SupplementaryinformationforChapter4 94 Cd34 Gm16897 1HSC 2Multipotent 3MEP Primed 4GMP 5MEP Peaks Genes 194935000 194940000 194945000 194950000 194955000 194960000 Coverage (Norm. ATAC Signal Range (0−0.55) by ReadsInTSS) chr1:194933820−194961821 HSC Multipotent MEP Primed GMP MEP A −4 0 4 −5 0 5 UMAP1 UMAP2 0.0 0.1 0.2 0.3 0.4 Accessibility −4 0 4 −5 0 5 UMAP_1 UMAP_2 0.0 0.4 0.8 1.2 1.6 Cd34 B C Cd34 Expression Cd34 Accessibility Figure B.1: Comparison of gene expression and gene accessibility score for Cd34. (A) UMAP of the SC- Transform normalized expression of multipotent progenitor marker Cd34. (B) UMAP of the gene accessi- bility score for Cd34. (C) Coverage plot of Cd34 and 5kb upstream of Cd34’s transcription start site. 95 HSC Multipotent MEP Primed GMP MEP Other Mthfd1 Cpox Hspd1 Pvt1 Hdgf Rhag Aqp1 Minpp1 Zg16 Mt1 Spire1 Ugcg Ank1 Kel Ext1 Nfia Abcb4 Ermap Car1 Slc25a21 Gria3 Slc24a3 Hk3 Cers6 Ms4a3 Cdk8 F630028O10Rik Fndc3b Emb Calr Elane Ctsg Prtn3 Rab44 Atp8b4 Mpo Cavin2 Dach1 Rab27b Inpp4b Prkca Gata2 Pf4 Slc14a1 Plxnc1 Mef2c Mfsd2b Rap1b Pdcd4 Psd3 Zeb2 Tgfbr3 Med12l Itga2b Pbx1 Clnk Runx2 Cdk6 Ikzf2 Tespa1 Sell Plac8 Ptpre Parp8 Arhgap15 Adgrl4 Maml3 Satb1 Slco3a1 Sox4 Flt3 H2afy 2610307P16Rik Cd34 Rnf220 B3galt1 Prdm16 Hlf Prkg1 Alcam Dlg2 Trpc6 Itsn1 Tox Samd12 Mllt3 Gda Plxdc2 Mecom Magi2 Rora Ncam2 Il1rapl2 Plcl1 Chrm3 −2 −1 0 1 2 Expression Identity HSC Multipotent MEP Primed GMP MEP Other −4 0 4 −5 0 5 UMAP_1 UMAP_2 HSC Multipotent MEP Primed GMP MEP −5 0 5 10 −10 −5 0 UMAP_1 UMAP_2 Singlet Doublet A B C D E Figure B.2: Overview of scRNA-seq processing in the old ad libitum (oAL) sample. (A) UMAP of doublets identified by DoubletFinder. (B) UMAP of cell cycle phases of the cells. Since the cells are fairly homo- geneous, they cluster together by cell cycle phase. (C) UMAP of cell cycle phases of the cells after cell cycle phases G2M and S are regressed out. (D) UMAP of hematopoietic cell type clusters. (E) Heatmap of the top 20 log-fold change markers per cluster from (D). Among these top marker genes are canonical hematopoietic cell type markers, including Mecom (HSC), Cd34 (Multipotent), Itga2b (MEP Primed), Mpo (GMP), and Car1 (MEP). 96
Abstract (if available)
Abstract
During hematopoiesis, a concert of intrinsic and extrinsic factors guide hematopoietic stem cells (HSCs) towards different hematopoietic lineages. In this work, a variety of these factors are probed using multiscale mathematical modeling, data analysis of single-cell genomics data, and gene regulatory network inference. The influence of cell-cell communication on hematopoietic progenitor lineage decision making is explored using a multiscale mathematical model of the well-characterized GATA1-PU.1 bifurcation. The relationship between HSC aging and changes in metabolism is evaluated by analyzing bulk and single-cell RNA-sequencing data from HSCs extracted from wild type and Igf2bp2 knockout mice of varying ages. Lastly, changes in gene regulation during hematopoiesis is evaluated using a proposed novel method, popInfer, for inferring gene regulatory networks from single-cell multiome (RNA and ATAC) data. popInfer is applied to the transition from HSCs to multipotent progenitors from young and old mice fed ad libitum and diet restricted to compare how the gene regulatory networks are perturbed with each condition. In total, this dissertation further elucidates how HSC function and lineage decision-making occurs and is influenced by cell-cell communication, metabolism, aging, and diet.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dissecting the heterogeneity of mouse hematopoietic stem cells in vivo
PDF
Membrane-bound regulation of hematopoietic stem cells
PDF
Scalable latent factor models for inferring genetic regulatory networks
PDF
Statistical modeling of sequence and gene expression data to infer gene regulatory networks
PDF
Investigating the evolution of gene networks through simulated populations
PDF
Data-driven learning for dynamical systems in biology
PDF
Regional localization and regulation of hematopoietic stem cells in the bone marrow stem cell niche
PDF
Metabolic profiling of single hematopoietic stem cells for developing novel ex vivo culture strategies
PDF
Systems approaches to understanding metabolic vulnerabilities in cancer cells
PDF
A single cell time course of senescence uncovers discrete cell trajectories and transcriptional heterogeneity
PDF
Mechanistic modeling of angiogenic factors network and cancer therapy
PDF
Deciphering protein-nucleic acid interactions with artificial intelligence
PDF
Neural network integration of multiscale and multimodal cell imaging using semantic parts
PDF
Genome-wide characterization of the regulatory relationships of cell type-specific enhancer-gene links
PDF
The calcium-sensing receptor in the specification of normal and malignant hematopoietic cell localization in the bone marrow microenvironment
PDF
Essays on bioinformatics and social network analysis: statistical and computational methods for complex systems
PDF
Genome-wide studies reveal the isoform selection of genes
PDF
Inferring gene flow across spatial landscapes
PDF
Ancestral inference and cancer stem cell dynamics in colorectal tumors
PDF
Understanding anti-angiogenic signaling and treatment for cancer through mechanistic modeling
Asset Metadata
Creator
Rommelfanger, Megan Kristen
(author)
Core Title
Investigating hematopoietic stem cells via multiscale modeling, single-cell genomics, and gene regulatory network inference
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Computational Biology and Bioinformatics
Degree Conferral Date
2023-05
Publication Date
04/06/2023
Defense Date
03/28/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
gene regulatory networks,Hematopoiesis,multiscale modeling,OAI-PMH Harvest,systems biology
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
MacLean, Adam (
committee chair
), Finley, Stacey (
committee member
), Lu, Rong (
committee member
)
Creator Email
meganfra@usc.edu,mkrommelfanger@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112952621
Unique identifier
UC112952621
Identifier
etd-Rommelfang-11578.pdf (filename)
Legacy Identifier
etd-Rommelfang-11578
Document Type
Dissertation
Format
theses (aat)
Rights
Rommelfanger, Megan Kristen
Internet Media Type
application/pdf
Type
texts
Source
20230406-usctheses-batch-1017
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
gene regulatory networks
multiscale modeling
systems biology