Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Understanding ancestry-specific disease allelic effect sizes by leveraging multi-ancestry single-cell RNA-seq data
(USC Thesis Other)
Understanding ancestry-specific disease allelic effect sizes by leveraging multi-ancestry single-cell RNA-seq data
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Understanding ancestry-specific disease allelic effect sizes
by leveraging multi-ancestry single-cell RNA-seq data
by
Juehan Wang
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
May 2022
Copyright 2022 Juehan Wang
ii
Acknowledgments
It is a genuine pleasure to express my sense of thanks and gratitude to people who have offered
me support for my thesis.
First and foremost, I would like to express the deepest appreciation to the Gazal lab at USC and
my mentor Professor Steven Gazal of the Department of Population and Public Health. His dedication and
keen interest above all his overwhelming attitude to help our lab members have always encouraged me
step by step. Without his guidance and persistent help this dissertation would not have been possible. The
time studying in the lab has made my best experience of my master life.
Secondly, I would also like to acknowledge Professor Mancuso Nicholas and Professor Adam de Smith
as my committee members, and I am gratefully indebted to them for their very valuable comments and
suggestions on this thesis.
Finally, I must express my very profound gratitude to my parents for providing me with unfailing
support and continuous encouragement throughout my years of study and through the process of
researching and writing this thesis. This accomplishment would not have been possible without them.
Thank you.
iii
TABLE OF CONTENTS
Acknowledgments ii
List of Tables iv
List of Figures iv
Abstract vi
Introduction 1
Chapter 1: Method 2
2-1 Single-cell RNA-seq data in peripheral blood mononuclear cells 2
2-2 Genes differentially expressed across ancestries 4
2-3 Estimating enrichment of stratified squared multi-ancestry genetic correlation using S-LDXR 4
Chapter 2: Results 8
3-1 Characterizing genes differentially expressed in Asians and Europeans 8
3-2 Characterizing multi-ancestry correlation of disease variants in genes differentially expressed in
Europeans and East Asians 11
3-2-a/ Characterizing multi-ancestry correlation in lymphoid and myeloid cell-types 11
3-2-b/ Characterizing multi-ancestry correlation at the cell-type level 13
3-2-c/ Illustrative example 15
Chapter 3: Discussion 17
References 19
Appendices 22
Appendix A: Supplementary Figures 22
Appendix B: Supplementary Tables 27
iv
List of Tables
Table 1: Number of cells for the 7 main PBMC cell-types. 3
Table 2: Sample size of the 15 immune traits. 6
Table 3: Gene ontology (GO) enrichment analysis for ANC-DE genes in B cells. 15
v
List of Figures
Figure 1: UMAP projection and assignment of all cells to cell types. 3
Figure 2: Results of genes differentially expressed. 9
Figure 3: S-LDXR results for lymphoid and myeloid ANC-DE genes. 12
Figure 4: S-LDXR results for enrichment of top 100 differentially expressed ANC-DE genes in 7 cell-
types. 14
Figure 5: Discordant results between ASI and EUR Lymphocyte Count GWAS around the B-cell ANC-
DE gene MCL1. 16
vi
Abstract
Genome-wide association studies (GWAS) have highlighted that human diseases and complex traits are
dominated by common regulatory risk variants, and that some of these variants have ancestry-specific
effects. Here we tested if these variants tend to be enriched around genes that have ancestry-specific levels
of transcription. First, we analyzed single-cell RNA-seq (scRNA-seq) data in peripheral blood mononuclear
cells (PBMCs) from 48 individuals of Asian (ASI) or European (EUR) ancestry and detected 370 genes
significantly differentially expressed across ASI and EUR ancestries (ANC-DE genes) in at least one cell
type. These genes tend to be differentially expressed in a single cell type and were enriched in genes
involved in immune response to the environment. Second, we leveraged 15 GWAS datasets of blood and
immune-related traits available in both ASI and EUR populations and ran S-LDXR to test if genetic variants
surrounding ANC-DE genes were enriched in ancestry-specific effects. We determined that squared multi-
ancestry genetic correlation is 0.53 ± 0.05 depleted around lymphoid ANC-DE genes and 0.67 ± 0.06
depleted around myeloid ANC-DE genes, and that the stronger lymphoid depletion is driven by the genes
most differentially expressed in B cells (0.32 ± 0.06 depletion). Finally, we leveraged our findings to
interpret different discordant results between ASI and EUR Lymphocyte Count GWAS around the MCL1
gene. Altogether, highlight that genes with ancestry-specific levels of regulation in lymphoid and myeloid
cell types are enriched in genes involved in immune response to the environment and enriched in genetic
variants with different effect sizes in blood and immune-related traits. These results could be explained by
gene-environment interaction within immune genes.
1
Introduction
Allele frequency differences across human ancestries have been widely observed [1–3], have been linked
to human adaptation [4–6], and have been shown to impact the genetic architecture of human complex traits
[7–9]. Regulatory differences across human ancestries have also been observed on different functional
scales, such as gene expression [10–16], eQTLs [17–19], methylation [20,21] and enhancers [22], and could
lead to ancestry specific genetic architecture of diseases. Indeed, human diseases are dominated by
regulatory genetic variants [23–25], and poor transportability of polygenic risk scores across populations
[26,27] suggests ancestry specific effects for these regulatory variants. Thus, characterizing regulatory
differences across human ancestries could help us to understand recent human adaptation and the ancestry
specific genetic architecture of human diseases, which could ultimately improve polygenic risk prediction
in non-European populations and alleviate health disparities [26–28].
Understanding how human regulatory differences across ancestries impact disease risk has been
nearly impossible for the main reason that both publicly available well-powered GWAS datasets [26,29]
and functional datasets[28,30] are focusing on individuals of European descent. However, non-European
GWAS datasets and functional datasets have started to be released by the genetic community.
Here we analyzed a unique dataset of single-cell RNA-seq (scRNA-seq) data in peripheral blood
mononuclear cells (PBMCs) for 23 and 25 individuals of Asian (ASI) and European (EUR) ancestry,
respectively. We detected genes differentially expressed across these ancestries (labeled ANC-DE genes)
at the cell-type level. Then, we applied the S-LDXR method on 15 blood and immune traits [8] to investigate
if variants surrounding ANC-DE genes are enriched in discordant allele effect sizes between GWAS from
ASI and EUR ancestries. Our analyses highlighted that genes with ancestry-specific levels of regulation in
lymphoid and myeloid cell types are enriched in genes involved in immune response to the environment
and enriched in genetic variants with different effect sizes in blood and immune-related traits.
2
Chapter 1: Method
2-1 Single-cell RNA-seq data in peripheral blood mononuclear cells
We used processed scRNA-seq data in PBMCs from Perez et al. Science In Press. This dataset consists, to
our knowledge, of the largest multi-ancestry single-cell dataset with 106 and 100 individuals of Asian (ASI)
and European (EUR) ancestry, respectively. This is a case-control dataset with 158 systemic lupus
erythematosus (SLE) cases (83 ASI and 75 EUR), and 48 controls (23 ASI and 25 EUR). For further
analyses we considered primarily only controls in order to have a homogeneous population (despite lower
sample size), but use the full dataset (i.e., SLE cases and controls) to test the robustness of some of our
results (despite phenotype heterogeneity).
This scRNA-seq dataset includes 948,001 cells from 11 cell-types totally. Only 936,467 cells from
the 7 most abundant cell-types were considered in our analysis: Natural killer cells (NK), B cells (B), and
CD4+ and CD8+ T cells (T4 and T8), non-classical monocytes (cM and ncM) and conventional dendritic
cells (cDC). Uniform Manifold Approximation and Projection (UMAP) of the single-cell transcriptomic
profiles revealed distinct regions of the embedding occupied by different cell types (Fig. 1A). Lymphoid
cells (NK, B, T4 and T8) and myeloid cells (cDC, cM and ncM), which all play major roles in innate
immunity, clustered together (Figure 1). Cells did not cluster according to their disease status and ancestries
(Supplementary Figure 1). For further analyses, we performed additional quality control. Specifically, we
removed cells with more than 20% of their reads in 13 mitochondrial (MT) genes (high fraction of cells in
MT genes indicates damaged or dying cells which don’t count), with less than 500 reads or more than
10,000 reads.
3
Cell type Controls only Controls + SLE
ASI EUR Total ASI EUR Total
Lymphoid
B B cells 9,130 12,442 21,572 65,132 46,803 111,935
NK Natural killer cells 5,208 10,612 15,820 30,716 37,223 67,939
T4 CD4
+
T cells 32,792 43,959 76,751 125,365 143,165 268,530
T8 CD8
+
T cells 13,952 16,042 29,994 106,158 78,581 184,739
Myeloid
cDC Conventional dendritic cells 1,045 1,475 2,520 6,559 6,531 13,090
cM Classical monocytes 13,578 20,932 34,510 133,505 117,906 251,411
ncM Non-classical monocytes 2,026 3,617 5,643 19,754 19,069 38,823
Total 77,731 109,079 186,810 487,189 449,278 936,467
Table 1: Number of cells for the 7 main PBMC cell-types. We report the number of cells in controls and
the full dataset for ASI and EUR ancestries after quality control.
Figure 1: UMAP projection and assignment of all cells to cell types. We report UMAP projection and
assignment of all cells to 11 cell types within the whole dataset (A) and within controls (B) (cells from SLE
cases in grey). Prolif: Proliferating lymphocytes; PB: Plasmablasts; pDC: Plasmacytoid dendritic cells;
Progen: CD34
+
progenitors.
4
2-2 Genes differentially expressed across ancestries
We tested, within each cell-type, if each gene was differentially expressed between individuals with Asian
and European ancestries. We used a Poisson linear mixed-effects model (PLMM) with the number of reads
as the outcome variable to model, the individual as a random effect, and age, gender, batch, 5 first PCs
computed at the cell-type level, log of total number of reads per cell, proportion of genes expressed in a
single cell (CDR: cellular detection rate) and fraction of reads in MT genes as fixed effect covariates. When
analyzing the whole dataset, we added the status (i.e., SLE case or control) as a fixed effect covariate. We
restricted our analyses to 19,995 genes [31], and for each cell-type we restricted our analyses to expressed
genes, which we define as genes with at least 50 reads across all samples. We defined a gene as differentially
expressed across ancestry (labeled ANC-DE genes) as genes with a controlled false discovery rate (FDR)
of 5%.
For comparison purposes, we also detected ANC-DE genes using two alternative approaches
widely used to detect genes differentially expressed in scRNA-seq data. First, we computed pseudo-bulk
expression values and regressed the individual logtp10k (i.e., the of 1 + 10,000 * the proportion of cells in
the gene). Second, we considered a linear mixed model regressing the cell logtp10k.
We performed Gene ontology (GO) enrichment analysis of ANC-DE genes using R package goseq.
Our reference set of genes consisted of the genes expressed in the considered cell-types. ANC-DE gene sets
were significantly enriched in a given category if the FDR-adjusted P value was < 0.05.
2-3 Estimating enrichment of stratified squared multi-ancestry genetic correlation using S-LDXR
S-LDXR [8] is a method to estimate enrichment of stratified squared multi-ancestry genetic correlation
across functional categories of SNPs using GWAS summary statistics and population-matched linkage
disequilibrium (LD) reference panels. S-LDXR determines that a category of SNPs is enriched for multi-
ancestry genetic covariance if SNPs with high LD to that category have a higher product of Z-scores than
SNPs with low LD to that category. S-LDXR models per-allele effect sizes (accounting for differences in
5
allele frequency differences between populations) of SNP j in two populations (labeled as β
1j
and β
2j
) with
variance and covariance,
Var!β
1j
" =∑ a
C
(j)τ
1C C
, Var!β
2j
" =∑ a
C
(j)τ
2C C
,
Cov!β
1j
,β
2j
" =∑ a
C
(j)θ
C C
(1)
where a
C
(j) is the value of SNP 𝑗 for annotation C, which can be binary or continuous-valued; τ
1C
and τ
2C
are the net contribution of annotation C to the variance of β
1j
and β
2j
, respectively; and θ
C
is the net
contribution of annotation C to the covariance of β
1j
and β
2j
.
S-LDXR estimates the stratified squared multi-ancestry genetic correlation, which is defined as
r
g
2
(C)=
ρ
g
2
(C)
h
g1
2
(C)h
g2
2
(C)
(2)
where ρ
g
is the multi-ancestry genetic covariance of each binary annotation C:
ρ
g
(C)=' (' a
C'
(j)θ
C'
C'
)
j∈C
(3)
where θ
C
represents the per-SNP contribution to multi-ancestry genetic covariance of the per-allele causal
disease effect size of annotation C, and θ
C'
are coefficients for all binary and continuous-valued annotations
C′ included in the analysis. h
g1
2
and h
g2
2
are heritabilities in each population.
Then S-LDXR estimate the enrichment/depletion of squared multi-ancestry genetic correlation,
which is defined as
λ
2
(C)=
r
g
2
(C)
r
g
2
(4)
where r
g
2
is the genome-wide squared multi-ancestry genetic correlation. λ
2
can be meta-analyzed across
traits with different r
g
2
.
We applied the S-LDXR method using recommended settings, reference files (i.e. 481 East Asian
and 489 European samples in the 1000 Genomes Project [3]), and a background set of functional
6
annotations (i.e. the baseline-LD-X model, a set of 62 functional SNP-annotations known to impact per-
allele effect sizes). We analyzed 15 immune traits from the total of 31 traits analyzed in [8], which are
Basophil Count (BASO), Eosinophil Count (EO), Hemoglobin A1c (HBA1C), Hemoglobin (HGB),
Hematocrit (HTC), Lymphocyte Count (LYMPH), Mean Corpuscular Hemoglobin (MCH), MCH
Concentration (MCHC), Mean Corpuscular Volume (MCV), Monocyte Count (MONO), Neutrophil Count
(NEUT), Platelet Count (PLT), Rheumatoid Arthritis (RA), Red Blood Cell Count (RBC) and White Blood
Cell Count (WBC). GWAS were obtained for both ASI and EUR ancestries using Biobank Japan, UK
Biobank, and other sources (see Table 2).
Trait N ASI N EUR
BASO Basophil Count 62,076 [32] 131,860 [33]
EO Eosinophil Count 62,076 [32] 337,539 [34]
HBA1C Hemoglobin A1c 42,790 [32] 337,539 [34]
HGB Hemoglobin 108,769 [32] 132,596 [33]
HTC Hematocrit 108,757 [32] 132,699 [33]
LYMPH Lymphocyte Count 62,076 [32] 337,539 [34]
MCH Mean Corpuscular Hemoglobin 108,054 [32] 337,539 [34]
MCHC MCH Concentration 108,728 [32] 132,586 [33]
MCV Mean Corpuscular Volume 108,256 [32] 132,353 [33]
MONO Monocyte Count 62,076 [32] 337,539 [34]
NEUT Neutrophil Count 62,076 [32] 131,564 [33]
PLT Platelet Count 108,208 [32] 337,539 [34]
RA Rheumatoid Arthritis 22,343 [35] 37,598 [35]
RBC Red Blood Cell Count 108,794 [32] 337,539 [34]
WBC White Blood Cell Count 107,964 [32] 337,539 [34]
Table 2: Sample size of the 15 immune traits. We report the sample size of the 15 immune traits for
ASI and EUR ancestries.
7
Our analyses included SNP-annotations related to gene sets, which were constructed by adding
100-kb windows on either side of the transcribed region of each gene in the set. All analyses included a
SNP-annotation for the 19,995 genes, and seven SNP-annotations representing the set of genes expressed
in each cell-type.
8
Chapter 2: Results
3-1 Characterizing genes differentially expressed in Asians and Europeans
We aimed here to characterize the regulatory differences between ASI and EUR ancestries by investigating
a PBMC single-cell RNA-seq dataset of 23 ASI individuals and 25 EUR individuals. This dataset does not
contain significant cell-type proportion differences between the two samples (Supplementary Figure 2).
We detected a total of 455 genes significantly (at FDR 5%) differentially expressed across ASI and
EUR ancestries (ANC-DE genes) within the 7 most abundant PBMC cell-types (370 unique ANC-DE genes)
(Figure 2). The number of ANC-DE genes varied from 11 in cDC cells, to 169 in cM cells; overall the
number of ANC-DE genes was (non-significantly) positively correlated with the number of cells in the cell-
type (r = 0.59; P = 0.16). Around half of the ANC-DE genes were overexpressed in ASI vs. in EUR (219
in ASI vs. 236 in EUR). Out of the 455 associations, 63% were replicated when using SLE cases and
controls (3,019 significant ANC-DE genes within the 7 cell-types, 1,990 unique ANC-DE genes),
replicating a high number of ANC-DE genes, but also revealing potential discrepancy due to heterogeneity
in SLE cases and controls. For comparison purposes, we replicated our analyses using pseudo bulk
expression and linear mixed models regressing the cell logtp10k, and detected 18 and 187 significant
associations, respectively, highlighting the advantage of using Poisson linear mixed-effects to detect ANC-
DE genes.
Interestingly, 319 out of the unique 370 ANC-DE genes (86%) were differentially expressed in a
single cell-type, suggesting high cell-type specificity of differential expression across ancestries (Figure 2).
We observed a similar pattern in SLE cases and controls, where out of 1,990 unique ANC-DE genes, 1,397
(70%) were cell-type specific. Within the 51 ANC-DE genes in at least 2 cell-types, 49 have consistent
directions (i.e., overexpressed or underexpressed in all the cell-types), suggesting robust detection of ANC-
DE genes. We observed discordant results for HLA-DPA1 (discordant effects in NK and cDC cells) and
TTC39C (discordant effects in T4 and cM cells); we note that these discordant effects were observed in
9
cell-types of lymphoid vs. myeloid cells, possibly suggesting different ancestry regulation mechanisms
rather than false positive results.
Figure 2: Results of genes differentially expressed. (A) We report numbers of ANC-DE genes
significantly differentially expressed across ASI and EUR ancestries within the 7 most abundant PBMC
cell types at the FDR 5% level. Each bar is colored by the number of cell-types sharing the ANC-DE
gene. (B) We report the number of ANC-DE genes shared by different numbers of cell-types.
When grouping ANC-DE genes within lymphoid and myeloid cell-types, we found 210 and 186
unique ANC-DE genes, respectively, with only 26 genes being shared, suggesting distinct ancestry specific
regulation within lymphoid and myeloid cell-types. We observed similar but less stronger patterns when
considering SLE cases and controls (1,354 and 920 ANC-DE genes, respectively; 286 ANC-DE genes
shared).
Finally, we performed Gene ontology (GO) enrichment analysis of ANC-DE genes. We found 43,
10 and 30 significant GO categories for the 370 PBMC, 210 lymphoid and 186 myeloid ANC-DE genes,
respectively (Supplementary Tables 1-2). Interestingly, the most significant GO category is “immune
response” that is enriched in PBMC ANC-DE genes (FDR corrected P = 8 x 10
-4
). The three GO categories
that were the most enriched in lymphoid ANC-DE genes were “lumenal side of membrane”, “interferon-
gamma-mediated signaling pathway”, and “peptide antigen binding” (all having FDR corrected P = 6 x 10
-
3
), and the three GO categories that were the most enriched in myeloid ANC-DE genes were “neutrophil
10
degranulation”, “myeloid neutrophil activation involved in immune response”, and “neutrophil mediated
immunity” (all having FDR corrected P = 8 x 10
-3
). These results suggest that ANC-DE genes have an
expected immune function due to the different environment and selective pressure history of ASI and EUR
populations. At the cell-type level, we only detected significantly enriched GO categories in NK cells (top
three enriched GO categories: “immune response”, “MHC class II protein complex” and “cellular response
to interferon-gamma”, FDR corrected P < 8 x 10
-3
) and cM cells (top three enriched GO categories:
“neutrophil degranulation”, “neutrophil activation involved in immune response” and “neutrophil mediated
immunity”, FDR corrected P < 0.02) (Supplementary Tables 3-4).
11
3-2 Characterizing multi-ancestry correlation of disease variants in genes differentially expressed in
Europeans and East Asians
We aimed to estimate the enrichment of stratified squared multi-ancestry genetic correlation (𝜆
2
) for SNPs
within ANC-DE genes by meta-analyzing S-LDXR results on 15 GWAS of blood and immune related traits
(Table 2).
3-2-a/ Characterizing multi-ancestry correlation in lymphoid and myeloid cell-types
We started by investigating two SNP-annotations corresponding to the 210 lymphoid and 186 myeloid
ANC-DE genes (2.2% and 1.6% of investigated common SNPs, respectively). We determined that squared
multi-ancestry genetic correlation was extremely depleted for both SNP-annotations: 𝜆
2
= 0.53 ± 0.05 (P =
1 x 10
-18
) for lymphoid ANC-DE genes and 𝜆
2
= 0.67 ± 0.06 (P = 1 x 10
-7
) for ANC-DE myeloid genes
(Figure 3). These 𝜆
2
are the most and third most depleted annotations when considering other annotations
of the baseline-LD-X model (second most depleted annotation is 5’ UTR, 0.5% of investigated common
SNPs, 𝜆
2
= 0.64 ± 0.08, P = 7 x 10
-6
). We observed a significant negative value for the net contribution of
the lymphoid ANC-DE genes SNP-annotation (P of 𝜃<0 = 2 x 10
-3
), suggesting the unique contribution of
lymphoid ANC-DE genes into squared multi-ancestry genetic correlation. We note that when detecting
ANC-DE genes in SLE cases and controls, we replicated the highly significant depletions but with a lower
magnitude (𝜆
2
= 0.83 ± 0.03, P = 2 x 10
-11
for lymphoid ANC-DE genes, and 𝜆
2
= 0.78 ± 0.03, P = 2 x 10
-
12
for ANC-DE myeloid genes) due to the higher number of genes and SNPs in the corresponding SNP-
annotation (10.8% and 7.7%, respectively).
We highlighted four significant (P < 0.05/15) depletions at the trait level (Figure 3): BASO (𝜆
2
=
0.04 ± 0.22, P = 1 x 10
-5
), LYMPH (𝜆
2
= 0.39 ± 0.16, P = 1 x 10
-4
), and MCV (𝜆
2
= 0.48 ± 0.16, P = 1 x
10
-3
) for lymphoid ANC-DE genes, and PLT (𝜆
2
= 0.51 ± 0.14, P = 3 x 10
-4
) for myeloid ANC-DE genes.
Finally, we compared lymphoid and myeloid ANC-DE genes 𝜆
2
meta-analyzed on 15 blood and
immune traits to the ones meta-analyzed on 16 other types of traits (i.e., non-blood and non-immune traits)
12
(Figure 3). While we observed similar 𝜆
2
for myeloid ANC-DE genes (0.67 ± 0.06 vs 0.57 ± 0.13,
respectively), we observed a slightly significant higher depletion of 𝜆
2
for lymphoid ANC-DE genes (0.53
± 0.05 vs 0.80 ± 0.10, respectively; P = 0.02 for difference). This result suggests that PBMC ANC-DE
genes are also depleted in squared multi-ancestry genetic correlation for non-blood and non-immune traits.
Figure 3: S-LDXR results for lymphoid and myeloid ANC-DE genes. We report a scatter plot of
multi-ancestry genetic correlation enrichment for 15 blood and immune traits and 16 other traits in
lymphoid and myeloid ANC-DE genes, with their corresponding meta-analyzed value (lines represent
95% CI. We labeled four traits where 𝜆
2
was significantly depleted inmulti-ancestry genetic correlation
around lymphoid ANC-DE genes (BASO, LYMPH, and MCV) and myeloid ANC-DE genes (PLT).
Altogether, these results demonstrate that genes differentially expressed in ASI and EUR in both
lymphoid and myeloid cell-types are surrounded by genetic variants with discordant effect sizes in ASI and
EUR GWAS. The depletion has the highest magnitude for lymphoid ANC-DE genes, with different
depletions between blood and immune traits vs. other types of traits.
13
3-2-b/ Characterizing multi-ancestry correlation at the cell-type level
To refine lymphoid and myeloid 𝜆
2
signal, we performed S-LDXR analyses by creating SNP-annotations
for each of the 7 main cell-types. As the number of FDR genes varies significantly by cell-type (between
11 in cDC cells and 169 in cM cells), we constructed annotations with the top 100 genes with the smallest
ANC-DE genes P value per cell-type (around 1.0% of investigated common SNPs per SNP-annotation).
The 7 cell-types had all corresponding SNP-annotations significantly depleted (P < 6 x 10
-7
). We observed
the smallest and most significant depletion for the ANC-DE genes in B cells (𝜆
2
= 0.32 ± 0.06, P = 6 x 10
-
27
) and cDC cells (𝜆
2
= 0.43 ± 0.08, P = 5 x 10
-13
), making these annotations the most depleted in multi-
ancestry genetic correlation ever investigated with S-LDXR. These two cell-types have corresponding SNP-
annotations with significant negative value for their net contribution (P of 𝜃<0 = 5 x 10
-7
for B cells and 5
x 10
-6
for cDC cells), suggesting their unique contribution to squared multi-ancestry genetic correlation.
We replicated our analyses by varying the number of DE genes from 100 and 2,000, although the strength
of the signal decreases when the number of genes increases (Supplementary Figure 3). We note that when
detecting ANC-DE genes in SLE cases and controls, we replicated our conclusions on the top 100 ANC-
DE genes in B cells, but not in cDC cells (Supplementary Figure 4).
We detected five traits with significant (P < 0.05/15) ANC-DE genes in B cells and/or nCM cells
depleted in multi-ancestry genetic correlation: EO (𝜆
2
= 0.33 ± 0.19, P = 5 x 10
-4
), HGB (𝜆
2
= 0.15 ± 0.24,
P = 4 x 10
-4
), HTC (𝜆
2
= 0.21 ± 0.26, P = 3 x 10
-3
), LYMPH (𝜆
2
= 0.21 ± 0.26, P = 3 x 10
-3
), PLT (𝜆
2
=
0.34 ± 0.22, P = 2 x 10
-3
) and RBC (𝜆
2
= 0.34 ± 0.15, P = 7 x 10
-6
) for ANC-DE genes in B cells, and
LYMPH (𝜆
2
= 0.26 ± 0.20, P = 2 x 10
-4
) and WBC (𝜆
2
= 0.22 ± 0.23, P = 6 x 10
-4
) for ANC-DE genes in
cDC cells.
We compared the 𝜆
2
meta-analyzed on 15 blood and immune traits to the ones meta-analyzed on
16 other types of traits (i.e., non-blood and non-immune traits). We observed no significant differences at
the cell-type level, but we note some 𝜆
2
differences for ANC-DE genes in B cells (𝜆
2
= 0.32 ± 0.06 vs 𝜆
2
=
14
0.62 ± 0.15; P = 0.06 for difference) and that ANC-DE genes in NK cells were not significantly enriched
in other traits (𝜆
2
= 0.83 ± 0.16; P = 0.30).
Finally, to interpret our results, we also performed GO enrichment analyses of the top ANC-DE
genes. First, we failed to find significant pathways (at the FDR 5% level) when using the top 100 ANC-DE
genes in B and cDC cells (Supplementary Table 5-6), but found significant ones related to immunity and
immune response for NK and nCM cells (Supplementary Tables 7-8). However, when extending our
analyses to the top 200 ANC-DE genes, we observed 10 significant pathways in B cells, including “response
to toxic substance” (Table 3). We still did not observe significant pathways for the top 200 ANC-DE genes
in cDC cells, but many of the most associated pathways were involved in immunity and immune response
(Supplementary Table 9).
Altogether, these results demonstrate the strong discordant effect sizes in ASI and EUR GWAS for
variants surrounding ANC-DE genes B cells and suggest that different immune adaptation to environment
lead to different regulatory mechanisms impacting human complex traits and diseases.
Figure 4: S-LDXR results for enrichment of top 100 differentially expressed ANC-DE genes in 7
cell-types. We report bar plot and 95% CI for multi-ancestry genetic correlation enrichment of top 100
ANC-DE genes in immune traits in both ASI and EUR ancestries.
15
Cell type GO Category P-value FDR P-value
B positive regulation of cell death 4.27E-06 2.88E-02
B cytosolic ribosome 9.74E-06 2.88E-02
B focal adhesion 1.58E-05 2.88E-02
B cell-substrate junction 1.64E-05 2.88E-02
B hydrogen peroxide catabolic process 1.95E-05 2.88E-02
B hydrogen peroxide metabolic process 2.22E-05 2.88E-02
B response to toxic substance 2.44E-05 2.88E-02
B regulation of cell death 3.04E-05 3.15E-02
B protein localization to endoplasmic reticulum 4.54E-05 4.18E-02
B protein targeting 5.23E-05 4.33E-02
Table 3: Gene ontology (GO) enrichment analysis for ANC-DE genes in B cells. We report pathways
significant at the FDR 5% level and their corresponding FDR-corrected P-value for the 200 genes with
the smallest ANC-DE P values.
3-2-c/ Illustrative example
We illustrated the implication of our results by interpreting discordant results between ASI and EUR GWAS
around an ANC-DE gene in B cells. Specifically, we focused on MCL1, a gene implicated in every
differentiation step of B cells development, which we found as being a B cell specific ANC-DE gene (P =
1 x 10
-6
in B cells, 7th smallest P in ANC-DE B cells; P < 0.002 in the 6 other cell-types) that is more
expressed in ASI than in EUR individuals (Supplementary Figure 5). We observed a strong association in
the EUR Lymphocyte Count (LYMPH) GWAS around this gene (minimum P = 2 x 10
-13
), but no significant
association in the ASI LYMPH GWAS (minimum P = 0.07) (Figures 3A-B). We ensured that this strong
difference was not driven by different GWAS sample sizes and ASI and EUR by comparing the marginal
effect sizes of the lead ancestry-specific variants (23 variants with a difference of the -log10 GWAS P >
16
10; Figure 3C). Marginal effect sizes were on average 7 times larger in the EUR GWAS than in the EAS
GWAS, with significant differences (P < 0.002) across the 23 variants (Figure 3D). We observed similar
allele frequencies and LD scores for those SNPs (Figures 3E-F), suggesting that this difference is also not
driven by different genetic architecture between ASI and EUR populations.
Figure 5: Discordant results between ASI and EUR Lymphocyte Count GWAS around the B-cell
ANC-DE gene MCL1. (A, B) We report Lymphocyte Count (LYMPH) GWAS -log 10(P) computed in
ASI (A) and EUR (B) populations. Black horizontal line represents P = 5x10
-8
. The orange region
represents 100kb windows on either side of the transcribed region of MCL1. (C) Scatter plot of LYMPH
GWAS -log 10(P) computed in ASI and EUR populations. (D) Scatter plot of LYMPH marginal effect
sizes computed in ASI and EUR populations. Red lines represent 95% CI. (E) Scatter plot of ASI and
EUR MAF. (F) Scatter plot of ASI and EUR LD scores. SNPs shared within both ASI and EUR GWAS
are in black. SNPs not shared are in grey. Red points represent SNPs with a -log 10(P) difference between
ASI and EUR greater than 10.
17
Chapter 3: Discussion
Here, we analyzed scRNA-seq data in PBMCs from 48 individuals of ASI or EUR ancestry, and detected
370 ANC-DE genes in at least one cell type. These genes tend to be differentially expressed in a single cell
type, and were enriched in genes involved in immune response to the environment. Then, we determined
that squared multi-ancestry genetic correlation is 0.53 ± 0.05 depleted around lymphoid ANC-DE genes
and 0.67 ± 0.06 depleted around myeloid ANC-DE genes, and that the stronger lymphoid depletion is driven
by the genes most differentially expressed in B cells (0.32 ± 0.06 depletion). Altogether, highlight that
genes with ancestry-specific levels of regulation in lymphoid and myeloid cell types are enriched in genes
involved in immune response to the environment and enriched in genetic variants with different effect sizes
in blood and immune-related traits. These results could be explained by gene-environment interaction
within immune genes.
Our findings have several implications for downstream analyses. First, they provide a partial source
of explanation for the non-transportability of polygenic risk scores across populations [26]. Accounting for
ANC-DE genes in relevant cell types could help to downweigh variant effects when computing polygenic
risk scores. Second, our results highlight the benefits of generating functional datasets in non-European
populations. Finally, we proposed a framework that could be leveraged to analyze the impact of the
environment into discordant GWAS effects. A possible follow-up analyses would be to investigate the
impact of genes differentially expressed in women and men [36] into the discordant effects of sex-stratified
GWAS [37].
We note several limitations of our work. First, although our dataset was (to our knowledge) the
largest multi-ancestry scRNA-seq dataset, it contains only 48 individuals, which limited us to detect a
hundred of ANC-DE genes in the most abundant cell-types. However, extremely significant S-LDXR
results obtained in the top 100 ANC-DE genes in nCM suggest that the gene ranking of our analyses is
robust, even if many of these genes are not significant at the FDR 5% level. Second, our analyses were
18
restricted to only two populations, which are the only ones with both large functional and GWAS datasets
available. Ongoing efforts to generate both functional and GWAS datasets in diverse populations would
help to replicate our results. Third, we ran S-LDXR analyses by considering variants within 100kb windows
on either side of the transcribed region of each gene in the gene set, which can lead to a loss of power.
Better (and ancestry-specific) SNP-to-gene linking strategies should improve our analyses [31]. Finally, it
is not clear if the impact of ANC-DE genes on human disease and complex traits is a consequence of
different allele frequencies within the populations, or a consequence of a different environment that would
impact gene regulation (without changing the genetics of the populations). A next step would be to leverage
the genotypes from the 48 individuals of our scRNA-seq data, to see which fraction of ANC-DE genes have
a cis-eQTL with different allele frequencies in ASI and EUR. However, S-LDXR tests the difference of
allele effects and should account for the difference of allele frequencies in the populations, suggesting that
S-LDXR are driven by the second explanation. Despite these limitations, our results convincingly
demonstrate that ancestry-specific effect sizes are enriched in genes with ancestry-specific regulation.
19
References
1. Consortium TIH, The International HapMap Consortium. A haplotype map of the human genome.
Nature. 2005. pp. 1299–1320. doi:10.1038/nature04226
2. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. Worldwide Human
Relationships Inferred from Genome-Wide Patterns of Variation. Science. 2008. pp. 1100–1104.
doi:10.1126/science.1153717
3. A global reference for human genetic variation. Nature. 2015;526: 68–74.
4. Karlsson EK, Kwiatkowski DP, Sabeti PC. Natural selection and infectious disease in human
populations. Nature Reviews Genetics. 2014. pp. 379–393. doi:10.1038/nrg3734
5. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Natural selection has driven population
differentiation in modern humans. Nature Genetics. 2008. pp. 340–345. doi:10.1038/ng.78
6. Fan S, Hansen MEB, Lo Y, Tishkoff SA. Going global by adapting local: A review of recent human
adaptation. Science. 2016;354: 54–59.
7. Guo J, Wu Y, Zhu Z, Zheng Z, Trzaskowski M, Zeng J, et al. Global genetic differentiation of
complex traits shaped by natural selection in humans. Nat Commun. 2018;9: 1865.
8. Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, Siewert KM, et al. Population-specific causal
disease effect sizes in functionally important regions impacted by selection. Nat Commun. 2021;12:
1098.
9. Kanai M, Ulirsch JC, Karjalainen J, Kurki M, Karczewski KJ, Fauman E, et al. Insights from
complex trait fine-mapping across diverse populations. bioRxiv. 2021.
doi:10.1101/2021.09.03.21262975
10. Nédélec Y, Sanz J, Baharian G, Szpiech ZA, Pacis A, Dumaine A, et al. Genetic Ancestry and
Natural Selection Drive Population Differences in Immune Responses to Pathogens. Cell. 2016. pp.
657–669.e21. doi:10.1016/j.cell.2016.09.025
11. Quach H, Rotival M, Pothlichet J, Loh Y-HE, Dannemann M, Zidane N, et al. Genetic Adaptation
and Neandertal Admixture Shaped the Immune System of Human Populations. Cell. 2016. pp. 643–
656.e17. doi:10.1016/j.cell.2016.09.024
12. Idaghdour Y, Czika W, Shianna KV, Lee SH, Visscher PM, Martin HC, et al. Geographical
genomics of human leukocyte gene expression variation in southern Morocco. Nat Genet. 2010;42:
62–67.
13. Lappalainen T, Sammeth M, Friedländer MR, ’t Hoen PAC, Monlong J, Rivas MA, et al.
Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:
506–511.
14. Martin AR, Costa HA, Lappalainen T, Henn BM, Kidd JM, Yee M-C, et al. Transcriptome
sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS
Genet. 2014;10: e1004549.
20
15. Hughes DA, Kircher M, He Z, Guo S, Fairbrother GL, Moreno CS, et al. Evaluating intra- and inter-
individual variation in the human placental transcriptome. Genome Biol. 2015;16: 54.
16. Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al. Human genomics. The
human transcriptome across tissues and individuals. Science. 2015;348: 660–665.
17. Stranger BE, Montgomery SB, Dimas AS, Parts L, Stegle O, Ingle CE, et al. Patterns of cis
regulatory variation in diverse human populations. PLoS Genet. 2012;8: e1002639.
18. Mogil LS, Andaleon A, Badalamenti A, Dickinson SP, Guo X, Rotter JI, et al. Genetic architecture
of gene expression traits across diverse populations. PLoS Genet. 2018;14: e1007586.
19. Shang L, Smith JA, Zhao W, Kho M, Turner ST, Mosley TH, et al. Genetic Architecture of Gene
Expression in European and African Americans: An eQTL Mapping Study in GENOA. Am J Hum
Genet. 2020;106: 496–512.
20. Fagny M, Patin E, MacIsaac JL, Rotival M, Flutre T, Jones MJ, et al. The epigenomic landscape of
African rainforest hunter-gatherers and farmers. Nat Commun. 2015;6: 10047.
21. Carja O, MacIsaac JL, Mah SM, Henn BM, Kobor MS, Feldman MW, et al. Worldwide patterns of
human epigenetic variation. Nat Ecol Evol. 2017;1: 1577–1583.
22. Kasowski M, Kyriazopoulou-Panagiotopoulou S, Grubert F, Zaugg JB, Kundaje A, Liu Y, et al.
Extensive variation in chromatin states across humans. Science. 2013;342: 750–752.
23. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, et al. Partitioning
heritability by functional annotation using genome-wide association summary statistics. Nat Genet.
2015;47: 1228–1235.
24. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization
of common disease-associated variation in regulatory DNA. Science. 2012;337: 1190–1195.
25. Gusev A, Lee SH, Trynka G, Finucane H, Vilhjálmsson BJ, Xu H, et al. Partitioning heritability of
regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet. 2014;95:
535–552.
26. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic
risk scores may exacerbate health disparities. Nat Genet. 2019;51: 584–591.
27. Sirugo G, Williams SM, Tishkoff SA. The Missing Diversity in Human Genetic Studies. Cell.
2019;177: 1080.
28. Gurdasani D, Barroso I, Zeggini E, Sandhu MS. Genomics of disease risk in globally diverse
populations. Nat Rev Genet. 2019;20: 520–535.
29. Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies.
Trends Genet. 2009;25: 489–494.
30. Kelly DE, Hansen MEB, Tishkoff SA. Global variation in gene expression and the value of diverse
sampling. Curr Opin Syst Biol. 2017;1: 102–108.
31. Gazal S, Weissbrod O, Hormozdiari F, Dey K, Nasser J, Jagadeesh K, et al. Combining SNP-to-gene
linking strategies to pinpoint disease genes and assess disease omnigenicity. medRxiv. 2021;
21
2021.08.02.21261488.
32. Kanai M, Akiyama M, Takahashi A, Matoba N, Momozawa Y, Ikeda M, et al. Genetic analysis of
quantitative traits in the Japanese population links cell types to complex human diseases. Nat Genet.
2018;50: 390–400.
33. Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, et al. The Allelic Landscape of Human
Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167: 1415–1429.e19.
34. Loh P-R, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale
datasets. Nat Genet. 2018;50: 906–908.
35. Machaj F, Rosik J, Szostak B, Pawlik A. The evolution in our understanding of the genetics of
rheumatoid arthritis and the impact on novel drug discovery. Expert Opinion on Drug Discovery.
2020. pp. 85–99. doi:10.1080/17460441.2020.1682992
36. Márquez EJ, Chung C-H, Marches R, Rossi RJ, Nehar-Belaid D, Eroglu A, et al. Sexual-dimorphism
in human immune system aging. Nat Commun. 2020;11: 1–17.
37. Bernabeu E, Canela-Xandri O, Rawlik K, Talenti A, Prendergast J, Tenesa A. Sex differences in
genetic architecture in the UK Biobank. Nat Genet. 2021;53: 1283–1289.
22
Appendices
Appendix A: Supplementary Figures
Supplementary Figure 1: UMAP projection and assignment of all cells to cell types. We report UMAP
projection and assignment of all cells to 11 cell types within the whole dataset. Cells are shown by cell-
types (A), ancestry (B), disease status (C) and both ancestry and status (D). Cells did not cluster according
to their disease status and ancestries.
23
Supplementary Figure 2: Cell type proportions of all cell types in ASI and EUR individuals. We report
a boxplot of the cell-type proportion differences between the 23 ASI individuals and 25 EUR individuals.
24
Supplementary Figure 3: S-LDXR results for the top ANC-DE genes in 7 cell types. We report S-
LDXR results meta-analyzed across 15 blood and immune traits for SNP-annotations related to the top 100,
200, 500, 1000, 2000 ANC-DE genes (computed on 48 controls), all expressed genes and all genes. We
report multi-ancestry genetic correlation enrichment (A), the significance of the net contribution of each
annotation (i.e., P value of 𝜃) (B), and heritability enrichment in ASI GWAS (C) and EUR GWAS (D). In
panels (A), (C) and (D) solid points represent values significantly different from the ones obtained in all
expressed genes of the corresponding cell type.
25
Supplementary Figure 4: Replication of S-LDXR results when detecting ANC-DE genes in both SLE
cases and controls. We report S-LDXR results meta-analyzed across 15 blood and immune traits for SNP-
annotations related to the top 100, 200, 500, 1000, 2000 ANC-DE genes (computed on 206 SLE cases and
controls), all expressed genes and all genes. We report multi-ancestry genetic correlation enrichment (A),
the significance of the net contribution of each annotation (i.e., P value of 𝜃 ) (B), and heritability
enrichment in ASI GWAS (C) and EUR GWAS (D). In panels (A), (C) and (D) solid points represent values
significantly different from the ones obtained in all expressed genes of the corresponding cell type.
26
Supplementary Figure 5: MCL1 pseudo-bulk gene expression in B cells between ASI and EUR
individuals. We report a boxplot of logtp10k for MCL1 pseudo-bulk gene expression in B cells between
ASI and EUR individuals.
27
Appendix B: Supplementary Tables
Cell type GO Category FDR P-value
Lymphoid lumenal side of membrane 6.02E-03
Lymphoid interferon-gamma-mediated signaling pathway 6.02E-03
Lymphoid peptide antigen binding 6.05E-03
Lymphoid integral component of lumenal side of endoplasmic reticulum membrane 8.11E-03
Lymphoid lumenal side of endoplasmic reticulum membrane 8.11E-03
Lymphoid MHC protein complex 1.00E-02
Lymphoid response to interferon-gamma 1.28E-02
Lymphoid cellular response to interferon-gamma 1.56E-02
Lymphoid immune system process 2.61E-02
Lymphoid regulation of protein polymerization 3.30E-02
Myeloid neutrophil degranulation 7.73E-03
Myeloid neutrophil activation involved in immune response 7.73E-03
Myeloid neutrophil mediated immunity 7.73E-03
Myeloid regulated exocytosis 7.73E-03
Myeloid neutrophil activation 7.73E-03
Myeloid granulocyte activation 7.73E-03
Myeloid extracellular exosome 7.73E-03
Myeloid extracellular vesicle 7.73E-03
Myeloid extracellular organelle 7.73E-03
Myeloid leukocyte degranulation 7.73E-03
Myeloid myeloid cell activation involved in immune response 8.40E-03
Myeloid myeloid leukocyte mediated immunity 9.07E-03
28
Supplementary Table 1: Gene ontology (GO) enrichment analysis for ANC-DE genes in lymphoid
and myeloid cell types. We report pathways significant at the FDR 5% level and their corresponding FDR-
corrected P-value.
29
Cell type GO Category FDR P-value
All immune response 8.39E-04
All interspecies interaction between organisms 1.05E-02
All response to interferon-gamma 1.05E-02
All interferon-gamma-mediated signaling pathway 1.44E-02
All leukocyte activation 1.44E-02
All cellular response to interferon-gamma 1.85E-02
All MHC protein complex 2.61E-02
All response to external biotic stimulus 3.21E-02
All response to other organism 3.21E-02
All cellular response to cytokine stimulus 3.21E-02
All cell surface 3.22E-02
All neutrophil degranulation 3.22E-02
All ER to Golgi transport vesicle membrane 3.22E-02
All lumenal side of membrane 3.22E-02
All neutrophil activation involved in immune response 3.22E-02
All response to biotic stimulus 3.22E-02
All negative regulation of transport 3.31E-02
All leukocyte degranulation 3.35E-02
All neutrophil mediated immunity 3.35E-02
All positive regulation of cellular respiration 3.35E-02
All neutrophil activation 3.35E-02
All peptide antigen binding 3.35E-02
Supplementary Table 2: Gene ontology (GO) enrichment analysis for ANC-DE genes in all cell types.
We report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-value.
30
Cell type GO Category FDR P-value
NK interferon-gamma-mediated signaling pathway 1.92E-03
NK immune response 2.49E-03
NK MHC class II protein complex 4.25E-03
NK cellular response to interferon-gamma 7.83E-03
NK peptide antigen binding 8.20E-03
NK response to interferon-gamma 1.04E-02
NK integral component of lumenal side of endoplasmic reticulum membrane 1.04E-02
NK lumenal side of endoplasmic reticulum membrane 1.04E-02
NK MHC protein complex 1.07E-02
NK immune system process 1.33E-02
NK clathrin-coated endocytic vesicle membrane 1.51E-02
NK regulation of immune system process 1.51E-02
NK lumenal side of membrane 1.51E-02
NK cellular response to cytokine stimulus 1.51E-02
NK response to cytokine 2.62E-02
NK defense response to other organism 2.62E-02
NK antigen binding 2.62E-02
NK clathrin-coated endocytic vesicle 2.62E-02
NK cytokine-mediated signaling pathway 4.51E-02
Supplementary Table 3: Gene ontology (GO) enrichment analysis for ANC-DE genes in NK cells. We
report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-value.
31
Cell type GO Category FDR P-value
CM neutrophil degranulation 1.95E-02
CM neutrophil activation involved in immune response 1.95E-02
CM neutrophil mediated immunity 1.95E-02
CM neutrophil activation 1.95E-02
CM granulocyte activation 1.95E-02
CM leukocyte degranulation 0.21E-02
CM regulated exocytosis 0.21E-02
CM myeloid cell activation involved in immune response 0.21E-02
CM extracellular region 0.21E-02
CM extracellular exosome 0.21E-02
CM myeloid leukocyte mediated immunity 0.21E-02
CM extracellular vesicle 0.21E-02
CM extracellular organelle 0.21E-02
CM myeloid leukocyte activation 0.34E-02
CM secretion by cell 0.34E-02
CM secretory granule 0.34E-02
CM vesicle-mediated transport 0.39E-02
CM immune response 0.39E-02
CM export from cell 0.41E-02
CM leukocyte activation 0.46E-02
CM extracellular space 0.46E-02
CM exocytosis 0.46E-02
Supplementary Table 4: Gene ontology (GO) enrichment analysis for ANC-DE genes in CM cells.
We report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-value.
32
Cell type GO Category P-value FDR P-value
B positive regulation of cell death 8.10E-06 6.71E-02
B positive regulation of apoptotic process 2.01E-04 5.87E-01
B positive regulation of programmed cell death 2.12E-04 5.87E-01
B tertiary granule lumen 5.92E-04 1.00
B macromolecule transmembrane transporter activity 9.86E-04 1.00
B regulation of phospholipid transport 1.12E-03 1.00
B positive regulation of phospholipid transport 1.12E-03 1.00
B protein localization to endoplasmic reticulum 1.13E-03 1.00
B regulation of anatomical structure size 1.58E-03 1.00
B GDP-mannose metabolic process 2.06E-03 1.00
Supplementary Table 5: Gene ontology (GO) enrichment analysis for ANC-DE genes in B cells. We
report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-value.
33
Cell type GO Category P-value FDR P-value
cDC response to external stimulus 1.39E-05 1.08E-01
cDC cellular response to organic substance 8.34E-05 2.46E-01
cDC response to organic substance 1.12E-04 2.46E-01
cDC response to external biotic stimulus 1.59E-04 2.46E-01
cDC response to other organism 1.59E-04 2.46E-01
cDC response to biotic stimulus 2.17E-03 2.77E-01
cDC cellular response to chemical stimulus 2.50E-03 2.77E-01
cDC extracellular region 4.56E-03 2.96E-01
cDC response to bacterium 4.65E-03 2.96E-01
cDC extracellular exosome 5.31E-03 2.96E-01
Supplementary Table 6: Gene ontology (GO) enrichment analysis for ANC-DE genes in cDC cells.
We report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-value.
34
Cell type GO Category P-value
NK immune response 2.78E-07
NK interferon-gamma-mediated signaling pathway 9.18E-06
NK immune system process 5.35E-05
NK cellular response to interferon-gamma 2.41E-04
NK response to interferon-gamma 6.75E-04
NK regulation of immune system process 6.82E-04
NK cytokine-mediated signaling pathway 8.52E-04
NK cellular response to cytokine stimulus 1.24E-03
NK regulation of immune response 1.24E-03
NK defense response to other organism 1.39E-03
NK MHC protein complex 2.08E-03
NK defense response 2.49E-03
NK response to cytokine 2.49E-03
NK cell surface receptor signaling pathway 2.71E-03
NK innate immune response 4.18E-03
NK MHC class II protein complex 5.42E-03
NK response to external biotic stimulus 8.18E-03
NK response to other organism 8.18E-03
NK antigen binding 1.05E-02
NK response to biotic stimulus 1.05E-02
NK immune effector process 1.12E-02
NK cell periphery 1.16E-02
Supplementary Table 7: Gene ontology (GO) enrichment analysis for top 100 ANC-DE genes in NK
cells. We report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-
value.
35
Cell type GO Category P-value
nCM secretion 4.80E-05
nCM immune effector process 4.80E-05
nCM secretory granule 1.35E-04
nCM neutrophil degranulation 1.61E-04
nCM neutrophil activation involved in immune response 1.61E-04
nCM neutrophil mediated immunity 1.61E-04
nCM neutrophil activation 1.61E-04
nCM immune response 1.61E-04
nCM granulocyte activation 1.61E-04
nCM secretion by cell 1.88E-04
nCM export from cell 2.06E-04
nCM regulated exocytosis 2.06E-04
nCM leukocyte degranulation 2.06E-04
nCM myeloid cell activation involved in immune response 2.06E-04
nCM secretory vesicle 2.21E-04
nCM myeloid leukocyte mediated immunity 2.21E-04
nCM myeloid leukocyte activation 2.84E-04
nCM defense response 3.42E-04
nCM leukocyte mediated immunity 3.91E-04
nCM leukocyte activation involved in immune response 5.58E-04
nCM cell activation involved in immune response 5.72E-04
nCM exocytosis 6.43E-04
Supplementary Table 8: Gene ontology (GO) enrichment analysis for top 100 ANC-DE genes in NK
cells. We report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-
value.
36
Cell type GO Category P-value FDR P-value
cDC response to external stimulus 2.85E-05 1.43E-01
cDC response to bacterium 3.81E-05 1.43E-01
cDC hydrogen peroxide catabolic process 5.52E-05 1.43E-01
cDC response to type I interferon 8.38E-05 1.62E-01
cDC peroxidase activity 1.21E-04 1.75E-01
cDC response to carbohydrate 2.29E-04 1.75E-01
cDC immune system process 2.59E-04 1.75E-01
cDC oxidoreductase activity, acting on peroxide as
acceptor
2.65E-04 1.75E-01
cDC aging 2.70E-04 1.75E-01
cDC type I interferon signaling pathway 3.22E-04 1.75E-01
Supplementary Table 9: Gene ontology (GO) enrichment analysis for ANC-DE genes in cDC cells.
We report pathways significant at the FDR 5% level and their corresponding FDR-corrected P-value.
Abstract (if available)
Abstract
Genome-wide association studies (GWAS) have highlighted that human diseases and complex traits are dominated by common regulatory risk variants, and that some of these variants have ancestry-specific effects. Here we tested if these variants tend to be enriched around genes that have ancestry-specific levels of transcription. First, we analyzed single-cell RNA-seq (scRNA-seq) data in peripheral blood mononuclear cells (PBMCs) from 48 individuals of Asian (ASI) or European (EUR) ancestry, and detected 370 genes significantly differentially expressed across ASI and EUR ancestries (ANC-DE genes) in at least one cell type. These genes tend to be differentially expressed in a single cell type, and were enriched in genes involved in immune response to the environment. Second, we leveraged 15 GWAS datasets of blood and immune-related traits available in both ASI and EUR populations, and ran S-LDXR to test if genetic variants surrounding ANC-DE genes were enriched in ancestry-specific effects. We determined that squared multi-ancestry genetic correlation is 0.53 ± 0.05 depleted around lymphoid ANC-DE genes and 0.67 ± 0.06 depleted around myeloid ANC-DE genes, and that the stronger lymphoid depletion is driven by the genes most differentially expressed in B cells (0.32 ± 0.06 depletion). Finally, we leveraged our findings to interpret different discordant results between ASI and EUR Lymphocyte Count GWAS around the MCL1 gene. Altogether, highlight that genes with ancestry-specific levels of regulation in lymphoid and myeloid cell types are enriched in genes involved in immune response to the environment and enriched in genetic variants with different effect sizes in blood and immune-related traits. These results could be explained by gene-environment interaction within immune genes.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Understand the distinct patterns of selection in auto-immune diseases with ancient DNA data by the S-LDSC model
PDF
Modeling the minor allele frequency and linkage disequilibrium joint architectures of human diseases and complex traits
PDF
Characterizing synonymous variants by leveraging gene expression and GWAS datasets
PDF
Benchmarking of computational tools for ancestry prediction using RNA-seq data
PDF
Improving the power of GWAS Z-score imputation by leveraging functional data
PDF
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
PDF
Analysis of SNP differential expression and allele-specific expression in gestational trophoblastic disease using RNA-seq data
PDF
Genome-wide characterization of the regulatory relationships of cell type-specific enhancer-gene links
PDF
Tri-specific T cell engager immunotherapy targeting tumor initiating cells
PDF
RNA methylation in cancer plasticity and drug resistance
PDF
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
PDF
Molecular signature of aggressive disease and clonal diversity revealed by single-cell copy number analysis of prostate cancer cells across multiple disease states
Asset Metadata
Creator
Wang, Juehan (author)
Core Title
Understanding ancestry-specific disease allelic effect sizes by leveraging multi-ancestry single-cell RNA-seq data
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Degree Conferral Date
2022-05
Publication Date
04/14/2022
Defense Date
03/18/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
allele effect sizes,Asian and European ancestry,multi-ancestry,OAI-PMH Harvest,PBMCs,single-cell RNA-seq data,S-LDXR
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gazal, Steven (
committee chair
), de Smith, Adam (
committee member
), Nicholas, Mancuso (
committee member
)
Creator Email
juehanwa@usc.edu,juehanwang310@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110960498
Unique identifier
UC110960498
Document Type
Thesis
Format
application/pdf (imt)
Rights
Wang, Juehan
Type
texts
Source
20220415-usctheses-batch-924
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
allele effect sizes
Asian and European ancestry
multi-ancestry
PBMCs
single-cell RNA-seq data
S-LDXR