Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Using genetic ancestry to improve between-population transferability of a prostate cancer polygenic risk score
(USC Thesis Other)
Using genetic ancestry to improve between-population transferability of a prostate cancer polygenic risk score
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Using Genetic Ancestry to Improve Between-Population Transferability of a
Prostate Cancer Polygenic Risk Score
by
Ali Sahimi
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
APPLIED BIOSTATISTICS AND EPIDEMIOLOGY
May 2021
Copyright 2021 Ali Sahimi
ii
Dedication
This study is dedicated to my paternal grandparents, Habibollah Sahimi and Fatemeh
Fakour Rashid. Their presence will always be with me, and the memory of their struggles is
what drove me towards medical research.
To my maternal grandparents, Muhammad Ali Babaei and Hajar Shireenkam, whose
kindness, nobility, resilience, and strength have always inspired me and continue to do so.
Finally, to my uncle, Ali Sahimi, who lost his life at the age of 23 believing in a better
tomorrow. I hope to live up to his name and memory.
iii
Acknowledgements
At the time this is being written, the world is making its way to recovery after a year long
battle with COVID-19. Trying to complete an accelerated Masters’ program within such
difficult times requires determination, grit, and resiliency. For these attributes, I will forever
be in debt to my friends and family. Primarily, I have to thank my parents, Muhammad and
Mahnoush, for exemplifying resiliency and determination. Both moved to an unknown and
unfamiliar country and built themselves up through hard-work and dedication. I am endlessly
grateful to them for their endless love and support. Their example is one that I have used to
push myself forward and walk the unbeaten path. I would also like to thank my sister, Niloofar.
For years she has been a source of endless laughter and a confidant. Most siblings end up
exasperated after such a long period of time quarantined together, but she has been a wellspring
of support. To me, family is everything. Without them I would not have made it as far as I
have. Thank you.
I also have had the support of family members forged through camaraderie rather than
blood. I have known my closest friends since the early days of High School. Their faith in my
capability, their endless jokes, and their staunch loyalty have all been gifts and privileges I
have cherished. I am eternally grateful to them and look forward to more years of laughter and
chaos.
Despite the fact that I only had a little over a year to do a project in a field where I had no
prior experience, Dr. David Conti still welcomed me into his research group with open arms.
He provided me with the tools and opportunities needed to learn the necessary information
iv
quickly and showed enough trust in me to assign me this important project. I will forever be in
his debt.
Finally, I thank my supervisor and mentor, Dr. Burcu Darst. It is difficult enough to teach
an undergraduate with no prior knowledge of biostatistics. It is a Herculean task to do so
through email and Zoom calls. She has demonstrated endless patience and exceptional talent
in adapting to the situation and has helped me develop my skills beyond what would have been
expected given the circumstances. Thank you for always being a steady hand in a time when
everything was anything but steady. I look forward to your successes in the future and am
forever indebted to you as well; I could not have asked for a better mentor.
v
Table of Contents
Dedication ...................................................................................................................................... ii
Acknowledgements ...................................................................................................................... iii
List of Tables ................................................................................................................................ vi
List of Figures ............................................................................................................................. viii
Abstract ......................................................................................................................................... ix
Chapter 1: Introduction ............................................................................................................... 1
Chapter 2: Methods ...................................................................................................................... 3
MEC Participants ....................................................................................................................... 3
Genetic Data and Ancestral Composition Estimation .............................................................. 3
PRS Computation and Evaluation ............................................................................................ 4
Chapter 3: Results......................................................................................................................... 7
Chapter 4: Discussion ................................................................................................................. 21
References .................................................................................................................................... 23
Appendix ...................................................................................................................................... 27
vi
List of Tables
Table 1: Average ADMIXTURE ancestral proportions for Reference and MEC populations
stratified by self-identified superpopulation group ......................................................................... 9
Table 2: Admixture Ancestral Proportions between Cases and Controls in MEC Male
Participants. ..................................................................................................................................... 9
Table 3: PRS Distribution Decile Cutoff Values for 1000G + PAGE Reference and MEC
Control Groups by Superpopulation ............................................................................................. 14
Table 4: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC
EUR Participants ........................................................................................................................... 16
Table 5: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC
AFR Participants ........................................................................................................................... 17
Table 6: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC
EAS Participants ........................................................................................................................... 18
Table 7: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC HIS
Participants .................................................................................................................................... 19
Table 8: Area Under the Curve (AUC), estimating predictivity of PRS methodologies for each
self-identified superpopulation group. The base model included Age and first 10 PCs, which was
also included in each PRS model. ................................................................................................. 20
Appendix Table 1: Mean Ancestry Proportions Generated from Varying PRS Calculation
Methodologies within MEC EUR Participants ............................................................................. 29
Appendix Table 2: Mean Ancestry Proportions Generated from Varying PRS Calculation
Methodologies within MEC AFR Participants ............................................................................. 29
Appendix Table 3: Mean Ancestry Proportions Generated from Varying PRS Calculation
Methodologies within MEC EAS Participants ............................................................................. 30
vii
Appendix Table 4: Mean Ancestry Proportions Generated from Varying PRS Calculation
Methodologies within MEC HIS Participants .............................................................................. 30
viii
List of Figures
Figure 1: Admixture Estimates of Ancestry in MEC Participants................................................. 8
Figure 2: Polygenic Risk Score Distributions Utilizing Different Effect Sizes in MEC and
1000G + PAGE Reference Participants .................................................................................. 11, 12
Appendix Figure 1: PC Plot for Development of Reference Panel ...................................... 28, 28
ix
Abstract
PSA screening for prostate cancer (PCa) has continuously had the issue of overdiagnosis,
especially in minority populations that are more at risk. Polygenic risk scores (PRS) have the
potential to ameliorate these deficiencies, but effective methods to translate them to admixed
populations are lacking. Several studies have shown that incorporating ancestry proportions
can help bridge this gap in translatability. Here, we describe a new method to construct
individualized ancestry informed PRS distributions by utilizing reference sets that are
representative of a homogeneous sample of superpopulations. Ancestry proportions were
estimated using ADMIXTURE in the Multi-Ethnic Cohort (MEC) (N=41,158) using a
reference of homogeneous European, African, East Asian, and Amerindian individuals from
the 1000 Genomes Project and PAGE Global Reference Panel (N=1,890). A multi-ancestry
PCa PRS was calculated in MEC men (N=18,362) and reference individuals, and reference
PRS distributions were calculated for each reference superpopulation. For each MEC
participant, individualized PRS distributions were calculated as a weighted sum of the
reference PRS distributions, utilizing ancestry proportions as weights, and these individualized
PRS distributions were used to categorize participants into PRS categories. The PRS was
evaluated using logistic regression models, and odds ratios demonstrated an improvement in
predictive ability of our new method across diverse populations compared to the standard PRS
categorization based on controls of the same sample. Improvement in risk estimates is
concurrent with other attempts to incorporate ancestry into PRS construction; the Hispanic
cohort shows utility of new method in admixed populations, but more tests on independent
data will be required.
Chapter 1: Introduction
Prostate Cancer (PCa) is the most frequently diagnosed cancer in males and the second
leading cause of male cancer-related death in the United States, with 207K new cases and 30K
deaths recorded in 2017
1
. The disease is understood to be highly heritable, with 57% of the
variation attributed to genetics
2
. Genome-wide association studies (GWAS) have identified
multiple candidate loci associated with risk for developing PCa
3–7
, which explain ~40% of the
familial relative risk of PCa
7
.
PCa disproportionately impacts African American (AA) men who, compared to non-
Hispanic Whites, are twice as likely to develop PCa and over twice as likely to die from the
disease
1
. Research into these disparities has found genetic risk variants present in higher
frequencies within AA populations, such as variants found within the 8q24 region
3,7–9
. This
disparity is exacerbated by false-positive diagnoses resulting from current prostate-specific
antigen (PSA) screening procedures, which are estimated to have between 17-66%
overdiagnosis rates
10–12
. This low specificity has been shown to result in more overdiagnosis
of PCa in AA men
13,14
. As such, there is a need to bolster screening with new methodologies
that work in tandem with current procedures in order to improve diagnostics.
Utilizing polygenic risk scores (PRS) developed from large-scale GWAS in conjunction
with standard clinical screening has shown improvements in accuracy and early detection for
multiple chronic conditions, such as atherosclerosis and other cardiovascular diseases
15,16
.
However, ~80% of participants included in GWAS investigations are of European descent,
limiting the generalizability of PRS, as PRS generated from Euro-centric data has been shown
2
to have poorer predictive ability in populations that are increasingly divergent from European
ancestry
17
. To address this problem within the context of PCa genetic epidemiology, we
previously performed a multi-ancestry GWAS meta-analysis of over 230,000 men (~107,000
of whom were PCa cases) from European, African, East Asian, and Hispanic populations
7
. We
identified 86 novel PCa risk loci, bringing the total to 269 known risk variants, which we used
to construct a PRS that was highly predictive of PCa risk across populations. However, the
discriminative ability was lower in highly admixed African and Hispanic populations. To
optimize the potential benefit of incorporating PRS into PCa screening across populations, it
will be important to improve predictive ability of PRS in admixed populations.
Here, we expanded on the findings of this previous work by using an individual’s unique
ancestral composition to create an individualized reference PCa PRS distribution, which we
subsequently use to evaluate PCa risk across diverse men, utilizing data from the Multiethnic
Cohort (MEC) Study
18
, 1000 Genomes Project
19
, and PAGE Global Reference Panel
20
.
3
Chapter 2: Methods
MEC Participants
The MEC Study is a prospective cohort study established between 1993 and 1996 to
investigate causes behind chronic conditions, especially cancer, across diverse populations in
Los Angeles and Hawaii. Blood samples were collected between 2001 and 2006 in nested case-
control studies, allowing for genetic investigations. PCa cases were identified through registry
entries within the cancer Surveillance, Epidemiology, and End results (SEER) program in
Hawaii and California. Clinically relevant information, such as sex, age at blood draw, and age
at diagnosis (for cases) was recorded. GWAS data were available for a total of 41,158 MEC
participants across 29 projects and five populations – White, Black, Japanese, Latino, and
Native Hawaiian – although one of these populations, Native Hawaiian, was excluded due to
our not having ethnic-specific PRS effect estimates for their population. The final sample used
for PCa association analyses was 18,362 male PCa case and control participants. Of the cases
there were 982 European, 1611 African, 1295 East Asian, and 1400 Hispanic PCa cases.
Genetic Data and Ancestral Composition Estimation
Genome-wide genotyping data was imputed with Minimac4 v1.0.2 on the Michigan
Imputation Server v1.2.4
21
using Phase 3 of the 1000 Genomes Project
19
as the reference panel
and Eagle v2.4
22
for phasing. Variants used to estimate ancestral composition were autosomal
variants that overlapped across the 29 MEC GWAS projects, were common with a minor allele
4
frequency > 0.01, had a genotype call rate > 0.99, and were independent with R
2
< 0.1, leading
to 16,621 variants available for ancestry analyses.
ADMIXTURE
23
was used to determine the proportion of European, African, Amerindian,
and East Asian ancestry within the 41,158 MEC participants. The analysis was supervised at
K=4, using reference populations from Phase 3 of the 1000 Genomes Project
19
and the PAGE
Global Reference Panel
20
for the four ancestral groups and the 16,621 variants. Participants
included in the reference panel were those whose genetic ancestry closely reflected their
population group, determined by performing a principal component analysis (PCA) using the
16,621 variants. For each superpopulation, participants whose PC1 or PC2 values were the
equivalent of the 1 SD value from the median of the Amerindian cluster were removed.
Additionally, 1
st
degree related individuals were excluded. Graphics showing the selection of
reference individuals utilized in this study can be seen in Appendix Figure 1. The reference
panel was limited to 517 African, 513 European, 360 Amerindian, and 500 East Asian ancestry
individuals, for a total of 1,890 participants. The first 10 principal components resulting from
a PCA analysis in MEC were used to adjust for potential population stratification in subsequent
association analyses.
PRS Computation and Evaluation
Of the 269 risk variants previously described
7
, 268 were present and imputed with an info
score > 0.8 in MEC. PRS were constructed using two different weighing methods. Firstly, for
all MEC cases and controls polygenic risk was computed using multi-ethnic effect sizes for all
268 variants. Secondly, for the same MEC participants their PRS was recalculated utilizing
5
ethnic-specific weights matching each participant’s self-identified superpopulation group. For
this computation, each superpopulation did not have effect sizes for all 268 variants utilized;
missing values for a variant were replaced with their respective multi-ethnic effect size to
ensure equal variants were being utilized across all self-identified groups.
In parallel with the construction of the different PRS, the estimation of risk was evaluated
using three different methods. For these analyses, the PRS was divided into a categorical
variable by constructing indicator variables for population-specific PRS categories determined
in controls: [0% - 10%], (10% - 20%], (20% - 30%], (30% - 40%], (40% - 60%], (60% - 70%],
(70% - 80%], (80% - 90%], and (90% - 100%], utilizing the (40% - 60%] category as the
reference. The different methods would redefine the cutoff values at which each of these decile
categories listed above began and ended within the reference distribution.
For the multi-ethnic and ethnic-specific weight PRS calculated for MEC participants, the
initial method of estimating risk was by creating a reference PRS distribution of MEC control
participants, stratified by self-identified superpopulation. PCa case participant PRS would be
distributed within the control PRS distribution weighted with the same effect size type.
Afterwards, odds ratios would be calculated for each of the decile categories with respective
confidence intervals and p-values. The second method utilized reference distributions with the
goal of evaluating the performance of a PRS when the reference PRS distribution is more
representative of a homogenous superpopulation group. The MEC cases and controls were
sorted within the defined reference PRS cutoffs (matching reference PRS distributions to self-
reported ancestry). Logistic regression models adjusted for the first 10 PCs and age were used
to examine the effect of this newly categorized PRS on PCa risk.
6
In the third method of estimating risk, for each participant within the MEC – with either
case or control status – new PRS decile cutoffs were calculated. The ADMIXTURE ancestral
proportions for each MEC participant were used to calculate personalized PRS decile cutoffs
as a weighted sum of the reference PRS decile cutoffs:
𝐶𝑢𝑡𝑜𝑓𝑓 𝐷𝑒𝑐𝑖𝑙𝑒 .𝐶𝑎𝑡 = 𝜌 𝐸𝑈𝑅 ∗ 𝐶𝑢𝑡𝑜𝑓𝑓 𝐸𝑈𝑅 .𝐷𝑒𝑐𝑖𝑙𝑒 .𝐶𝑎𝑡 + 𝜌 𝐴𝐹𝑅 ∗ 𝐶𝑢𝑡𝑜𝑓𝑓 𝐴𝐹𝑅 .𝐷𝑒𝑐𝑖𝑙𝑒 .𝐶𝑎𝑡 + 𝜌 𝐸𝐴𝑆 ∗ 𝐶𝑢𝑡𝑜𝑓𝑓 𝐸𝐴𝑆 .𝐷𝑒𝑐𝑖𝑙𝑒 .𝐶𝑎𝑡 + 𝜌 𝐴𝑀𝑅 ∗ 𝐶𝑢𝑡 𝑜𝑓𝑓 𝐴𝑀𝑅 .𝐷𝑒𝑐𝑖𝑙𝑒 .𝐶𝑎𝑡
After new cutoffs were calculated, the individual’s PRS was categorized accordingly based on
his unique PRS decile cutoffs. All MEC cases and controls were categorized in this fashion,
and categorical analyses were carried out between cases and control, as described above within
each self-reported ancestral population. Evaluation of the predictive capability of the different
PRS distributions was conducted through the use of Area under the Curve (AUC) estimates.
This was done through the “pROC” package on R and included the age of participants and the
first 10 PC values in the baseline model. The discriminative model accounted for the same in
addition to the PRS.
7
Chapter 3: Results
Across the full MEC sample of 41,158 participants, those who self-identified EUR and
EAS descent had the greatest mean ancestral proportions of their own respective
superpopulation groups (Fig. 1). More specifically, the MEC participants self-reporting
African ancestry were on average 76.24% African by descent, while those self-reporting
European ancestry were on average 94.72% European by descent and those self-reporting East
Asian ancestry were on average 97.51% East Asian by descent. Self-reported Hispanic
participants were the most admixed with only 39.36% Amerindian descent and their greatest
ancestral proportion being European at 52.75%, as listed in Table 1. After limiting the MEC
participants to the 18,362 males within the dataset, the Wilcoxon rank sum test was used to
determine if there were significant differences in ancestry proportions between cases and
controls within each self-identified population group (Table 2).
PRS distributions were graphed for three different sets: the 1000G + PAGE reference
participants using multi-ethnic effect sizes; the MEC male participants using multi-ethnic
effect sizes, and the MEC male participants using ethnic-specific effect sizes. For the third set
of distributions, all 268 variants did not have effect sizes available in each of the
superpopulation groups. European weights were available for 266 variants, while African
weights were available for 248, East Asian weights for 222, and Hispanic weights for 254
variants. For variants in each set of ethnic-specific effect sizes without a value for the weight,
the multi-ethnic effect size was used to ensure a common number of variants across all four
superpopulations. For the MEC and Reference participants, the African group had the highest
8
Figure 1: Distribution of ADMIXTURE ancestral proportions stratified by superpopulation group and data source. Ancestry was
estimated utilizing 16,621 SNPs curated from 25 overlapping MEC projects after passing filter for MAF > 0.01, R
2
< 0.1. ADMIXTURE
software was used with a Supervised K=4 run. Reference group was a composite of 1000 Genomes Phase3 and PAGE Global Reference
Panel participants. Superpopulations examined were European, African, Hispanic, and East Asian.
9
Table 1: Average ADMIXTURE ancestral proportions for Reference and MEC populations stratified by
self-identified superpopulation group
Reference
EUR
AFR
EAS
AMR
EUR AFR
EAS
HIS
N
EUR
AFR
EAS
AMR
MEC
517
360
500
513
100.00%
0.00%
0.00%
0.00%
0.00%
100.00%
0.00%
0.00%
0.00%
0.00%
100.00%
0.00%
0.00%
0.00%
0.00%
100.00%
5254
10149
9937
10853
94.72%
20.92%
1.12%
52.75%
1.61%
76.24%
0.43%
5.88%
2.08%
1.30%
97.51%
2.00%
1.56%
1.52%
0.92%
39.36%
Table 2: Admixture Ancestral Proportions between Cases and Controls in MEC Male Participants.
EUR
AFR
EAS
AMR
EUR
AFR
EAS
HIS
Case | Control
Case | Control
Case | Control
Case | Control
0.944 | 0.940
0.016 | 0.015
0.022 | 0.024
0.019 | 0.016
0.201 | 0.213
0.771 | 0.758 0.014 | 0.015
0.005 | 0.010
0.003 | 0.004
0.981 | 0.976
0.009 | 0.009
0.540 | 0.529
0.058 | 0.057
0.018 | 0.020
0.383 | 0.392
*
*
*p-value < 0.05 | Wilcoxon Rank Test
0.012 | 0.013
*
*
N
Case | Control
982 | 1716
1611 | 2896
1295 | 3907
1400 | 4235
10
multi-ethnic PRS distribution of all four groups. East Asians in the MEC had the lowest mean
PRS distribution, while PRS distributions in the Reference group overlapped between East
Asians and Hispanics [Fig. 2(a) and 2(b)]. For the ethnic-specific PRS, MEC East Asian men
displayed the highest mean distribution of all four groups, while African men were the lowest
and overlapped with Hispanic men, as shown in Figure 2(c). In all three distributions,
Europeans were in-between the two extrema groups.
The distributions for the multiethnic PRS calculated in the reference panel and MEC
participants were divided into decile categories for each superpopulation and are recorded in
Table 3. Reference decile cutoffs were used in the analyses examining the utilization of a
homogeneous reference PRS distribution and an ancestry weighted PRS distribution.
In multiethnic PRS analyses, we found that East Asian men in the top 90%-100% PRS
decile had the highest odds of PCa (OR=4.93, 95% CI=3.94-6.15, P=6.21x10
-45
), followed by
European (OR=4.50, 95% CI=3.37-6.01, P=2.32x10
-24
), African (OR=3.83, 95% CI=3.08-
4.75, P=4.60x10
-34
), and Hispanic men (OR=3.60, 95% CI=2.92-4.44, P=2.37x10
-33
). A
similar pattern was observed for the lowest 0% - 10% PRS decile from smallest OR value to
highest (Tables 4-7).
In ethnic-specific PRS analyses, the top 90% - 100% PRS deciles exhibit the same pattern
with East Asian men having highest odds of PCa (OR=6.59, 95% CI=5.23-8.30, P=8.95x10
-
58
), followed by European (OR=4.64, 95% CI=3.46-6.21, P=9.24x10
-25
), African (OR=4.32,
95% CI=3.47-5.38, P=3.48x10
-39
), and Hispanic men (OR=3.89, 95% CI=3.16-4.78,
P=8.39x10
-38
). A similar but opposite pattern was observed for the 0% - 10% PRS decile
(Tables 4-7).
11
A.
A.
B.
B.
C.
C.
Figure 2 (Caption Continued on next page)
12
Next, we evaluated the efficacy of using a PRS reference distribution based on
homogenous superpopulation groups from the 1000G + PAGE. Using this approach, East
Asian MEC participants showed the highest PCa odds within the 90% - 100% PRS decile
category (OR=6.72, 95% CI=5.30-8.52, P=8.05x10
-56
), followed by European (OR=5.13, 95%
CI=3.79-6.95, P=3.62x10
-26
), Hispanic (OR=5.04, 95% CI=3.71-6.84, P=4.64x10
-25
), and
African men (OR=3.90, 95% CI=3.00-5.08, P=4.44x10
-24
). Additionally, both cases and
controls were distributed more densely in the upper decile categories (Tables 4-7). Compared
against the odds calculated utilizing MEC controls as a reference distribution to define decile
categories, using 1000G + PAGE showed higher estimates of risk in the upper extrema deciles
and lower estimates in the lower extrema deciles across all self-identified superpopulation
groups. Additionally, the number of cases and controls differ greatly between both methods as
a result of the underlying method of distributing MEC participants.
Lastly, our new methodology incorporating weighted PRS decile cutoff values using each
individual’s admixture proportions was evaluated. Within the upper 90% - 100% PRS
category, East Asians displayed the highest PCa odds, and the highest odds displayed by all
Figure 2: Polygenic risk score distribution of reference panel participants combined from 1000
Genome Phase3 and PAGE Global Reference Panel (A), of the Multi-Ethnic Cohort (MEC)
participants using multi-ethnic weights and stratified between cases and controls (B), and the
MEC participants with ethnic-specific weights and replacement with ME weights (C). A) All
1890 participants are represented with PRS calculated utilizing the multi-ethnic weights for 268
risk loci. Distributions are present for European (x ̄ = 23.92, σ = 1.11), African (x ̄ = 24.91, σ =
1.14), East Asian (x ̄ = 23.67, σ = 0.97), and Hispanic (x ̄ = 23.63, σ = 0.67). B) Distributions are
divided between each superpopulation group with European cases (x ̄ = 24.74, σ = 0.82) and
controls (x ̄ = 24.14, σ = 0.79), African cases (x ̄ = 25.41, σ = 0.86) and controls (x ̄ = 24.85, σ =
0.82), East Asian cases (x ̄ = 24.35, σ = 0.72) and controls (x ̄ = 23.75, σ = 0.74), and Hispanic
cases (x ̄ = 24.83, σ = 0.80) and controls (x ̄ = 24.28, σ = 0.77). C) Distributions are divided
between each superpopulation group with European cases (x ̄ = 22.82, σ = 0.75) and controls (x ̄
= 22.27, σ = 0.73), African cases (x ̄ = 20.16, σ = 0.72) and controls (x ̄ = 19.66, σ = 0.69), East
Asian cases (x ̄ = 24.97, σ = 0.80) and controls (x ̄ = 24.25, σ = 0.81), and Hispanic cases (x ̄ =
20.29, σ = 0.77) and controls (x ̄ = 19.75, σ = 0.76).
Table 8: Area Under the Curve (AUC) estimating predictivity of PRS methodologies for each self-
identified superpopulation groupFigure 2: Polygenic risk score distribution of reference panel
participants combined from 1000 Genome Phase3 and PAGE Global Reference Panel (A), of the
Multi-Ethnic Cohort (MEC) participants using multi-ethnic weights and stratified between cases
and controls (B), and the MEC participants with ethnic-specific weights and replacement with
ME weights. A) All 1890 participants are represented with PRS calculated utilizing the multi-
ethnic weights for 268 risk loci. Distributions are present for European (x ̄ = 23.92, σ = 1.11),
African (x ̄ = 24.91, σ = 1.14), East Asian (x ̄ = 23.67, σ = 0.97), and Hispanic (x ̄ = 23.63, σ =
0.67). B) Distributions are divided between each superpopulation group with European cases (x ̄
= 24.74, σ = 0.82) and controls (x ̄ = 24.14, σ = 0.79), African cases (x ̄ = 25,41, σ = 0.86) and
controls (x ̄ = 24.85, σ = 0.82), East Asian cases (x ̄ = 24.35, σ = 0.72) and controls (x ̄ = 23.75, σ
= 0.74), and Hispanic cases (x ̄ = 24.83, σ = 0.80) and controls (x ̄ = 24.28, σ = 0.77). C)
Distributions are divided between each superpopulation group with European cases (x ̄ = 22.82,
σ = 0.75) and controls (x ̄ = 22.27, σ = 0.73), African cases (x ̄ = 17.18, σ = 0.71) and controls (x ̄
= 16.68, σ = 0.68), East Asian cases (x ̄ = 18.18, σ = 0.80) and controls (x ̄ = 17.46, σ = 0.81), and
Hispanic cases (x ̄ = 17.38, σ = 0.77) and controls (x ̄ = 16.85, σ = 0.75).
13
four methods (OR=7.06, 95% CI=5.55-8.98, P=5.95x*10
-57
). This was followed by European
(OR=5.11, 95% CI=3.77-6.93, P=9.09x10
-26
), Hispanic (OR=4.75, 95% CI=3.62-6.23,
P=2.75x10
-29
), and African men (OR=4.68, 95% CI=3.72-5.88, P=1.43x10
-39
). This pattern
repeated itself for the lowest decile extrema ordered from the least to the greatest odds. Similar
to the previous method, utilizing admixture proportions to weight the new cutoffs shows higher
estimates in the upper extrema decile categories and lower ones in the lower extrema compared
to the estimates calculated with the multi-ethnic weights only.
Importantly, the Hispanic group displayed statistical significance across all decile
categories when admixture was utilized to weigh the PRS reference cutoffs, in contrast to the
previous methods. Additionally, the Hispanic group was the only one of the four to display a
decrease in odds when shifting from utilizing the reference distributions to using weighted
versions of them. In the weighted reference PRS distribution approach, controls were
distributed more normally within the weighted category cutoffs, as opposed to the unweighted
reference PRS distribution approach, and cases were distributed more densely towards the
upper deciles like the first two methodologies employed, as displayed in Tables 4-7.
Separately, the European and East Asian men, who were the least admixed, displayed the
smallest difference in OR between the methodology that used the reference distributions
without weighing and the method that did weight them. Across all four methodologies
employed, there was no discernible pattern in the differences in mean ancestry proportions for
each group’s respective ancestral group across the decile categories (Appendix Tables 1-4).
14
Table 3: PRS Distribution Decile Cutoff Values for 1000G + PAGE Reference and MEC Control Groups by Superpopulation
MEC
HIS
21.46139
23.30757
23.63378
23.88263
24.09755
24.48516
24.70242
24.92616
25.26748
26.91453
EAS
21.01408
22.78927
23.13398
23.37103
23.57018
23.95178
24.15675
24.37763
24.67565
27.10328
AFR
22.23067
23.82386
24.15042
24.42089
24.63727
25.01601
25.25520
25.54898
25.90284
28.57022
EUR
21.27072
23.13339
23.49277
23.74100
23.94791
24.32025
24.54736
24.80770
25.14626
26.75556
Reference
HIS
21.99749
22.74543
23.03912
23.28674
23.48808
23.77234
23.98387
24.16840
24.47790
25.61793
EAS
20.94751
22.40508
22.85179
23.24450
23.46269
23.93221
24.20208
24.51177
24.84907
26.33552
AFR
21.25428
23.30749
24.01016
24.42157
24.75667
25.28481
25.54898
25.92359
26.30049
28.07198
EUR
20.19174
22.36000
23.03504
23.44440
23.76355
24.22003
24.52463
24.77741
25.31175
26.42739
PRS Decile
0% - 10%
10% - 20%
20% - 30%
30% - 40%
40% - 60%
60% - 70%
70% - 80%
80% - 90%
90% - 100%
0% - 10%
15
To evaluate the discriminative ability of the four PRS models, we calculated the area under
the curve (AUC). The baseline models were calculated for each superpopulation group
accounting only for the first 10 principal components and age of participants. Europeans
estimated an AUC of 0.593 (95% CI: 0.570 – 0.616), Africans 0.539 (95% CI: 0.521 – 0.557),
East Asians 0.642 (95% CI: 0.625 – 0.658), and Hispanic 0.592 (95% CI: 0.575 – 0.609). The
individualized cutoff PRS method was calculated to have an AUC for Europeans at 0.721 (95%
CI: 0.701 – 0.741), Africans at 0.691 (95% CI: 0.675 – 0.707), East Asians at 0.768 (95% CI:
0.753 – 0.782), and Hispanics at 0.709 (95% CI: 0.694 – 0.724). These estimates displayed
slightly better absolute-value, discriminative ability than the multiethnic PRS using within
sample controls to determine the PRS distribution. This difference is not statistically significant
due to overlapping confidence intervals. The European group, however, only exhibited a slight
improvement in discriminative ability when the individualized cutoff PRS approach was
compared to the ethnic-specific PRS (EUR: 0.720, 95% CI: 0.700 – 0.741). With the same
ethnic-specific method, African (0.699, 95% CI: 0.683 – 0.715), Hispanic (0.779, 95% CI:
0.765 – 0.793), and East Asian (0.713, 95% CI: 0.697 – 0.728) groups had higher
discriminative ability in the PRS derived from the latter methodology. There was significant
overlap in the confidence intervals of all four methods (Table 8).
16
Table 4: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC EUR Participants
PRS Cutoff
0.10 (0.01 – 0.72)
0.41 (0.24 – 0.69)
0.56 (0.37 – 0.85)
0.59 (0.42 – 0.84)
1.00 (ref.)
1.37 (1.02 – 1.85)
1.50 (1.10 – 2.06)
2.69 (2.04 – 3.55)
5.11 (3.77 – 6.93)
N
Case | Control
1 | 26
19 | 124
37 | 171
65 | 243
160 | 403
124 | 230
108 | 178
234 | 216
234 | 125
PRS Ref
0.10 (0.01 – 0.77)
0.38 (0.21 – 0.67)
0.49 (0.32 – 0.75)
0.60 (0.42 – 0.85)
1.00 (ref.)
1.27 (0.94 – 1.70)
1.63 (1.19 – 2.24)
2.44 (1.86 – 3.21)
5.13 (3.79 – 6.93)
N
Case | Control
1 | 24
16 | 115
32 | 172
60 | 224
157 | 394
126 | 252
111 | 172
238 | 237
241 | 126
PRS EUR
0.33 (0.20 – 0.55)
0.39 (0.25 – 0.62)
0.71 (0.48 – 1.04)
1.06 (0.74 – 1.52)
1.00 (ref.)
1.60 (1.15 – 2.22)
1.86 (1.35 – 2.57)
2.31 (1.69 – 3.15)
4.64 (3.46 – 6.21)
N
Case | Control
21 | 172
28 | 172
48 | 172
71 | 172
134 | 343
107 | 171
123 | 172
157 | 171
289 | 172
PRS ME
0.31 (0.19 – 0.51)
0.55 (0.37 – 0.83)
0.54 (0.36 – 0.81)
0.88 (0.61 – 1.26)
1.00 (ref.)
1.32 (0.94 – 1.85)
1.57 (1.14 – 2.16)
2.05 (1.51 – 2.80)
4.50 (3.37 – 6.01)
N
Case | Control
21 | 172
40 | 172
42 | 171
66 | 172
140 | 343
92 | 171
117 | 172
158 | 171
297 | 172
PRS Decile
0% - 10%
10% - 20%
20% - 30%
30% - 40%
40% - 60%
60% - 70%
70% - 80%
80% - 90%
90% - 100%
17
Table 5: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC AFR Participants
PRSCutoff
0.29 (0.10 – 0.83)
0.40 (0.27 – 0.60)
0.72 (0.55 – 0.94)
0.69 (0.53 – 0.89)
1.00 (ref.)
1.63 (1.30 – 2.05)
2.12 (1.72 – 2.05)
2.74 (2.20 – 3.40)
4.68 (3.72 – 5.88)
N
Case | Control
4 | 37
33 | 217
101 | 369
103 | 380
298 | 783
190 | 305
280 | 344
282 | 273
320 | 188
PRSRef
0.23 (0.11 – 0.47)
0.37 (0.28 – 0.49)
0.59 (0.46 – 0.75)
0.54 (0.43 – 0.69)
1.00 (ref.)
1.42 (1.13 – 1.78)
1.81 (1.47 – 2.24)
2.62 (2.05 – 3.35)
3.90 (3.00 – 5.08)
N
Case | Control
9 | 73
74 | 381
126 | 416
135 | 480
361 | 703
196 | 264
277 | 300
210 | 162
223 | 117
PRSAFR
0.41 (0.29 – 0.59)
0.57 (0.42 – 0.79)
0.73 (0.54 – 0.99)
0.74 (0.55 – 0.99)
1.00 (ref.)
1.66 (1.30 – 2.12)
1.90 (1.50 – 2.42)
2.55 (2.02 – 3.22)
4.32 (3.47 – 5.38)
N
Case | Control
46 | 290
62 | 290
80 | 289
82 | 290
217 | 579
184 | 289
209 | 290
272 | 289
459 | 290
PRSME
0.36 (0.25 – 0.51)
0.69 (0.51 – 0.92)
0.65 (0.49 – 0.88)
0.61 (0.46 – 0.83)
1.00 (ref.)
1.32 (1.03 – 1.69)
1.76 (1.39 – 2.23)
2.22 (1.77 – 2.79)
3.83 (3.08 – 4.75)
N
Case | Control
45 | 290
83 | 290
81 | 289
75 | 290
240 | 579
160 | 289
217 | 290
267 | 289
443 | 290
PRS Decile
0% - 10%
10% - 20%
20% - 30%
30% - 40%
40% - 60%
60% - 70%
70% - 80%
80% - 90%
90% - 100%
18
Table 6: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC EAS Participants
PRSCutoff
0.09 (0.03 – 0.29)
0.29 (0.18 – 0.45)
0.65 (0.48 – 0.88)
0.65 (0.47 – 0.89)
1.00 (ref.)
1.31 (1.03 – 1.67)
2.39 (1.91 – 2.99)
3.14 (2.47 – 3.99)
7.06 (5.55 – 8.98)
N
Case | Control
3 | 139
24 | 336
71 | 490
60 | 404
216 | 975
153 | 523
244 | 447
208 | 332
316 | 231
PRSRef
0.09 (0.03 – 0.30)
0.30 (0.19 – 0.47)
0.61 (0.45 – 0.82)
0.65 (0.47 – 0.89)
1.00 (ref.)
1.23 (0.97 – 1.58)
2.41 (1.92 – 3.01)
3.05 (2.40 – 3.88)
6.72 (5.30 – 8.52)
N
Case | Control
3 | 134
24 | 321
71 | 490
60 | 404
216 | 975
153 | 523
244 | 447
208 | 332
316 | 231
PRSEAS
0.41 (0.29 – 0.50)
0.57 (0.42 – 0.79)
0.73 (0.54 – 0.99)
0.74 (0.55 – 0.99)
1.00 (ref.)
1.66 (1.30 – 2.12)
1.90 (1.50 – 2.42)
2.55 (2.02 – 3.22)
4.32 (3.47 – 5.38)
N
Case | Control
20 | 391
41 | 391
54 | 390
54 | 391
161 | 781
116 | 391
160 | 390
229 | 391
460 | 391
PRSME
0.22 (0.14 – 0.34)
0.53 (0.38 – 0.67)
0.47 (0.33 – 0.67)
0.65 (0.47 – 0.90)
1.00 (ref.)
1.10 (0.83 – 1.44)
1.76 (1.37 – 2.28)
2.44 (1.92 – 3.11)
4.93 (3.94 – 6.15)
N
Case | Control
23 | 391
54 | 391
47 | 390
62 | 391
193 | 781
110 | 391
160 | 390
222 | 391
424 | 391
PRS Decile
0% - 10%
10% - 20%
20% - 30%
30% - 40%
40% - 60%
60% - 70%
70% - 80%
80% - 90%
90% - 100%
19
Table 7: Odds Ratios Generated from Varying PRS Calculation Methodologies within MEC HIS Participants
PRSCutoff
0.17 (0.08 – 0.40)
0.39 (0.29 – 0.52)
0.50 (0.39 – 0.64)
0.62 (0.49 – 0.78)
1.00 (ref.)
1.40 (1.13 – 1.73)
1.82 (1.46 – 2.27)
2.58 (2.06 – 3.21)
4.75 (3.62 – 6.23)
N
Case | Control
6 | 117
64 | 501
97 | 649
123 | 672
309 | 1010
207 | 500
195 | 369
226 | 295
173 | 122
PRSRef
0.69 (0.31 – 1.50)
0.76 (0.38 – 1.51)
0.95 (0.53 – 1.68)
1.32 (0.82 – 2.12)
1.00 (ref.)
1.84 (1.25 – 2.70)
1.66 (1.13 – 2.45)
2.49 (1.77 – 3.50)
5.04 (3.71 – 6.84)
N
Case | Control
8 | 106
12 | 131
20 | 170
34 | 220
50 | 445
81 | 398
74 | 405
177 | 660
944 | 1700
PRSHIS
0.35 (0.24 – 0.50)
0.48 (0.35 – 0.67)
0.58 (0.43 – 0.79)
0.66 (0.49 – 0.88)
1.00 (ref.)
1.31 (1.02 – 1.68)
1.86 (1.47 – 2.34)
1.86 (1.47 – 2.34)
3.89 (3.16 – 4.78)
N
Case | Control
40 | 424
55 | 423
65 | 424
75 | 423
219 | 847
136 | 423
199 | 424
193 | 423
417 | 424
PRSME
0.39 (0.28 – 0.56)
0.47 (0.34 – 0.65)
0.70 (0.53 – 0.94)
0.65 (0.48 – 0.87)
1.00 (ref.)
1.23 (0.96 – 1.58)
1.65 (1.30 – 2.10)
2.11 (1.68 – 2.64)
3.60 (2.92 – 4.44)
N
Case | Control
46 | 424
53 | 423
77 | 424
71 | 423
212 | 847
142 | 423
175 | 424
226 | 423
388 | 424
PRS Decile
0% - 10%
10% - 20%
20% - 30%
30% - 40%
40% - 60%
60% - 70%
70% - 80%
80% - 90%
90% - 100%
20
Table 8: Area Under the Curve (AUC), estimating predictivity of PRS methodologies for each self-identified
superpopulation group. The base model included Age and first 10 PCs, which was also included in each PRS model.
Base
PRSME PRSEthnic-Specific PRSReference PRSCutoffs
EUR 0.593 (0.570 – 0.616)
0.720 (0.699 – 0.740)
0.720 (0.700 – 0.741)
0.721 (0.700 – 0.741)
0.721 (0.701 – 0.741)
AFR 0.539 (0.521 – 0.557)
0.690 (0.674 – 0.706)
0.699 (0.683 – 0.715)
0.691 (0.675 – 0.707)
0.691 (0.675 – 0.707)
EAS 0.642 (0.625 – 0.658)
0.764 (0.750 – 0.779)
0.779 (0.765 – 0.793)
0.767 (0.753 – 0.781)
0.768 (0.753 – 0.782)
HIS 0.592 (0.575 – 0.609)
0.705 (0.689 – 0.721)
0.713 (0.697 – 0.728)
0.692 (0.676 – 0.707)
0.709 (0.693 – 0.724)
21
Chapter 4: Discussion
Here, we discuss a new methodology to construct and interpret PRS in admixed
populations, utilizing effect estimates derived from a multiethnic PCa GWAS
7
and genetic
ancestry estimates. After developing a reference panel of participants with homogeneous genetic
ancestry reflecting African, European, East Asian, or Amerindian populations using the 1000
Genomes Project and PAGE Global Reference Panel, we calculated the reference PRS
distributions using multi-ethnic weights. We utilized this reference panel to calculate ancestry
proportions for men from the MEC Study who self-identified as one of the four superpopulation
groups of interest using ADMIXTURE. Using these reference PRS distributions and ancestry
proportions, we calculated unique PRS distributions for each MEC participant to reflect his genetic
ancestry. This method had improved performance in risk estimation across all populations
evaluated compared to basing PRS distributions on self-reported ancestry using either multiethnic
or population specific weighted PRS. Compared to using the homogeneous reference PRS
distributions without weighing, African and Hispanic populations who are more admixed showed
greater improvement with the new weighted method. As expected, less admixed populations like
East Asian and European showed approximately the same odds when using the homogeneous
reference PRS distributions either with or without weighting by ancestry proportions.
To our knowledge, this is the first attempt to recategorize the PRS risk category of admixed
individual’s by utilizing reference groups, whose genetic ancestry is highly reflective of single
superpopulation. Other recently developed ancestry-informed methods have also been reported to
improve the predictive ability of PRS models
24–26
. These methods typically use ancestral
proportions to weight effect sizes derived from an ethnic-specific GWAS. The predictive power
22
of these methods typically decreased as individual European ancestry decreased, which was
attributed to the heavy reliance on European-sourced SNPs
24,25
. We attempted to circumvent this
issue by utilizing multiethnic effect sizes, which has been shown to improve PRS accuracy
27
. All
four methods showed no discernible reduction in average ancestry proportions in non-European
groups. Additionally, the Hispanic MEC participants demonstrated a distinction between the final
two methods of the PRS construction. Assortment of Hispanic participants into reference
distributions without weighing for admixture indicated consistent non-significance across lower
deciles and a heavier density of assorted cases and controls in the upper extrema decile categories.
However, with admixture weighting, lower deciles showed significance and more normalized
distribution of controls. This suggests a possible improvement in utilizing admixture weighing
over simple replacement of underlying distributions used to define cutoff values. Further testing
with an outside data source containing admixed individuals will be required to further validate this
hypothesis.
Comparing the predictive power of these genetic-based approaches against current
screening standards, such as PSA, will be important for future clinical applicability of the PRS.
While PSA has limited ability to diagnose PCa
28–30
, it has been suggested that PRS could improve
upon the discriminative ability of PCa
31
. This is especially so in the deficits of PSA screening for
population-based screening, such as the increase in PCa mortality from low-intensity, single
screenings
17
. However, this has yet to be evaluated using the recent multi-ancestry PRS
7
and a
diverse population. The approach described here will likely optimize the potential utility and
interpretation of PRS.
23
References
1. US Cancer Statistics Working Group (June 2020). U.S. Cancer Statistics Data
Visualizations Tool, based on 2019 submission data (1999-2017).
https://gis.cdc.gov/Cancer/USCS/DataViz.html.
2. Mucci, L. A. et al. Familial risk and heritability of cancer among twins in nordic
countries. JAMA - J. Am. Med. Assoc. (2016) doi:10.1001/jama.2015.17703.
3. Olama, A. A. Al et al. A meta-analysis of 87,040 individuals identifies 23 new
susceptibility loci for prostate cancer. Nat. Genet. 46, 1103–1109 (2014).
4. Conti, D. V. et al. Two Novel Susceptibility Loci for Prostate Cancer in Men of African
Ancestry. J. Natl. Cancer Inst. (2017) doi:10.1093/jnci/djx084.
5. Gudmundsson, J. et al. A study based on whole-genome sequencing yields a rare variant
at 8q24 associated with prostate cancer. Nat. Genet. (2012) doi:10.1038/ng.2437.
6. Hoffmann, T. J. et al. A large multiethnic genome-wide association study of prostate
cancer identifies novel risk variants and substantial ethnic differences. Cancer Discov.
(2015) doi:10.1158/2159-8290.CD-15-0315.
7. Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate
cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet.
(2021) doi:10.1038/s41588-020-00748-0.
8. Freedman, M. L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus
in African-American men. Proc. Natl. Acad. Sci. U. S. A. (2006)
doi:10.1073/pnas.0605832103.
24
9. Darst, B. F. et al. A Germline Variant at 8q24 Contributes to Familial Clustering of
Prostate Cancer in Men of African Ancestry. Eur. Urol. (2020)
doi:10.1016/j.eururo.2020.04.060.
10. Kilpeläinen, T. P. et al. False-positive screening results in the European randomized study
of screening for prostate cancer. Eur. J. Cancer (2011) doi:10.1016/j.ejca.2011.06.055.
11. Loeb, S. et al. Overdiagnosis and overtreatment of prostate cancer. European Urology
(2014) doi:10.1016/j.eururo.2013.12.062.
12. Catalona, W. J. Prostate Cancer Screening. Medical Clinics of North America (2018)
doi:10.1016/j.mcna.2017.11.001.
13. Sandhu, G. S. & Andriole, G. L. Overdiagnosis of prostate cancer. J. Natl. Cancer Inst. -
Monogr. (2012) doi:10.1093/jncimonographs/lgs031.
14. Etzioni, R. et al. Overdiagnosis due to prostate-specific antigen screening: Lessons from
U.S. prostate cancer incidence trends. J. Natl. Cancer Inst. (2002)
doi:10.1093/jnci/94.13.981.
15. Kullo, I. J. et al. Incorporating a genetic risk score into coronary heart disease risk
estimates: Effect on low-density lipoprotein cholesterol levels (the MI-GENES Clinical
Trial). Circulation (2016) doi:10.1161/CIRCULATIONAHA.115.020109.
16. Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of
atherosclerosis and greater relative benefit from statin therapy in the primary prevention
setting. Circulation (2017) doi:10.1161/CIRCULATIONAHA.116.024436.
17. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health
disparities. Nat. Genet. (2019) doi:10.1038/s41588-019-0379-x.
25
18. Kolonel, L. N. et al. A multiethnic cohort in Hawaii and Los Angeles: Baseline
characteristics. Am. J. Epidemiol. (2000) doi:10.1093/oxfordjournals.aje.a010213.
19. Auton, A. et al. A global reference for human genetic variation. Nature (2015)
doi:10.1038/nature15393.
20. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for
complex traits. Nature (2019) doi:10.1038/s41586-019-1310-4.
21. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet.
(2016) doi:10.1038/ng.3656.
22. Browning, S. R. & Browning, B. L. Haplotype phasing: Existing methods and new
developments. Nature Reviews Genetics (2011) doi:10.1038/nrg3054.
23. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in
unrelated individuals. Genome Res. (2009) doi:10.1101/gr.094052.109.
24. Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3
Genes, Genomes, Genet. (2020) doi:10.1534/g3.120.401658.
25. Cavazos, T. B. & Witte, J. S. Inclusion of variants discovered from diverse populations
improves polygenic risk score transferability. Hum. Genet. Genomics Adv. (2021)
doi:10.1016/j.xhgg.2020.100017.
26. Marnetto, D. et al. Ancestry deconvolution and partial polygenic score can improve
susceptibility predictions in recently admixed individuals. Nat. Commun. (2020)
doi:10.1038/s41467-020-15464-w.
27. Márquez-Luna, C. et al. Multiethnic polygenic risk scores improve risk prediction in
diverse populations. Genet. Epidemiol. (2017) doi:10.1002/gepi.22083.
26
28. Pinsky, P. F. et al. Extended mortality results for prostate cancer screening in the PLCO
trial with median follow-up of 15 years. Cancer (2017) doi:10.1002/cncr.30474.
29. Schröder, F. H. et al. Screening and Prostate-Cancer Mortality in a Randomized European
Study. N. Engl. J. Med. (2009) doi:10.1056/nejmoa0810084.
30. Schröder, H., Hugosson, J., Roobol, M. & Tammela, T. Prostate Cancer Mortality at 11
years of Follow-up in the ERSPC. N Engl J Med 366, 981–990 (2012).
31. Sipeky, C. et al. Prostate cancer risk prediction using a polygenic risk score. Sci. Rep. 10,
1–7 (2020).
27
Appendix
28
Appendix Figure 1: A) PC1xPC2 plot displaying PAGE Global Reference, 1000 Genomes
Phase 3, and Locally Advanced and Metastatic Pancreatic Cancer (LAPC) cohort
participants. Color and shape of points was determined by superpopulation and
subpopulation, respectively. B) PC1xPC2 plot displaying only PAGE and 1000 Genome
Phase 3 participants that remained after filtering for individuals outside of cluster that
deviated from clusters by ± 1 SD of the Amerindian median. Subsequent filters removed
individuals based on 1
st
degree of relatedness. **Courtesy of Xin Sheng.
29
Appendix Table 1: Mean Ancestry Proportions Generated from Varying PRS Calculation Methodologies
within MEC EUR Participants
PRS Decile PRS ME
Case | Control
PRS EUR
Case | Control
PRS Reference
Case | Control
PRS Cutoff
Case | Control
0% - 10% 95.28% | 93.93% 95.30% | 93.42% 94.01% | 96.04% 94.01% | 93.92%
10% - 20% 93.82% | 94.31% 89.49% | 93.90% 96.04% | 95.17% 94.85% | 94.53%
20% - 30% 92.39% | 93.38% 91.76% | 92.33% 96.29% | 94.95% 94.93% | 95.07%
30% - 40% 91.72% | 93.20% 92.38% | 94.32% 93.39% | 94.72% 92.95% | 94.09%
40% - 60% 92.43% | 93.64% 92.31% | 93.05% 92.76% | 94.25% 92.01% | 94.04%
60% - 70% 93.28% | 92.81% 92.44% | 92.82% 94.36% | 94.10% 94.12% | 94.75%
70% - 80% 93.09% | 93.28% 94.20% | 93.21% 93.50% | 94.26% 95.66% | 94.64%
80% - 90% 93.69% | 92.69% 93.82% | 94.02% 95.07% | 93.65% 94.62% | 93.80%
90% - 100% 93.23% | 94.12% 93.50% | 94.87% 93.72% | 95.49% 94.71% | 95.51%
Appendix Table 2: Mean Ancestry Proportions Generated from Varying PRS Calculation Methodologies
within MEC AFR Participants
PRS Decile PRS ME
Case | Control
PRS AFR
Case | Control
PRS Reference
Case | Control
PRS Cutoff
Case | Control
0% - 10% 71.01% | 68.07% 72.60% | 70.55% 72.57% | 65.18% 85.38% | 71.03%
10% - 20% 73.59% | 74.59% 76.29% | 74.31% 70.09% | 71.57% 77.52% | 75.63%
20% - 30% 72.38% | 74.33% 75.40% | 75.36% 74.15% | 74.42% 77.28% | 75.63%
30% - 40% 73.51% | 74.26% 77.02% | 74.94% 73.82% | 74.78% 75.98% | 76.62%
40% - 60% 75.18% | 76.30% 74.83% | 76.45% 75.46% | 77.13% 77.71% | 75.99%
60% - 70% 75.38% | 77.30% 74.69% | 77.06% 77.57% | 78.06% 76.05% | 76.11%
70% - 80% 77.13% | 78.01% 76.71% | 76.73% 78.88% | 78.08% 77.38% | 76.66%
80% - 90% 78.74% | 78.01% 79.12% | 77.69% 80.19% | 80.12% 76.91% | 73.56%
90% - 100% 80.25% | 80.11% 78.74% | 77.74% 80.46% | 80.51% 77.26% | 76.00%
30
Appendix Table 3: Mean Ancestry Proportions Generated from Varying PRS Calculation Methodologies
within MEC EAS Participants
PRS Decile PRS ME
Case | Control
PRS EAS
Case | Control
PRS Reference
Case | Control
PRS Cutoff
Case | Control
0% - 10% 97.37% | 97.23% 97.56% | 96.84% 97.56% | 98.09% 97.56% | 98.04%
10% - 20% 97.50% | 96.89% 97.36% | 97.03% 98.09% | 97.90% 98.08% | 97.53%
20% - 30% 96.05% | 97.05% 95.89% | 97.04% 98.46% | 97.70% 97.36% | 97.72%
30% - 40% 97.54% | 97.21% 97.72% | 96.70% 96.90% | 97.78% 98.20% | 97.90%
40% - 60% 97.57% | 97.15% 97.44% | 97.14% 98.43% | 98.05% 98.41% | 97.92%
60% - 70% 97.29% | 96.70% 97.50% | 96.54% 98.21% | 97.49% 98.20% | 97.30%
70% - 80% 97.44% | 96.25% 97.51% | 97.39% 98.27% | 97.40% 98.27% | 97.44%
80% - 90% 97.29% | 96.96% 97.25% | 95.97% 98.10% | 97.53% 98.08% | 97.37%
90% - 100% 97.21% | 95.29% 97.26% | 96.08% 97.96% | 95.86% 97.97% | 96.95%
Appendix Table 4: Mean Ancestry Proportions Generated from Varying PRS Calculation Methodologies
within MEC HIS Participants
PRS Decile PRS ME
Case | Control
PRS HIS
Case | Control
PRS Reference
Case | Control
PRS Cutoff
Case | Control
0% - 10% 33.08% | 37.51% 28.69% | 36.29% 22.80% | 38.61% 41.36% | 42.91%
10% - 20% 39.67% | 38.58% 36.43% | 38.15% 31.63% | 36.11% 38.71% | 42.66%
20% - 30% 36.00% | 38.49% 39.17% | 38.36% 34.16% | 37.37% 42.54% | 39.93%
30% - 40% 38.21% | 38.62% 36.64% | 37.60% 39.97% | 38.91% 41.28% | 40.51%
40% - 60% 38.36% | 39.70% 36.62% | 39.37% 39.20% | 38.59% 39.77% | 39.12%
60% - 70% 36.86% | 38.40% 37.89% | 40.16% 35.97% | 38.54% 38.61% | 37.42%
70% - 80% 38.31% | 39.69% 40.14% | 40.58% 40.01% | 39.90% 37.41% | 37.24%
80% - 90% 39.55% | 40.59% 39.53% | 41.32% 37.85% | 39.54% 35.49% | 35.22%
90% - 100% 38.69% | 40.47% 38.93% | 40.58% 38.66% | 39.87% 35.23% | 35.94%
Abstract (if available)
Abstract
PSA screening for prostate cancer (PCa) has continuously had the issue of overdiagnosis, especially in minority populations that are more at risk. Polygenic risk scores (PRS) have the potential to ameliorate these deficiencies, but effective methods to translate them to admixed populations are lacking. Several studies have shown that incorporating ancestry proportions can help bridge this gap in translatability. Here, we describe a new method to construct individualized ancestry informed PRS distributions by utilizing reference sets that are representative of a homogeneous sample of superpopulations. Ancestry proportions were estimated using ADMIXTURE in the Multi-Ethnic Cohort (MEC) (N=41,158) using a reference of homogeneous European, African, East Asian, and Amerindian individuals from the 1000 Genomes Project and PAGE Global Reference Panel (N=1,890). A multi-ancestry PCa PRS was calculated in MEC men (N=18,362) and reference individuals, and reference PRS distributions were calculated for each reference superpopulation. For each MEC participant, individualized PRS distributions were calculated as a weighted sum of the reference PRS distributions, utilizing ancestry proportions as weights, and these individualized PRS distributions were used to categorize participants into PRS categories. The PRS was evaluated using logistic regression models, and odds ratios demonstrated an improvement in predictive ability of our new method across diverse populations compared to the standard PRS categorization based on controls of the same sample. Improvement in risk estimates is concurrent with other attempts to incorporate ancestry into PRS construction
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genetic studies of cancer in populations of African ancestry and Latinos
PDF
Utility of polygenic risk score with biomarkers and lifestyle factors in the multiethnic cohort study
PDF
Native American ancestry among Hispanic Whites is associated with higher risk of childhood obesity: a longitudinal analysis of Children’s Health Study data
PDF
Identifying genetic, environmental, and lifestyle determinants of ethnic variation in risk of pancreatic cancer
PDF
Association of comorbidity with prostate cancer tumor characteristics in African American men
PDF
The interplay between tobacco exposure and polygenic risk score for growth on birthweight and childhood acute lymphoblastic leukemia
PDF
The role of alcohol and alcohol-related risk factors in population health using a multi-level approach
PDF
Predictive factors of breast cancer survival: a population-based study
PDF
Pharmacogenetic association studies and the impact of population substructure in the women's interagency HIV study
PDF
The environmental and genetic determinants of cleft lip and palate in the global setting
PDF
The evaluation of the long-term effectiveness of zero/low fluoroscopy workflow in ablation procedures for the treatment of paroxysmal and persistent atrial fibrillation
Asset Metadata
Creator
Sahimi, Ali
(author)
Core Title
Using genetic ancestry to improve between-population transferability of a prostate cancer polygenic risk score
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Applied Biostatistics and Epidemiology
Publication Date
04/01/2021
Defense Date
03/18/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
admixed population,ancestry proportions,ethnic-specific estimates,global ancestry,Multi-Ethnic Cohort,OAI-PMH Harvest,polygenic risk score,recategorization of risk
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Conti, David (
committee chair
), Haiman, Christopher (
committee member
), Mancuso, Nicholas (
committee member
)
Creator Email
asahimi@usc.edu,asahimi1998@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-434843
Unique identifier
UC11667840
Identifier
etd-SahimiAli-9380.pdf (filename),usctheses-c89-434843 (legacy record id)
Legacy Identifier
etd-SahimiAli-9380.pdf
Dmrecord
434843
Document Type
Thesis
Rights
Sahimi, Ali
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
admixed population
ancestry proportions
ethnic-specific estimates
global ancestry
Multi-Ethnic Cohort
polygenic risk score
recategorization of risk