Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genes and environment in prostate cancer risk and prognosis
(USC Thesis Other)
Genes and environment in prostate cancer risk and prognosis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
GENES AND ENVIRONMENT IN PROSTATE CANCER RISK AND PROGNOSIS by Ahva Shahabi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MOLECULAR EPIDEMIOLOGY) May 2014 Copyright: 2014 Ahva Shahabi i ACKNOWLEDGEMENTS To my mentor: I would like to especially thank my mentor, Dr. Mariana C. Stern, for her invaluable guidance and support during my years as a graduate student. Her continuous sincere encouragement and belief in my abilities further inspired me to work hard to achieve my goals while allowing me to grow as a person and as a scientist. Her commitment and passion for science is truly inspirational and I truly appreciate everything I have learned from her. It has been an honor to be her student. To my committee: I would like to thank my committee members, Dr. Juan Pablo Lewinger, Dr. Sue Ingles, Dr. Kim Siegmund, and Dr. Jacek Pinski, for their incredible guidance and feedback over the years on the projects presented in this dissertation. I am truly appreciative of all their assistance and their time spent to help me learn and grow. To my professors and advisers: I would like to thank all my professors who have given me the knowledge to flourish as a scientist. I would like to give a special thanks to Dr. Stanley Azen and to Dr. Joseph Hacia for their much-appreciated advisement and guidance on the projects throughout my time as a graduate student. To my lab members: I would like to thank my lab members, Roman Corral, Andre Kim, Raj Satkunasivam, Amit Joshi, Chelsea Catsburg, and Derek Gruter for their much appreciated assistance and invaluable friendship throughout the years. To my family and friends: I am truly grateful to my family and friends, near and far, and especially my parents, Azar and Mehrdad Shahabi, and my aunt, Parvin Rezainia, for all their support when I needed it the most. I could not have done this without you. ii Table of Contents ACKNOWLEDGEMENTS ............................................................................................................. i LIST OF TABLES ...................................................................................................................... viii LIST OF FIGURES ...................................................................................................................... xi ABBREVIATIONS .................................................................................................................... xiv Abstract .................................................................................................................................... xvi 1 INTRODUCTION .................................................................................................................. 1 1.1 Cancer ........................................................................................................................................... 1 1.2 Epidemiology of prostate cancer ........................................................................................ 2 1.2.1 Incidence of prostate cancer ....................................................................................................... 2 1.2.2 Mortality from prostate cancer .................................................................................................. 2 1.2.3 Risk factors of prostate cancer ................................................................................................... 3 1.2.4 Race/ethnicity as a risk factor in prostate cancer risk and mortality ....................... 4 1.2.4.1 Latinos, genetic ancestry, environment and prostate cancer ................................................. 5 1.3 Prostate cancer anatomy and histopathology ............................................................... 7 1.3.1 Histological changes in the prostate ........................................................................................ 9 1.3.2 Gleason grading system ............................................................................................................. 13 1.3.3 Prostate cancer staging .............................................................................................................. 15 1.3.3.1 Staging and grading of prostate biopsies...................................................................................... 17 1.3.3.2 Staging and grading at radical prostatectomy............................................................................ 19 1.3.4 Multifocal prostate cancer and heterogeneity of disease ............................................ 21 1.3.5 Localized prostate cancer ......................................................................................................... 25 1.3.5.1 Early detection and prostate specific antigen (PSA) screening .......................................... 25 iii 1.3.5.2 Current treatment standards ............................................................................................................. 28 1.3.5.3 Impact of PSA screening/early detection on treatment options ........................................ 29 1.3.5.4 Clinical prognosis among localized prostate cancer patients .............................................. 31 1.3.5.4.1 Biochemical recurrence (PSA recurrence) .......................................................................... 31 1.3.5.4.2 Predictors of prognosis ................................................................................................................ 32 1.3.5.5 Current prognostic tools: strengths and weaknesses ............................................................. 32 1.3.5.6 Biomarkers of prostate cancer: development and progression ......................................... 36 1.3.5.6.1 Single nucleotide polymorphisms ........................................................................................... 36 1.3.5.6.2 Gene expression prognostic signatures ................................................................................ 38 1.3.6 Advanced prostate cancer ......................................................................................................... 42 1.4 Tobacco and prostate cancer risk and mortality ...................................................... 44 1.4.1 Tobacco chemical carcinogens ................................................................................................ 45 1.4.1.1 Polycyclic aromatic hydrocarbons ................................................................................................... 49 1.4.1.2 Heterocyclic amines ............................................................................................................................... 51 1.4.1.3 N-nitroso compounds ............................................................................................................................ 51 1.4.1.4 Aromatic amines ...................................................................................................................................... 52 1.4.1.5 Benzene ........................................................................................................................................................ 52 1.4.1.6 Acetaldehyde ............................................................................................................................................. 53 1.4.2 Metabolism of tobacco chemical carcinogens and variation in metabolism genes 53 2 HYPOTHESES AND SPECIFIC AIMS ............................................................................. 57 3 METHODS ........................................................................................................................... 60 3.1 Study designs and study populations ............................................................................ 60 3.1.1 The California Collaborative Prostate Cancer Study ...................................................... 60 3.1.1.1 Recruitment of participants in San Francisco Bay Area ......................................................... 60 3.1.1.2 Recruitment of participants in Los Angeles County ................................................................. 61 iv 3.1.2 Hospital de Clínicas José de San Martín radical prostatectomy cohort (Buenos Aires, Argentina) .......................................................................................................................................... 61 3.1.3 University of Southern California, Institute of Urology, Radical Prostatectomy cohort 62 3.1.3.1 Nested case-control study on prognostic biomarkers of localized prostate cancer.. 64 3.2 Data Collection ....................................................................................................................... 65 3.2.1 The California Collaborative Prostate Cancer Study: Demographics and lifetime exposure data ................................................................................................................................................ 65 3.2.2 The California Collaborative Prostate Cancer Study: Follow-up of prostate cancer cases .................................................................................................................................................... 66 3.2.3 Hospital de Clínicas José de San Martín cohort: Follow-up of participants.......... 67 3.2.4 University of Southern California, Institute of Urology, Radical Prostatectomy cohort: Follow-up of participants ......................................................................................................... 67 3.3 Biomarker data collection ................................................................................................. 68 3.3.1 The California Collaborative Prostate Cancer Study ...................................................... 68 3.3.1.1 Genotyping of carcinogen metabolism polymorphisms ........................................................ 69 3.3.1.2 Genotyping of ancestry informative markers (AIMs) ............................................................. 69 3.3.2 Hospital de Clínicas José de San Martín cohort ................................................................ 70 3.3.2.1 Genotyping of polymorphisms in GST genes .............................................................................. 71 3.3.3 Nested case-control study on prognostic biomarkers of localized prostate cancer 72 3.3.3.1 Expression profiling of prostate cancer tumors ........................................................................ 77 3.4 Data analysis ........................................................................................................................... 78 3.4.1 The California Collaborative Prostate Cancer Study ...................................................... 78 3.4.1.1 Tobacco smoking and prostate cancer risk ................................................................................. 78 v 3.4.1.2 Tobacco smoking, polymorphisms in metabolism enzymes and prostate cancer risk: gene by environment analyses .............................................................................................................................. 79 3.4.1.3 Tobacco smoking, genetic ancestry, and prostate cancer risk among Latino participants .................................................................................................................................................................... 80 3.4.1.4 Tobacco smoking and prostate cancer survival ......................................................................... 82 3.4.2 Hospital de Clínicas José de San Martín cohort: GST polymorphisms and biochemical recurrence ............................................................................................................................. 83 3.4.3 University of Southern California, Institute of Urology, Radical Prostatectomy Cohort database ............................................................................................................................................ 84 3.4.3.1 Clinical predictors of early versus prostate cancer biochemical recurrence and subsequent clinical recurrence ............................................................................................................................. 84 3.4.3.2 Identification of gene expression profiles associated with cancer recurrence for localized prostate cancer tumors ......................................................................................................................... 85 3.4.3.2.1 Preprocessing of microarray data: normalization, background correction, and batch effect correction ........................................................................................................................................................ 85 3.4.3.2.2 Differential expression analysis ............................................................................................... 87 3.4.3.2.3 Identification of risk predictive models for prostate cancer clinical recurrence 87 3.4.3.2.4 Validation of identified models in external datasets ...................................................... 93 3.4.3.2.4.1 External dataset 1: Mayo Clinic data (GenomeDx) ................................................. 94 3.4.3.2.4.2 External dataset 2: Memorial Sloan-Kettering Cancer Center data ................ 95 3.4.3.2.4.3 External dataset 3: Erasmus Medical Center data .................................................. 96 4 RESULTS.............................................................................................................................. 97 4.1 Tobacco and prostate cancer risk and mortality in a multi-ethnic population: The California Collaborative Prostate Cancer Study ........................................................... 97 4.1.1 Tobacco smoking and prostate cancer risk .................................................................... 102 vi 4.1.2 Tobacco smoking, polymorphisms in metabolism enzymes, and PCA risk ....... 106 4.1.2.1 Genetic ancestry of Latinos, tobacco smoking, and prostate cancer risk .................... 108 4.1.3 Tobacco smoking, PCa, and overall survival .................................................................. 115 4.2 Biomarkers of prostate cancer recurrence: results from the Hospital de Clínicas José de San Martín cohort ........................................................................................... 118 4.2.1 Associations between clinical variables and biochemical recurrence ................ 118 4.2.2 Analyses of genotypes and risk of biochemical relapse ............................................ 122 4.3 Predictors of late versus early recurrence following radical prostatectomy for localized prostate cancer treated within the PSA-era: University of Southern California, Institute of Urology, Radical Prostatectomy cohort .................................... 129 4.3.1 Characteristics and follow up of localized prostate cancer radical prostatectomy patients.......................................................................................................................................................... 129 4.3.2 Clinical predictors of early versus late biochemical recurrence ............................ 134 4.3.3 Time to clinical recurrence based on time of biochemical recurrence ............... 141 4.4 Biomarkers of recurrence of localized prostate cancer: University of Southern California, Institute of Urology, Radical Prostatectomy cohort .................................... 147 4.4.1 Characteristics of patients included in the discovery/training set ....................... 147 4.4.2 Quality assessment of gene expression data .................................................................. 149 4.4.3 Differentially expressed genes between tumors .......................................................... 152 4.4.4 Development of the predictive signature ........................................................................ 159 4.4.5 Validation of predictive model using external datasets and comparisons with existing predictive models .................................................................................................................... 169 5 DISCUSSION .................................................................................................................... 174 5.1 The role of tobacco smoking and metabolic enzyme genes in prostate cancer 174 vii 5.1.1 Strengths and Limitations ...................................................................................................... 177 5.1.2 Conclusions .................................................................................................................................. 178 5.2 Predicting late versus earlier biochemical recurrence and subsequent clinical recurrence ........................................................................................................................................ 178 5.2.1 Strengths and Limitations ...................................................................................................... 181 5.2.2 Conclusions .................................................................................................................................. 181 5.3 The role of biomarkers of prognosis for localized prostate cancer ................. 182 5.3.1 Strengths and limitations ....................................................................................................... 185 5.3.2 Conclusions .................................................................................................................................. 186 5.4 Summary and final remarks ........................................................................................... 186 BIBLIOGRAPHY .................................................................................................................... 189 viii LIST OF TABLES Table 1.1: Clinical staging of prostate cancer ................................................................................ 16 Table 1.2: Pathologic staging of prostate cancer ........................................................................... 16 Table 1.3: Regional lymph nodes staging of prostate cancer ......................................................... 16 Table 1.4: Metastasis staging of prostate cancer ............................................................................ 17 Table 1.5: Current PCa prognostic tools (models and nomograms) using only clinical variables and their predictive ability captured by the area under the curve (AUC). ..................................... 35 Table 1.6: List of studies done to discover biomarkers of aggressive PCa .................................... 41 Table 1.7: Chemical carcinogens found in tobacco smoke ............................................................ 47 Table 1.8: Candidate single nucleotide polymorphisms (SNPs) in metabolism genes involved in the metabolism of chemical carcinogens found in tobacco smoke. ............................................... 56 Table 4.1: Socio-demographic and lifestyle characteristics of controls and cases in the California Collaborative Prostate Cancer Study ............................................................................................. 98 Table 4.2: Tobacco smoking characteristics by race/ethnicity among cases and controls in the California Collaborative Prostate Cancer Study .......................................................................... 100 Table 4.3: Characteristics of prostate cancer cases by tobacco smoking status (including any cigarette/cigar/pipe smoking) in the California Collaborative Prostate Cancer Study ................ 101 Table 4.4: Smoking characteristics and risk of localized prostate cancer by race/ethnicity in the California Collaborative Prostate Cancer Study .......................................................................... 104 Table 4.5: Smoking characteristics and risk of advanced prostate cancer by race/ethnicity in the California Collaborative Prostate Cancer Study .......................................................................... 105 Table 4.6: Smoking status, CYP1A2 (rs7662551) genotype, and risk of prostate cancer by stage and race/ethnicity ......................................................................................................................... 107 ix Table 4.7: Characteristics of Latino male controls and prostate cancer cases from Los Angeles County (N=416) ........................................................................................................................... 110 Table 4.8: Genetic ancestry (African, European, Native American) among Latino controls and cases ............................................................................................................................................. 112 Table 4.9: Genetic ancestry and risk of prostate cancer among Latinos ...................................... 113 Table 4.10: Native American ancestry as a potential modifier of the association between lifestyle factors and prostate cancer risk among Latino males .................................................................. 114 Table 4.11: Tobacco smoking and PCa-related mortality and all-cause mortality among PCa cases (n=1,944)...................................................................................................................................... 117 Table 4.12: Characteristics of the Hospital de Clínicas José de San Martín study population .... 120 Table 4.13: Genotype distribution of GST genes and comparisons with other populations. ....... 124 Table 4.14: Univariable and multivariable Cox proportional hazard models for biochemical recurrence based on GST genotypes ............................................................................................ 128 Table 4.15: Characteristics of localized prostate cancer patients treated with radical prostatectomy at USC .......................................................................................................................................... 131 Table 4.16: Follow-up data on USC radical prostatectomy patients ........................................... 132 Table 4.17: Radical prostatectomy patients: characteristics of patients with no recurrence and patients who experienced early or late biochemical recurrence. .................................................. 136 Table 4.18: Univariable Cox regression to determine predictors of overall biochemical recurrence. .................................................................................................................................... 137 Table 4.19: Univariable Cox regression to determine predictors of early ( ≤ 5 y ea r s a f t er sur g er y ) or late (>5 years after surgery) biochemical recurrence............................................................... 138 Table 4.20: Multivariable Cox regression to determine predictors of overall biochemical recurrence anytime after radical prostatectomy. .......................................................................... 139 Table 4.21: Multivariable Cox regression to determine predictors of early ( ≤ 5 y ea r s af t e r su r g er y ) or late (>5 years after surgery) biochemical recurrence............................................................... 140 x Table 4.22: Characteristics of patients with gene expression profiles available .......................... 148 Table 4.23: Biological processes of the differentially expressed genes between tumors of patients with no recurrence and patients who experienced clinical recurrence leading to metastatic disease. ..................................................................................................................................................... 153 Table 4.24: Genes differentially expressed between tumors of patients with no recurrence (NED) and patients who experienced clinical metastasis after surgery (CR). ......................................... 154 Table 4.25: Differentially expressed genes between tumors with Gleason score 7 and Gleason <7. ................................................................................................................................................ 158 Table 4.26: The genes obtained using different frequency thresholds (20% - 80%) across 500 stability selection bootstraps. ....................................................................................................... 162 Table 4.27: Summary of the 28 genes in the USC/Illumina predictive signature. ....................... 166 Table 4.28: Biological processes of the 28 genes in the USC/Illumina predictive signature ...... 168 Table 4.29: Area under the curve (AUC) values obtained through repeated 5-fold cross validation (CV) using the Mayo Clinic dataset for validation of predictive signatures. ............................... 171 Table 4.30: Validation of the USC 28-gene model using 3 independent datasets ....................... 172 Table 4.31: Validation of external predictive gene signatures using the 3 datasets with gene expression data and our own statistical methods including repeated 5-fold cross validation. ..... 173 xi LIST OF FIGURES Figure 1.1 Illustration of the prostate, its anatomical location, and its 3 different zones ................ 7 Figure 1.2: Illustration of the location and structure of the 3 different zones of the prostate .......... 8 Figure 1.3: Microscopic view of both benign glands (in red boxes; the dark purple cells lining the acini are the basal cells) and malignant glands (example of some in yellow box) spread throughout the stroma and in-between benign glands.................................................................... 11 Figure 1.4: Illustration of the structure of a normal prostate acinus with its cellular components (top) and the structure and differentiated behavior of malignancy (bottom). ................................ 11 Figure 1.5: Histological progression from normal prostate cells to malignant glands .................. 12 Figure 1.6: Gleason grading diagram ............................................................................................. 13 Figure 1.7: Pathological image of adjacent Gleason grade 3 (G3) and Gleason grade 4 (G4) tumor cells. ............................................................................................................................................... 14 Figure 1.8: Illustrations classification of each pathologic stage of prostate cancer based on tumor growth from the prostate. ............................................................................................................... 15 Figure 1.9: Recommended schematic for initial prostate biopsy using 12 cores that cover the entire peripheral zone where tumors usually occur........................................................................ 18 Figure 1.10: Two hypothesized ideas contributing to tumor heterogeneity in prostate cancer progression over time. .................................................................................................................... 22 Figure 1.11: Surveillance, Epidemiology, and End Results (SEER) incidence rates of prostate cancer by age of diagnosis (1973-1995) ........................................................................................ 27 Figure 1.12: The role of somatic and germline mutations in the evelopment and progression of prostate cancer. .............................................................................................................................. 38 Figure 1.13: Role of environment, physical absorption, and metabolism of pro-carcinogens. ..... 53 xii Figure 3.1: University of Southern California, Institute of Urology, Radical Prostatectomy cohort: breakdown by pathologic stage and recurrence status following radical prostatectomy ............... 63 Figure 3.2: Arcturus Veritas laser capture microdissection machine used for the study. .............. 74 Figure 3.3: The inside of the LCM machine .................................................................................. 74 Figure 3.4: LCM laser picking up tissue and method of storing tissue prior to processing ........... 75 Figure 3.5 (A-E): Microdissection of malignant prostate cells. ..................................................... 76 Figure 3.6: Measuring the accuracy of a test, such as a biomarker clinical tool (genetic signature) used to predict disease recurrence. ................................................................................................. 90 Figure 3.7: Example of an ROC curve. .......................................................................................... 91 Figure 3.9: Summary of the methods used for differential expression analysis and predictive model development ........................................................................................................................ 96 Figure 4.1(a-d): Kaplan-Meier curves for biochemical-recurrence free survival by clinical characteristics. .............................................................................................................................. 121 Figure 4.2: Agarose gel electrophoresis of the multiplex PCR reaction for GSTT1 and GSTM1 genotyping.................................................................................................................................... 122 Figure 4.3: Agarose gel electrophoresis of GSTP1 amplicons digested with Alw26I restriction enzyme. ........................................................................................................................................ 123 Figure 4.4 (a-e): Kaplan-Meier curves for biochemical-recurrence free survival by GST genotype. ..................................................................................................................................................... 127 Figure 4.5: Pa t i ent s w ho e xper i enc ed b i och em i ca l r e cur r enc e ( B C R ) ≤ 5 years or >5 years after radical prostatectomy and clinical recurrence (CR) status ........................................................... 143 Figure 4.6: Probability of being clinical recurrence-free based on time of B C R ( ≤ 5 y ea r s o r > 5 years after surgery) ...................................................................................................................... 144 Figure 4.7: Clinical recurrence-free survival following early or late biochemical recurrence among patients with pT2 prostate cancer. .................................................................................... 145 xiii Figure 4.8: Clinical recurrence-free survival following early or late biochemical recurrence among patients with ≥ p T3a prostate cancer. ................................................................................ 146 Figure 4.9: Internal controls used for quality assessment of the samples and the experimental runs when processing of the samples. .................................................................................................. 150 Figure 4.10: High reproducibility detected using average signal based on one sample run in duplicate ....................................................................................................................................... 151 Figure 4.11: Assessment of expression data generated from FFPE samples. .............................. 151 Figure 4.12: Summary of the methods used for differential expression analysis, predictive model development, and validation of the selected predictive model. ................................................... 160 Figure 4.13: ROC curves derived from repeated 5-fold cross-validation: 28 gene model versus clinical variables only model. ...................................................................................................... 161 xiv ABBREVIATIONS AA = Aromatic amine AIMs = Ancestry informative markers APC = Annual percent change AS = Active surveillance BCR = Biochemical (PSA) recurrence BMI = body mass index BPH = Benign prostatic hyperplasia CI = Confidence intervals CR = Clinical recurrence CV = Cross validation DRE = Digital rectal exam EMC = Erasmus Medical Center FFPE = Formalin-fixed paraffin embedded GAM = Generalized additive model H&E = Hematoxylin and eosin HCA = Heterocyclic amines HR = Hazard ratio IHC = Immunohistochemistry LCM = Laser capture microdissection MC = Mayo Clinic MET = Metastasis MRI = Magnetic resonance imaging MSKCC= Memorial Sloan-Kettering Cancer Center xv NED = No evidence of disease NOC = N-nitroso compounds OR = Odds ratio PAH = Polycyclic aromatic hydrocarbons PCa = Prostate cancer PIA = Proliferative inflammatory atrophy PIN = Prostatic intraepithelial neoplasia PSA = Prostate specific antigen RRP/RP= Retropubic radical prostatectomy/radical prostatectomy SFBA = San Francisco Bay Area SES = Socioeconomic status SNP = Single nucleotide polymorphism TRUS = Transrectal ultrasound TURP = Transurethral section of the prostate xvi Abstract Prostate cancer (PCa) is the most commonly diagnosed solid-tumor cancer among men in developed countries and the second leading cause of cancer mortality among men in the United States. An estimated 1 in 6 American men will be diagnosed with PCa in his lifetime and approximately 80% will be diagnosed with localized disease. In 1986 the prostate specific antigen (PSA) screening test was introduced as an early detection tool. This resulted in stage migration toward earlier, more localized PCa cases and younger age at diagnosis. Despite advances in detection, there are still few established risk factors of PCa, which include older age, family history of PCa, race/ethnicity, with men of African ancestry having the highest risk, and specific risk genetic alleles recently identified in genome wide association studies. Even though currently most PCa cases are diagnosed at an early stage and most are indolent, there is a still a proportion that can be aggressive and progress to lethal disease. Unfortunately, current prognostic tools, based on clinical variables, are limited in accurately differentiating an aggressive tumor from an indolent tumor. Therefore, the majority of patients with localized PCa choose to undergo definitive treatment, such as radical prostatectomy, which can carry side effects affecting quality of life. Disconcertingly, up to one third of radical prostatectomy patients continue to experience biochemical recurrence and 20-30% of these progress to lethal disease. Therefore, there is an urgent need to identify concrete risk factors for PCa that could be use to prevent disease incidence, and more accurate stratification tools to prevent overtreatment of indolent cases and undertreatment of aggressive disease. In this dissertation I investigated both genetic and environmental factors to better understand PCa risk and prognosis. xvii A risk factor associated with several cancers is tobacco smoking, but its role in prostate cancer is not clearly established. Using the California Collaborative Prostate Cancer Study, I tested the hypothesis that tobacco smoking may increase the risk of PCa and PCa-related mortality and that certain carcinogen metabolism genes may modify this relationship. The results from this study showed that smoking was associated with an increased risk of primarily localized PCa and that this relationship was modified by a polymorphism in the carcinogen metabolism CYP1A2 gene. Given that approximately a third of patients with localized PCa experience recurrence after radical prostatectomy, we sought to determine clinical characteristics associated with timing of recurrence after surgery using the University of Southern California, Institute of Urology, Radical Prostatectomy cohort. The results showed that different clinical risk factors are associated with early (5 years or less after surgery) and late (greater than 5 years after surgery) biochemical recurrence (BCR), and specifically organ confined disease (stage pT2) and extraprostatic extension disease ( st ag e ≥ pT 3 a) hav e d i f f e r en t p r ob abi l i t i e s o f cl i n i ca l m et astatic recurrence following BCR. The next two studies were used to test the hypothesis that malignant prostate glands from localized PCa tumors with a high risk of progression have genes that are differentially expressed due to somatic mutations or inherited polymorphisms, when compared to tumors from localized PCa patients that have low risk of recurrence. The Hospital de Clínicas José de San Martín radical prostatectomy cohort was used to determine that polymorphisms in Glutathione S-transferase (GST) genes are associated with recurrence after surgery. The University of Southern California, Institute of Urology, Radical Prostatectomy cohort was used to identify a gene expression signature that shows improved prediction of aggressive disease upon standard clinical variables xviii currently used in predicting prognosis, which was externally validated in three independent cohorts. The findings from these studies provide novel information on the role of genes and environment associated with risk of PCa and prognosis that will hopefully contribute to improvements in cancer prevention and the advancement of PCa personalized medicine. . 1 1 INTRODUCTION 1.1 Cancer Cancer consists of a range of diseases involving abnormal growth of tissue arising from abnormal and uncontrolled growth and proliferation of malignant cells. Tumorigenesis (or carcinogenesis) encompasses a series of progressions from which normal cells are transformed into cancer cells. This multi-step process involves alterations (genetic or epigenetic) in the genes responsible for regulating the crucial homeostasis of growth, function, cell structure, and death of cells. With advancements in cancer research, new knowledge allowed for further refinement of the hypothesized elements of tumorigenesis and include: sustaining proliferative signaling, evading growth suppressors, activating invasion and metastasis, enabling replicative immortality, inducing angiogenesis, resisting death, deregulating cellular energetics, and avoiding immune destruction (Hanahan & Weinberg, 2011). It is also known that the tumor microenvironment involves several different cell types that enable tumor growth and progression, consisting of interplay of cancer cells and non-malignant cells adjacent to the tumor (e.g. stromal components) (Hanahan & Weinberg, 2011). Further studies have also indicated clonal expansion of subtypes of malignant cells within tumors (cancer stem cells) that may contribute to the multi-focal nature of certain cancers. A better understanding of the characteristics of tumorigenesis allows for the identification and implementation of better clinical tools that incorporate clinical variables and genomic data to detect the presence of cancer in patients and to more accurately identify potentially aggressive 2 tumors that are more likely to metastasize. A primary focus of the work presented in this dissertation was to contribute further to our current knowledge on cancer and to the end of metastatic disease. 1.2 Epidemiology of prostate cancer 1.2.1 Incidence of prostate cancer Prostate cancer (PCa) is the second most commonly diagnosed cancer worldwide after skin cancer. Based on age-standardized incidence rates, PCa is the most diagnosed solid tumor cancer among men in developed countries (62.0 per 100,000 men), and sixth among developing countries (12.0 per 100,000 men) (Jemal et al., 2011). In the US, it is the most commonly diagnosed solid-tumor cancer among men and approximately 1 in 6 men will be diagnosed with PCa in his lifetime. The American Cancer Society estimates that there will be 248,590 new diagnosed cases of PCa and 29,720 deaths resulting from PCa in the year 2013. As of 2008, the highest age-standardize incidence rates were in Australia/New Zealand (104.2 per 100,000), Western and Northern Europe (94.1 per 100,000), and North America (85.6 per 100,000) (Jemal et al., 2011). The lowest age-standardized incidence was in South Central Asia (4.1 per 100,000). 1.2.2 Mortality from prostate cancer PCa is the sixth leading cause of death from cancer worldwide, and the third among developed countries (Jemal et al., 2011). As of 2008, the highest age-standardized rates of mortality due to 3 PCa are in the Caribbean (26.3 per 100,000) and Southern and Western Africa (19.3 and 18.3 per 100,000 respectively) whereas all areas of Asia have the lowest rates (ranging from 2.5-7.5 per 100,000) (Jemal et al., 2011). Among US men, it is the second most common cause of cancer death, after lung cancer, with 1 in 36 dying from the disease ( “Am er i c an C anc er S oci e t y : Pros t a t e C anc er ,” n.d. ) . 1.2.3 Risk factors of prostate cancer Established risk factors for PCa include older age, family history of PCa, race/ethnicity, with men of African ancestry having the highest risk, and specific risk genetic alleles recently identified in genome wide association studies (Amin Al Olama et al., 2013; Eeles et al., 2013; Haas & Sakr, 1997). While multiple environmental factors have been studied in association with PCa risk, results have been inconclusive and no preventable environmental cause of PCa has been confirmed. Environmental factors studied to be associated with risk include sunlight exposure and vitamin D (Ingles, Ross, & Mimi, 1997; John, Schwartz, Koo, Van Den Berg, & Ingles, 2005; Schwartz, 2005; Stewart & Weigel, 2004), diet (Catsburg et al., 2012; Giovannucci, Liu, Platz, Stampfer, & Willett, 2007; Joshi et al., 2012a; Joshi, John, Koo, Ingles, & Stern, 2012b), tobacco smoking (Huncharek, Haddock, & Reid, 2010), obesity (Gong et al., 2006), calcium intake, and other lifestyle factors (physical activity, occupation, vasectomy, sexual activity) (Bostwick et al., 2004). In this dissertation, I specifically focused on understanding the potential role of smoking as a risk factor for PCa and PCa-mortality in a multi-ethnic case-control study, which will be discussed in further detail in section 1.4. 4 1.2.4 Race/ethnicity as a risk factor in prostate cancer risk and mortality Migrant studies suggest that a combination of environmental and genetic factors may play a role in the development of PCa (Haas & Sakr, 1997). Men of countries with low PCa-related mortality (e.g. Japan) who move to countries with higher mortality rates, such as the US, begin to experience incidence and mortality rates near those experienced by American men, suggesting an environmental component contributing to risk (Haas & Sakr, 1997). However, men of African descent continue to have the highest PCa incidence and PCa-related mortality while Asian men have the lowest, with this trend seen in the US and worldwide (Jemal et al., 2011). This insinuates an additional genetic component involved in risk and progression. African-American men are more likely to be diagnosed with advanced stage PCa and more than twice as likely to die from PCa when compared to white men. Age-adjusted incidence and mortality rates based on 2006-2010 SEER data are the following: African Americans (228.5 and 50.9 per 100,000 for incidence and mortality respectively), whites (144.9 and 21.2 per 100,000 for incidence and mortality respectively), Hispanics (125.8 and 19.2 per 100,000 for incidence and mortality respectively), and Asians/Pacific Islanders (81.8 and 10.1 per 100,000 for incidence and mortality respectively) ( “SEE R C anc e r Sta t i st i cs Fac t sh ee t s: P r ost a t e C a nce r ,” n.d.). As previously mentioned, mortality rates are also higher in areas with prevalent African populations such as in the Caribbean and in sub-Saharan Africa (Jemal et al., 2011). Studies show evidence of genetic variants and interactions with environmental factors affecting risk differently among different races/ethnicities that can contribute to these observed disparities in PCa risk and mortality (Haiman et al., 2011; Powell, 2007). 5 1.2.4.1 Latinos, genetic ancestry, environment and prostate cancer Latinos have higher PCa incidence and mortality rates than Asians/Pacific Islanders, but less than African Americans and whites. As with other racial/ethnic groups, PCa risk among Hispanics may be explained by a combination of environmental and genetic factors. Identification of these factors among Latinos is challenging given the fact that they are a highly heterogeneous population in terms of socio-demographic characteristics, culture, and admixed genetic background. At least three distinct racial/ethnic groups contribute to the admixed Latino gene pool: European (white), African, and Native American. Admixture proportions vary widely among Latin countries depending on the history of migration patterns of the European and African immigrants and the density of the existing Native American populations at the time of colonization of Latin countries (Burchard et al., 2003; Gonzalez Burchard et al., 2005; Price et al., 2007). In addition to the variation among Latin countries, immigrant Latinos who differ in their level of acculturation also have additional diversity in their exposure to various environmental factors. Few studies have addressed the role of genetic (Balic et al., 2002; I. Cheng et al., 2012) and environmental risk factors with PCa risk among the Latino population due to the challenges of addressing the various layers of heterogeneity in this population and scarce availability of epidemiological studies that included this minority population. A key challenge in identifying genetic factors that could be associated with disease is to account for the genetic admixture of each individual. As an admixed population of up to three different racial/ethnic groups, each Latino individual inherits different ancestral genetic proportions and therefore different alleles, depending on the parents and migration patterns of his/her country of origin. The disease causing allele(s) could be more prevalent in one of the ancestral groups that make up the Latino population, and therefore, each Latino individual may have a different risk depending on his/her 6 proportion of inherited genetic ancestry. This “population stratification ” can be an issue in genetic association studies, such as case-control studies. If risk of a certain disease of interest varies by ancestry, then any allele that has a high frequency in that particular ancestral population will show an association with disease, even if it is truly not the disease-causing allele (Burchard et al., 2003; Tsai et al., 2005). Ancestral informative markers (AIMs) provide a method to measure genetic ancestry of each individual, which can then be used to correct for this population stratification in association studies. AIMs are a set of selected single nucleotide polymorphisms (SNPs) picked from across the genome that have large differences in allele frequencies among the selected ancestral populations of interest. Global (or genome-wide) ancestry is the overall genetic ancestry estimated for an individual and is determined using SNPs distributed across autosomal chromosomes (e.g. AIMs), while local genetic ancestry is ancestry at a particular location on a chromosome and defined as having 0, 1, or 2 copies from each of the considered ancestral populations (Seldin, Pasaniuc, & Price, 2011). Global ancestry can be determined using as few as 100 AIMs (Tsai et al., 2005) and is crucial in case-control studies to prevent false associations while local ancestry requires many more markers and is used to identify a possible genetic locus affecting disease risk (Seldin et al., 2011). For Latinos, a panel consisting of European, African, and Native American markers that have large differences in allele frequencies among these ancestral populations is adequate in determining global genetic ancestry. Choosing the appropriate Native American population for the study population is important as well, since there is also genetic heterogeneity within Native American populations across Latin countries (Galanter et al., 2012; Price et al., 2007). Once a panel of selected genetic markers is established based on the ancestral populations selected, each individual is genotyped and based on the alleles present or absent, a proportion of each ancestral 7 population is assigned to that individual as a representation of the global ancestry. In addition to estimating the proportion of each ancestral population, AIMs can also be used to determine the risk of disease associated with the proportion of inherited alleles from each independent racial/ethnic group and to adjust for any confounding that may occur when evaluating any other traits with disease. 1.3 Prostate cancer anatomy and histopathology The prostate is an exocrine gland and one of the male sex accessory tissues of the reproductive system. It is a structure that has been conserved in all mammals, though its physiological and anatomical function varies by species. Walnut-sized and located right beneath the bladder surrounding the urethra and ejaculatory duct (Figure 1.1), the human prostate produces secretions that become apart of one half to two thirds of the ejaculate volume, and is thought to consist of secretory elements that help promote sperm viability and motility (Humphrey, 2003). Figure 1.1 Illustration of the prostate, its anatomical location, and its 3 different zones Source: people.rit.edu 8 Figure 1.2: Illustration of the location and structure of the 3 different zones of the prostate Source: Cancer: Principles and Practice of Oncology, 4 th ed. Lippincott Williams and Wilkins Anatomically, the prostate is comprised of three different glandular zones (Figure 1.2) and a right and left lobe (halves). The current zonal model in use was developed by John E. McNeal, M.D. and consists of the following: the peripheral zone, central zone, and transition zone. No fibrous boundary is histologically visible among the zones (Humphrey, 2003). The peripheral zone is about 70% of the prostate volume and is the most common zone of prostatic intraepithelial neoplasia (PIN) and carcinoma. The central zone is about 25% of the volume and the fraction of the epithelium to stroma is higher than in the rest of the prostate. The transition zone is only about 5% of the prostate volume but can enlarge due to benign prostatic hyperplasia (BPH) (Bostwick & Dundore, 1997). The entire prostate is further surrounded by a capsule that consists of an inner layer of smooth muscle and a collagenous outer cover; the relative amounts of each vary across the prostate. At the microscopic level, the adult prostate consists of many ducts and acini within dense fibromuscular stroma consisting of smooth muscle cells and fibroblasts. Differentiating between 9 ducts and acini in the prostate is not always possible, unlike other secretory glands in the body. The ducts and acini are composed of normal epithelium consisting of three main cell types: secretory cells, basal cells, and neuroendocrine cells. The luminar secretory cells produce prostate-specific antigen (PSA), prostatic acid phosphatase (PAP), acidic mucin and other secretory elements that are released into the lumen and into the ducts to combine with seminal fluid. PSA is a glycoprotein that serves as an ant-coagulant in seminal fluid. The basal cells are flattened cells that are at the base of the luminal cells and on top of the basement membrane that separate the gland from the stromal elements. In humans the basal to luminal cell ratio is approximately 1:1 while in other species it is seen to be approximately 1:7 (El-Alfy, Pelletier, Hermo, & Labrie, 2000). Basal cells play an important role in both normal and abnormal prostate growth. They are postulated to contain prostatic stem cells, while also being the target of estrogens, androgens, epidermal growth factor, and other growth factors (Bonkhoff, 1996; R. A. Taylor, Toivanen, & Risbridger, 2010b). It is the absence of the basal cell layer around a gland that is a key identifier of malignant growth. The neuroendocrine cells are the least common type in the prostate, with a suspected endocrine-paracrine regulatory role in growth and development. Intraluminal contents of glands also include degenerating epithelial cells and corpora amylacea, which are thickened secretions that can also be calcified. Corpora amylacea are commonly seen in the normal prostate and the numbers increase with a m an’ s age; calcified corpora amylacea may contribute to forming prostatic calculi, which are prostatic stones that primarily contain calcium phosphate and are often found in inflammation and benign prostatic hyperplasia (Humphrey, 2003). 1.3.1 Histological changes in the prostate Cellular changes may occur in the prostate that can be cancerous or non-cancerous. Non- cancerous types of cellular changes can include metaplasia, benign prostatic hyperplasia (BPH; 10 one of the most common conditions among older men), and atypical adenomatous hyperplasia (Humphrey, 2003). Epithelial atrophy arising due to inflammatory responses from a variety of possible exposures (e.g. infection, hormones, diet, urine reflux, physical trauma) is suspected to play a role as precursors to prostatic intrapethelial neoplasia (PIN) and cancer and is termed proliferative inflammatory atrophy (PIA) (De Marzo et al., 2007). PIN consists of changes in the prostate that are thought to be pre-malignant lesions, in particular high-grade lesions. High-grade PINs are very common in prostates removed, and is present in 75% of prostates with carcinoma and 35% of prostates without carcinoma (Humphrey, 2003). The majority of cancers occur in the peripheral zone of the prostate, where there is also a high prevalence of PIA and medium-high prevalence of high grade PIN (De Marzo et al., 2007). Fewer tumors occur in the transition zone where BPH is highly prevalent with some PIA, and hardly any occur in the central zone. Prostatic adenocarcinoma, which is cancer originating in the gland, comprises the majority of prostate cancer diagnoses (>90%). The structure of the glands changes greatly, with progressive disappearance of the basal cell layer and lamina, nuclear and nucleolar enlargement, and the glands becoming very irregular (Figures 1.3-1.5). This is in distinction with high-grade PIN that still maintain the architecture of the glands and ducts, albeit with crowding of proliferating cells, and the presence of the basal cell layer. Figure 1.5 provides histological evaluation of the progression from a normal prostate gland to cancerous glands. It is believed that disruptions to the structure of the glands allow prostatic components, such as PSA, to infiltrate into the circulatory system. Increased PSA levels in the serum can be an indicator of cellular structural changes in the prostate that can be either cancerous or non-cancerous. For diagnostic purposes at the time prostate tissue is received, such as biopsy tissue, monoclonal antibodies (i.e. 34βE12) are used to stain basal cells to determine if the glands are malignant or not based on presence or absence of basal cells (Humphrey, 2003). 11 Figure 1.3: Microscopic view of both benign glands (in red boxes; the dark purple cells lining the acini are the basal cells) and malignant glands (example of some in yellow box) spread throughout the stroma and in-between benign glands. Figure 1.4: Illustration of the structure of a normal prostate acinus with its cellular components (top) and the structure and differentiated behavior of malignancy (bottom). Stroma Basal cells Luminal cells Basal lamina Luminal secretion Loss of basal cells and basal lumina Glands packed together Large nuclei with prominent nucleoli Normal prostate cells Malignant cells Figure 1.5: Histological progression from normal prostate cells to malignant glands A. Normal prostate gland stained with hematoxylin and eosin. B. T he s am e nor m al pr ost a t e g l and depi c t ed i n pi c t u r e A but st a i ned w i t h 34βE 12 t o det ec t basal cells surround the gland (dark brown in picture). This confirms a benign diagnosis. C. Although basal cells cannot be seen with the naked eye at this magnification and without immunohistochemical staining, there is evidence of structural changes differing from normal glands. The red arrow points to inflammatory response cells surrounding the gland, indicating a possible prostatic inflammatory atrophy (PIA). D. High grade prostatic intraepithelial neoplasia (PIN) lesion. Although not visible with the naked eye, basal cells still surround the gland. High grade PIN lesions may look cancerous but are not, however, they are hypothesized to be precursors to malignant lesions. E. Prostate cancer of Gleason 3. Clustered glands still maintain glandular form but do not have basal cells. F. Prostate cancer of Gleason 4. Noticeable loss of structure and fusion of glands when compared to PCa of Gleason 3. Sources of images A, B, D: webpathology.com Normal Precursors to malignancy (PIA PIN) Cancer A B C D F E 1.3.2 Gleason grading system Pathologists use the Gleason grading system, developed by Dr. Donald Gleason with colleagues in the 1960s, to assess the extent of glandular differentiation and growth pattern of the cells in a prostate tumor, from biopsy cores and from radical prostatectomy tissue (Humphrey, 2003). The Gleason grades are between 1 and 5, and two grades are provided for each tumor: a primary score representing the predominant tumor pattern and the secondary score representing the second most common pattern in the tumor. The sum of these two is called the Gleason score and the higher the number, the higher the deviance from the normal pattern. The grading system has been an important factor in determining prognosis of localized PCa patients. Figure 1.6: Gleason grading diagram Source: prostate-cancer.org Traditionally, the G l e as on sc or e i s ca t eg or i z ed i nt o g r oups of ≤ 6, 7, and 8 -10. Gleason 7 tumors are either 3+4 (primary 3, secondary 4) or 4+3 (primary 4, secondary 3), depending on the prominent Gleason pattern seen. There have been studies showing that there is a difference in prognosis among patients with 3+4 and 4+3 tumors, with 4+3 tumors having worse prognosis than patients with 3+4 tumors (Lavery & Droller, 2012; Makarov, SANDERSON, Partin, & Epstein, 2002; Tollefson, Leibovich, Slezak, Zincke, & Blute, 2006; Wright, Salinas, Lin, & Kolb, 2009). 14 Structurally, Gleason 3 tumors show important differences when compared to Gleason 4 tumors. The basal cell layer is absent from both, but a noticeable difference is the disruption of glandular form and visually showing possible diffusion of cancerous glandular cells through the basement membrane. This may be an important characteristic to consider when evaluating tumors involving both Gleason grades, since if Gleason 4 tumor cells have the potential to invade its own basement membrane then it has developed mechanisms that would prime it to invade other membranes in the prostate, such as those belonging to lymphatic or blood vessels (Lavery & Droller, 2012). Further discussion on the heterogeneous nature of tumors with differing Gleason grades will be in sections 1.3.3.2 and 1.3.4. Figure 1.7: Pathological image of adjacent Gleason grade 3 (G3) and Gleason grade 4 (G4) tumor cells. G3 G4 15 1.3.3 Prostate cancer staging The current staging system for prostatic adenocarcinomas follows the TNM (T=primary tumor, N = regional lymph nodes, M = distant metastasis) AJCC/UICC (American Joint Cancer Committee/Union Internationale Contre le Cancer) staging classification. The stage provided determines the extent to which the tumor has grown within the prostate and the tumor extension outside of the prostate. Figure 1.8 provides a visual guide on tumor progression and the corresponding stage. Clinical staging is based findings from physical examination, imaging, and biopsy tissue obtained from the prostate and pathological staging is based on what is learned about the cancer at radical prostatectomy, which is a surgical treatment option to remove the entire prostate. Figure 1.8: Illustrations classification of each pathologic stage of prostate cancer based on tumor growth from the prostate. Source: www.cancer.gov Tables 1.1-1.4 are obtained from the National Cancer Institute and show the most recent version (revised in 2010) of the TNM staging system by the AJCC. 16 Table 1.1: Clinical staging of prostate cancer Table 1.2: Pathologic staging of prostate cancer Table 1.3: Regional lymph nodes staging of prostate cancer 17 Table 1.4: Metastasis staging of prostate cancer 1.3.3.1 Staging and grading of prostate biopsies Clinical staging is estimated based on diagnostic assessments of tumor extension and cellular differentiation. These include results from digital rectal examination (DRE), lab results, imaging, and histologic analysis of biopsy samples obtaine d f r o m a pat i en t ’ s p r os t at e. T h er e a r e v ar i a t i o ns in guideline recommendations from several countries in regards to the recommended factors that clinicians should consider before deciding to pursue a prostate biopsy on a patient. Factors include results from DRE, levels of serum PSA (cutoffs of high PSA vary), PSA density (serum PSA divided by prostate volume), family history, age, race/ethnicity, and comorbidities (Ukimura et al., 2013). There are also further guideline recommendations for prostate biopsy techniques. In prostate biopsies, 10 to 12 transrectal needle core biopsies are recommended and are the typical systematic biopsies performed to capture a full range of areas from the peripheral zone of the prostate (Figure 1.9) (Ukimura et al., 2013). Additional cores can be taken based on DRE results or through imaging such as transrectal ultrasound (TRUS) and even more recently, yet still limited in clinical application, magnetic resonance imaging (MRI) in conjunction with TRUS, to optimize visualization of potential suspicious areas for sampling (Ukimura et al., 2012). Once a biopsy is taken, a pathologist is able to examine the tissue to determine if cancer is present. If a 18 tumor is detected, the case is staged based on the most recent clinical staging system using all clinical diagnostic results (Table 1.1). Since in some cases the initial biopsy turns out to be negative for cancer but there is other clinical suspicion of cancer (e.g. DRE results, high PSA, high grade PIN lesions) a repeat biopsy is warranted. Most studies have shown that up to 30% of patients of repeat biopsies with initial negative biopsies actually do have cancer, and therefore other clinical variables should be taken into account after a negative initial biopsy is detected (Ukimura et al., 2013). It is also still important to consider repeat biopsies in cases with an initial positive biopsy that are diagnosed as low-risk w her e t he pat i en t d ec i des t o u nder g o “a c t i v e sur v ei l l an ce ” ( A S ) , a process by which patients are actively followed with repeated biopsies instead of receiving definitive treatment. Among patients who were initially diagnosed as low- risk who underwent an immediate repeat biopsy, 15-20% had upgrading (Berglund et al., 2008). It is approximated that 1 of 3 patients who would be considered as candidates for AS, may in fact have more advanced disease than initially assessed based on diagnostic biopsy samples (Ukimura et al., 2013). Figure 1.9: Recommended schematic for initial prostate biopsy using 12 cores that cover the entire peripheral zone where tumors usually occur Source: (Ukimura et al., 2013) 19 1.3.3.2 Staging and grading at radical prostatectomy Once a radical prostatectomy is performed, staging is confirmed based on the tissues resected to determine extent of tumor invasion (Tables 1.2 and 1.3). This is in addition to the pathological assessment of the prostate tumor. A current concern in prostate cancer diagnosis is the fact that approximately 30-50% of tumors with Gleason 6 score (3+3) at biopsy are upgraded to Gleason 7 (3+4 or 4+3) at radical prostatectomy (Berglund et al., 2008; Chun et al., 2006; Cohen et al., 2008; Lavery & Droller, 2012; Truong et al., 2013). This occurrence can be attributed to the heterogeneous nature of the disease where there can be multiple foci of cancer lesions within the prostate, which will be discussed further in section 1.3.4. The random sampling of tissue during a biopsy may lead to misclassification since only small pieces of the prostate can be assessed, with the potential of capturing only portions of the tumor that may not be representative of the overall tumor aggressiveness. For example, a biopsy evaluation of the cores may indicate a tumor with Gleason 6 (3+3), while it is possible that both patterns of Gleason grades 3 and 4 may be present in the prostate, and the Gleason 4 area may not be captured in the biopsy cores but found during pathological evaluation after radical prostatectomy and therefore staged as a Gleason 7 (3+4 or 4+3) case. This is concerning since there is evidence showing that Gleason 4 pattern is associated with worse prognosis, and could have inherently different biological attributes than Gleason 3 tumors and below (Lavery & Droller, 2012). Downgrading at RP also occurs in approximately 30-50% of patients, especially those with Gleason 8-10 biopsies or a primary grade of 4 or 5 at biopsy (Cohen et al., 2008; Epstein, Feng, Trock, & Pierorazio, 2012; Whitson et al., 2013). There could be possible explanations as to why downgrading may occur. One could be that the tumor looks uncertain between two Gleason grades at biopsy and is diagnosed with the higher grade. Another explanation could be that a biopsy core may capture such a small foci with high Gleason in the prostate that the foci may either not be seen in the assessment of the primary tumor after RP, or it may not be included in the final score because of the small amount. Since studies 20 have shown the importance of considering small high-grade tumors in prognosis, as will be discussed in section 1.3.4, the latter may not be as likely to occur if the histological grading system and methods are nowadays being re-evaluated and modified based on the findings. One st udy ’ s f i nd i ng s consistent with results from previous studies indicated that having a smaller proportion of positive cores with high grade tumor and a smaller proportion of total tissue having prostate cancer at biopsy were significantly associated with downgrading (Whitson et al., 2013). Whereas downgrading may lead to overtreatment of men who actually have lower Gleason patterns, upgrading at RP may have serious consequences for patients who are considering pursuing AS at diagnosis. One study even showed that 40% of patients with Gleason 8-10 on biopsy were downgraded, while 46% of patients with Gleason 2-6 on biopsy were upgraded (Cohen et al., 2008). This potential misclassification of the diagnostic Gleason score due to tumor heterogeneity may lead to many patients categorized as low-risk to pursue AS, thus preventing these patients from receiving earlier and more aggressive treatment that they need. Importantly too, this relatively high rate of misclassification may discourage many patients diagnosed with low-risk disease from pursuing AS altogether due to the concern that the cancer might be more aggressive than originally diagnosed. Recent studies have attempted to look at possible predictors of upgrading such as the number of biopsy cores obtained, number of positive biopsy cores, clinical stage, PSA level, PSA density, DRE results, prostate volume, TRUS results, number of previous biopsies, histological findings such as high grade PIN, and patient characteristics such as age, ethnicity/race, and BMI (Iremashvili, Manoharan, Pelaez, Rosenberg, & Soloway, 2012; Truong et al., 2013). However, the ability of currently developed nomograms to differentiate between patients who experience upgrading from those who do not is still very limited after validation in external cohorts, with an area under the curve (AUC) estimate (a statistical metric estimating predictive ability, further 21 discussed in section 3.4.3.2.3) ranging from approximately 0.54 to 0.67 (Iremashvili et al., 2012; Truong et al., 2013). Therefore further evaluation of factors associated with upgrading is warranted to increase accuracy of discriminating between low and high-grade disease.. 1.3.4 Multifocal prostate cancer and heterogeneity of disease Although research in PCa made progress in the ways clinicians diagnose and treat patients, there is still controversy over what would be the best staging and grading system clinicians should use because of concerns over the heterogeneous nature of PCa (Andreoiu & Cheng, 2010). Most prostate tumors, 60-90%, tend to have multiple tumor foci which suggest multifocal origins with potential to progress at varying rates (Andreoiu & Cheng, 2010; Lavery & Droller, 2012; Mouraviev, Mayes, & Polascik, 2009). Yet, the biological basis of multifocal heterogeneity within prostate cancer is still not well understood. One hypothesis is that different clonal types of cancer stem cells develop independently at the same time and the fusion of these small independently rooted foci over time across the prostate creates a larger and histologically more heterogeneous tumor with different Gleason scores. Yet, some of these foci, such as those with higher Gleason grade, may progress while the others will not. If this is the case, a patient with only Gleason 3 and no Gleason 4 pattern may have great prognosis and no extensive treatment would be needed. Another hypothesis is that the multifocal state derives from one initial lesion, where the cancer evolves through different Gleason patterns over time, allowing multiple foci in the prostate to evolve into separate tumors with different grades, albeit sharing some underlying genetic alterations across all foci (Andreoiu & Cheng, 2010; Lavery & Droller, 2012) (Figure 1.10). If this hypothesis holds true, then there is potential for low-risk Gleason 3 tumors to evolve into Gleason 4 cancer cells, which would indicate the need for early intervention and appropriate 22 follow up to prevent further progression (Lavery & Droller, 2012). There are findings that support either of these hypotheses, showing that the natural progression of PCa is still not yet fully understood. Lavery and Droller point out an interesting point showing that Gleason 8-10 tumors only slightly decreased from pre-PSA to the post-PSA era, providing some evidence that if in fact cancer cells evolve into higher Gleason over time, then catching these tumors early with the PSA screening would show a large downward grade migration (Lavery & Droller, 2012). This supports the hypothesis of independent clonal types of cancer stem cells rather than progression from a single clonal type. Figure 1.10: Two hypothesized ideas contributing to tumor heterogeneity in prostate cancer progression over time. A) Hypothesis 1: Different clonal types of cancer stem cells develop independently. For example, a Gleason 3 (G3) lesion and a Gleason 4 (G4) lesion arise independent from one another. B) Hypothesis 2: Different foci evolve from an initial lesion. For example, a G3 lesion can evolve to become a G4. Derived from: (Andreoiu & Cheng, 2010; Lavery & Droller, 2012) A key implication of these two hypotheses is that finding potential biomarkers predictive of aggressive disease relies on an understanding of the biology of tumor heterogeneity. Based on the hypothesis presented in Figure 1.9A, if a biomarker is identified from the sampling of malignant cells that are terminally lower Gleason, one could not use it to predict that there Initial lesion G4 G3 G3 Initial lesion G3 G3 G4 (A) (B) 23 possibly are other more aggressive cells, if the lesions arise independent from one another. If the hypothesis pictured in Figure 1.10B holds true, then sampling from foci of lower Gleason may provide clues that can help detect the genetic changes needed to occur to progress into higher-risk malignant lesions. It is possible to that a combination of both environmental exposures (i.e. carcinogens) and underlying genetic factors m ay cont r i but e t o w ar ds a “f i el d d ef ect, ” and subsequently additional carcinogens and inflammatory processes may play a role in forming new, and/or expanding the existing precursor lesion foci, thus contributing to multifocal tumors (Andreoiu & Cheng, 2010; Lavery & Droller, 2012). Understanding prostate cancer tumor evolution and which are the critical alterations linked to aggressive behavior are crucial to target the disease in the treatment stage and optimally the diagnosis stage. Re-evaluation of the histological grading system may be important in addressing the heterogeneity in PCa. Gleason patterns can vary throughout the prostate and only the two most prevalent patterns are combined to provide the overall Gleason score. Yet, studies have also shown that there could be tertiary Gleason patterns that are not as prevalent in size in the overall tumor assessment but may be a strong predictor of prognosis, especially if it is a higher grade pattern (Harnden, Shelley, Coles, & Staffurth, 2007; Hattab, Koch, Eble, Lin, & Cheng, 2006; Sim, Telesca, Culp, Ellis, & Lange, 2008; Trock et al., 2009). Therefore, using the index tumor, which can be defined as the largest tumor found in the prostate, to determine prognosis may not always provide accurate cl ues t o t he t um or ’ s be hav i o r ( although in many cases the index tumor usually does acquire high-grade foci). To truly assess th e c anc e r ’ s b eha v i or , a l l t u m or f oci within the prostate would have to be evaluated. 24 The multifocal nature of PCa tumors presents a challenge in the clinic and in research. As mentioned in section 1.3.3, properly staging the disease using clinical fine-needle biopsies that extract only small portions of the prostate affects treatment options in light of the possibility of upgrading. It is also a challenge for developing more conservative treatment methods, such as focal ablation therapy, a technique that ablates only a region of cancer while keeping non- malignant prostate tissue intact (Mouraviev et al., 2009). Lastly, prostate cancer heterogeneity presents a challenge in research studies, given that there are typically limited amounts of prostate tumor tissue available, and the sampling of tissue might not always be representative of the overall tumor landscape. In order to address heterogeneity, underlying genetic profiles of PCa tumors may further provide insight into the characteristics of malignant cells, as it is speculated that molecular alterations may precede visible histological/pathological changes. Differences in genetic profiles of tumors of different Gleason patterns capture the variation in potential aggressive behavior between high-grade and low-grade carcinomas (Lavery & Droller, 2012; Tomlins et al., 2011; True et al., 2006). Analysis of gene expression profiles from microdissected Gleason 3+3 tumors and 4+4 tumors show that Gleason pattern 3 tumors tend to overexpress genes that control intracellular metabolism, versus Gleason 4 pattern tumors tend to overexpress genes involved in cell cycle regulation, and DNA replication with diminished androgen signaling (Lavery & Droller, 2012). More detailed characterization of different Gleason grades along with a greater understanding of the heterogeneous physical and genetic nature of PCa tumors could provide more insight into predictors of aggressive tumors that have worse prognosis. 25 1.3.5 Localized prostate cancer A tumor that is confined within the prostate without metastasis to outside tissue or organs is identified as localized prostate cancer. Clinical staging, based on biopsy and imaging results, of localized PCa consists of cT1-T2. Pathologic staging (after the removal of the prostate) of localized PCa is pT2 for organ-confined tumors that are within the prostatic capsule (pT2a-pT2c) and pT3 for tumors that extend through the prostatic capsule and may involve seminal vesicles (pT3a-pT3b) but have not invaded into other organs outside the capsule. 1.3.5.1 Early detection and prostate specific antigen (PSA) screening In 1986, the prostate specific antigen (PSA) test was approved by the US Food and Drug Administration (FDA) and was made available as a clinical tool to help detect PCa recurrence following treatment. This test allowed for detection of serum PSA levels, which serves as a biomarker of histological changes, such as cancer, in the prostate if found in high amounts. Soon it became a cancer screening tool when used in conjunction with a digital rectal exam (DRE) and biopsy of the prostate to detect the presence of a tumor. The introduction of this screening tool led to a stage migration toward more localized PCa cases and cases involving younger patients (R. A. Stephenson & Stanford, 1997). Based on data from Surveillance, Epidemiology, and End Results (SEER), incidence rates of prostate cancer increased sharply after PSA screening came into clinics. In 1985, there were 115.5 new cases per 100,000 men and in 1992 there were 237.4 new cases per 100,000 men among all ethnicities/races ( “ SE ER C anc e r S t a t i s t i cs Fa ct sh ee t s: Pro st at e C anc er ,” n.d .) . This was evident among both Caucasian and African-American men, as there was more detection of early disease in all asymptomatic men (Figure 1.11). From 1975 to 1988, the annual percent change (APC) in delay-adjusted incidence (age adjusted to the US standard population in 2000) was 2.8 for Caucasians and 2.1 for African Americans at all ages. From 1988 26 to 1992, the APC increased to 16.3 and 19.2 for Caucasians and African Americans at all ages, respectively. Furthermore, because of PSA screening, younger men (<65 years old) especially had a distinct increase in incidence during the start of PSA screening (Figure 1.11). For men below 65 years old, the APC from 1975 to 1988 was 3.6 and 2.0 for Caucasian and African Americans, respectively. From 1988 to 1992, the APC for men below 65 years old was 22.7 and 26.1 for Caucasians and African Americans, respectively. 27 Figure 1.11: Surveillance, Epidemiology, and End Results (SEER) incidence rates of prostate cancer by age of diagnosis (1973-1995) The PSA screening test detects serum PSA, which is seen in very low or undetectable levels in a man with a normal prostate. PSA levels can rise in the serum based upon many types of cellular changes in the prostate, whether it is benign prostatic hyperplasia or malignancy (Oesterling, 1991). Therefore, PSA is not specific to only malignant cells, which can lead to misdiagnosis when used as the sole tool of diagnosing disease. Of the malignant diagnoses 28 detected by initial PSA testing, with DRE, and ultimately followed-up by a fine needle biopsy of the prostate, many are clinically localized disease (tumor is confined to the prostate). According to SEER data from 2003-2009, approximately 81% of diagnosed PCa cases are localized disease ( “SEE R C anc er S t a t i s t i cs F ac t s hee t s : Pro st at e C anc e r ,” n.d.) . 1.3.5.2 Current treatment standards Along with the increase in incidence of localized disease due to screening, there has also been an increase in invasive procedures to treat disease. Since the early 1990s, radical prostatectomy (removal of the entire prostate gland and surrounding seminal vesicles and lymph nodes) has been the most popular choice as the primary treatment of disease. Approximately 60% of men diagnosed with low-risk choose to undergo radical prostatectomy as their primary treatment (Matthew R Cooperberg, Broering, Kantoff, & Carroll, 2007). However, RP may carry potential side effects affecting quality of life, such as incontinence and impotence if nerve-sparing surgery is not possible. Brachytherapy and external-beam radiotherapy are also options for treatment, a choice of primary treatment for approximately 15% of low-risk patients (Matthew R Cooperberg et al., 2007; Walsh & DeWeese, 2007) . “ Active su r v ei l l a nce ” o r “wa t ch f ul w a i t i ng ” are options that are least favored by most patients. Although these terms are used interchangeably at times, “a ct i v e su r v ei l l an ce ” i s r e f er r ed t o as a m or e i n t en s e m oni t or i ng of di se a se u si ng PSA tests, DREs, and imaging at frequent follow-up intervals. According to data from the Cancer of the Prostate Strategic Urologic Research Endeavor (CaPSURE) registry on 10,385 men diagnosed with PCa from 31 urology centers across the US, 10.2% of patients diagnosed from 2004 to 2006 chose active surveillance, up from 6.2% in 2000 to 2001 (Matthew R Cooperberg et al., 2007). “W a t ch f ul w ai t i ng ” i nv ol v es l es s f r equ ent m oni t or i n g and r e l i es on sy m pt o m s t o de termine the next course of treatment, if any are required. Data from 1990-2001 in the Department of Defense Center for Prostate Disease Research Database show that approximately 14% of diagnosed PCa 29 cases cho se “wat c hf u l w a i t i ng ” (WU et al., 2004). Delayed treatment would be desirable for many men with low-risk disease who may have a tumor that will not progress further in order to reduce the negative impact of side effects on health related quality of life. Therefore it is crucial to find ways to accurately categorize low-risk patients in order to increase enrollment in observation rather than invasive treatment. 1.3.5.3 Impact of PSA screening/early detection on treatment options The inability of the PSA test to accurately differentiate an indolent tumor from an aggressive one led to controversy over whether or not it should be recommended as a screening tool. The European Randomized Study of Screening for Prostate Cancer reported that of 1,410 men who are screened, 48 cases need to be treated for PCa in order to prevent one PCa death (Schroder, Hugosson, & Roobol, 2009). Therefore, overdiagnosis and overtreating indolent disease cases has been a rising concern in Europe as well as in the US, in addition to other developed countries, because of PSA screening (Welch & Albertsen, 2009). In 2012, the U.S. Preventive Services Task Force provided a Grade D recommendation, in which they recommend against PSA screening for prostate cancer (Screening for Prostate Cancer, Topic Page. U.S. Preventive Services Task Force., n.d.). The main argument for this recommendation is that screening provides little benefit for the potential high cost of overtreating many patients. However, the American Urologic Association (AUA) acknowledges the potential benefits of prostate cancer screening and risk assessment primarily for men between 55-69 years old and with greater than 10 years estimated life expectancy, or even younger men who have strong risk factors such as family history of PCa or being African-American (Carter, 2013). Both sides provide valid concerns, with opponents of PSA screening focused on the issue of overtreatment that may interfere with quality of life and proponents focused on appropriately treating based on proper risk assessment of the case. It is reported that approximately 30% of patients who elect for radical 30 prostatectomy have truly low risk of disease recurrence and may benefit more if they opt for “a ct i v e su r v ei l l an ce ” ( A S ) (Eggener et al., 2013). In order to address these concerns, men classified as high risk for PCa-related mortality should be actively treated for their disease, while all other patients should undergo and remain on AS unless signs of cancer progression force for definitive local therapy. However, the problem is that currently there is no accurate clinical tool that can categorize men into the AS group assuredly knowing that all these men are low-risk and will not progress. As previously discussed in section 1.3.3.2, there is evidence of upgrading among men initially categorized as low-risk, showing the predicament if using current classification methods to place men into the observation group without treatment. The PIVOT trial, the first randomized trial comparing men in watchful waiting to men who underwent radical prostatectomy with at least 12 years of follow-up, showed that while only a subgroup of men can benefit from RP, there were no differences seen in risk of metastasis and PCa-related mortality between the groups after 7-9 years of follow-up (Wilt et al., 2012). To identify the subgroup of men, diagnostic variables such as Gleason score and PSA level and age are important to consider. The results from the Scandinavian Prostate Cancer Group-4, randomized trial assessing outcomes of RP versus watchful waiting was in agreement with the fact that only a subgroup of individuals benefited from RP while all others did not needed invasive treatment (Bill-Axelson et al., 2011) . In addition to these trials, further studies and trials (PRIAS in the Netherlands and ProtecT in the United Kingdom) were attempted or still ongoing to determine possible predictors that can accurately identify patients who would benefit more from AS (Bastian et al., 2009; Hajj et al., 2013; Klotz et al., 2010; Lane et al., 2010; Van As et al., 2008). Although clinical variables such as Gleason score at biopsy, patient age, PSA level, PSA kinetics (how quickly PSA rises over time), tumor grade and volume have been studied as possible predictors, at this point, no conclusive predictors of PCa progression have been determined(Cary & Cooperberg, 2013; Klotz et al., 2010; Van As et al., 2008). A study reported that their method of restaging biopsy after an initial biopsy before starting AS may have helped minimize potential upgrading and determining 31 eligibility for AS (Eggener et al., 2013). Therefore, discovering predictors, including clinical variables and genetic biomarkers, that could accurately assess risk of progression and set criteria indicating for AS would help prevent overtreatment of patients who have indolent disease while treating patients who are truly at high risk of progression. Additionally, the use of biomarkers that could help with accurate risk categorization at the time of biopsy would address the controversy over early detection using PSA screening while informing clinicians and their patients on the appropriate treatment steps needed. 1.3.5.4 Clinical prognosis among localized prostate cancer patients 1.3.5.4.1 Biochemical recurrence (PSA recurrence) Even after a radical prostatectomy, up to one third of patients can experience a biochemical recurrence (BCR) (also called PSA recurrence) when serum PSA levels become detectable again (Lerner et al., 1996; Pound, Partin, & Eisenberger, 1999; Ward, Blute, Slezak, Bergstralh, & Zincke, 2003). Following surgery, the serum PSA level should be undetectable within 6 weeks of the radical prostatectomy (Cookson et al., 2007). A detectable PSA level after this period indicates that prostate epithelial cells are still present in the body and might imply the presence of a micro-metastasis; however, this rise in PSA does not confirm whether a clinical metastasis is present or not. The criteria used for the PSA level threshold determining a “r ec ur r en ce ” ha s changed throughout the decades, with no clear standardized definition (Cookson et al., 2007; A. J. Stephenson et al., 2006). Studies have shown that rates of progression could have a difference of 35% depending on the definition of BCR used based on the PSA threshold (A. J. Stephenson et al., 2006). Reports show that 18% to 29% of individuals with BCR can progress to metastatic disease (Loeb et al., 2011; Pound et al., 1999), indicating that BCR is suggestive and not definitive of possible aggressive disease. The ideal BCR threshold defining recurrence would be one that is highly correlated with metastatic disease. Since the biological nature of PCa can vary 32 greatly among different patients, other clinical factors than PSA level should be discussed before treatment is determined. 1.3.5.4.2 Predictors of prognosis Currently, the most predictive factors for localized PCa recurrence are selected clinical variables. Doubling time of PSA, which is the time elapsed until the PSA level doubles, has been used as an indicator of tumor aggressiveness, with lower doubling time showing higher association with further progression (Roberts, Blute, Bergstralh, Slezak, & Zincke, 2001). However, by the time PSA level starts doubling rapidly, the patient may already have metastatic disease. Risk of recurrence may additionally vary based on clinical factors such as Gleason score, pre-operative PSA level, pathologic stage, age at diagnosis, tumor volume, and positive surgical margins. Predictors of biochemical recurrence may not necessarily be the same predictors for metastatic disease, since only a subset of patients with BCR progress. Therefore, the tumor characteristics, which may include genetic alterations accumulated and the tumor microenvironment at the time of diagnosis, m ay pr ov i de cl ue s t o t h e ag g r es si v ene s s of t he t um or ’ s b eha v i or before the patient experiences any recurrence. 1.3.5.5 Current prognostic tools: strengths and weaknesses An ideal clinical tool to predict prognosis would be one that both accurately discriminates between indolent and aggressive tumors while maintaining simplicity for efficient clinical use. A concern regarding the current tools available in determining prognosis for localized PCa patients is their limited predictive accuracy. These tools include models and nomograms, intended for easy application in the clinic, that use a combination of clinical variables such as biopsy Gleason score, clinical stage, pre-operative PSA level, and in some models data collected at time of 33 surgery. Examples of models used are t h e D ’ A m i co risk stratification (D'Amico, Whittington, & Malkowicz, 1998), Partin tables (Partin, Mangold, Lamm, & Walsh, 2001), Kattan nomograms (Kattan & Eastham, 1998; Kattan, Wheeler, & Scardino, 1999; A. J. Stephenson et al., 2006), and the CAPRA score (Matthew R Cooperberg et al., 2005) (Table 1.5). Some models use only pre- operative variables to determine stage of disease or prognosis before surgery while some modified models include post-operative variables to determine risk of recurrence after radical prostatectomy. Studies validating the current models show that the area under the curve (AUC), a metric capturing predictive accuracy (AUC= 0.5 showing predictive ability not better than random and 1.0 having perfect prediction of recurrence, with further discussion in section 3.4.3.2), of these models range from 0.65 to 0.81 (Blute, Bergstralh, IOCCA, Scherer, & Zincke, 2001; Boorjian, Karnes, Rangel, Bergstralh, & Blute, 2008; Matthew R Cooperberg et al., 2006; Graefen, Karakiewicz, Cagiannos, Klein, et al., 2002a; Graefen, Karakiewicz, Cagiannos, Quinn, et al., 2002b; Greene et al., 2004; Karnes et al., 2013; May et al., 2007; Mitchell et al., 2005; A. J. Stephenson et al., 2006; Steyerberg et al., 2007; Zhao et al., 2008). Although some models show higher AUCs, there is still variability in prediction when used in different validation cohorts and they are still not accurate enough to guide a patient confidently into a certain treatment course. Specifically, an AUC measurement of 0.65 of a given clinical model means that there is a 65% chance that using this model a randomly selected localized PCa patient ranked by this model as a pat i en t ‘ at r i sk ’ f or r e cur r e nce , w i l l a ct u al l y hav e a r e cur r enc e, w he r ea s on e r ank ed by t h e m ode l as ‘ l ow r i sk ’ f o r r e cur r en ce w i l l not h a ve one. Therefore, there would be 35% chance of incorrectly discriminating patients for risk of recurrence, suggesting that each patient will have 35% chance of being told he is at risk of recurrence when in reality he is not (false positive), which might lead to unnecessary treatment, or he might be told he is at low risk of recurrence when in reality he is (false negative), increasing the chance of dying from aggressive disease. 34 It is hypothesized that given two patients with identical clinical characteristics (Gleason score, stage, etc.) at the time of diagnosis the tumors from patients who later experience recurrence might be molecularly distinct from tumors from patients who never have recurrence and have true indolent disease. Therefore, tumor biomarkers that capture this aggressive behavior used in combination with current clinical variables may provide the information needed to improve upon current models attempting to predict aggressive disease (Cooper, Campbell, & Jhavar, 2007). For instance, genetic markers that provide additional understanding of the underlying genetic behavior of a tumor at risk of progression would provide useful information on top of clinical variables to predict clinical outcomes of PCa. In the next section I discuss the progress made to date regarding the discovery of new markers of tumor aggressiveness that could improve upon the predictive accuracy of models using only clinical variables. 35 Table 1.5: Current PCa prognostic tools (models and nomograms) using only clinical variables and their predictive ability captured by the area under the curve (AUC). Using pre-operative variables Using post-operative variables Prognostic tool 0.73 (Blute et al., 2000) 0.65 (Cooperberg et al., 2005) 0.8 (Graefen et al, 2002) 0.68 (Greene et al., 2004) 0.75 (Graefen et al, 2002) 0.78 (Zhao et al., 2008) 0.89 (Kattan et al., 1999) 0.76 & 0.79 (Stephenson et al., 2006) 0.81 & 0.79 (Stephenson et al., 2006) 0.79 (Steyerberg et al., 2007) 0.66 (Cooperberg et al., 2005) 0.68 (Cooperberg et al., 2005) 0.76 (Zhao et al., 2008) 0.78 - 0.81 (May et al, 2007) D'Amico risk group stratification (D'Amico et al., 1998) Variables used for risk stratification (low, intermediate, high): Clinical stage Pre-operative PSA level Biopsy Gleason score Abbreviations: Organ confined (OC), extracapsular extension (EC), seminal vesicle invasion (SVI), lymph node positive (LN+), biochemical recurrence (BCR), AUC (area under the curve), prostate specific antigen (PSA) Risk stratification only Studies validating BCR-free probabilities among risk groups: (Hernandez et al., 2007) (Boorjian et al., 2008) (Mitchell et al., 2005) Predictive ability (AUC)* Predictive ability (AUC)* **Additional variables added in the new nomograms are in italics *Area under the curve (AUC) used to assess predictive ability (sensitivity and specificity) of a model. Predicting 10-year freedom of BCR after surgery: Predicting indolent cancer: Nomogram for organ confined PCa (Steyerberg et al., 2007) Clinical stage Biops y 1° Gleason grade Biops y 2° Gleason grade Pre-operative PSA level Ultrasound prostate volume % positive cores Mm noncancerous tissue Mm Cancerous tissue NA Clinical variables used in current prognostic tools NA NA Predicting 7-year freedom of BCR: Predicting 10-year freedom of BCR after surgery: Clinical stage Pre-operative PSA level Biopsy 1 ° Gleason grade Biopsy 2 ° Gleason grade # positive biopsy cores # negative biopsy cores Pre-operative PSA Seminal vesical invasion Lymph node involvement Surgical margins Extracapsular extension Year of radical prostatectomy 1° Gleason grade (surgery) 2° Gleason grade (surgery) Predicting organ confined disease: Clinical stage Pre-operative PSA level Biopsy Gleason score Pathologic stage Predicting OC: 0.68 Predicting ECE: 0.62 Predicting SVI: 0.74 Predicting LN+: 0.77 (Yu et al., 2010) Predicting BCR after surgery: Predicting 5-year freedom of BCR after surgery: Clinical stage Pre-operative PSA level Biopsy Gleason score Kattan nomograms (Kattan et al.,1998) (Kattan et al.,1999) Gleason score of surgical specimen Pre-operative PSA level Prostatic capsular invasion Surgical margins Seminal vesicle invasion Lymph node involvement Clinical stage Pre-operative PSA level Biopsy Gleason score Percent positive biopsies Age at diagnosis CAPRA score (Cooperberg, 2005) Clinical stage Pre-operative PSA level Biopsy Gleason score Partin tables (Partin et al., 1997) Updated nomograms** (Stephenson et al., 2006) 36 1.3.5.6 Biomarkers of prostate cancer: development and progression Uncovering the underlying molecular characteristics of PCa has allowed researchers to gain insight into potential biomarkers of PCa diagnosis and PCa progression. Determining biomarkers that would help clinicians differentiate between an aggressive tumor and a tumor that is indolent and slow-growing would provide more clear guidance on treatment steps for their patients to avoid overtreatment of disease, and more aggressive and earlier treatment of possibly metastatic disease. 1.3.5.6.1 Single nucleotide polymorphisms Various types of studies such as candidate gene studies and genome-wide association studies (GWAS) with different populations have been used to determine certain single nucleotide polymorphisms (SNPs) that may play a role in both risk of prostate cancer and prognosis. Many studies have studied the association of genes associated with prostate cancer risk, and examples include genes in steroid metabolism (AR, SRD5A2, CYP genes, HSD3B, HSD17B, UGT2B15, ER, VDR, PSA), oxidative stress (GSTP1, GSTM1, GSTT1, MnSOD), cell cycle and tumor suppressor genes (p53, TGF- β, cy c l i n D 1) , cell adhesion, DNA repair (OGG1, XRCC1, MGMT, NBS1/NBN), angiogenesis (VEGF, HIF1-A, FGF4, ecNOS, endostatin), metabolic markers (IGF-1, IGFBP-3, INS, IRS-1, leptin), common variants in familial prostate cancer genes (ELAC2, RNASEL, MSR1), and other genes from other cancer studies (BRCA2, ATM, CHEK2, PTEN) (Naylor, 2007). The tumor suppressor gene NKX3.1 on chromosome 8p Another polymorphism in the TMPRSS2 gene was also found to be associated with prostate cancer risk, which was further discovered to be a part of a fusion with other genes in the ETS gene family (ERG, ETV1, ETV4) through chromosomal rearrangements ( “Re cur r en t f us i on of TMPR SS 2 and E T S t r a nsc r i pt i on f ac t o r g ene s i n p r os t a t e ca nc er ., ” 2005 ;; Shah, M ehr a, & C hi nna i y an, 200 8) . This fusion was also 37 associated with worse prognosis for prostate cancer patients (Perner et al., 2006) (Mehra et al., 2008). GWAS studies have also implicated the role of prostate cancer susceptibility alleles (Eeles et al., 2013; 2009) and alleles distinguishing aggressive cases from less aggressive cases (Amin Al Olama et al., 2013). Among these alleles, several in the 8q24 region have been reported to associate with prostate cancer risk (Gurel et al., 2008; Haiman et al., 2007; Witte, 2009). Since most studies on SNPs have focused primarily on progression of normal to prostate tumor development and less on prognosis, it remains unclear as to which genes and polymorphisms can also be used to differentiate between aggressive and indolent disease, if there is an overlap at all. Assessing tumor alterations can provide clues for prognosis by differentiating tumors with potential to metastasize from indolent slow-growing tumors, but finding variants associated with disease aggressiveness would involve large GWAS studies with PCa patients with extensive follow-up information to determine recurrence or mortality status after treatment. Assessing gene expression of tumors would provide additional information that would help determine genes associated with prognosis. Using an overlap of GWAS SNP data and expression data to determine associations and overlapping genes would also provide a better understanding of which genes are involved in prognosis (Witte, 2009) (Figure 1.12). 38 Figure 1.12: The role of somatic and germline mutations in the evelopment and progression of prostate cancer. Source: (Witte, 2009) 1.3.5.6.2 Gene expression prognostic signatures The technical advances in gene expression microarray development over the years provided an enhanced tool to assess expression of genes across the genome simultaneously, improving the possibility of discovering biomarkers associated with aggressive disease (Cooper et al., 2007). Multiple studies have used gene expression data from prostate tumor tissue to understand the heterogeneous nature of the disease at a genetic level: to determine genes associated with biological changes involved from the progression of benign to malignant disease, to identify potential prognostic biomarkers associated with aggressive behavior, and to develop a prognostic model that distinguishes aggressive from indolent tumors while improving upon models using clinical variables alone (shown in section 1.3.5.5) (Agell et al., 2012; Bibikova et al., 2007; Cheville et al., 2008; Cuzick et al., 2011; Erho et al., 2013; Glinsky, Glinskii, Stephenson, 39 Hoffman, & Gerald, 2004; Henshall et al., 2003; Irshad et al., 2013; Knezevic et al., 2013; Kosari et al., 2008; Lapointe et al., 2004; Larkin et al., 2011; Nakagawa et al., 2008; Penney et al., 2011; Ramaswamy, Ross, Lander, & Golub, 2002; Sboner et al., 2010; Singh, Febbo, Ross, Jackson, & Manola, 2002; A. J. Stephenson et al., 2008; Talantov, Jatkoe, Böhm, & Zhang, 2010; B. S. Taylor et al., 2010a; Varambally et al., 2005; Y. P. Yu et al., 2004). As technology advanced to allow for accurate whole genome profiling of both fresh tissue and archival tissue, such as formalin fixed paraffin embedded tissue, more studies were able to obtain and analyze large amounts of genomic data on prostate cancer tissue (Table 1.6). With follow-up data available on these patients, biomarkers can be evaluated in association with disease recurrence. An ideal biomarker(s) would accurately identify aggressive tumors, improve upon existing clinical variables in terms of prediction, and also could predict prognosis at the diagnostic phase prior to any invasive treatment. Most of these studies have been done with tumor tissue from radical prostatectomy. Whereas biopsy tissue would be more informative, as it would be available also from men who do not elect radical prostatectomy, several limitations are presented by biopsy material. Among them is the fact that due to the tumor heterogeneity in the prostate, not all biopsies sample the most aggressive part of the tumor (see section 1.3.3.1), and the fact that only minute amounts of tissue are available from biopsy cores. A concern that has been highlighted for the existing studies is the lack of overlap of the identified predictive biomarkers across studies. This could be due to several issues. Among them is the fact that these studies did not use whole-genome generated gene expression data, lack of microdissection and thus enrichment for malignant glands through microdissection, small sample sizes, and using inappropriate endpoints that do not always indicate lethal disease (such as 40 biochemical recurrence instead of clinical recurrence/metastatic disease). Additional issues to consider are differences in study design and population and differences in statistical analyses used. More recent studies have been able to address these limitations may providing more accurate information on potential biomarkers of prognosis. Table 1.6 lists the studies that have been done to examine the association of potential biomarkers in PCa biology and of aggressive disease. 41 Table 1.6: List of studies done to discover biomarkers of aggressive PCa 42 1.3.6 Advanced prostate cancer Advanced prostate cancer is defined when the tumor spreads outside of the prostate gland and prostatic capsule into lymph nodes, bones, or other outside tissue nearby or distal from the prostate. Locally advanced disease, which continues to be a shifting definition, includes stage T3b or higher. However, advanced PCa is defined as stage T4, any lymph node involvement, which includes any metastasis of local or distal tissue outside the prostate capsule, with any possible Gleason score. Whereas approximately 81% of diagnosed PCa cases from 2003-2009 were localized, 12% were cases with lymph node involvement and 4% were metastatic. Metastatic or advanced PCa currently is not curable but there are treatments and clinical trials available that are used to help control and minimize cancer spread and symptoms associated with it. Among them are hormone deprivation therapies, where the androgens in the body are drastically reduced. Most prostate tumors are dependent on androgen stimulation, which is mediated by the androgen receptor (AR), and androgen deprivation drugs have been shown to slow the growth rate of the cancer. Yet, over time, resistance to androgen deprivation therapy does occur since the tumor itself begins to independently maintain AR activity through genetic modification rather than from androgen ligands, in modified forms of the testosterone released mostly by the testes and some by the adrenal gland (Balk, 2002; B. J. Feldman & Feldman, 2001). At this point, androgen deprivation drugs are no longer effective, leading to a lethal form of PCa called castrate-resistant PCa or androgen-independent PCa (AIPC). Currently, the first-line treatment for AIPC cases is chemotherapy using Docetaxel plus prednisone (McKeage, 2012; Tannock, de Wit, Berry, & Horti, 2004). In 2010, a new treatment called Provenge (sipuleucel-T; Dendreon), an autologous immunotherapy w hi ch u se s t he ow n pa t i en t ’ s i m m un ity to attack the prostate cancer cells, was found to be effective for those who no longer respond to the hormone deprivation therapy (Cheever & Higano, 2011). Abiraterone acetate (Zytega), another drug for castrate-resistant PCa 43 that inhibits expression of CYP17A1, a gene coding for an enzyme expressed in testicular, adrenal, and prostatic tumor, in combination with prednisone (a corticosteroid drug effective as an immunosuppressant) was also approved in 2011 for metastatic PCa patients (Ryan et al., 2013). Risk factors specific for advanced prostate cancer have not been clearly identified. Results from the Health Professional Follow-up Study reported an association with recent smoking history, taller height, higher BMI, family history of PCa, high levels of total energy intake, calcium intake, and alpha-linolenic acid with an increased risk of lethal prostate cancer (Giovannucci et al., 2007). They also showed a relationship between statin drugs and decreased risk of advanced PCa (Platz et al., 2006). Being overweight and having a higher intake of animal products and animal fat among both Caucasians and African Americans was also reported to be associated with advanced PCa (Hayes et al., 1999; Snowdon & PHILLIPS, 1984). Smoking was also reported to be associated with an increased risk of fatal prostate cancer, particularly those who were currents smokers at diagnosis (Huncharek et al., 2010). However, due to inconsistent findings and also the fact that some studies do not stratify by stage, risk factors for advanced PCa are still not established. Also, it is possible that some men who are diagnosed with localized disease because of early screening could actually have tumors that harbor characteristics that may potentially turn into advanced PCa, but the screening caught these cases in their early development stages. Therefore, examining the tumor features to identify genes associated with advanced PCa may provide help with identifying men who are at higher risk of advanced PCa, and to determine other risk factors associated with fatal disease. Also, the fact that African American men have the highest rates of PCa-specific mortality shows a possible association with underlying genetic susceptibility that may produce tumors with innate potential to turn into advanced PCa. 44 1.4 Tobacco and prostate cancer risk and mortality Tobacco smoking is associated with an increased risk of several cancers, yet its carcinogenic role in the prostate is not clearly established (Hickey, Do, & Green, 2001; Huncharek et al., 2010). Findings from several epidemiologic studies do not support an association between tobacco smoking and prostate cancer (Hickey et al., 2001), yet a meta-analysis of 24 prospective cohort studies reported an estimated overall increased risk of PCa and PCa-related mortality associated with tobacco smoking (Huncharek et al., 2010). A 2009 review of the epidemiological literature further supported an association between tobacco smoking and aggressive PCa (Zu & Giovannucci, 2009). There are possible factors that may contribute to the observed inconsistent results in the literature. First includes the heterogeneity across study designs, such as the use of cohort and case-control studies in assessing the association and the variation in the characteristics of study populations used. Second is the variation in smoking status definitions; each study has its own criteria of defining never, former, or current smoker based on smoking history and time endpoints to categorize as former or current. Third includes the lack of details on smoking history and cessation, such as quantity smoked along with duration and more details on smoking cessation (how much time before diagnosis or time after diagnosis of disease). Fourth is the lack of consideration of stage and tumor grade, which would provide insight into screening effect and if smoking has an impact on aggressiveness of the tumor. The screening effect can be attributed to the use of PSA screening since the 1980s; individuals who get screened are more likely to be diagnosed with early stage disease when it is asymptomatic. This may introduce lead-time bias, showing increased survival time due to the early detection from the screening, and may interfere with assessing the true aggressiveness of the tumor. If smoking is somehow associated with screening (i.e. if smokers tend to get more PSA screening), then smoking could be seen to be more associated with a certain stage of PCa (i.e. localized PCa). Also, independent of screening bias, the lack of stage and tumor grade data can create problems if smoking is biologically truly 45 associated with one stage of PCa and not the other, especially if this is not considered during the study recruitment phase. Moreover, risk of PCa in association with tobacco smoking may also differ by race/ethnicity. Most studies examining the association between PCa risk and tobacco smoking predominantly include non-Hispanic White men with few focusing on African- Americans (Lavender et al., 2009; Murphy et al., 2013), and even fewer on Latinos (Strom, Yamamura, Flores-Sandoval, Pettaway, & Lopez, 2008). The admixture among the Latino population may confound the association between smoking and PCa risk and therefore should be taken into consideration as well. 1.4.1 Tobacco chemical carcinogens Tobacco smoke consists of mainstream smoke (leaving the mouth end of the cigarette) and side- stream smoke (released during puffs) and exhaled mainstream smoke (leaving the smoker) and all can be sources of environmental pollution and carcinogens (Ding, Trommel, Yan, Ashley, & Watson, 2005) (Lu & Zhu, 2007). Tobacco smoke contains approximately 4,800 compounds, with a total of 1,172 substances in both smoke and tobacco, and 69 identified carcinogens identified by the year 2000 (D. Hoffmann & Hoffmann, 2001; International Agency for Research on Cancer (IARC), 2004) (Table 1.6). IARC classifications for chemical carcinogens include the following, in order of increasing carcinogenicity: Group 2B (possibly carcinogenic to humans), Group 2A (probably carcinogenic to humans), and Group 1 (known to be carcinogenic to humans). In a report reviewing IARC classified carcinogens, the number of carcinogens in cigarette mainstream smoke classified by IARC was updated to 81 compounds (11 Group 1, Group 2A, and 56 Group 2B) and included smoke constituents that result from agricultural chemicals (C. J. Smith, Perfetti, Garg, & Hansch, 2003). Tobacco smoke has been reported to 46 cause cancer in many organ sites that have been studied, such as oral/nasal, esophagus, pharynx/larynx, lung, pancreas, myeloid organs, bladder and ureter, and uterine cervix (DeMarini, 2004) (Vineis, Alavanja, & Buffler, 2004) . Associations are less clear across studies examining other cancer types, such as prostate cancer, in some instances due to lack of sufficient studies. Table 1.7 provides details on the specific carcinogenic compounds that are found within tobacco smoke with sufficient evidence showing carcinogenic behavior in animals. 47 Table 1.7: Chemical carcinogens found in tobacco smoke Source: IARC Monographs Volume 83 (2004): Tobacco Smoke and Involuntary smoking 48 49 The total amount of carcinogens in cigarette smoke is approximately 1 to 3mg, comparable to the amount of nicotine in each cigarette (0.5 -1.5mg) (Hecht, 2003). Within each cigarette, there are chemicals that produce tumors in laboratory animals in low amounts (strong carcinogens), such as polycyclic aromatic hydrocarbons (PAHs), N-nitroso compounds (NOCs), and aromatic amines (AA) These carcinogens are found in relatively small amounts of ~200ng per cigarette. In contract, there are other chemicals that a r e m or e abun dant ( ~9 00μ g ) but ar e on l y ca r ci nog en i c at this higher doses (weak carcinogens), among these are aldehydes (Hecht, 2003). Heterocyclic amines (HCAs) were initially heavily examined because of their presence in cooked foodstuffs. However, HCAs which are highly tumorigenic, were later identified within tobacco smoke (Rodgman & Perfetti, 2013). Therefore, the most potent types of carcinogens in tobacco smoke that have the potential to reach different parts of the body, such as the prostate, include the strong carcinogenic groups of PAHs, NOCs, AA, and HCAs. However, a possible role for other compounds listed to be carcinogenic to humans, which are present in high proportions within tobacco smoke such as benzene such as aldehydes (specifically acetaldehyde) cannot be ignored, as they may also play a role in prostate carcinogenesis (Hecht, 2003; D. Hoffmann & Hoffmann, 2001). 1.4.1.1 Polycyclic aromatic hydrocarbons PAHs are environmental pollutants and are produced by incomplete combustion of organic material such as sources of fossil fuel combustion (e.g. automobiles, coking plants, production of asphalt) and from tobacco smoke, which is also a major indoor pollutant (Ding et al., 2005). They can also form in smoked foods or foods cooked over flames (Menzie, Potocki, & Santodonato, 1992). It is known that certain PAHs (such as benzo[a]pyrene) can cause cancers, such as lung 50 and bladder cancers (G Mastrangelo, 1996). In a non-occupational setting, most of the PAHs a non-smoker receives is through the ingestion of food cooked over flames (as high as 10-20 μ g / kg for charred meat) (Phillips, 1999); a tobacco smoker receives additional higher levels of PAHs that may induce greater DNA damage (Chuang, Lee, Chang, & Sung, 2003). According to the Agency for Toxic Substances and Disease Registry, smoking one cigarette can provide an intake of 20-40 ng of benzo[a ] p y r ene , 0.7μ g / day of benz o[ a]pyrenefrom from smoking one pack of unf i l t er e d ci g ar et t es , and 0 .4μ g / day f or sm ok i ng a pac k of f i l t e r ed c i g ar e t t es (Agency for Toxic Substances and Disease Registry (ATSDR), n.d.). Absorption of PAHs into the body can be through the lungs, gastrointestinal tract, and the skin. PAHs in their native form are inactive and hydrophobic, yet following bodily absorption, PAHs undergo cellular metabolism and become diol-epoxides, and other reactive intermediates (Liang Chen et al., 1996), that covalently bind to macromolecules in the cells, such as DNA, forming DNA adducts (Phillips, 1999; Sims, Grover, Swaisland, Pal, & Hewer, 1974). These adducts can induce DNA replication errors and possible mutations that may lead to tumorigenesis. There is evidence showing that PAH can induce genotoxic transformation of prostate tissue in mice (Marquardt et al., 1972) and that PAH-DNA adducts are present preceding carcinogenesis in the prostate but decrease in prostate tumor cells as the cells become more poorly differentiated (Rybicki, Rundle, Savera, Sankey, & Tang, 2004). 51 1.4.1.2 Heterocyclic amines Another group of carcinogens in tobacco smoke are heterocyclic amines (HCAs). For the non- smoker, cooking meats at high temperatures is usually the main source of HCAs (NIH, 2011). There is sufficient evidence supporting the role of HCAs in carcinogenesis, with most of the genotoxic activity due to pyridines (e.g. PhIP), quinolones (IQ types), and quinoxalines (IQx types) (Turesky, 2002). Experiments using animal models have shown that HCAs can induce DNA adducts and mutations in several organs such as the prostate, colon, liver, and kidney ( “Pr o st at e m ut at i on s i n rats induced by the suspected human carcinogen 2-amino-1-methyl-6- phenylimidazo[4,5- b] py r i di ne.,” 200 0;; T u r t el t aub, Ma ut he, & D i ng l ey , 1997 ) . Most studies have focused on dietary factors that are sources of HCAs (i.e. cooked meats) in association with cancer risk, yet HCAs identified in tobacco smoke provides an additional source of HCAs. 1.4.1.3 N-nitroso compounds N-nitroso compounds (NOCs), includes nitrosamines, nitrosamides, and nitrosoguanidines. They are created through the nitrosation of secondary amines. Exposure to NOCs from tobacco products is more intense and prevalent than from any other source, even though NOCs are also obtained through dietary sources (Hecht & Hoffmann, 1991; Preston-Martin & Correa, 1988) (Eichholzer & Gutzwiller, 1998). NOC precursors are found in mainstream tobacco smoke and the side-stream smoke from tobacco smoking (Tricker, Ditrich, & Preussmann, 1991) (Adams, O'Mara-Adams, & Hoffmann, 1987). NOCs have also been found in the urine of smokers and is associated with cancers of the oral cavity, esophagus, lunch, and pancreas (Hecht & Hoffmann, 1991; Nair et al., 1985; Tricker, 1997). 52 1.4.1.4 Aromatic amines Aromatic amines or arylamines are potent carcinogens in tobacco smoke with two of the four present in tobacco smoke shown to be carcinogenic in humans (IARC Group 1). Exposure to arylamines is primarily from occupational sources and from tobacco smoke (Vineis & Pirastu, 1997). Studies over the past few decades have consistently reported an association between arylamines, such as those from tobacco smoking, and risk of bladder cancer (Castelao, Yuan, & Skipper, 2001; Mommsen & Aagaard, 1983; R. K. Ross, Jones, & Yu, 1996; Vineis & Pirastu, 1997; M. C. Yu, Skipper, Tannenbaum, Chan, & Ross, 2002). Since arylamines can be metabolized by the bladder, it is reasonable to believe that the prostate has the ability to metabolize the carcinogen as well. 1.4.1.5 Benzene Benzene is found in many places in the environment and is an important industrial chemical that is present in gasoline, engine exhausts, wood smoke, and tobacco smoke (Rappaport et al., 2009). Smokers obtain most of their benzene exposure (90%) through mainstream cigarette smoke and have an average benzene body burden approximately 6 to 10 times that of individuals who do not smoke (Wallace, 1996). Furthermore, it is reported that approximately half the benzene environmental pollution is initiated by smokers. Non-smokers however receive most of their benzene exposure from engine exhaust. Benzene has been associated with an increased risk of leukemia and other lympho-hematopoietic cancers in humans and induces tumors in multiple locations in rodents (Rappaport et al., 2009). 53 1.4.1.6 Acetaldehyde Acetaldehyde is a carcinogen that is a significant constituent of cigarette smoke. It is also a metabolite involved in the metabolism of ethanol (Seitz & Becker, 2007). Acetaldehyde is a highly reactive oxidative species (ROS) in itself and does not need further metabolism to produce a carcinogenic affect. It has been reported that acetaldehydes play a role in creating DNA adducts, and simple inhalation in rodent models induces tumorigenesis in the nasal mucosa and in the larynx and may play a role in colorectal and breast cancers (Seitz & Becker, 2007). Figure 1.13: Role of environment, physical absorption, and metabolism of pro-carcinogens. Source of body images: Wikipedia 1.4.2 Metabolism of tobacco chemical carcinogens and variation in metabolism genes When entered into the bo dy ’ s ci r cul at i on, t he se p r o -carcinogens are metabolized into different metabolites that could be more readily excreted from the body (Figure 1.13). The initial steps of this metabolism primarily involve P450 enzymes that oxygenate the substrate and are a part of the 54 phase I metabolic enzymes. Phase I enzymes are usually involved in the activation of the carcinogens and are meant to create polar groups in these compounds in order for them to be substrates for Phase II enzymes that will conjugate them in order to make them less reactive (Jakoby & Ziegler, 1990). These oxygenated intermediates are electrophiles (positively charged), such as those in the metabolism of PAHs (Shimada & Fujii-Kuriyama, 2004), HCAs (Josephy & Novak, 2013), NOCs (Archer, 1989), and arylamines (Lang & Kadlubar, 1991) and are attracted to nucleophilic centers that are present on intracellular macromolecules, namely DNA and proteins. When this occurs, DNA adducts are formed. While the phase II metabolic enzymes can detoxify activated metabolites and therefore excreted from the body, DNA adducts form by escaping detoxification. These adducts play a key role in the carcinogenesis process, especially if they affect genes related to cell growth and proliferation. The rate of excretion of carcinogenic compounds is dependent on the action of Phase I and Phase II enzymes, and thus the rate of DNA damage created from the carcinogens will be dependent on these competing reactions. Furthermore, these carcinogen metabolism reactions carried out by specific carcinogen or xenobiotic metabolism enzymes are coded by genes known to be variable in the population (Grover & Martin, 2002). Therefore, this equilibrium between activation and detoxification varies in the human population, as genetic variations in the metabolism genes should be taken into account as well as the amount and duration of exposures to pro-carcinogens he/she faces in his/her lifetime. Metabolic genes involved in the metabolism of PAHs, HCAs, NOCs, arylamines, and benzene are listed in table 1.8. Chemical carcinogens from tobacco smoke can enter the lungs and into the circulatory system where it can reach the liver as well as the other tissues such as the prostate. In these organs, these carcinogens may undergo activation and/or detoxification. Whereas the majority of these metabolic reactions occur in the liver, experimental studies show that the prostate has the 55 capacity to metabolize many of the tobacco compounds into activated carcinogens (Di Paolo, Teitel, Nowell, Coles, & Kadlubar, 2005; Gustafsson et al., 1981; Martin et al., 2002), suggesting a plausible link between tobacco smoking and prostate carcinogenesis. For example, a study that reported associations between tobacco smoking and the presence of PAH-DNA adducts in the prostate, which varied by race and was modified by genetic variants in genes that code for proteins involved in PAH metabolism (Nock et al., 2007). The role of arylamine and HCA metabolism in the prostate is also plausible based on previous studies. Metabolism genes such as N-acetyltransferase-1 and 2 (NAT21 & NAT2) are usually heavily involved in arylamine metabolism and HCA metabolism and genetic variants in these genes have been linked to bladder cancer development (Cartwright, Rogers, & Barham-Hall, 1982; A. T. Chan et al., 2005b; Hirvonen, 1999). Animal models that examined the role of NAT2 fast and slow acetylator genotypes on levels of DNA adducts in the prostate and bladder after exposure to arylamines showed that slow NAT2 acetylators had significantly higher DNA adduct levels in both tissue which entails an increased risk of prostate and bladder tumors (Hein et al., 2002). Furthermore, a case-control study examining the role of NAT2 in PCa risk among a Japanese population reported a statistically significant increased risk of PCa among smokers with a NAT2 slow metabolism genotype (Hamasaki, Inatomi, Katoh, & Aono, 2003). Another study reported an increased PCa risk with a NAT2 slow metabolism genotype as well (Hein et al., 2002). Metabolic genes involved in the metabolism of PAHS, HCAs, NOCs, AA, and benzene are listed in Table 1.7. In the work that I present results from investigations done to assess the association between tobacco and prostate cancer risk and the potential modifying role of genetic variants in metabolic enzyme genes. For the latter, I focused on polymorphisms in genes that play key roles in the metabolism of the main potent carcinogens in tobacco smoke. 56 Table 1.8: Candidate single nucleotide polymorphisms (SNPs) in metabolism genes involved in the metabolism of chemical carcinogens found in tobacco smoke. 57 2 HYPOTHESES AND SPECIFIC AIMS The overall objective of my studies was to identify environmental and genetic factors associated with the risk of development and progression of prostate cancer, taking into account relevant clinical variables. Specifically, I hypothesized that: 1. Tobacco smoking may increase the risk of PCa and PCa-related mortality, and this association will be stronger among carriers of functional polymorphisms in carcinogen metabolism genes. Among Latino men this association might be modified by the proportion of Native American ancestry. 2. Clinical characteristics of localized tumors can differentiate risk of late versus early recurrence in localized PCa patients treated with radical prostatectomy, with earlier recurrence being correlated with faster subsequent timing of a metastatic event. 3. Malignant prostate glands from localized PCa tumors that have a high risk of progression have genes that are differentially expressed, due to somatic mutations and/or inherited polymorphisms, when compared to similar tumors from patients at low risk of recurrence. Thus, these gene expression signatures and/or polymorphisms can be predictive of recurrence. In order to test these hypotheses, I proposed the following specific aims: Specific Aim 1: To determine the role of tobacco smoking on prostate cancer risk, prostate cancer related mortality, and all-cause mortality in a multi-ethnic case-control study. Aim 1a: To determine the main effects of intensity and duration of tobacco smoking on localized and advanced prostate cancer risk among non-Hispanic white, African-American, and Hispanic men. 58 Aim 1b: To determine if the main and joint effects of each of the genetic ancestry components (European, Native American, African) are associated with prostate cancer risk among Hispanic men and to determine the role of genetic ancestry as a potential modifier of the association between tobacco smoking and PCa risk among Latino men. Aim 1c: To determine the modifying role of polymorphisms in carcinogen metabolism genes on the association between tobacco smoking and prostate cancer risk. Aim 1d: To determine the main effects of tobacco smoking on PCa-related mortality and all-cause mortality. Specific Aim 2: To determine the association of polymorphisms in glutathione S-transferase (GST) carcinogen metabolism genes on the risk of biochemical (PSA) recurrence (BCR) among radical prostatectomy patients in an Argentine patient cohort. Specific Aim 3: To determine if clinical variables are associated with late (>5 years) versus earlier ( ≤5 years) biochemical recurrence (BCR) among radical prostatectomy patients and if timing of BCR is correlated with a possible subsequent timing of a metastatic event. Specific Aim 4: To identify and develop gene expression-based signatures that are predictive of clinical outcomes among localized prostate cancer cases. Aim 4a: To identify differentially expressed genes from malignant prostate cells of organ-confined localized prostate tumors in the following comparison groups: 1) patients who experienced a metastatic event or clinical recurrence (CR) vs. patients without any type of recurrence; 3) patients with high grade prostate cancer (Gleason 7+) vs. patients with low grade prostate cancer (Gleason >7). 59 Aim 4b: To develop risk predictive models of aggressive disease that include clinical variables and gene expression profiles and that improve upon the predictive ability of existing clinical tools. Aim 4c: To validate the identified predictive model(s) in other localized PCa patient cohorts. 60 3 METHODS 3.1 Study designs and study populations 3.1.1 The California Collaborative Prostate Cancer Study The California Collaborative Prostate Cancer Study was conducted in Los Angeles County (LAC) and in the San Francisco Bay area (SFBA) and used similar protocols and a common structured questionnaire administered in person. The characteristics of the study population and participation rates have been previously described (John et al., 2005; Joshi et al., 2012a). Briefly, newly diagnosed PCa cases were identified through the Los Angeles County and Greater Bay Area cancer registries. At both study sites, patients with intra-capsular PCa were classified as localized cases and patients with extra-capsular extension of the tumor, and/or extension into adjacent surrounding tissue, or regional lymph nodes, or metastasis to other areas of the body, were classified as advanced cases (SEER 1995 clinical and pathologic extent of disease codes 41-85). 3.1.1.1 Recruitment of participants in San Francisco Bay Area San Francisco Bay Area (SFBA): Eligible localized cases aged 40-79 years diagnosed from 1997-1998 were randomly sampled among non-Hispanic White men (15% sample) and African- American men (60% sample). Eligible advanced cases aged 40-79 years included all non- Hispanic White men and all African-American men diagnosed from 1997-2000. Controls were identified through random- di g i t di al i ng and, f or m en ag ed ≥ 65 y ea r s, t h r oug h r a ndom se l ec tions from the rosters of beneficiaries of the Health Care Financing Administration, and they were frequency-matched to advanced cases on race/ethnicity and the expected 5-year age distribution 61 of cases. The in-person interview was completed by 208 localized cases (73 African-Americans and 135 non-Hispanic Whites), 568 advanced cases (118 African-Americans and 450 non- Hispanic Whites) and 545 controls (90 African-Americans and 455 non-Hispanic Whites). 3.1.1.2 Recruitment of participants in Los Angeles County Los Angeles County (LAC): African-American, Hispanic and non-Hispanic White males diagnosed with PCa from 1999-2003 were identified by rapid case ascertainment through the Los Angeles County Cancer Surveillance Program. Controls were identified through a standard neighborhood walk algorithm [17] and were matched to cases on age (±5 years) and race/ethnicity. The in-person interview was completed by 1,184 cases (351 African-Americans, 333 Hispanics and 500 non-Hispanic Whites), including 631 with advanced PCa and 553 with localized PCa, and 594 controls (163 African-Americans, 122 Hispanics and 309 non-Hispanic Whites). 3.1.2 Hospital de Clínicas José de San Martín radical prostatectomy cohort (Buenos Aires, Argentina) This section for this study is paraphrased from a published paper written in collaboration with Dr. Javier Cotignola and colleagues. (Cotignola et al., 2013) Patients at the Hospital de Clínicas José de San Martín who underwent retropubic radical prostatectomy (RRP) as their primary therapeutic strategy from December 1998 to July 2010 and were diagnosed with pathologic localized prostate cancer (n=105) were retrospectively recruited into this study. All patients were Argentine citizens and, by definition, of Hispanic ethnicity. 62 Based on what is known about this population, most men are likely to have predominantly European ancestry, although some admixture of Native American and African ancestry could be present (Avena et al., 2012). All who agreed to participate in the study signed a written informed consent. 3.1.3 University of Southern California, Institute of Urology, Radical Prostatectomy cohort The radical prostatectomy cohort includes 4,063 patients who underwent a retropubic radical prostatectomy (RRP) with routine pelvic lymph node dissection (PLND) at the University of Southern California from 1972 to 2009. Clinical data were prospectively maintained in an Institutional Review Board (IRB) approved database until May 2010 when follow-up was discontinued. For all analyses using this cohort, the following exclusions were made: not consented cases (n=85), were not adenocarcinoma cases (n=7), salvage cases (n=23), and laproscopic cases (n=39). Figure 3.1 provides a visual breakdown of the patients in the cohort who had organ confined (pT2), extracapsular extension (pT3), lymph node involvement (LN+) PCa and patients who remained recurrence free with no evidence of disease (NED), patients who experienced a biochemical recurrence (BCR), and patients who experienced further clinical metastatic recurrence (CR). 63 Figure 3.1: University of Southern California, Institute of Urology, Radical Prostatectomy cohort: breakdown by pathologic stage and recurrence status following radical prostatectomy 64 3.1.3.1 Nested case-control study on prognostic biomarkers of localized prostate cancer Patients considered for inclusion in the study were radical prostatectomy patients who were diagnosed with organ-confined disease (pT2) (n=2,646). Further inclusion criteria included being treated between 1988-2008 (within the PSA-era at the institution), had no lymph node involvement, had archived prostate tissue available, formalin-fixed paraffin embedded (FFPE) tissue, for study purposes as formalin-fixed paraffin embedded (FFPE) tissue, and had available clinical data and follow-up data. There were a total of 2,552 eligible patients, among them, 2,359 patients who did not have any recurrence after surgery and a total of 193 patients who experienced recurrence (BCR only n=147, and n=46 with CR or metastasis). For the nested case-control study, selected cases were all eligible patients documented to have BCR after surgery. Controls were selected from the cohort using an incidence density sampling method (Langholz, 2007; Langholz & Borgan, 1997; M. H. Wang, Shugart, Cole, & Platz, 2009). The controls are individuals who were r andom l y se l ec t e d f r om t he “r i sk se t ,” or t he recurrence-free patients still under follow- up at t he t i m e of t h e ca s e’ s b i oc hem i ca l r ec u r r e nce an d still at risk of experiencing recurrence. With the assistance of Dr. Bryan Langholz, we randomly seeded and generated a list of five randomly chosen controls within each risk set that will thus be matched to the case in respect to follow-up time. We accounted for the possibility that in spite of the fact that cases met the eligibility requirement of having tissue available in the database, the actual tissue blocks might not be available for a variety of reasons, including that the block might be depleted, might no longer be on site, or might be missing. To address this issue, we went down the list of controls for each risk set and chose the first control that had the tissue block available for laser capture microdissection. If the first control in the risk set is a control that later becomes a 65 case himself, the next eligible control in the list would have been chosen as an additional control. Controls were meant to match cases on operation year, pathologic Gleason score, neoadjuvant hormonal therapy (we used this as an exclusion later in the study so n=22 treated tumors were included), and stage (T2, organ confined). Gleason score was relaxed in order to obtain eligible controls for each case by usi ng ca t eg or i es o f ≤ 6, 7 , 8 -10. However, since the endpoint was changed from BCR to clinical recurrence (CR) in the study after obtainment of all tissue, the matching was not considered in the final analyses. 3.2 Data Collection 3.2.1 The California Collaborative Prostate Cancer Study: Demographics and lifetime exposure data A st r uc t ur ed que st i onna i r e , adm i ni st e r ed a t t he p ar t i ci pan t ’ s hom e, as k ed about dem og r aphi c background, medical history, tobacco use and other lifestyle factors. The interviewers also took anthropometric measurements of height and weight. Usual dietary intake during the reference year (calendar year before diagnosis for cases or before selection into the study for controls) was assessed using a 74-item food frequency questionnaire (FFQ) t ha t w as ada p t e d f r om B l ock ’ s Health History and Habits Questionnaire (Block, Coyle, Hartman, & Scoppa, 1994). An aggregate level socio-economic status (SES) variable was derived from 2000 census data as previously described (Schwartz, John, Rowland, & Ingles, 2010a). Body mass index (BMI) was calculated using the reported weight in the reference year and measured height at the time of the interview. BMI was calculated as weight (in kilograms) divided by height (in meters) squared and categorized as normal weight (BMI <25), overweight (BMI 25- 29.9) and ob es e ( B MI ≥ 30 ) . Underweight men (BMI <18.5, n=15) were grouped with normal-weight men. 66 Tobacco consumption variables The questionnaire assessed lifetime histories of smoking (cigarettes, cigars, pipe), tobacco chewing, and use of tobacco snuff. Information was collected on the ages at which men started and stopped tobacco consumption, and years and amount of tobacco consumption (cigarettes per day, cigars per week, pipes per week, chewing tobacco per week, cans of snuff per week). Ever tobacco smoking (not including tobacco chewing or snuff) was defined as smoking at least one cigarette a day and/or one cigar/pipe a week for 6 months or longer, and former smokers were defined as individuals who quit smoking prior to the reference year. The following variables were evaluated: history of tobacco smoking (ever/never), smoking status (never, former, current), age started to smoke (years), duration of smoking (years), type of tobacco used (cigarettes, cigars, pipes) and cigarettes smoked per day (lifetime average), and pack-years of cigarette smoking (ratio of the number of cigarettes smoked per day to 20 cigarettes, which is the current number of cigarettes per pack, multiplied by the total number of years smoked). Variables were dichotomized based on the median value among controls who ever smoked tobacco. 3.2.2 The California Collaborative Prostate Cancer Study: Follow-up of prostate cancer cases Follow-up data on survival of the PCa cases up to January 2012 were retrieved from the Los Angeles County Cancer Surveillance Program and the Greater Bay Area Cancer Registry. After excluding 48 cases with ambiguous tumor staging and 16 with missing or incorrect interview date, PCa-related mortality and all-cause mortality analyses were based on 1,944 cases. 67 3.2.3 Hospital de Clínicas José de San Martín cohort: Follow-up of participants Trained urologists and oncologists at the Hospital de Clínicas José de San Martín performed patient recruitment, follow-up and the maintenance of medical records on all 105 patients recruited from August 2008 to November 2010. The study protocol was approved by the Institutional Ethical Committee of the Hospital as enunciated by the Declaration of Helsinki. 3.2.4 University of Southern California, Institute of Urology, Radical Prostatectomy cohort: Follow-up of participants All specimens from radical prostatectomies were assessed using consistent pathological reporting, and follow-up at the institution was standardized (clinical examinations and PSA measurements). After radical prostatectomy, patients are followed every 4 to 6 months in year 1, every 6 months in years 2 and 3, and once annually afterward. During the visits, patients received a physical examination, had a serum PSA measurement and chest x-ray. Bone scans were also completed if there were signs of progression, such as an increase in PSA levels. Patients were recommended adjuvant radiotherapy for positive surgical margins and/or ≥ p T 3 d i se as e and sa l v ag e r adi ot h er a py w as of f er ed t o t hos e w i t h B C R or C R t hat ha d cl i n i ca l evidence of local disease. BCR was defined as a detectable PSA level based on the era-specific as sa y ’ s det ec t ab i l i t y l i m i t , verified by two consecutive increased PSA tests, with 3-4 months in between blood draws. The assay cutoffs used to determine PSA levels over the years include: <0.3ng/ml for 1988-July 1994; <0.05 for July 1994 – March 2005, and <0.03 for March 2005 – 68 current. Clinical recurrence was defined as detected local or disease gross disease after surgery confirmed by clinical imaging (i.e. MRI, CT, chest x-ray). Time to BCR was calculated as the time from radical prostatectomy to the time when BCR w as docum ent ed b as e d on phy si c i an ’ s ev al uat i on of t h e two consecutive PSA tests. Time to clinical recurrence was measured as the time from radical prostatectomy to time of documented clinical recurrence. Follow-up of patients were completed through routine perusal of patient medical records and physician notes. Wh e n nec e ss a r y , phone ca l l s w er e m ade t o t he pa t i ent s or t h e pat i en t ’ s physician if there was a change in physician. Patients who underwent a radical prostatectomy from 1972 to 2009 were entered into the Institutional Review Board approved database maintained by the USC Institute of Urology. Last follow-up of patients was completed in May 2010. 3.3 Biomarker data collection 3.3.1 The California Collaborative Prostate Cancer Study Blood or mouthwash samples were collected for 1,164 advanced cases, 553 localized cases (only LAC), and 1,119 controls. 69 3.3.1.1 Genotyping of carcinogen metabolism polymorphisms Genotyping was completed for 11 single nucleotide polymorphisms (SNPs) in 8 genes reported to impact enzyme function: GSTP1 Ile105Val (rs1695) (Strange & Fryer, 1999), PTGS2 -765 G/C (rs20417) (Papafili et al., 2002), CYP1A2 -154 A/C (rs762551) (Sachse, Bhambra, & Smith, 2003), CYP2E1 -1054C>T (rs2031920), EPHX1 Tyr113His (rs1051740) (Hassett, Lin, Carty, Laurenzana, & Omiecinski, 1997), CYP1B1 Leu432Val (rs1056836), UGT1A6 Thr181Ala (rs1105879), and NAT2 Ile114Thr, Arg197Gln, Gly286Glu and Arg64Gln (rs1799930, rs1799931, rs1801279, rs180120) (Hirvonen, 1999), in addition to two genes that had copy number variants, GSTM1 and GSTT1 (Strange & Fryer, 1999). NAT2 haplotypes were constructed using haplo.stats package in R. NAT2 haplotypes have been characterized for their impact on protein function (A. T. Chan, Tranah, & Giovannucci, 2005a), consistent with the existing classification (Gertig et al., 1999), we classified carriers of two copies of the fast haplotype as ‘ ”f as t ” a nd c ar r i e r s of al l ot he r ha pl o t y pes a s “ sl ow ” phen ot y pe. All genotypes were obtained usi ng T a qm an as sa y s, av ai l abl e “o n dem and” f r om A B I ( A ppl i ed B i o sy st em s, Fost e r C i t y , CA), following manufacturer's instructions. Call rates were >97%. No differences were found between observed genotypic frequencies and those expected assuming Hardy-Weinberg Equilibrium. 3.3.1.2 Genotyping of ancestry informative markers (AIMs) A panel of 106 biallelic ancestry informative markers that distinguish European, African, and Native American ancestry was used to estimate genetic ancestry for Latinos in the study population. The characteristics of this established panel of ancestry informative markers have been reported in previous studies (Fejerman et al., 2010; Yaeger et al., 2008). These selected markers are widely spaced throughout the genome, distributed well throughout the 22 chromosomes, and have large differences in allele frequencies among the ancestral populations 70 ( δ >0.5) an d t he r ef or e m axi m i z e t he i nf o r m at i on ac q ui r ed f o r m or e t ha n on e ancestral population pairing (Fejerman et al., 2010). Simulations show that approximately 100 markers are required to provide ancestry estimates that have correlation coefficients of >0.90 when compared with true ancestry (Tsai et al., 2005). Allelic frequencies for the ancestral populations were obtained from a pane l t hat i ncl uded 42 Eu r opea ns ( C o r i el l ’ s N or t h A m er i ca n C a uca s i an pa nel ) , 37 W es t A f r i c an s (non-admixed Africans living in London, United Kingdom and South Carolina) and 30 Native Am er i ca ns ( 15 Ma y ans a n d 15 N ahu as ) . G eno t y pi ng w as pe r f o r m ed at t he C hi l dr e n’ s H osp i t a l O ak l and R es ea r ch I nst i t u t e ( D r . B ec k m an’ s l abor at or y ) usi ng a m ul t i p l ex P C R coupl ed w i t h single base extension methodology with allele calls using a Sequenom analyzer. One marker was not used due to high failure rate, leaving 105 AIMs to be used for ancestry estimation. 3.3.2 Hospital de Clínicas José de San Martín cohort This section for this study is paraphrased from a published paper written in collaboration with Dr. Javier Cotignola and colleagues. (Cotignola et al., 2013) Germline DNA was extracted from 3-5ml of ea c h pat i en t ’ s peripheral blood obtained by venipuncture, anticoagulated with EDTA (ethylenediaminetetraacetic acid) using the protocol, CTAB (N-Cetyl-N,N,N-trimethyl-ammonium bromide; Merck, Darmstadt, Germany). The DNA concentration and quality was obtained by measuring absorbance at 260nm and using the A260nm/ A 280nm r at i o, w i t h G ene Q uan t ™ -Pro RNA/DNA calculator (Amersham- Biosciences). Some of the DNA samples were further analyzed using electrophoresis on 1% agarose (Genbiotech SRL, Buenos Aires, Arentina) with the gels dyed with ethidium bromide (Promega, Wisconsin, USA). Long-term storage of samples was in -80°C with working solution (50ng DNA/ul aliquot) stored at -20°C. 71 3.3.2.1 Genotyping of polymorphisms in GST genes Three polymorphisms in 3 GST genes were genotyped: GSTP1 c.313 A>G (NM_000852.3: c.313A>G; p.105 Ile>Val; rs1695), GSTT1 null and GSTM1 null. The PCR-RFLP assay was used for genotyping GSTP1 c.313A>G and genotypes of GSTT1 and GSTM1 were determined by multiplex PCR. By using this method, the null genotype could be determined by the absence of a band for either GSTT1 or GSTM1 in electrophoresis and discriminated from the GSTT1 and GSTM1 present genotypes in the heterozygous and homozygous variations. An internal PCR control was also used to confirm the null genotype. Samples that did not amplify for both genes were repeated twice or three times prior to discarding a PCR failure. These samples that did not amplify were called null for both GSTT1 and GSTM1 only if the following criteria were met: (a) all replications were concordant, (b) other samples within the same PCR using the same PCR mix did amplify (reaction controls), and (c) PCRs for double-null samples showed the specific amplified region for other genes (DNA quality control). Genotyping call rates were 98% for GSTP1, 99% for GSTT1 and 98% for GSTM1. All PCRs were performed in a DNA Enginet Thermocycler (Bio-Rad, Hercules, CA, USA). PCRs and digested products were analyzed by 2% agarose (Genbiotech SRL, Buenos Aires, Argentina) gel electrophoresis in 1 x TAE buffer (0.8M Tris; 0.4M sodium acetate; 0.04M EDTA; pH 8.3) and dyed with ethidium bromide (Promega, Madison, WI, USA). Gels were photographed and analyzed with the G-Box system (Syngene, Frederick, MD, USA) and the Genesnap software (Syngene). Samples that failed were repeated once or twice as needed. Outputs from genotyping were reviewed independently by two laboratory members, and 10 –12% of blindly random selected samples were re-analyzed as quality control of the experiments. The results were included in the final analyses when there was 100% agreement between the two independent readers, and when there was a 100% concordance between samples and blinded repeats. 72 3.3.3 Nested case-control study on prognostic biomarkers of localized prostate cancer Patients who underwent a radical prostatectomy at USC/Norris Comprehensive Cancer Center were provided a consent form to provide permission to have their prostate tissue to be used in the Prostate Cancer Prostatectomy Data and Specimen Collection Project (Principal Investigator: Sia Daneshmand, M.D.). After the prostate tissue was resected and an attending pathologist confirmed a diagnosis, the tissue was formalin fixed and paraffin embedded (FFPE) for preservation purposes and stored at the Department of Pathology. The resected tissue was sectioned and preserved in several tissue blocks; therefore each patient may have 10-20 tissue blocks, depending on the size of the prostate resected. The pathologist performs standard protocol of sectioning and submitting the tissue for processing of the apex, mid, base, and margins of the prostate. With IRB approval, prostate tissue for selected participants in this study was obtained from the Department of Pathology. For the study, our collaborating pathologist (Dr. Andy Sherrod) reviewed the hematoxylin and eosin (H&E) stained slides of each tissue block and determined the best tissue block to use for microdissection, one with sufficient tumor tissue available that is also the most representative of the highest Gleason grade of the index tumor. Pathology technicians used a microtome to cut 10, 5-micron sections of the selected block, along with a cover-slipped H&E slide for clear visualization of the location of the tumor under the microscope. Dr. Sherrod clearly marked the location of tumor on each H&E slide of the corresponding block in order to use as a guide during microdissection of the tumor on the other non-cover-slipped slides. 73 In order to enrich for malignant glands and avoid contamination with stromal tissue or non-malignant glands, a laser capture microdissection (LCM) microscope (Arcturus® Laser Capture Microdissection, Model Veritas; Applied Biosystems by Life Technologies, Foster City, CA) was used to microdissect malignant prostate glands. For this purpose, slides obtained from the pathology core were de-paraffinized and lightly stained with H&E (no coverslip mounted) prior to microdissection. Appropriate measures were taken to insure reduced contamination of tissue and minimized loss of RNA in the tissue (i.e. proper use of lab coats and gloves, use of RNase-free reagents, and routine cleaning of equipment). The LCM microscope is run through a software program that allows the user to view the slides on a computer screen, and to select and mark the areas wanted for microdissection. The LCM microscope and its components are pictured in Figures 3.1 and 3.2. Figure 3.3A illustrates how the laser hits through the cap placed on the tissue to “grab ” the selected malignant glands from the slide by melting the polymer film of the cap touching the tissue. Figures 3.4 and 3.5 shows the process of selecting malignant regions for microdissection and the tissue obtained on the cap after microdissection for further processing. After obtaining tissue on the caps from the laser capture microdissection (approximately 4 LCM caps of tissue per case and 4-8, 5 micron slides depending on size of tumor area; 3-4 hours per case), the caps with the tissue of interest fer (Buffer PKD, provided by Qiagen K in a 0.5 mL tube and temporarily stored at 4°C until further RNA extraction (as seen in Figure 3.3B). RNA extractions were completed using the Qiagen AllPrep® DNA/RNA FFPE kit to have the option of recovering both RNA and DNA from the microdissected tissue (partial extracted DNA samples were stored in -20°C for full extraction at a later time). The samples were vortexed then incubated at 56°C for 15 minutes, placed on ice for 3 minutes, and centrifuged at full speed for 15 minutes to separate DNA and RNA. Subsequent steps of sample processing were performed according to the Kit manual. The samples were quantified using a Nanodrop machine. The isolated RNA samples were stored at 20 ng/uL in RNase-free water at -80°C. 74 Figure 3.2: Arcturus Veritas laser capture microdissection machine used for the study. Figure 3.3: The inside of the LCM machine Location where the slides were placed for microdissection along with the sterile caps used to pick up malignant glands Lightly H&E stained non- cover-slipped slides for LCM H&E slide for guidance: Tumor area marked by pathologist Sterile LCM caps (Arcturus CapSure ®) Location where finished caps are placed 75 Figure 3.4: LCM laser picking up tissue and method of storing tissue prior to processing A. When in the LCM machine, the laser (illustrated by the red arrow) hits through the ca p t o m el t t he bo t t om pol ym er coa t ed l ay er of t he c a p t o “g r ab” t h e se l ec t ed ar e a of interest. The tissue then sticks to the polymer film. B. Once the LCM is done and the cap is removed with the selected areas of tissue, it is placed in a 0.5mL tube with lysis buffer and proteinase K, briefly stored at 4°C until further RNA processing. A B 76 Figure 3.5 (A-E): Microdissection of malignant prostate cells. A. 5-micron hematoxylin-eosin stained coverslipped section of a radical prostatectomy specimen. The pathologist (Dr. Sherrod) marked the area in blue marking where malignant glands can be found (Gleason score 3+3). B. Microscopic (10x) view of the location on the slide. The malignant glands (red arrow) are seen surrounded by non-malignant glands (blue arrow). C. Marked areas on slide for laser capture microdissection (2x) D. Areas after laser capture microdissection (2x) E. Laser capture microdissection cap with tissue of interest for further processing. A B C D E 77 3.3.3.1 Expression profiling of prostate cancer tumors In order to obtain gene expression data of the microdissected malignant prostate glands, the Whole Genome DASL HT assay was used. Samples were shipped to Illumina, Inc. where our collaborators carried out the processing of the samples. This platform was especially developed to obtain genome-wide expression profiles from partially degraded RNA (such as those from archival FFPE tissue) (April et al., 2009). The assay first converts total RNA to complementary DNA (cDNA) using specific primers, then the cDNA is annealed to the DASL Assay Pool (DAP) probe groups. The probes are approximately 50 bases making it possible to detect partially degraded RNA. The pr ob e se t s c ons i st o f ol i g onucl e ot i d es ( o ne l o ca t ed a t t h e 5’ end an d one downstream at th e 3 ’ en d) that hybridize to the targeted cDNA site. The upstream oligonucleotide extends to ligate to the corresponding downstream oligonucleotide to create a PCR template that can be amplified using PCR primers and to finally detect each target gene sequence by hybridzing to the HumanHT-12 v4 BeadChip (Illumina; Whole-Genome DASL ® HT Assay for Expression Profiling in FFPE Samples; Data Sheet: RNA Analysis, 2010). The HumanHT-12 v4 BeadChip, which efficiently processes 12 samples per BeadChip array, was used to detect the following transcripts using the RNA from the tumor samples: 27,253 coding transcripts (well-established annotations), 426 coding transcripts (provisional annotations), 1,580 non-coding transcripts (well- established annotations), and 26 non-coding transcripts (provisional annotations) (Illumina; Whole-Genome DASL ® HT Assay for Expression Profiling in FFPE Samples; Data Sheet: RNA Analysis, 2010). We used between 50-200 ng from each tumor to obtain profiles with the DASL platform. Cases and controls were run in pairs on the same chip. For quality control purposes, 20% of samples were included as duplicates 78 3.4 Data analysis 3.4.1 The California Collaborative Prostate Cancer Study 3.4.1.1 Tobacco smoking and prostate cancer risk The analyses of questionnaire data were based on 761 localized cases, 1,199 advanced cases, and 1,139 controls. Analyses of genotype data were based on subjects with DNA from blood, including 535 localized cases, 988 advanced cases, and 800 controls. These individuals did not differ from those without DNA with regards to age, calorie intake, family history, SES and BMI at either study site. To best correct for differences in race/ethnicity, socioeconomic status (SES) and the case/control ratio across the two study sites, we created a variable that classified men according to study site (SFBA or LAC), SES (5 level variable, as previously described (Schwartz, John, Rowland, & Ingles, 2010b)) and race/ethnicity (non-Hispanic White, African American, Hispanic), and used it to group individuals in conditional logistic regression models that were used to estimate odds ratios (OR) and 95% confidence intervals (CI). SES was collapsed into 3 categories (quintiles 1-2, 3, 4-5) for SFBA subjects and 4 categories (quintiles 1, 2, 3, 4-5) for LAC subjects, leaving 6 SES/race groups from SFBA and 12 from LAC. When evaluating smoking tobacco, models were adjusted for age at diagnosis for cases or selection into the study for controls (in years, modeled as continuous), family history of PCa in first-degree relatives (no, yes), BMI (<25.0, 25.0-29.9, >=30.0 kg/m 2 ), average lifetime consumption of alcohol (grams/day), use of non-smoking tobacco (snuffing or chewing) (no, yes), cigar or pipe smoking (no, yes) if evaluating only cigarette smoking (cigarettes/day or pack-years), intake of red meat cooked at high temperature (broiled, pan-fried or grilled, in g/day), which we previously reported 79 to be associated with increased PCa risk and contributes to carcinogenic exposure (Catsburg et al., 2012; Joshi et al., 2012a). We also considered possible confounding by total vegetable consumption (g/day), total fruit consumption (g/day), and total calorie intake (kcal/day) during the reference year; however, inclusion of these covariates did not change OR estimates by >10%, so they were not included in final models. All analyses were stratified by stage (localized and advanced) and by race/ethnicity. Heterogeneity by race within each stage was evaluated using the likelihood ratio test comparing conditional logistic models that were fit with and without interaction terms of smoking variables and race. All hypothesis tests were two-sided and all analyses were done using the statistical software Stata/SE 11.2 (Stata Corporation, College Station, TX). 3.4.1.2 Tobacco smoking, polymorphisms in metabolism enzymes and prostate cancer risk: gene by environment analyses The potential modifying role of the selected polymorphisms were assessed on the associations between tobacco smoking and PCa risk using both 2 degree of freedom (2-df) interaction tests by treating the three level tobacco smoking variables as categorical, and 1-df interaction tests by treating these variables as ordinal. For these gene x environment analyses we evaluated 7 SNPs and 2 copy number variant in 9 metabolism genes (GSTP1, PTGS2, CYP1A2, CYP2E1, EPHX1, CYP1B1, UGT1A6, GSTM1 and GSTT1), as well as the predicted phenotype of the NAT2 enzyme as determined by 4 SNPs, as possible modifiers of the associations with the following smoking variables: smoking status (never, former, current), history of smoking tobacco (never, ever), age st a r t o f sm ok i n g t obac co ( n ev er sm o k er , ≤ 18 y ea r s, > 1 8 y ea r s) , sm ok i ng dur at i on ( nev er sm ok er , ≤ 29 y ea r s, >29 y ea r s) , c i g ar e t t e s sm ok ed per day ( n ev er ci g ar et t e sm ok er , ≤ 20 ci g ar e t t es , >20 cigarettes), cigarette pack-years (nev er ci g ar et t e sm ok er , ≤ 2 2 c i g ar e t t e p ac k -years, >22 cigarette 80 pack- y ea r s) , and y ea r s s i n ce qui t t i ng sm ok i ng ( nev e r sm ok er , >21 y ea r s, ≤ 21 y ea r s) . G en e x smoking interaction models were adjusted for the same covariates used in the models to evaluate main effects of smoking on PCa risk. Associations between these SNPs and PCa risk have been previously reported (Catsburg et al., 2012). All hypothesis tests were two-sided and all analyses were done using the statistical software Stata S/E 11.2 (STATA Corporation, College Station, TX). 3.4.1.3 Tobacco smoking, genetic ancestry, and prostate cancer risk among Latino participants Using DNA extracted from Latino cases and controls, estimates of European, African, and Native American (Amerindian) genetic ancestry for each man were obtained using 105 ancestry- informative markers and the program STRUCTURE v.2.3.2. We ran STRUCTURE assuming genetic admixture occurring approximately 10 generations back (Price et al., 2007) with a prior that was set to assume independent allele frequencies among populations. To ensure convergence to the posterior distribution, 50,000 Markov Chain Monte Carlo replications were used after a burn-in length of 10,000, as previously described (Shahabi et al., 2013). Ancestry was categorized into tertiles and quartiles based on the distribution of ancestry among controls. Because the proportion of European, Native American, and African ancestry add up to 1, at most two ancestry variables can be used to characterize the genetic background of each man. Furthermore, because European and Native American ancestry were highly inversely correlated (correlation coefficient = -0.89), and African was more highly correlated with European ancestry (correlation coefficient = -0.29) than with Native American (correlation coefficient = - 0.18), only African and Native American ancestry were included in the models. 81 Characteristics of Latino cases and controls were co m par ed usi ng t he S t ud ent ’ s t -test for cont i nuou s v ar i ab l es and F i she r ’ s ex ac t t es t s f or ca t eg or i c al v ar i a bl e s. B o t h t e st s w er e al s o us ed for presenting the characteristics of controls by tertiles of Native American ancestry. Logistic regression models were used first to understand if: 1) genetic ancestry associates with risk of prostate cancer; 2) adjusting for genetic ancestry affects the association between smoking and prostate cancer risk (to infer impact of population stratification on results), and 3) genetic ancestry might act as a potential effect modifier of the association between smoking and prostate cancer risk. To investigate potential non-linear relationships between ancestry and prostate cancer risk, another set of analyses were completed using generalized additive models (GAMs), an approach previously described (Shahabi et al., 2013). GAMs extend generalized linear models by allowing nonlinear functions of the covariates (Hastie & Tibshirani, 1995). The covariates enter the regression model as additive smoothing splines and the degree of smoothness is determined by the data using cross validation. Logistic regression GAMs were completed with ancestry covariates entered as smoothing splines of the ancestry proportions. Additive effects of each ancestry and effects produced by the conjunction of two ancestries were evaluated. This analysis included the same potential confounders used in the unconditional logistic regression models. Crude and adjusted odds ratios and 95% confidence intervals were computed. Statistical analyses were performed using STATA 11 (StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP) and R (V2.11.1; http://www.R-project.org). 82 3.4.1.4 Tobacco smoking and prostate cancer survival For PCa cases, analyses of associations between tobacco smoking and risk of mortality after PCa diagnosis were done using Kaplan-Meier survival analysis and Cox regression models. To account for the time lapse between PCa diagnosis and interview dates, we conditioned our analysis on cases not being in the risk set during this time segment, for left truncation. Entry time was set as the time elapsed between diagnosis and interview and exit time was set as the latest follow-up date (death occurred or censored). Cox regression models were adjusted for age, family history of PCa, BMI, total intake of meat cooked at high temperature, non-smoking tobacco use (snuff/chew), use of cigar/pipe for at least 6 months (if evaluating cigarette smokers only), PCa stage (localized/advanced), race/ethnicity (Non-Hispanic White, Hispanic, African-American), SES (5 levels), and study center (SFBA/LAC). 83 3.4.2 Hospital de Clínicas José de San Martín cohort: GST polymorphisms and biochemical recurrence This section for this study is paraphrased from a published paper written in collaboration with Dr. Javier Cotignola and colleagues. (Cotignola et al., 2013) To study the association between GST polymorphisms and biochemical recurrence among patients who underwent a radical prostatectomy, univariable and multivariable analyses were performed using Cox proportional hazard models. For follow-up time, entry time was considered time of surgery and failure time was biochemical recurrence; individuals were censored if they were lost to follow-up. Initial multivariable Cox regression models considered the following variables: surgical margin status of the resected prostate (negative/positive), pathologic Gleason score ( ≤ 7 ( 3+ 4) , ≥ 7 ( 4+ 3) ) , se r um PS A l ev el at di ag no si s ( ≤ 4, > 4 -10, >10), family history of PCa in first-degree relatives, smoking status (never, former, current), and age at diagnosis. For the smoking variable, former smokers were patients who quit smoking at least 1 year prior to PCa diagnosis and current smokers were patients who smoked at time of diagnosis or patients who quit less than 1 year prior to the diagnosis. Kaplan-Meier survival analyses were used to determine the actuarial biochemical recurrence-free survival estimates stratifying by clinical variables and genotypes. Comparisons of survival estimates among groups were completed using the log-rank test. 84 3.4.3 University of Southern California, Institute of Urology, Radical Prostatectomy Cohort database 3.4.3.1 Clinical predictors of early versus prostate cancer biochemical recurrence and subsequent clinical recurrence For these analyses, we included all individuals diagnosed with clinical localized disease who were enrolled in this patient database. The database includes 4,063 males who underwent a radical retropubic prostatectomy (RRP) and bilateral pelvic lymph node dissection from 1972 to 2009. For these analyses, we excluded patients who did not consent to be apart of the database (n=85), did not have adenocarcinoma (n=7), were salvage cases (n=23), were laparoscopic cases (n=39), and were treated before July 1, 1998 or after June 30, 2008 (n=376). Further exclusions (not mutually exclusive) included PSA-era cases with clinical stage cTX or cT4 (n=2), lymph node metastasis (n=344), neoadjuvant hormonal therapy (n=703), neoadjuvant chemotherapy (n=7), adjuvant chemotherapy (n=103), adjuvant hormone therapy (n=93) and patients classified as nev er hav i ng “ No Evidence of Disease ” (NED; n=110). The study population used for the analyses included 2,646 patients with localized PCa, stages pT2 (n=2,040), pT3a (n=482), pT3b (n=120), pT4a (n=4). Patient characteristics w e r e ev al uat ed u si ng t he F i she r ’ s ex ac t t es t f or ca t eg or i ca l variables and the Wilcoxon rank sum test for continuous variables. Kaplan-Meier analyses were performed to determine actuarial survival estimates. Statistical differences of survival between groups were determined using the log-rank test. The univariable and multivariable Cox proportional hazards regression analyses were performed to evaluate the association between the clinical variables and BCR. Cox regression multivariable analyses were used to determine 85 predictors of BCR taking into account all clinical variables significant from the univariable analyses in the models. Final models included age at surgery, pre-operative PSA, radical prostatectomy Gleason score, surgical margin status, pathologic stage, and adjuvant radiation therapy. Race/ethnicity was excluded from the final models since it did not reach significance (p<0.2) in the univariable analyses. All analyses were performed using Stata/SE (Stata 12.0, StataCorp LP, College Station, TX). 3.4.3.2 Identification of gene expression profiles associated with cancer recurrence for localized prostate cancer tumors For this study, whole-genome gene expression profiles were obtained for 293 patients with organ confined PCa (stage pT2) who underwent radical prostatectomy at the University of Southern California. Among these patients, 154 experienced no recurrence following radical prostatectomy , or had “No ev i denc e of di se a se ( N E D ) ” , 106 patients with only BCR, and 33 patients who experienced clinical recurrence, or metastasis of disease (CR). Univariable analyses to identify differentially expressed genes were performed within the following groups: NED vs. BCR, NED vs. CR, and BCR vs. CR. In order to develop a signature that would be predictive of true metastatic disease, only the NED and CR groups were used in the analyses for model development. 3.4.3.2.1 Preprocessing of microarray data: normalization, background correction, and batch effect correction Raw microarray data files were generated from all samples after they were ran on the whole genome DASL HT platform at Illumina, Inc. We used GenomeStudio to output a text sample 86 probe file and control probe file with the following data: summarized expression level rounded to 8 decimal points (AVG signal), standard error of the bead replicates (BEAD STDERR), average number of beads (Avg_NBEADS) and the detection p-value giving probability of a target gene being detected above background (Detection Pval). All subsequent analyses were performed using R and Bioconductor (Gentleman et al., 2004). Pre-processing of large amounts of microarray data includes important steps prior to any furthe r ana l y si s t o e l i m i na t e di f f er e nt s our ce s o f “no i se ” t ha t m ay i nt er f er e w i t h t he t r ue g en e expression measurement. Control probes and sample probes were used to pre-process (normalization and background correction) and to assess quality control using Bioco nd uct or ’ s lumi and limma packages. A specific pre-processing package (neqc) allowed for non-parametric background correction followed by quantile normalization using both control and sample probes (Shi, Oshlack, & Smyth, 2010). This method provides the optimal compromise between precision and bias that occurs when using algorithms in preprocessing. Batch effects or variation due to technical and non-biological elements of the experiment (e.g. processing of samples at different times on separate chips) is a problem when combining different batches of expression data from RNA samples that were submitted for processing. In this study, we determined batch effect by chip array during the microarray processing. Using ComBat, an empirical Bayes method for removing batch effect (C. Chen, Grennan, Badner, Zhang, & Gershon, 2011), the data was further adjusted for chip effect, which also took into account shipment effect since each shipment consisted of using several chip arrays. 87 3.4.3.2.2 Differential expression analysis Empirical Bayes moderated t-tests were used to assess differential expression of gene effects between cases and controls. Models were adjusted for age, pre-operative PSA level, pathologic Gleason score, neoadjuvant hormone therapy, operation year, and surgical margin status. After adjusting for multiple testing, we obtained q-values from the False Discovery Rate (FDR) based method from Storey and Tibshirani (Storey & Tibshirani, 2003). All the univariable analyses were performed in R (http://www.R-project.org/) using the limma package (Smyth, 2004; 2005). Separate differential expression analyses were performed using the following assigned cases and controls: 1) PSA recurrence cases (n=139) versus no any recurrence controls (N=154); 2) clinical recurrence cases (n=33) versus no any recurrence controls (n=154); 3) PSA recurrence only (no further progression) cases (n=106) versus clinical recurrence (n=33); 4) patients with Gleason 7 versus patients with Gleason <7. 3.4.3.2.3 Identification of risk predictive models for prostate cancer clinical recurrence Only the comparison of clinical recurrence cases versus no any recurrence controls was used for these analyses to develop a predictive model for aggressive PCa. Using biochemical recurrence as an endpoint may not be sufficient to provide information on aggressive PCa, since only a subset of these individuals actually do progress to clinical recurrence. We speculated that comparing a homogenous group of individuals who experienced clinical recurrence to a group that did not experience any kind of recurrence might yield more robust information about gene expression profiles associated with true metastatic disease. 88 Similar to other studies using microarray data and high throughput technologies, this dataset has more possible predictors (~29,000) than independent subjects (n = 187). This is commonly referred to as the large p and a small n problem (p>>n) in statistics and machine learning and is an issue in model selection (Hastie & Tibshirani, 2003). The traditional regression methods such as logistic regression cannot be used since they require p<n. One way to approach this model selection with p>>n in order to determine a multivariable model that predicts patient outcomes is to limit the search to the top ranked genes (based on p-value significance) derived from the univariable differential expression analysis. However, poor predictive performance may result from selecting genes from the univariable analysis since this procedure does not consider the correlation between genes and the joint effect of genes on the risk of recurrence. To avoid this problem, we considered classification tools capable of dealing with large numbers of features (genes/targets) jointly in order to create a multivariable risk predication model for PCa clinical recurrence. The targets included in the final model were selected using stability selection with elastic net regularized logistic regression. Stability selection is an improvement on previous selection techniques that combines a subsampling approach with a selection algorithm, in this case elastic net regression, to select the most predictive features for the final model while having error control (Meinshausen & Bühlmann, 2010). Stability selection resamples the subjects without replacement and performs elastic net on each of the random subsamples of size m ≤ n, and determines the features that are selected the most frequently across all the subsamples. The purpose of this subsampling, rather than just directly using a selection algorithm such as elastic net is to better control the number of false positives variables selected (this improves on the elastic net and the LASSO which do not have this type of error control). Therefore, given a frequency threshold, stability selection selects the variables that appear in at least a certain proportion of subsamples 89 (i.e. a threshold of 50% would allow only features that are selected across 50% of bootstrap subsamples to be included into the final model). Increasing the threshold imposes a stricter penalty and the number of selected variables can be reduced. The opposite is true for a lower threshold, with more features selected into the model. Implemented on each subsample in stability selection, elastic net regression incorporates both the penalty of the LASSO (least absolute shrinkage and selection operator) , λ , and the penalty of ridge regression, α. Both LASSO and ridge regression are shrinkage methods that shrink regression coefficients toward zero relative to un-penalized regression. LASSO used alone comes with limitations in cases of p>>n, such that it tends to select randomly among correlated predictors to have in the model, therefore the addition of the elastic net penalty helps prevent exclusion of groups of highly correlated genes (Zou & Hastie, 2003). Ridge regression does not zero out coefficients so it includes all possible features in the model. Therefore, elastic net is the combination of the two shrinkage methods and uses a n α bet w ee n 1 .0 ( pure LASSO) and 0 (pure ridge regression), allowing only important variables to be left in the final model. It has been shown to outperform LASSO alone or ridge regression in simulations and real data analyses, yielding a lower prediction error (Zou & Hastie, 2003; 2005). Sensitivity and specificity are used to measure predictive ability of the genetic signature. In this case, the sensitivity would show how well the genetic signature actually detects patients who are likely to have a clinical recurrence. High sensitivity is preferred in order to discard false negatives, in which the signature incorrectly does not detect patients at risk of clinical recurrence. Specificity captures how well the genetic signature can detect patients who will not experience recurrence. Higher specificity is also preferred to exclude false positives, preventing true low-risk 90 patients undergoing unnecessary aggressive treatment (Figure 3.6). The receiving operating characteristic, or simply the ROC curve, is a graph that plots sensitivity (true positives) versus 1- specificity (false positives) as a function of the score or the probability cutoff, thus showing the achievable trade-off between sensitivity and specificity. Therefore, at each possible cutoff of the predicted probability estimated by elastic net produces one value of sensitivity and one value of specificity (e.g. non-recurrence is predicted below 40% probability, and a recurrence above 40%). The ROC curve is the set of specificity, sensitivity paired points as the cutoff changes. A perfect prediction showing 100% sensitivity and 100% specificity (no false positives) corresponds to an AUC=1.0, while a predictive model showing no predictive ability (i.e. not better than random) would lie along a diagonal line across the plot (AUC=0.5) (Figure 3.7). Figure 3.6: Measuring the accuracy of a test, such as a biomarker clinical tool (genetic signature) used to predict disease recurrence. TP/(TP+FP) = Positive predictive value FN/(FN+TN) = Negative predictive value TP/(TP+FN) = Sensitivity TN/(TN+FP) = Specificity Measures of how well a test/biomarker model discriminates between patients who have recurrence and patients without recurrence 91 Figure 3.7: Example of an ROC curve. An ROC curve is plotted with sensitivity versus (1-specificity), or the true positive rate versus the false positive rate, The red line shows perfect predictive ability (AUC=1.0), blue line shows predictive ability between AUC=0.5 and AUC=1.0, and any ROC curve on the dotted line shows no predictive ability greater than chance (AUC=0.5). The first step in implementing stability selection with elastic net regression analyses was calibrating the op t i m al t uni ng par am et er α ( usi ng B i oc onduct or p ac k ag e glmnet), by using elastic net with repeated 10-fold cross-validation (using Bioconductor package caret). Cross validation (CV) is a statistical model validation method that estimates how well a selected model could accurately perform in an independent dataset. Using the same dataset from which the model was dev el oped, t h i s m et hod pa r t i t i on s t h e da t as et i nt o “t r a i ni ng ” and “ t es t ” sets. If using 10 folds, as in this case, 10 sub-samples are created of which 9 subsamples are used for the training data and the 1 used as the test set (P. Zhang, 1993). This partitioning process occurs 10 times with elastic net run on each and the results from each run are averaged to provide one estimate of prediction. In order to reduce variability, repeated cross validation is done to obtain an average of estimations over a certain number of CV runs (J. H. Kim, 2009). The tuning parameter that provided the best prediction based on the resulting AUC metric was α=0.3 ;; but since the intention 1 − Sp ec ifi ci ty Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 92 was to include as many possible features while maintaining good prediction, α= 0.2 was used for the final model selection using stability selection. However, results from stability selection would not be sensitive based on the choice of AUC from this elastic net run. Therefore, using α=0.2, stability selection was implemented using 500 subsamples, each subsample having half of the sample size, in order to identify the most robust predictors for the final model. Standardization of the covariates by their standard deviation (the default in glmnet), which captures the variability of expression across samples, in order to place all gene features on the same scale to give them equal footing was not done in order to account for the fact that the variability of these gene features may be biologically important. Features were evaluated and ranked using frequency thresholds (Π ) ranging from 20% to 80% to determine the most predictive features. The different models developed by fitting elastic net on each set of features selected by the stability selection were evaluated using cross validation to determine the model with best AUC. Figure 3.8: Resulting plot showing features (in red) selected for at least one val ue o f λ fr o m el a st i c net acr o ss that is seen at least 50% (frequency threshold) of 500 bootstraps run using stability selection. This plots the lambdas selected during each run of elastic net against the frequency threshold ( Π). Features in grey did not reach the frequency threshold of 50% at any provided lambda. The constant lines at Π = 1.0 are the variables that were forced-in the model and appear 100% of the time across all subsamples. 93 We considered models obtained by stability selection and elastic net regression with different % frequency thresholds from 20-80%. An example of one run selecting at 50% frequency threshold across bootstraps, force including clinical variables (Gleason, PSA, operation year, age) is shown in Figure 3.6. Force inclusion does not allow for variable selection to be performed by not allowing the parameters of these variables to be penalized by the elastic net penalty, thus remaining in the model. Since current standard of care and clinical tools already require clinical data to be determined for patients, such as Gleason score and PSA level, choosing genes associated with these clinical variables would not be efficiently using data already at hand for prediction, therefore we selected models that forced-included these clinical variables. 3.4.3.2.4 Validation of identified models in external datasets For validation of the identified models, external datasets were used from 3 different studies that used whole-genome gene expression of PCa tumors (Boormans et al., 2013; Erho et al., 2013; B. S. Taylor et al., 2010a). Genomic and clinical data for these studies were obtained through a database repository for genomic data, Gene Expression Omnibus (GEO) (GSE46691, GSE21032, GSE41410). All three studies used the Affymetrix Human Exon 1.0 ST array to obtain gene expression data. This array consists of ~1.4 million probe sets, with approximately 4 probes per exon and about 40 probes per gene. Partek® (Copyright, Partek Inc. Copyright, Partek Inc. Partek and all other Partek Inc. product or service names are registered trademarks or trademarks of Partek Inc., St. Louis, MO, USA.), a software especially designed for managing and processing microarray and sequencing data, was used to extract the raw data (Affymetrix CEL files) from GEO and was normalized through standard Robust Multi-array Average (RMA) method and background correction for Affymetrix arrays. The exon array has three types of annotations available, in decreasing order of reliability: core (using Refseq, full length mRNAs), extended (adding expressed sequence tags (ESTs), sytenic rat and mouse mRNAs), and full (adding ab- 94 initio predictions) (Robinson & Speed, 2007). In order to ensure that all possible probes with good reliability were included in the validation, probes from the extended and full annotations were obtained for all genes in the model. Since literature on Affymetrix arrays shows that the probe intensity distributions among extended and full probes are almost indistinguishable (Robinson & Speed, 2007), the probes from the full probeset annotation were used for validation purposes. Using corresponding expression data using and the patient population from each of the studies, repeated 5-fold cross-validation (CV) usi ng el as t i c ne t ( α set at = 0.2, and no standardization of the probe variables) was performed for validation (CV as described in section 3.4.3.2.3). To determine the best prediction of a parsimonious model, t he λ ( LA SS O pena l t y parameter) one standard error above the detected minim u m λ ( w i t h t he l ow es t C V er r o r) was used to obtain the average AUC across all CV runs. Genes for all possible predictive models (frequency threshold from stability selection 20% - 80% as described in section 3.4.3.2.3) were assessed through cross validation using all data that was available for each dataset. A visual summary of the overall statistical analyses undertaken is in Figure 3.9. 3.4.3.2.4.1 External dataset 1: Mayo Clinic data (GenomeDx) Patients from the Mayo Clinic Radical Prostatectomy Registry were included in a nested case- control design also used in a previous study (Nakagawa et al., 2008) to determine predictors of aggressive disease. Three groups were selected: 213 patients with no evidence of disease (NED), 213 patients with biochemical recurrence and no detectable clinical metastasis, and 213 patients who experienced BCR and developed clinical metastasis. For their whole-genome profiling using the Affymetrix Human Exon 1.0 ST array, they included 545 patients with RNA available from 95 the original 639 from their previous study with a median of 16.9 years of follow up. For their study, NED patients and BCR patients were combined into the control group, based on their description that they found very limited differential expression between the groups. They also split their dataset into training (n=359; 230 controls and 129 metastatic cases) and validation (n=186; 123 controls and 63 metastatic cases) datasets. (Erho et al., 2013) To validate our final models, probes for all the included genes were obtained for all 545 patients. Validation was performed using the entire dataset (control group including combined NED and BCR patients versus metastatic cases) and also randomly splitting their data into training and validation sets, using similar proportions of cases and controls as was done in their training and validation sets. In this dataset, only Gleason score was publicly available and therefore was the only clinical variable used to validate in this dataset. 3.4.3.2.4.2 External dataset 2: Memorial Sloan-Kettering Cancer Center data This study done at Memorial Sloan-Kettering Cancer Center, included a comprehensive approach to analyze genes involved in PCa and included various types of data. For validation, we used the gene expression data they obtained for 131 primary prostate tumors (9 with progression to metastasis) and 19 metastases tissue from different patients (7 from nodes, 3 from spine, 3 from brain, 2 from bone, 1 from lung, 1 from neck, 1 extent bladder locally progressive, and 1 extent colon locally progressive). The clinical variables available for these patients without any missing data include age at diagnosis, race/ethnicity, neoadjuvant treatment, and adjuvant treatment status (B. S. Taylor et al., 2010a). 96 3.4.3.2.4.3 External dataset 3: Erasmus Medical Center data Patients in this study were treated with radical prostatectomy from Erasmus Medical Center (Rotterdam, The Netherlands). For validation, we obtained their expression data from 41 primary tumors without metastasis and 9 primary tumors that progressed to metastasis (Boormans et al., 2013). Gleason score and pathologic T stage were clinical variables available for our validation (Boormans et al., 2013). Figure 3.9: Summary of the methods used for differential expression analysis and predictive model development Analyses steps include pre-processing of the expression data, performing differential expression analysis among patients with no recurrence (NED) and patients with clinical metastatic recurrence (CR), developing a predictive signature of metastatic recurrence, and validation of the predictive signature in independent datasets with whole-genome profiling. Differential expression of genes NED (n=154) versus METs (n=33) 184 features (172 unique genes) Developing a predictive genetic signature Sample half of training data Data pre-processing Raw gene expression data Background correction and normalization Batch adjustment (by array chip) -Bioconductor limma package: neqc ~29,000 features ~29,000 features -Bioconductor sva package: ComBat ~29,000 features Frequency thresholds across subsamples 20% (163 features selected) to 80% (3 features selected) Stability selection with elastic net regression Elastic net regression on each subsample ~29,000 features Cases (METs) Controls (NED) Elastic net regression with repeated cross-validation Using features selected from stability selection Repeat 500 times 97 4 RESULTS 4.1 Tobacco and prostate cancer risk and mortality in a multi-ethnic population: The California Collaborative Prostate Cancer Study When compared to controls, localized and advanced cases were more likely to report a family history of PCa (Table 4.1). Localized cases were of lower socio-economic status (SES) than controls. Among controls, 67% had ever smoked tobacco and 18% were current smokers during the reference year. They consumed, on average, about a pack of cigarettes a day and smoked for an average of 28.2 years. No substantial differences in tobacco smoking characteristics were seen among races/ethnicities when also stratifying by PCa stage (Table 4.2). Among controls, 65% of non- Hispanic Whites were ever tobacco smokers and smoked 30.7 pack-years compared to 73% of African Americans who smoked 27.4 pack-years and 70% of Hispanics who smoked 24.6 pack- years. Characteristics of PCa cases by smoking status (never smoker, quit >21 years ago, quit ≤ 21 y ea r s ag o, cu r r ent sm ok er ) ar e p r es ented in Table 4.3. When compared to never and former smokers, current smokers had a lower BMI (p = 0.002), lower SES (p = 0.001), were more likely to be non-Hispanic White or African American (p<0.001), were of younger age at PCa diagnosis (p<0.001), had higher levels of alcohol consumption during their lifetime (p<0.001), consumed more meat cooked at high temperature (p<0.001), had lower total vegetable consumption (p<0.001) and lower total fruit intake (p<0.001). When compared to former smokers, current smokers were more likely to smoke a pack or less (p<0.001) and more likely to smoke for >29 years (p<0.001). 98 Table 4.1: Socio-demographic and lifestyle characteristics of controls and cases in the California Collaborative Prostate Cancer Study Controls Localized PCa cases Advanced PCa cases (N = 1,139) (N = 761) (N = 1,199) n (%) n (%) n (%) Characteristics Age at diagnosis (years) <50 59 (5) 21 (3) 48 (4) 50-59 322 (29) 122 (16) 333 (28) 60-69 450 (40) 283 (38) 499 (42) 70+ 293 (26) 323 (43) 310 (26) N 1,135 754 1,195 Mean (SD) 63.7 (9.1) 67.5 (8.8) 63.9 (8.5) Family history of PCa No 993 (88) 597 (79) 964 (81) Yes 139 (12) 155 (21) 228 (19) Body Mass Index (kg/m 2 ) <25 290 (26) 199 (27) 294 (25) 25-29 514 (46) 374 (50) 579 (49) ≥30 320 (28) 176 (23) 317 (26) Socio-economic status 1(Low) 124 (11) 161 (21) 161 (13) 2 142 (13) 136 (18) 150 (13) 3 206 (18) 127 (17) 217 (18) 4 278 (24) 138 (18) 235 (20) 5(High) 385 (34) 192 (26) 432 (36) Race/ethnicity Non-Hispanic White 764 (67) 343 (45) 741 (62) African-American 249 (22) 277 (37) 255 (21) Hispanic 122 (11) 134 (18) 199 (17) Center SFBA 594 (52) 553 (73) 631 (53) LAC 545 (48) 208 (27) 568 (47) 99 Table 4.1 continued Controls Localized PCa cases Advanced PCa cases (N = 1,139) (N = 761) (N = 1,199) Characteristics Ever smoked any tobacco Yes 763 (67) 560 (74) 839 (70) Smoked cigarettes for at least 6 months Yes 707 (62) 531 (70) 782 (65) Smoked cigars for at least 6 months Yes 148 (13) 109 (14) 159 (13) Smoked pipes for at least 6 months Yes 198 (17) 127 (17) 220 (18) Ever chewed tobacco Yes 21 (2) 18 (2) 28 (2) Ever snuffed tobacco Yes 6 (1) 5 (1) 11 (1) Tobacco smoking status (cigarettes/cigars/pipes) Never 369 (33) 197 (26) 357 (30) Former 550 (49) 409 (55) 608 (51) Current 209 (18) 143 (19) 228 (19) Age start of smoking tobacco (years) N 759 552 835 Mean (SD) 18.5 (5.7) 18.4 (5.5) 18.3 (5.8) Duration of smoking tobacco (years) N 759 552 835 Mean (SD) 28.2 (14.8) 32.0 (15.8) 28.9 (15.6) Years passed since smoking cessation (former smokers only) N 549 407 603 Mean (SD) 21.2 (12.3) 22.1 (13.9) 22.8 (12.9) Cigarettes smoked (per day) N 706 531 782 Mean (SD) 20.9 (14.5) 20.2 (14.9) 20.4 (14.8) Cigarettes smoked (pack-years) N 701 525 778 Mean (SD) 29.1 (26.4) 32.3 (31.0) 29.2 (28.1) 100 Table 4.2: Tobacco smoking characteristics by race/ethnicity among cases and controls in the California Collaborative Prostate Cancer Study n % n % n % n % n % n % n % n % n % Smoked cigarettes for at least 6 months No 313 41 75 30 36 30 113 33 80 29 31 23 278 38 73 29 62 31 Yes 449 59 172 70 83 70 228 67 197 71 101 77 461 62 182 71 136 69 Smoked cigars for at least 6 months No 648 85 219 89 114 96 286 84 232 84 124 94 621 84 224 88 190 96 Yes 114 15 28 11 5 5 55 16 45 16 8 6 118 16 31 12 8 4 Smoked pipes for at least 6 months No 599 79 215 87 116 97 260 76 234 85 129 98 562 76 217 85 194 98 Yes 163 21 32 13 3 3 81 24 43 15 3 2 177 24 38 15 4 2 Ever chewed tobacco No 750 98 241 98 117 98 336 99 266 96 130 98 719 97 247 97 198 100 Yes 12 2 6 2 2 2 5 1 11 4 2 2 20 3 8 3 0 0 Ever snuffed tobacco No 758 99 245 99 119 100 338 99 275 99 132 100 729 99 254 99 198 100 Yes 4 1 2 1 0 0 3 1 2 1 0 0 10 1 1 1 0 0 Ever smoked any tobacco No 266 35 66 27 36 30 93 27 72 26 31 23 228 31 68 27 60 30 Yes 496 65 181 73 83 70 248 73 205 74 101 77 551 69 187 73 138 70 Smoking Status Never 266 35 66 27 36 30 93 27 72 26 31 24 228 31 68 27 60 30 Former 385 51 112 45 53 45 200 59 132 48 77 49 381 52 111 43 116 59 Current 110 14 69 28 30 25 47 14 73 26 23 17 130 57 76 30 22 11 Age start of smoking tobacco (years of age) N 495 181 83 247 205 100 511 187 137 Mean(SD) 18.6 (6.1) 18.5 (4.9) 17.6 (5.3) 18.7 (5.3) 18.4 (6.1) 17.8 (4.9) 18.5 (5.6) 17.9 (6.0) 18.0 (6.4) Duration of smoking tobacco (years) N 495 181 83 247 205 100 511 187 137 Mean(SD) 27.0 (14.9) 30.8 (14.4) 29.4 (14.3) 29.5 (15.5) 34.1 (15.6) 34.1 (16.2) 28.1 (15.4) 32.8 (14.7) 26.3 (16.7) Years passed since smoking cessation (former smokers only) N 385 111 53 199 132 76 378 109 116 Mean(SD) 22.0 (12.0) 20.1 (12.3) 18.2 (13.0) 24.2(13.7) 19.8 (13.2) 20.4 (14.9) 23.7 (12.9) 18.9 (11.1) 23.5 (13.8) Cigarettes smoked (pack-years) N 448 172 83 228 196 101 461 181 136 Mean(SD) 30.7 (27.1) 27.4 (25.4) 24.6 (24.4) 35.2 (34.6) 29.5 (26.0) 31.1 (31.0) 33.4 (29.6) 26.1 (25.3) 19.2 (23.3) Cigarettes smoked (per day) N 448 172 83 228 197 101 461 182 136 Mean(SD) 23.2 (14.5) 17.7 (13.9) 15.6 (12.9) 24.2 (16.5) 17.0 (12.1) 17.7 (14.2) 23.6 (14.7) 17.1 (14.8) 14.2 (12.3) Advanced Non- Hispanic Whites African- America ns Hispanics Non- Hispanic Whites African- Americans Hispanics Controls Non-Hispanic Whites African- America ns Hispanic s Localized 101 Table 4.3: Characteristics of prostate cancer cases by tobacco smoking status (including any cigarette/cigar/pipe smoking) in the California Collaborative Prostate Cancer Study Never smoker Current smoker Quit >21 years ago Quit ≤21 years ago n (%) n (%) n (%) n (%) p-value Age at diagnosis (years) <50 27 (5) 5 (1) 14 (3) 23 (6) <0.001 50-59 137 (25) 74 (14) 121 (24) 123 (34) 60-69 217 (39) 214 (42) 201 (41) 146 (39) 70+ 170 (31) 222 (43) 158 (32) 78 (21) N 552 515 495 371 <0.001 Mean (SD) 65 (9) 68 (8) 65 (8) 62 (8) Stage 0.273 Localized 197 (36) 202 (39) 205 (41) 143 (39) Advanced 357 (64) 313 (61) 290 (59) 228 (62) Family history of PCa No 447 (81) 405 (79) 390 (79) 309 (83) 0.278 Yes 105 (19) 110 (21) 105 (21) 62 (17) Body Mass Index (kg/m 2 ) <25 133 (24) 141 (27) 100 (20) 118 (32) 0.002 25-29 269 (49) 256 (50) 251 (51) 175 (47) ≥30 152 (27) 118 (23) 143 (29) 77 (21) Socio-economic status 1(Low) 86 (16) 62 (12) 96 (19) 74 (20) 0.001 2 62 (11) 76 (15) 82 (17) 63 (17) 3 89 (16) 91 (18) 89 (18) 73 (20) 4 124 (21) 99 (19) 85 (17) 63 (17) 5(High) 191 (36) 187 (36) 143 (29) 98 (26) Race /ethnicity Non-Hispanic White 321 (58) 318 (62) 259 (52) 177 (48) <0.001 African-American 140 (25) 97 (19) 144 (29) 149 (40) Hispanic 91 (17) 100 (19) 92 (19) 45 (12) Center SFBA 348 (62) 303 (59) 302 (61) 218 (59) 0.567 LAC 209 (38) 212 (41) 193 (39) 153 (41) Age start smoking tobacco (years) >18 n/a 209 (41) 188 (38) 129 (35) 0.204 ≤18 n/a 305 (59) 307 (62) 242 (65) Duration of smoking tobacco (years) ≤29 n/a 464 (90) 145 (29) 35 (9) <0.001 >29 n/a 50 (10) 350 (71) 336 (91) Cigarettes smoked (per day) ≤20 n/a 369 (77) 312 (67) 294 (84) <0.001 >20 n/a 108 (23) 156 (33) 58 (16) Cigarette Pack-years ≤22 n/a 343 (72) 179 (38) 110 (31) <0.001 >22 n/a 134 (28) 288 (62) 241 (69) Alcohol intake (g/day) N 551 514 494 369 <0.001 Mean (SD) 7 (16) 12 (22) 14 (28) 20 (30) Meat cooked at high temperature intake (g/day) N 552 514 494 371 0.002 Mean (SD) 57 (52) 51 (51) 60 (56) 65 (55) Vegetable intake (g/day) N 551 514 494 370 <0.001 Mean (SD) 150 (191) 144 (171) 142 (191) 104 (133) Fruit intake (g/day) N 551 514 494 370 <0.001 Mean (SD) 118 (167) 122 (177) 114 (184) 69 (126) Former Smoker 102 4.1.1 Tobacco smoking and prostate cancer risk We observed differences in the associations between tobacco smoking variables and risk of localized PCa by race/ethnicity (Table 4.4). Among African Americans, there was no evidence of associations between localized PCa and any of the smoking variables. Among Hispanic and non- Hispanic Whites, ORs were generally elevated but were statistically significant only among non- Hispanic Whites. Among non-Hispanic Whites, risk of localized PCa was increased by 50% for ever smokers (OR = 1.5, 95%CI = 1.1-2.1, p = 0.007), former smokers (OR = 1.5, 95%CI = 1.1- 2.1, p = 0.008), and current smokers (OR=1.5; 95%CI=0.9-2.4, p=0.07) compared to never smokers, although the last comparison was not statistically significant due to the relatively small number of current smokers. OR estimates did not increase with increasing duration or intensity of smoking. Among former smokers, estimates were similar for men who quit >21 years (OR = 1.5; 95%CI = 1.0 - 2.1) v s. ≤ 21 y ea r s ( O R = 1. 6;; 9 5% C I = 1.1 -2.3) prior to the reference year. The association between smoking variables and risk of advanced PCa was assessed stratifying by race/ethnicity (Table 4.5). Among non-Hispanic Whites, current smoking was associated with an increased risk of advanced PCa when compared to never smokers (OR = 1.4, 95%CI = 1.0-1.9, p = 0.037). No associations were observed among African Americans (OR = 0.8, 95%CI = 0.5-1.3) or Hispanics (OR = 0.5, 95%CI = 0.2-1.0; p of heterogeneity test = 0.004). OR estimates did not increase with increasing duration or intensity of smoking. Among Hispanics, compared to never smokers, we observed some borderline statistically significant associations for smokers with longer time since quitting and an inverse association with current smoking, although the number of current smokers was relatively small. When restricting our analyses to ever smokers, we examined whether the age at first tobacco use modified the associations between tobacco smoking duration, cigarette pack-years, 103 and sm ok i ng st at u s ( q ui t >2 1 y ea r s a g o, qui t ≤ 21 y ea r s a g o, cur r ent sm ok i ng ) and PC a r i sk . T h er e was no evidence of effect modification for either localized or advanced disease among the variables considered. 104 Table 4.4: Smoking characteristics and risk of localized prostate cancer by race/ethnicity in the California Collaborative Prostate Cancer Study Co/Ca OR a 95%CI p-value Co/Ca OR a 95%CI p-value Co/Ca OR a 95%CI p-value Co/Ca OR a 95%CI p-value Smoking Status (any smoking tobacco) Never smoker 365/196 1.0 REF 265/93 1.0 REF 65/72 1.0 REF 35/31 1.0 REF Former smoker 549/409 1.3 1.0-1.6 0.033 385/200 1.5 1.1-2.1 0.008 111/132 0.8 0.5-1.3 0.483 53/77 1.6 0.9-3.2 0.174 0.073 Current smoker 206/142 1.1 0.8-1.5 0.462 108/47 1.5 0.9-2.4 0.070 68/72 0.7 0.4-1.2 0.204 30/23 1.1 0.5-2.3 0.840 Use of smoking tobacco No 365/196 1.0 REF 265/93 1.0 REF 65/72 1.0 REF 35/31 1.0 REF Yes 756/553 1.3 1.0-1.6 0.051 494/248 1.5 1.1-2.1 0.007 179/204 0.8 0.5-1.2 0.311 83/101 1.4 0.8-2.8 0.267 0.057 Age at first tobacco use (years) Never smoker 365/196 1.0 REF 265/93 1.0 REF 65/72 1.0 REF 35/31 1.0 REF >18 292/217 1.3 1.0-1.7 0.093 187/100 1.5 1.0-2.0 0.030 78/82 0.7 0.4-1.2 0.212 27/35 1.4 0.6-3.2 0.384 ≤18 463/334 1.2 1.0-1.6 0.084 306/147 1.6 1.1-2.4 0.008 101/122 0.9 0.5-1.4 0.52 56/65 1.4 0.7-2.9 0.314 0.132 p-trend 0.106 0.040 0.64 0.354 Smoking duration (years) Never smoker 365/196 1.0 REF 265/93 1.0 REF 65/72 1.0 REF 35/31 1.0 REF ≤29 386/243 1.2 1.0-1.6 0.107 272/129 1.5 1.1-2.2 0.013 78/77 0.7 0.5-1.3 0.307 36/37 1.4 0.6-2.9 0.443 0.293 >29 369/308 1.3 1.0-1.6 0.074 221/118 1.5 1.1-2.8 0.022 101/127 0.8 0.5-1.3 0.428 47/63 1.5 0.7-3.0 0.278 p-trend 0.081 0.022 0.476 0.293 Cigarettes smoked per day Never cig. smoker 421/224 1.0 REF 312/113 1.0 REF 74/80 1.0 REF 35/31 1.0 REF 0.088 ≤20 503/404 1.2 0.9-1.6 0.067 292/152 1.5 1.1-2.0 0.016 140/169 0.9 0.6-1.3 0.559 71/83 1.3 0.7-2.5 0.472 >20 196/121 1.2 0.9-1.6 0.321 154/76 1.5 1.0-2.3 0.030 30/27 0.6 0.3-1.1 0.087 12/18 2.1 0.7-5.9 0.161 p-trend 0.187 0.017 0.123 0.174 Cigarette Pack-years Never cig. smoker 421/224 1.0 REF 312/113 1.0 REF 74/80 1.0 REF 35/31 1.0 REF 0.161 ≤22 357/250 1.2 0.9-1.5 0.169 210/105 1.5 1.1-2.2 0.016 98/91 0.7 0.5-1.2 0.186 49/54 1.4 0.7-2.8 0.395 >22 342/275 1.3 1.0-1.7 0.039 236/123 1.5 1.1-2.1 0.023 72/105 1.0 0.6-1.6 0.959 34/47 1.6 0.7-3.3 0.246 p-trend 0.039 0.019 0.998 0.253 Years since quitting smoking tobacco Never smoker 365/196 1.0 REF 265/93 1.0 REF 65 1.0 REF 35/31 1.0 REF 0.314 Quit >21 years ago 274/192 1.3 1.0-1.7 0.105 202/108 1.5 1.0-2.1 0.032 52 0.8 0.5-1.4 0.523 20/30 1.3 0.5-3.2 0.56 Quit ≤21 years ago 274/215 1.3 1.0-1.7 0.043 183/91 1.6 1.1-2.3 0.018 58 0.9 0.5-1.5 0.629 33/46 1.8 0.8-3.8 0.136 Current smoker 206/142 1.1 0.8-1.5 0.466 108/47 1.5 1.0-2.4 0.069 68 0.7 0.4-1.2 0.202 30/23 1.1 0.5-2.5 0.837 p-trend 0.242 0.023 0.249 0.568 Hispanics a Adjusted for age at diagnosis (years), family history of PCa, body mass index, alcohol consumption (g/day), total intake of meat cooked at high temperature (g/day), any lifetime use of non- smoking tobacco snuff/chew, use of cigar/pipe for at least 6 months if evaluating cigarette smoking (per day and pack-years); b Test of heterogeneity of ORs by race/ethnicity. Non-Hispanic Whites African-Americans Heterog p b All races/ethnicities 105 Table 4.5: Smoking characteristics and risk of advanced prostate cancer by race/ethnicity in the California Collaborative Prostate Cancer Study Controls Cases OR a 95%C I p-value Controls Cases OR a 95%CI p-value Controls Cases OR a 95%CI p-value Controls Cases OR a 95%CI p-value n n n n n n n n Smoking Status (any smoking tobacco) Never smoker 365 355 1.0 REF 265 228 1.0 REF 65 67 1.0 REF 35 60 1.0 REF Former smoker 549 606 1.1 0.9-1.3 0.37 385 381 1.1 0.9-1.4 0.342 111 110 0.8 0.5-1.2 0.375 53 115 1.3 0.8-2.3 0.378 0.004 Current smoker 206 227 1.1 0.9-1.4 0.386 108 130 1.4 1.0-1.9 0.037 68 75 0.8 0.5-1.3 0.384 30 22 0.5 0.2-1.0 0.044 Use of smoking tobacco No 365 355 1.0 REF 265 228 1.0 REF 65 67 1.0 REF 35 60 1.0 REF Yes 756 833 1.1 0.9-1.3 0.322 494 511 1.2 0.9-1.5 0.147 179 185 0.8 0.5-1.2 0.322 83 137 1.0 0.6-1.7 0.969 0.359 Age at first tobacco use (years) Never smoker 365 355 1.0 REF 265 228 1.0 REF 65 67 1.0 REF 35 60 1.0 REF >18 years 292 309 1.1 0.8-1.3 0.600 187 199 1.2 0.9-1.6 0.191 78 69 0.7 0.4-1.1 0.126 27 41 0.9 0.5-1.8 0.789 ≤18 463 523 1.1 0.9-1.4 0.263 306 312 1.2 0.9-1.5 0.205 101 116 0.9 0.6-1.4 0.692 56 95 1.0 0.6-1.8 0.957 0.474 p-trend 0.262 0.178 0.884 0.929 Smoking duration (years) Never smoker 365 355 1.0 REF 265 228 1.0 REF 65 67 1.0 REF 35 60 1.0 REF ≤29 years 386 404 1.1 0.9-1.3 0.501 272 255 1.1 0.9-1.4 0.467 78 72 0.8 0.5-1.3 0.333 36 77 1.3 0.7-2.5 0.334 0.063 >29 years 369 428 1.1 0.9-1.4 0.275 221 256 1.3 1.0-1.7 0.053 101 113 0.8 0.5-1.3 0.431 47 59 0.7 0.4-1.3 0.234 p-trend 0.275 0.054 0.464 0.225 Cigarettes smoked per day Never cig. smoker 421 412 1.0 REF 312 278 1.0 REF 74 72 1.0 REF 35 62 1.0 REF ≤20 503 576 1.1 0.9-1.3 0.442 292 304 1.1 0.9-1.4 0.368 140 153 0.9 0.6-1.4 0.636 71 119 1.0 0.6-1.7 0.887 0.801 >20 196 200 1 0.8-1.3 0.913 154 157 1.1 0.8-1.5 0.583 30 27 0.7 0.4-1.4 0.323 12 16 0.8 0.3-2.0 0.579 p-trend 0.763 0.496 0.344 0.652 Cigarette Pack-years Never cig. smoker 421 412 1.0 REF 312 278 1.0 REF 74 72 1.0 REF 35 62 1.0 REF ≤22 357 386 1 0.8-1.2 0.861 210 194 1.0 0.8-1.3 0.974 98 100 0.9 0.5-1.4 0.537 49 92 1.1 0.6-1.9 0.744 0.256 >22 342 390 1.1 0.9-1.4 0.345 236 267 1.2 0.9-1.6 0.133 72 80 0.9 0.5-1.4 0.586 34 43 0.7 0.4-1.3 0.258 p-trend 0.351 0.142 0.594 0.303 Years since quitting smoking tobacco Never smoker 365 368 1.0 REF 265 228 1.0 REF 65 74 1.0 REF 35 66 1.0 REF Quit >21 years ago 274 318 1.1 0.9-1.4 0.28 202 207 1.2 0.9-1.5 0.229 52 42 0.6 0.4-1.1 0.124 20 69 2.0 1.0-4.0 0.055 <0.001 Quit ≤21 years ago 274 305 1 0.8-1.3 0.808 183 171 1.0 0.8-1.4 0.814 58 74 0.9 0.5-1.5 0.683 33 60 1.0 0.5-1.8 0.896 Current smoker 206 239 1.1 0.9-1.4 0.371 108 130 1.4 1.0-1.9 0.041 68 85 0.8 0.5-1.3 0.414 30 24 0.5 0.2-1.0 0.045 p-trend 0.508 0.114 0.616 0.044 All races/ethnicities a Adjusted for age at diagnosis (years), family history of PCa, body mass index, alcohol consumption (g/day), total intake of meat cooked at high temperature (g/day), any lifetime use of non-smoking tobacco snuff/chew, use of cigar/pipe for at least 6 months if evaluating cigarette smoking (per day and pack-years); b Test of heterogeneity of ORs by race/ethnicity. Non-Hispanic Whites African-Americans Hispanics Heterog p b 106 4.1.2 Tobacco smoking, polymorphisms in metabolism enzymes, and PCA risk Interactions between each of the nine polymorphisms (GSTP1, PTGS2, CYP1A2, CYP2E1, EPHX1, CYP1B1, UGT1A6, GSTM1 and GSTT1) and NAT2 predicted phenotype and tobacco smoking variables were evaluated. We only observed evidence of effect modification of CYP1A2 -154A>C (rs762551) on smoking status (never, former, current) (Table 4.6). Among carriers of the CC genotype, current smoking was associated with increased risk of PCa overall (OR = 2.2; 95% CI = 1.2 - 4.3, p-interaction = 0.008), localized PCa (OR = 2.8; 95% CI = 1.2-6.9, p- interaction = 0.012), and advanced PCa (OR = 1.9; 95% CI = 1.0-3.8, p-interaction = 0.043). These associations were not present among carriers of the AA genotype. Analyses considering other smoking variables (smoking duration, cigarette pack-years, and age at first tobacco use) showed similar findings as those for smoking status; however, none reached statistical significance. Similar interaction ORs were observed when stratifying by race/ethnicity and including both localized and advanced PCa for non-Hispanic Whites and African Americans. This pattern was not observed among Hispanics, although the number of Hispanics was small. No evidence of interaction was observed for any of the other polymorphisms or NAT2 predicted phenotype. 107 Table 4.6: Smoking status, CYP1A2 (rs7662551) genotype, and risk of prostate cancer by stage and race/ethnicity CYP1A2 p- interaction Smoking Status Co Ca OR* 95%CI Co Ca OR* 95%CI Co Ca OR* 95%CI OR* 95%CI Never 126 224 1.0 REF 102 159 1.0 REF 23 40 1.0 REF 0.9 0.7-1.1 Former 170 382 1.2 0.9-1.6 164 319 1.0 0.8-1.3 42 63 1.0 0.6-1.5 0.8 0.7-1.0 Current 88 136 0.9 0.6-1.2 52 121 1.4 1.0-2.0 10 36 2.2 1.2-4.3 1.4 1.0-2.0 0.008 p-trend 0.601 0.066 0.031 Smoking Status Co Ca OR* 95%CI Co Ca OR* 95%CI Co Ca OR* 95%CI OR* 95%CI Never 96 139 1.0 REF 66 78 1.0 REF 13 20 1.0 REF 0.9 0.6-1.2 Former 124 216 1.2 0.8-1.7 102 158 1.1 0.8-1.5 26 25 1 0.5-1.8 0.8 0.6-1.0 Current 47 69 1.2 0.7-1.8 24 52 1.7 1.0-2.7 3 6 2.4 0.9-6.6 1.3 0.8-2.1 0.268 p-trend 0.433 0.079 0.194 Smoking Status Co Ca OR* 95%CI Co Ca OR* 95%CI Co Ca OR* 95%CI OR* 95%CI Never 19 48 1.0 REF 24 44 1.0 REF 7 15 1.0 REF 1 0.6-1.6 Former 23 69 1.1 0.6-2.2 43 101 1 0.6-1.6 13 32 0.8 0.3-2.0 0.8 0.5-1.2 Current 23 44 0.6 0.3-1.2 23 57 1.1 0.6-2.0 4 30 2.2 0.8-6.2 1.8 1.1-3.0 0.034 p-trend 0.158 0.675 0.135 Smoking Status Co Ca OR* 95%CI Co Ca OR* 95%CI Co Ca OR* 95%CI OR* 95%CI Never 11 43 1.0 REF 12 40 1.0 REF 3 6 1.0 REF 0.7 0.4-1.5 Former 23 110 1.3 0.6-2.9 19 65 1.1 0.5-2.2 3 6 0.7 0.3-1.7 0.6 0.3-1.1 Current 18 29 0.7 0.3-1.6 5 16 0.5 0.2-1.3 3 1 0.9 0.2-3.8 0.6 0.2-1.4 0.899 p-trend 0.352 0.243 0.068 Smoking Status Co Ca OR* 95%CI Co Ca OR* 95%CI Co Ca OR* 95%CI OR* 95%CI Never 126 70 1.0 REF 102 55 1.0 REF 23 13 1.0 REF 0.9 0.6-1.3 0.016 Former 170 131 1.4 1.0-2.2 164 120 1.2 0.9-1.7 42 25 1.0 0.5-2.2 0.8 0.6-1.1 Current 88 39 0.8 0.5-1.4 52 49 1.5 0.9-2.4 10 16 2.8 1.1-6.9 1.7 1.1-2.7 p-trend 0.71 0.075 0.038 Smoking Status Co Ca OR 95%CI Co Ca OR 95%CI Co Ca OR 95%CI OR 95%CI Never 126 154 1.0 REF 102 104 1.0 REF 23 27 1.0 REF 0.9 0.7-1.2 0.043 Former 170 251 1.1 0.8-1.5 164 199 1.0 0.7-1.3 42 38 0.9 0.5-1.5 0.8 0.6-0.9 Current 88 97 0.9 0.8-1.5 52 72 1.3 0.9-1.9 10 20 1.9 1.0-3.8 1.3 0.9-1.9 p-trend 0.62 0.238 0.148 *Odds ratios derived from a common baseline model that includes the genotype, smoking status, and interaction terms between genotype and smoking status. ORs are adjusted for age (years), family history of PCa, body mass index, alcohol (gm/day), total intake of meat cooked at high temperature (gm/day), any lifetime use of non-smoking tobacco snuff/chew Advanced cases Localized cases A/A A/C C/C Per C-allele Hispanics African-Americans Non-Hispanic whites All 108 4.1.2.1 Genetic ancestry of Latinos, tobacco smoking, and prostate cancer risk Table 4.7 presents the characteristics of Latino male controls and prostate cancer cases with genetic ancestry data from Los Angeles County (N=416). Individuals who migrated from non- Latin countries were excluded (Philippines, Portugal, England, Nigeria; 2 controls, 5 cases). The majority of Latinos in this study were from Mexico or the USA (approximately 45% and 35% respectively, among both cases and controls). In regards to smoking status, more controls were current smokers than cases (27% versus 15%, respectively) (p=0.014). Smoking status (never, former, current) was the only assessment of smoking that showed a difference between cases and controls. There are no significant differences in genetic ancestry (African, European, and Native American) between Latino cases and controls (Table 4.8). In this population, European ancestry is the largest genetic ancestry component (51% and 54% among cases and controls, respectively) while African ancestry is the least (9% among both cases and controls). Logistic regression models were used to evaluate the association between genetic ancestry and risk of PCa (Table 4.9). The models use either quartiles or tertiles of Native American and African ancestries. Unadjusted models only include genetic ancestry and adjusted models further include age, BMI, family history, socioeconomic status (SES), and country of birth/origin. In both the unadjusted and adjusted models with ancestry as quartiles, Native American ancestry of >26.4% - 41.3% is statistically significantly associated with an increased risk of PCa compared to having lower Native American ancestry (unadjusted model: OR=1.92, 95%CI: 1.01-3.63; adjusted model: OR=3.06, 95%CI: 1.48-6.32). Using ancestry as tertiles in the adjusted model, African ancestry of >4.65%-9.40% and Native American ancestry of >31.45% - 109 48.0% were statistically significantly associated with an increased risk of PCa ( OR=2.01, 95%CI: 1.09-3.54 and OR=1.93, 95%CI: 1.05-3.54, respectively). Yet, no significant trend was seen in either quartiles of Native American ancestry or African ancestry in either the adjusted or unadjusted models. To assess to the possibility of confounding by genetic ancestry on the relationship between tobacco smoking and PCa risk, African and Native-American ancestry were included in the conditional logistic regression models as a continuous variable and also evaluated using tertiles. The risk estimates of smoking did not change by over 10% in either case. Therefore, genetic ancestry did not show to confound the relationship between tobacco smoking and PCa risk among Latino men in this study. Interactions were performed to assess genetic ancestry as a modifier between potential risk factors of PCa and PCa risk (Table 4.10). Median Native American ancestry was used for the interaction analyses, adjusting for African ancestry, age, BMI, family history, and SES. Since only smoking status was the only smoking variable that was significantly different among Latino cases and controls, this was the smoking measurement that was evaluated as a potential risk factor in these analyses. Native American genetic ancestry did not modify the association between smoking status and PCa risk among Latino men (p-interaction=0.582). 110 Table 4.7: Characteristics of Latino male controls and prostate cancer cases from Los Angeles County (N=416) Controls N=99 Cases N=317 p-value* n (%) n (%) BMI, n(%) ≤25 21 (22) 54 (17) 0.505 >25 to ≤30 45 (47) 165 (53) >30 30 (31) 95 (30) Family History No 90 (92) 255 (81) 0.011 Yes 8 (8) 60 (19) SES 1 (Low) 47 (48) 135 (43) 0.238 2 27 (27) 82 (26) 3 17 (17) 51 (16) 4 7 (7) 27 (8) 5 (High) 1 (1) 22 (7) Country/location of birth Mexico 44 (45) 145 (45) 0.423 USA 35 (35) 109 (34) Central America 11 (11) 31 (10) South America 7 (7) 16 (5) Caribbean 1 (1) 15 (5) Unknown 1 (1) 1 (1) Stage of PCa Localized . 124 (39) Advanced . 177 (56) Missing 16 White fish consumed Never/rarely 29 (36) 101 (36) 0.758 Low 37 (46) 118 (43) High 14 (18) 58 (21) Dark fish consumed Never/rarely 29 (36) 122 (44) 0.421 Low 39 (49) 114 (41) High 12 (15) 41 (15) Age Mean(SD) 61.3 (9.0) 65.9 (8.5) <0.001 Total calcium (mg/day) Mean (SD) 1,577 (886) 1,424 (802) 0.113 Total vitamin D (IU/day) Mean (SD) 696 (46) 604 (22) 0.054 Total high temperature cooked meat (g/day) Mean (SD) 87.6 (6.3) 92.9 (3.7) 0.483 *Fisher's Exact p-value for categorical variables and Student's t-test p-value for continuous variables 111 Table 4.7 continued (smoking variables) Controls N=99 Cases N=317 p-value* n (%) n (%) Smoking status Never 27 (28) 88 (28) 0.014 Former 43 (45) 179 (57) Current 26 (27) 46 (15) Use of smoking tobacco No 27 (28) 88 (28) 0.985 Yes 69 (72) 226 (72) Age at first tobacco use (years) Never smoker 27 (28) 88 (28) 0.945 >18 years 21 (22) 73 (23) ≤18 48 (50) 151 (48) Smoking duration (years) Never smoker 27 (28) 88 (28) 0.779 ≤29 years 29 (30) 105 (34) >29 years 40 (42) 119 (38) Cigarettes smoked per day Never cig. smoker 27 (28) 90 (29) 0.939 ≤20 58 (60) 192 (61) >20 11 (12) 32 (10) Cigarette Pack-years Never cig. smoker 27 (28) 90 (29) 0.987 ≤22 41 (42) 135 (43) >22 28 (29) 89 (28) Years since quitting smoking tobacco Never smoker 27 (28) 88 (28) 0.017 Quit >21 years ago 17 (18) 92 (29) Quit ≤21 years ago 26 (27) 86 (28) Current smoker 26 (27) 46 (15) 112 Table 4.8: Genetic ancestry (African, European, Native American) among Latino controls and cases Controls N=99 Cases N=317 p-value* Genetic Ancestry African ancestry Mean(SD) 0.09 (0.08) 0.09 (0.07) 0.983 European ancestry Mean (SD) 0.51 (0.20) 0.54 (0.17) 0.193 Native American ancestry Mean (SD) 0.40 (0.19) 0.38 (0.16) 0.187 *Student's t-test p-value 113 Table 4.9: Genetic ancestry and risk of prostate cancer among Latinos Co/Ca OR* 95%CI p-value OR** 95%CI p-value Genetic ancestry African ancestry 0 - 4.1% 23/66 1.0 REF 1.0 REF >4.1 - 6.5% 27/67 0.83 0.43-1.60 0.57 0.94 0.46-1.93 0.867 >6.5 - 10.7% 23/116 1.63 0.84-3.16 0.151 1.95 0.95-4.00 0.068 >10.7% 26/68 0.84 0.43-1.64 0.613 1.16 0.55-2.45 0.694 p-trend 0.647 0.285 Native American ancestry 0 - 26.4% 25/63 1.0 REF 1.0 REF >26.4% - 41.3% 25/127 1.92 1.01-3.63 0.045 3.06 1.48-6.32 0.003 >41.3 - 51.6% 25/69 1.01 0.52-1.96 0.976 1.61 0.76-3.40 0.214 >51.6% 24/58 0.95 0.48-1.85 0.875 1.99 0.90-4.37 0.088 p-trend 0.414 0.494 African ancestry 0 - 4.65% 33/81 1.0 REF 1.0 REF >4.65% - 9.4% 34/141 1.64 0.94-2.86 0.079 2.01 1.09-3.54 0.025 >9.4% 32/95 1.15 0.65-2.04 0.632 1.5 0.79-2.85 0.215 p-trend 0.545 0.225 Native American ancestry 0 - 31.45% 33/99 1.0 REF 1.0 REF >31.45% - 48.0% 33/137 1.34 0.77-2.32 0.298 1.93 1.05-3.54 0.034 >48.0% 33/81 0.8 0.45-1.41 0.441 1.29 0.68-2.45 0.436 p-trend 0.53 0.445 *Unadjusted odds ratio with only genetic ancestry in the model **Adjusted for age, body mass index, family history of PCa, socioeconomic status, country/location of birth ANCESTRY AS TERTILES 114 Table 4.10: Native American ancestry as a potential modifier of the association between lifestyle factors and prostate cancer risk among Latino males OR* 95%CI p-value OR* 95%CI p-value p-interaction Age ≤60 1.0 REF 1.0 REF 0.729 >60 5.07 1.96-13.10 0.001 4.16 2.29-7.56 <0.001 BMI, n(%) ≤25 1.0 REF 1.0 REF 0.414 >25 to ≤30 2.77 0.90-8.54 0.076 1.28 0.57-2.83 0.55 >30 1.37 0.40-4.69 0.62 1.19 0.50-2.85 0.695 Family History No 1.0 REF 1.0 REF Yes 12.72 1.61-100.32 0.016 1.71 0.68-4.26 0.251 0.04 Smoking status Never 1.0 REF 1.0 REF Former 1.01 0.37-2.71 0.99 1.18 0.59-2.38 0.642 0.582 Current 0.87 0.24-3.18 0.831 0.5 0.22-1.11 0.088 Total Calcium ≤1,367.56 1.0 REF 1.0 REF 0.926 >1,367.56 0.73 0.31-1.72 0.476 0.7 0.39-1.26 0.235 Total vitamin D ≤617.77 1.0 REF 1.0 REF >617.77 0.95 0.40-2.21 0.898 0.52 0.29-0.95 0.032 0.262 Total high temperature cooked meat ≤78.07 1.0 REF 1.0 REF >78.07 1.69 0.74-3.90 0.216 0.58 0.21-1.61 0.296 0.295 White fish consumed Never/rarely 1.0 REF 1.0 REF Low 0.68 0.24-1.89 0.457 0.86 0.42-1.73 0.665 0.935 High 1.1 0.32-3.76 0.876 1.25 0.49-3.23 0.641 Dark fish consumed Never/rarely 1.0 REF 1.0 REF Low 0.5 0.18-1.34 0.166 0.72 0.35-1.45 0.356 0.402 High 1.32 0.34-5.14 0.687 0.63 0.24-1.66 0.345 *Adjusted for African ancestry, age, BMI, family history, and socioeconomic status ≤41.3% >41.3% Native American ancestry 115 4.1.3 Tobacco smoking, PCa, and overall survival Until January 2012, among cases in our study with available survival data (N = 1,944) there were a total of 715 deaths (37%), with 253 deaths (13%) due to PCa as identified through the ICD-0-3 (International Classification of Disease for Oncology, codes C610-619) and 164 (8%) due to cardiovascular diseases. Of the deaths due to PCa (N = 253), 210 (83%) belonged to men who were diagnosed with advanced stage PCa. Among localized cases with available survival data, Non-Hispanic Whites (N = 340) experienced 17 PCa related deaths (5%), Hispanics (N = 133) 7 deaths (5%), and African-Americans (N = 276) 19 deaths (7%). Among advanced cases with available survival data, Non-Hispanic Whites (N = 736) experienced 120 PCa-related deaths (16%), Hispanics (N = 194) 31 deaths (16%), and African-Americans (N = 254) 56 deaths (22%). The median follow-up time for men who did not experience a PCa-related mortality was 9.1 years (interquartile range IQR= 7.3-10.1 years), and for men who did experience a PCa-related mortality was 3.8 years (IQR=1.5-6.5 years). No statistically significant associations were observed between tobacco smoking variables and risk of PCa-related mortality (Table 4.11). For localized PCa cases, the survival estimates at 5 years of follow-up were 97% for Non-Hispanic Whites, 97% for Hispanics, and 96% for African-Americans (log-rank p=0.51). Among advanced PCa cases only, the survival estimates at 5 years of follow-up were 91% for Non-Hispanic Whites, 89% for Hispanics, and 81% for African-Americans (log-rank p=0.05). 116 When considering all-cause mortality, statistically significant associations were seen with most smoking variables (Table 4.11). When compared to never smokers, we found an association with overall mortality and current smoking (HR = 1.6, 95%CI: 1.2- 2.0, p<0.001) , y oung a g e ( ≤ 18 years old) at smoking initiation (HR = 1.2, 95%CI: 1.0-1.5, p=0.029), longer tobacco smoking duration (>29 years; HR = 1.3; 95%CI: 1.1-1.6, p=0.009), greater cigarette smoking (>22 pack- years; HR = 1.4; 95%CI: 1.1-1.6, p<0.001), and more than one pack of cigarettes smoked per day (HR = 1.3, 95%CI: 1.1-1.6, p=0.016). When we restricted our analysis to advanced PCa cases only, we saw an association of current smoking with PCa-related mortality that did not reach statistical significance (HR = 1.4, 95%CI = 1.0- 2.2, p = 0.09 ) and m ar g i na l s t at i s t i ca l l y si g ni f i ca nt as so ci at i on w i t h ≤ 22 ci g ar et t e pack- y ea r s and ≤ 20 ci g ar e t t e s sm ok ed per day w hen com par ed t o n ev er sm oker s ( H R = 1.4, 95%CI: 1.0-2.0, p = 0.048; HR = 1.4, 95%CI = 1.0-2.0, p = 0.046, respectively; data not shown). 117 Table 4.11: Tobacco smoking and PCa-related mortality and all-cause mortality among PCa cases (n=1,944) PCa cases* PCa deaths % of cases HR** 95% CI p-value All-cause mortalities % of cases HR** 95% CI p-value Total 1,944 253 13 715 37 Smoking Status Never 550 64 12 1.0 REF 176 25 1.0 REF Former 1,012 131 13 1.1 0.8-1.4 0.752 385 54 1.1 0.9-1.3 0.351 Current 368 55 15 1.3 0.9-1.9 0.194 147 21 1.6 1.3-2.0 <0.001 Age start smoking tobacco (years old) Never smoker 550 64 12 1.0 REF 176 25 1.0 REF >18 855 117 14 1.1 0.8-1.6 0.580 334 47 1.1 0.9-1.4 0.352 ≤18 524 69 13 1.1 0.8-1.5 0.501 198 28 1.2 1.0-1.5 0.024 Ever use of smoking tobacco for at least 6 months No 550 64 12 1.0 REF 174 25 1.0 REF Yes 1,391 189 14 1.1 0.8-1.5 0.489 534 75 1.2 1.0-1.4 0.051 Smoking duration (years) Never smoker 550 64 12 1.0 REF 176 25 1.0 REF ≤29 years 645 83 13 1.1 0.8-1.6 0.462 217 31 1.1 0.9-1.3 0.529 >29 years 734 103 14 1.1 0.8-1.5 0.602 315 44 1.3 1.1-1.6 0.009 Years passed since stopped smoking tobacco Never smoker 550 64 12 1.0 REF 176 25 1.0 REF >20 years ago 513 66 13 1.1 0.7-1.5 0.702 191 27 1.0 0.8-1.2 0.949 ≤20 years ago 492 65 13 1.1 0.8-1.5 0.652 192 27 1.2 1.0-1.5 0.56 Current smoker 368 55 15 1.3 0.9-1.9 0.188 147 21 1.6 1.3-2.0 <0.001 Cigarette Pack-years Never cigarette smoker 636 74 12 1.0 REF 207 29 1.0 REF ≤22 632 87 14 1.2 0.8-1.6 0.386 208 29 1.0 0.8-1.2 0.968 >22 663 89 13 1.2 0.8-1.6 0.389 294 42 1.4 1.1-1.7 0.001 Cigarettes smoked per day Never cigarette smoker 636 74 12 1.0 REF 207 29 1.0 REF ≤20 983 143 15 1.2 0.9-1.6 0.200 373 52 1.2 1.0-1.4 0.119 >20 322 36 11 0.9 0.6-1.4 0.757 135 19 1.3 1.1-1.7 0.013 *2008 cases completed interview - excludes cases with ambiguous tumor staging (n=48), incomplete follow-up data (n=16) ***Hazard ratios adjusted for age (years), family history of PCa, body mass index, total intake of meat cooked at high temperature (gm/day), any lifetime use of non-smoking tobacco snuff/chew, use of cigar/pipe for at least 6 months if evaluating cigarette smoking (per day and pack-years), PCa stage (localized/advanced), socioeconomic status, race/ethnicity, and center (Northern/Southern) 118 4.2 Biomarkers of prostate cancer recurrence: results from the Hospital de Clínicas José de San Martín cohort This section for this study is paraphrased from a published paper written in collaboration with Dr. Javier Cotignola and colleagues. Tables and figures are shown as published. (Cotignola et al., 2013) Of the patients diagnosed with PCa, 12 (11.4%) had n or m al se r um PS A l ev el s ( ≤ 4 ng / m l ) . Almost 50% presented with a Gleason score <7, and among patients with combined Gleason score 7, 66% were classified as (3+4). Most had negative surgical margins (77.5%) and 50.5% of patients were diagnosed with pT2-stage tumors. One-third of patients experienced BCR following surgery. The median follow-up for patients without BCR was 84 months (8-156), and 36 months (3-132) for patients with BCR (Table 4.12). 4.2.1 Associations between clinical variables and biochemical recurrence Kaplan-Meier curves were used to study biochemical recurrence-free survival (BRFS) across Gleason score categories, pathologic T stage, and surgical margins, which are known risk factors for biochemical recurrence. Gleason score was categorized as : ≤ 6, 7 ( 3+4 ) , 7 ( 4+ 3) , and ≥ 8 (Figure 4.1a). Since survival of patients with a Gleason score of 7 (3+4) was similar to survival of patients with a Gleason sc o r e ≤ 6, these two groups were combined and categorized as a low-risk Gleason score. Gleason scores 7 (4+3) and ≥ 8 were then combined into a high-risk Gleason score. Analysis of the low and high-risk Gleason score showed a statistically significant difference in BRFS (Figure 4.1b), with the high-risk group having approximately 2.5 times the risk for biochemical recurrence than the low-risk group (HR=2.45, 95% CI=1.18-5.09, p=0.016). Advanced cases (stage pT3) also had a greater risk of developing a biochemical recurrence 119 (HR=2.20, 95% CI=1.07-4.52, p=0.032) and had lower BRFS compared to localized PCa cases (stage pT2) (Figure 4.1c). BRFS was also significantly different between negative and positive surgical margins (Figure 4.1d). Positive surgical margins were associated with more than 3 times risk of biochemical recurrence (HR=3.33, 95% CI=1.67-6.62, p=0.001). Results from this study show that only high-risk patients (PSA>20 ng/ml, stage pT3 or G l ea so n s cor e ≥ 8;; bas ed on D ’ A m i co r i sk st r a t i f i c a t i on (D'Amico et al., 1998) with positive surgical margins had lower BRFS (log-rank p=0.026, data not shown). However, no association was seen in low and intermediate-risk groups, as seen in a previous study (Corcoran et al., 2012), which could be attributed to the small numbers. 120 Table 4.12: Characteristics of the Hospital de Clínicas José de San Martín study population N % Total cases 105 100.0 Age at diagnosis (years) (median, range) 65 49 - 74 PSA level at diagnosis (ng/ml) (median, range) 6.87 0.77 - 28.90 ≤4 12 11.4 >4 - 10 63 60.0 >10 30 28.6 Smoking status Never smoker 29 27.6 Former smoker 56 53.3 Current smoker 20 19.1 Family history of PCa No 88 83.8 Yes 17 16.2 Pathological Gleason score 5 6 5.7 6 46 43.8 7 (3+4) 35 33.3 7 (4+3) 13 12.4 8 5 4.8 Pathological T stage pT2 50 50.5 pT3a 47 47.5 pT3b 2 2.0 Missing 6 Risk group for biochemical recurrence a Low 26 26.3 Intermediate 21 21.2 High 52 52.5 Missing 6 Weight of resected prostate (g) (median, range) 45 10 - 143 Missing (n) 14 Tumor Volume (mm 3 ) (median, range) 1900 12 - 18000 Missing (n) 74 Margin involvement of the resected prostate No 79 77.5 Yes 23 22.5 Missing 3 Biochemical recurrence No 70 66.7 Yes 35 33.3 a B ased o n D’ Am ico r is k g r o u p s tr atif icatio n , as follows: 121 Low risk (PSA<10 ng / m l, p T 2 s tag e a n d Glea s o n s co r e ≤ 6 ) , Intermediate risk (PSA=10 –20 ng/dL and pT2 stage and/or Gleason score 7), H ig h r is k ( P SA >2 0 n g /d L o r p T 3 s tag e o r Glea s o n s co r e ≥ 8 ) Figure 4.1(a-d): Kaplan-Meier curves for biochemical-recurrence free survival by clinical characteristics. a) Pa t ho l og i ca l G l e as on sc o r e ( ≤ 6, 3+4, 4 +3, 8 -10) (log rank p=0.016) b) Pa t hol og i ca l G l ea son sc or e ( ≤ 7, >7) ( l og r ank , p=0. 011) c) Pathological stage (T2, T3) (log rank p=0.024) d) Surgical margin status (log rank p<0.001) 122 4.2.2 Analyses of genotypes and risk of biochemical relapse Genotypes of the GST genes were assessed using multiplex PCR, and Figure 4.2 shows an example of an agarose gel electrophoresis for the multiplex PCR reaction. Figure 4.2: Agarose gel electrophoresis of the multiplex PCR reaction for GSTT1 and GSTM1 genotyping. The figure depicts an example of one 2% agarose gel electrophoresis dyed with ethidium bromide used to determine the GSTT1 and GSTM1 genotypes by multiplex PCR. Lanes 1, 3, 7, 8 and 9: GSTT1 present/GSTM1 null; lane 2: GSTT1 null/GSTM1 null; lanes 4, 5, 6 and 10: GSTT1 present/GSTM1 present; lane 11: GSTT1 null/GSTM1 present; M: 100 bp marker (Productos Bio-Lógicos, Buenos Aires, Argentina). Samples that did not amplify for both genes were repeated 2-3 times before discarding as a PCR failure. Figure 4.3 shows an agarose gel electrophoresis for the enzymatic digestion of the GSTP1 amplicon. The genotype distribution and allelic frequencies are similar in both Caucasian and US Hispanic Populations (Table 4.13). 123 Figure 4.3: Agarose gel electrophoresis of GSTP1 amplicons digested with Alw26I restriction enzyme. The figure shows an example of one 2% agarose gel electrophoresis dyed with ethidium bromide used to determine the GSTP1 genotype by PCR-RFLP. Lanes 1 and 4: c.313 GG; lanes 2 and 3: c.313 AG; lanes 5 and 6: c.313 AA; M: 50 bp marker (Genbiotech, Buenos Aires, Argentina). The genotypes distributions of the GST genes vary across different races/ethnicities (Table 4.13). The distributions were similar between Caucasians and US Hispanics with some overlap. 124 Table 4.13: Genotype distribution of GST genes and comparisons with other populations. a Hispanic genotype frequency ranges were obtained from dbSNP (accessed in August 2012; http://www.ncbi.nlm.nih.gov/snp) for HISP1 population and a report by Block et al. 19 . b Hispanic from Chile genotype frequency ranges reported by Acevedo et al. 13 , and Caceres et al. 14 . c Genotype frequency ranges for Caucasians with Spanish and Italian ancestry were obtained from dbSNP (accessed in August 2012; http://www.ncbi.nlm.nih.gov/snp) for CAUC1 and AFD_EUR_PANEL populations, and reports by To-Figueras et al. 20 , Ladero et al. 21 , Raimondi et al. 22 , and Agudo et al. 23 . d for GSTP1 c.313 A>G the recessive model was considered Abbreviations: NA, not available GST genotype N % Reported % in US Hispanics a Reported % in Hispanics from Chile b Reported % in Caucasians c Total cases 105 100.0 - - - GSTT1 null Present 82 78.8 86 89-94 71-85 Null 22 21.2 14 6-11 15-29 Missing data 1 GSTM1 null Present 57 55.3 57 64-77 52-55 Null 46 44.7 43 23-36 45-48 Missing data 2 GSTP1 c.313 A>G (p.105 Ile>Val) AA (Ile/Ile) 51 49.5 52 NA 48-50 AG (Ile/Val) 41 39.8 35 NA 30-42 GG (Val/Val) 11 10.7 13 NA 10-22 A (Ile) 143 69.4 NA NA NA G (Val) 63 30.6 NA NA NA Missing data 2 GSTT1 null + GSTM1 null + GSTP1 c.313 A>G d 0 risk allele 42 42.0 NA NA NA 1 risk allele 40 40.0 NA NA NA 2 risk alleles 17 17.0 NA NA NA 3 risk alleles 1 1.0 NA NA NA Missing data 5 125 Carriers of the GST null genotypes had slightly lower BRFS than carriers of the non-null genotypes, albeit these differences were not statistically significant (Figures 4.4a and 4.4b for GSTT1 and GSTM1, respectively). No significant associations were found between these polymorphisms and biochemical recurrence risk in the unadjusted and multivariate Cox models (Table 4.14). For the analysis of the GSTP1 SNP, an additive model was first considered using the genotype with the highest enzyme activity as the reference genotype, c.313 AA genotype (p.105 Ile/Ile), then the heterozygous genotype that has intermediate activity, and the c.313 GG genotype (p.105 Val/Val) that has the lowest activity. Individuals with the c.313 GG genotype (p.105 Val/Val) had lower BRFS when compared to patients with the c.313 AA genotype (p.105 Ile/Ile) and c.313 AG genotype (p.105 Ile/Val) (Figure 4.4c). A recessive model (AA+AG vs. GG) was further considered which showed that GG genotype had lower BRFS when compared to patients with the AA or AG combined genotypes (Figure 4.4d). The GG genotype was associated with a 3-fold higher risk for recurrence than the AA+AG genotypes in the unadjusted model (Table 4.14). This association remained statistically significant when the model was further adjusted for margin status, Gleason score, pT stage, PSA level at diagnosis, family history of PCa, smoking status and age at diagnosis (Table 4.14). The estimates did not significantly change when further adjusting for GSTT1 and GSTM1 genotypes. Because GSTs are enzymes that typically participate in the same metabolism pathways with overlapping substrate specificity, we considered an additive combined score that captures information on the genotypes of the three GSTs. The genotypes were categorized as follows: 0- risk-allele genotype (GSTT1 present, GSTM1 present, and GSTP1 c.313 AA+AG), 1-risk-allele 126 genotype (GSTT1 null, GSTM1 present, and GSTP1 c.313 AA+AG; or GSTT1 present, GSTM1 null, and GSTP1 c.313 AA+AG; or GSTT1 present, GSTM1 present, and GSTP1 c.313 GG), 2- risk-allele genotype (GSTT1 null, GSTM1 null, and GSTP1 c.313 AA+AG; or GSTT1 null, GSTM1 present, and GSTP1 c.313 GG; or GSTT1 present, GSTM1 null, and GSTP1 c.313 GG), and 3-risk-allele genotype (GSTT1 null, GSTM1 null, and GSTP1 c.313 GG). Only one patient presented the 3-risk-allele genotype; therefore, he was pooled with the 2-risk-allele group. We found that patients who carried 2 or more (2+) GST risk alleles had lower BRFS compared to patients with 0 or 1 risk allele (Figure 4.4e). The unadjusted proportional hazards model showed that patients with 2+ risk alleles had a nearly 3-fold increased risk for biochemical recurrence compared to patients with 0 risk alleles (Table 3). This association remained statistically significant after adjustment for other potential risk factors in multivariable models (Table 4.14). 127 Figure 4.4 (a-e): Kaplan-Meier curves for biochemical-recurrence free survival by GST genotype. a) GSTT1 (log-rank p=0.1480), b) GSTM1 (log-rank p=0.4901), c) GSTP1 co-dominant model (log-rank p<0.010), d) GSTP1 recessive model (log-rank p=0.003) e) GST combined genotype using the GSTP1 recessive model (log-rank p=0.010). 128 Table 4.14: Univariable and multivariable Cox proportional hazard models for biochemical recurrence based on GST genotypes Statistical significant associations are in bold. a GSTP1 analysis adjusted for GSTT1 and GSTM1 phenotypes b adjusted for margin c adjusted for margin, Gleason score (low-risk vs. high-risk) and pathological T stage (pT2 vs. pT3) d adjusted for margin, Gleason score (low-risk vs. high-risk), pathological T stage (pT2 vs. pT3), PSA level at d i a g n o si s ( ≤ 4 v s. > 4 -10 vs. >10), family history of PCa, smoking status (never vs. former vs. current) and age at diagnosis (continuous variable) e adjusted for margin, Gleason score (low-risk vs. high- r i sk ) , P S A l e v e l a t d i a g n o si s ( ≤ 4 v s. > 4 -10 vs. >10), family history of PCa, smoking status (never vs. former vs. current) and age at diagnosis (continuous variable) f GSTP1 analysis adjusted for GSTT1 and GSTM1 genotypes, margin, Gleason score (low-risk vs. high risk), PSA l e v e l a t d i a g n o si s ( ≤ 4 v s. > 4 -10 vs. >10), family history of PCa, smoking status (never vs. former vs. current), and age at diagnosis g for GSTP1 c.313 A>G a recessive model was considered 129 4.3 Predictors of late versus early recurrence following radical prostatectomy for localized prostate cancer treated within the PSA-era: University of Southern California, Institute of Urology, Radical Prostatectomy cohort 4.3.1 Characteristics and follow up of localized prostate cancer radical prostatectomy patients Characteristics of all study patients are presented in Table 4.15, stratified by organ confined (pT2a-pT2c) tumors versus tumors that had extracapsular extension ( ≥ p T 3 a). All clinical variables were statistically significantly different across these two groups: age, pre-operative PSA, cl i ni ca l T st ag e, D ’ A m i co r i sk cl a ss i f i ca t i o n g r o up, p r os t at ec t omy year, pathologic Gleason score, adjuvant therapy, surgical margin status. The distribution of race/ethnicity did not differ between these two groups. The median follow-up after RP of the entire study population (n=2,646) was 6.45 years (IQR=3.49-10.16) for the entire study population, 6.50 years (IQR=3.62-10.20) for organ- confined patients, and 6.03 years (IQR=3.12-9.85) for extracapsular extension patients (Table 4.16). A total of 270 (10.2%) patients experienced disease recurrence, 204 experienced only BCR, 64 experienced BCR followed by CR, and 2 demonstrated CR without prior detection of BCR. The median follow-up among the 2,376 patients who did not develop any evidence of disease recurrence was 7.00 years (IQR=4.06-10.44), 6.88 years (IQR=4.05-10.41) among those with organ-confined disease, and 7.32 years (IQR=4.11-11.40) among those with extracapsular 130 extension. The median time to BCR (n=268) was 2.96 years (IQR=1.69-4.71) and the median follow up after BCR to CR (n=64) was 2.23 years (IQR=1.02-5.10). Among the 268 patients who had BCR, 210 (78%) had recurrence within 5 years or less after surgery whereas 45 (17%) recurred between 5 to 10 years after surgery, and 13 (5%) recurred 10 years following surgery. Among organ-confined patients, 107 (76%) patients experienced BCR within 5 years or less after surgery, 25 (18%) between 5 to 10 years after surgery, and 8 (6%) patients after 10 years following surgery. Among extracapsular extension patients, 103 (80%) experienced BCR within 5 years or less after surgery, 20 (16%) patients between 5 to 10 years after surgery, and 5 (4%) patients after 10 years following surgery. 131 Table 4.15: Characteristics of localized prostate cancer patients treated with radical prostatectomy at USC All Organ confined (pT2a - pT2c) Extracapsular extension ( ≥pT3a) p-value* Total (N) 2,646 2,040 (77) 606 (23) Age (years) <60 923 (35) 764 (37) 158 (26) <0.001 60-64 600 (23) 460 (23) 140 (23) 65-69 591 (22) 454 (22) 137 (23) 70+ 533 (20) 362 (18) 171 (28) Race/ethnicity Non-Hispanic White 2,286 (87) 1,760 (87) 526 (87) 0.886 Hispanic 136 (5) 108 (5) 28 (5) African-American 104 (4) 80 (4) 24 (4) Asian/PI 88 (4) 66 (4) 22 (4) Missing 31 25 6 ≤4 395 (15) 347 (17) 48 (8) <0.001 >4-10 1,665 (63) 1,312 (64) 343 (57) >10-20 447 (17) 306 (15) 141 (23) >20 143 (5) 73 (4) 70 (12) Missing 6 2 4 Clinical stage cT1 1,896 (72) 1,530 (75) 366 (61) <0.001 cT2 734 (28) 506 (25) 228 (37) cT3 16 (1) 4 (1) 12 (2) Low 1,115 (48) 969 (55) 146 (27) <0.001 Intermediate 822 (36) 571 (33) 251 (46) High 356 (16) 207 (12) 149 (27) Prostatectomy year 07/1988 - 07/1994 460 (17) 309 (15) 151 (25) <0.001 07/1994 - 03/2005 1,639 (62) 1,292 (63) 347 (57) 03/2005 - 06/2008 546 (21) 438 (22) 108 (18) ≤6 1,203 (46) 1,067 (53) 136 (22) <0.001 (3+4) or (2+5) 921 (35) 694 (34) 227 (37) (4+3) or (5+2) 285 (11) 169 (8) 116 (19) 8-10 226 (9) 99 (5) 127 (22) Missing 11 11 0 Adjuvant therapy None 2,263 (85) 1,901 (85) 362 (60) <0.001 Radiation Therapy 370 (14) 138 (14) 232 (38) Radiation Therapy + Androgen Deprivation Therapy 13 (1) 1 (1) 12 (2) Surgical margin status Negative 2,011 (76) 1,683 (83) 328 (54) <0.001 Apex only 265 (10) 192 (9) 73 (12) Other with +/- apex 319 (12) 130 (6) 189 (31) Positive but missing location 51 (2) 35 (2) 16 (3) Pathologic T stage *Chi-square p-value PSA before surgery (ng/ml) D'Amico Risk Classification Pathologic Gleason score 132 Table 4.16: Follow-up data on USC radical prostatectomy patients All Organ confined pT2a - pT2c Extracapsular extension ≥pT3a p-value* Follow-up for all patients (years) N 2,646 2,040 606 0.0396 Median (IQR) 6.45 (3.49-10.16) 6.50 (3.62-10.20) 6.03 (3.12-9.85) Recurrence status No Recurrence 2,376 (90) 1,899 (93) 477 (79) <0.001 Only BCR 204 (8) 109 (5) 95 (15) BCR and CR 64 (2) 31 (1) 33 (6) CR without prior BCR 2 (<1) 1 (<1) 1 (<1) Follow-up for patients without any recurrence following surgery (years) N 2,376 1,899 477 0.0575 Median (IQR) 7.00 (4.06-10.44) 6.88 (4.05-10.41) 7.32 (4.11-11.40) Follow-up until BCR following surgery (years) N 268 140 128 Median (IQR) 2.96 (1.69-4.71) 2.96 (1.74-4.78) 2.85 (1.63-4.71) Biochemical recurrence (years after surgery) ≤5 years 210 (78) 107 (76) 103 (80) 0.674 >5 -10 years 45 (17) 25 (18) 20 (16) >10 years 13 (5) 8 (6) 5 (4) N 24 7 17 Early BCR ( ≤5 years after surgery) 22 (92) 7 (100) 15 (88) Late BCR (>5 years after surgery) 2 (8) 0 2 (12) *P-value from Chi-square test (categorical variables) or Wilcoxon rank sum (continuous variables) Abbreviations: Inter-quartile range (IQR), biochemical recurrence (BCR), clinical metastastic recurrence (CR) Pathologic T stage PCa-related mortality after BCR and CR 133 For t h i s st u dy , cl as si f i ca t i on of “ ea r l y ” and “l at e ” B C R w as bas e d on t he cut -point of 5 years after radical prostatectomy. It is recommended by the National Comprehensive Cancer Network (NCCN) Guidelines that monitoring of PSA for localized PCa patients after initial definitive therapy such as radical prostatectomy should be done at least 6-12 months, or more frequently for high-risk men, for 5 years (NCCN Clinical Practice Guidelines in Oncology (NCCN Guideline®): Prostate Cancer (Version 4.2013), n.d.). At 5 years, if PSA remains undetectable, follow-up is no longer recommended and the patient could be considered disease- free. Studies additionally report that the majority of recurrences occur within the first 5 years after surgery (Bolton et al., 2013; Dillioglugil, Leibman, & Kattan, 1997; Klotz et al., 2010; Loeb et al., 2011; A. J. Stephenson et al., 2005). However the heterogeneous nature of progression seen in PCa continues to be evident as some patients do have BCR after 5 years (Amling, Blute, & Bergstralh, 2000). Although patients with late BCR are shown to have better prognosis than patients with earlier BCR, the biological nature of the disease after BCR may be different for each patient, no matter at what point of time after initial treatment. Therefore, since additional follow-up is not suggested in the national guidelines for localized PCa patients BCR-free at 5 years after surgery, it is important to recognize which patients are still at risk for BCR after 5 years in order to maintain follow-up for these patients. Next we investigated possible differences in key characteristics among patients who had no ev i den ce o f r ec u r r enc e , ea r l y B C R ( ≤ 5 y ea r s a f t er su r g er y ) or l a t e B C R ( >5 y ea r s a f t e r surgery) (Table 4.17). Among patients with BCR, there was a significant difference in the proportion of patients <65 years old among early and late BCR patients (49% among early BCR vs. 60% among late BCR, p=0.044) but the median age was not statistically significant (median age=65 and 63 years, respectively, Wilcoxon p-value=0.154). Patients experiencing late BCR w er e m or e l i k el y t o hav e pat ho l og i c G l ea son sc or e ≤ 6 than earlier BCR (45% versus 25% for 134 earlier BCR, p<0.001). Variables statistically significantly different among patients with no recurrence and patients with recurrence at any time after surgery include age, pathologic Gleason sc or e, PS A bef o r e su r g er y , D ’ A m i co r i sk cl as si f i ca t i o n, ad j uv ant t he r apy , sur g i c al m ar g i n st a t us , clinical stage, and pathologic stage. Patients with no recurrence were more likely to have lower pathologic Gleason score, lower PSA before surgery, low- r i sk D ’ A m i co cl as s i f i ca t i on, n eg at i v e surgical margins, diagnosed with clinical stage T1, and have lower pathologic stage after surgery (pT2) than patients with BCR at any time after surgery (all p<0.001). 4.3.2 Clinical predictors of early versus late biochemical recurrence Univariable analysis was performed to identify predictors of overall BCR and stratified by early and late BCR. All clinical variables including age, pathologic Gleason score, pre-operative PSA level, surgical margin status, pathologic T stage, adjuvant therapy, were associated with overall BCR risk, except for race/ethnicity (Table 4.18). These variables were also seen to be associated with early BCR (Table 4.19). Variables associated with experiencing late BCR included age at surgery (60-64 years), higher pathologic Gleason score (8-10), pre-operative PSA >10ng/ml, positive surgical margin, pathologic stage (ECE), and adjuvant radiotherapy (Table 4.19). In multivariable Cox regression, age at time of surgery, pre-operative PSA, Gleason score, positive sur g i ca l m ar g i n, and ≥ p T 3 pat ho l og i c s t ag e r em ai ned as soc i a t ed w i t h an i nc r ea se d r i sk of overall BCR at any time after RP (Table 4.20). Multivariable Cox regression further showed that the risk of early BCR was associated with pathologic Gleason score higher than 6 (Gleason 8-10 HR=3.07, 95%CI: 1.99-4.74, p<0.001), pre-operative PSA >20 (HR=2.14; 95%CI=1.15-3.98, p- value=0.016), positive surgical margin (HR =1.93; 95%CI: 1.32-2.82, p=0.001) , and ≥ p T3 pathologic stage (HR=2.10, 95%CI=1.53-2.87, p<0.001). Adjuvant radiation therapy was not 135 significantly protective from early BCR. A specific age category of 60-64 years at time of surgery compared to <60 years old (HR=2.56, 95%CI=1.26-5.18, p=0.009), pre-operative PSA level in the range of 10-20 ng/ml (HR=4.00, 95%CI=1.16-13.71, p=0.027) , and ≥ p T 3 p at ho l og i c s t ag e (HR=1.94, 95%CI=1.05-3.58, p=0.033) were predictors of late BCR. Adjuvant radiation therapy was not significantly protective from late BCR (Table 4.21). 136 Table 4.17: Radical prostatectomy patients: characteristics of patients with no recurrence and patients who experienced early or late biochemical recurrence. No recurrence ≤5 years after RP >5 years after RP p-value* p-value** Total (N) 2,376 210 58 Age (years) <60 854 (36) 54 (26) 12 (21) 0.001 0.044 60-64 529 (22) 48 (23) 23 (39) 65-69 523 (22) 53 (25) 15 (25) 70+ 470 (20) 55 (26) 8 (15) Race/ethnicity Non-Hispanic White 2,048 (87) 184 (88) 52 (90) 0.918 0.729 Hispanic 122 (5) 10 (5) 4 (6) African-American 96 (4) 7 (3) 1 (2) Asian/PI 78 (3) 9 (4) 1 (2) ≤6 1,124 (48) 52 (25) 26 (45) <0.001 0.029 (3+4) or (2+5) 828 (35) 76 (36) 16 (27) (4+3) or (5+2) 244 (10) 36 (17) 5 (9) 8-10 169 (7) 46 (22) 11 (19) ≤4 374 (16) 18 (9) 3 (5) <0.001 0.416 >4-10 1,512 (63) 114 (54) 28 (49) >10-20 377 (16) 50 (24) 20 (34) >20 107 (5) 28 (13) 7 (12) Low 1,064 (51) 43 (24) 8 (18) <0.001 0.542 Intermediate 721 (35) 77 (43) 23 (52) High 281 (14) 61 (34) 13 (30) Adjuvant therapy None 2,074 (87) 150 (71) 38 (66) <0.001 0.543 Radiation only 290 (12) 59 (28) 20 (34) Both hormone and radiation 12 (1) 1 (1) 0 (0) Negative 1,857 (78) 115 (55) 37 (64) <0.001 0.362 Apex only 222 (9) 37 (17) 6 (10) Other with +/- apex 246 (10) 58 (28) 15 (26) Positive but missing location 51 (3) 0 (0) 0 (0) Clinical stage cT1 1,734 (73) 133 (63) 29 (50) <0.001 0.133 cT2 630 (26) 74 (35) 28 (48) cT3 12 (1) 3 (2) 1 (2) Prostatectomy year 07/1988 - 07/1994 361 (15) 62 (29) 36 (62) <0.001 <0.001 07/1994 - 03/2005 1,476 (62) 140 (67) 22 (38) 03/2005 - 06/2008 538 (23) 8 (4) 0 (0) Pathologic T Stage pT2a-pT2c 1,899 (80) 107 (51) 33 (57) <0.001 0.422 pT3a-pT4a 477 (20) 103 (49) 25 (43) Clinical recurrence No - 155 (74) 49 (85) - 0.117 Yes - 55 (26) 9 (15) Time of biochemical recurrence *Fisher's exact p-value evaluating differences among all groups **Fisher's exact p-value evaluating differences between only BCR groups D'Amico Risk Classification PSA before surgery (ng/ml) Pathologic Gleason score Surgical Margin status 137 Table 4.18: Univariable Cox regression to determine predictors of overall biochemical recurrence. No BCR (n) BCR (n) HR 95%CI p-value 2,376 268 Age at surgery (years) <60 854 66 1.0 REF 60-64 529 71 1.66 1.19-2.32 0.003 65-69 523 68 1.65 1.19-2.32 0.004 70+ 470 63 1.7 1.21-2.40 0.002 p-trend 0.002 Pathologic Gleason score ≤6 1,124 78 1.0 REF (3+4) or (2+5) 828 92 1.84 1.36-2.49 <0.001 (4+3) or (5+2) 244 41 2.82 1.93-4.13 <0.001 8-10 169 57 4.78 3.40-6.72 <0.001 p-trend <0.001 PSA before surgery (ng/ml) ≤4 376 21 1.0 REF >4-10 1,512 142 1.64 1.04-2.60 0.034 >10-20 377 70 2.98 1.83-4.86 <0.001 >20 107 35 4.61 2.68-7.92 <0.001 p-trend <0.001 Surgical margin status Negative 1,857 152 1.0 REF Positive: Apex only 222 43 2.06 1.47-2.90 <0.001 Positive other location with +/- apex 246 73 2.96 2.24-3.91 <0.001 Positive but missing location 51 0 . . . Pathologic T stage Organ Confined 1,899 140 1.0 REF Extracapscular 477 128 3.23 2.55-4.11 <0.001 Adjuvant therapy None 2,074 188 1.0 REF Radiation only 290 79 2.3 1.77-3.00 <0.001 Both hormone and radiation 12 1 1.43 0.20-10.18 0.723 Race/ethnicity Non-Hispanic White 2,048 236 1.0 REF Hispanic 122 14 1.1 0.64-1.88 0.738 African-American 96 8 0.9 0.45-1.82 0.773 Asian/PI 78 10 1.23 0.66-2.32 0.514 Biochemical recurrence anytime after prostatectomy 138 Table 4.19: Univariable Cox regression to determine predictors of early ( ≤5 years after surgery) or late (>5 years after surgery) biochemical recurrence. ≤5 years after prostatectomy >5 years after prostatectomy No BCR (n) BCR (n) HR 95%CI p-value BCR (n) HR 95%CI p-value 2,376 210 58 Age at surgery (years) <60 854 54 1.0 REF 12 1.0 REF 60-64 529 48 1.39 0.94-2.05 0.099 23 2.86 1.42-5.76 0.003 65-69 523 53 1.61 1.10-2.35 0.014 15 1.85 0.87-3.97 0.111 70+ 470 55 1.81 1.24-2.63 0.002 8 1.27 0.54-3.02 0.587 p-trend 0.001 0.675 Pathologic Gleason score ≤6 1,124 52 1.0 REF 26 1.0 REF (3+4) or (2+5) 828 76 2.07 1.46-2.95 <0.001 16 1.31 0.70-2.45 0.405 (4+3) or (5+2) 244 36 3.33 2.18-5.10 <0.001 5 1.51 0.58-3.95 0.398 8-10 169 46 5.41 3.63-8.04 <0.001 11 3.47 1.75-6.88 <0.001 p-trend <0.001 0.001 PSA before surgery (ng/ml) ≤4 376 18 1.0 REF 3 1.0 REF >4-10 1,512 114 1.52 0.92-2.50 0.098 28 2.37 0.72-7.80 0.154 >10-20 377 50 2.56 1.49-4.39 0.001 20 5.39 1.60-18.14 0.007 >20 107 28 4.52 2.50-8.18 <0.001 7 5.44 1.41-21.07 0.014 p-trend <0.001 <0.001 Surgical margin status Negative 1,857 115 1.0 REF 37 1.0 REF Positive: Apex only 222 37 2.37 1.64-3.43 <0.001 6 1.14 0.48-2.71 0.761 Positive other location with +/- apex 246 58 3.13 2.28-4.30 <0.001 15 2.42 1.33-4.41 0.004 Positive but missing location 51 0 . . . 0 . . . Pathologic T stage Organ Confined 1,899 107 1.0 REF 33 1.0 REF Extracapscular 477 103 3.37 2.57-4.42 <0.001 25 2.78 1.66-4.66 <0.001 Adjuvant therapy None 2,074 150 1.0 REF 38 1.0 REF Radiation only 290 59 2.31 1.71-3.12 <0.001 20 2.29 1.33-3.94 0.003 Both hormone and radiation 12 1 1.63 0.23-11.62 0.628 0 . . . Race/ethnicity Non-Hispanic White 2,048 184 1.0 REF 52 1.0 REF Hispanic 122 10 0.97 0.51-1.84 0.93 4 1.61 0.58-4.45 0.358 African-American 96 7 0.93 0.44-1.98 0.854 1 0.73 0.10-5.32 0.76 Asian/PI 78 9 1.4 0.71-2.72 0.329 1 0.61 0.08-4.38 0.62 139 Table 4.20: Multivariable Cox regression to determine predictors of overall biochemical recurrence anytime after radical prostatectomy. HR 95%CI p-value Age at surgery (years) <60 1.0 REF 60-64 1.47 1.05-2.06 0.025 65-69 1.44 1.02-2.02 0.036 70+ 1.23 0.87-1.75 0.247 Pathologic Gleason score ≤6 1.0 REF (3+4) or (2+5) 1.55 1.14-2.11 0.005 (4+3) or (5+2) 1.92 1.30-2.84 0.001 8-10 2.75 1.90-4.00 <0.001 PSA before surgery (ng/ml) ≤4 1.0 REF >4-10 1.36 0.85-2.15 0.196 >10-20 1.98 1.21-3.26 0.007 >20 2.24 1.27-3.95 0.005 Surgical margin status Negative 1.0 REF Positive: Apex only 1.80 1.25-2.61 0.002 Positive: other location with +/- apex 1.79 1.27-2.51 0.001 Pathologic T stage Organ Confined 1.0 REF Extracapscular 2.1 1.59-2.78 <0.001 Adjuvant therapy None 1.0 REF Radiation only 0.78 0.55-1.09 0.148 Both hormone and radiation 0.52 0.07-3.78 0.518 Biochemical recurrence anytime after prostatectomy 140 Table 4.21: Multivariable Cox regression to determine predictors of early ( ≤5 years after surgery) or late (>5 years after surgery) biochemical recurrence. ≤5 years after prostatectomy >5 years after prostatectomy HR 95%CI p-value HR 95%CI p-value Age at surgery (years) <60 1.0 REF 1.0 REF 60-64 1.22 0.82-1.80 0.32 2.56 1.26-5.18 0.009 65-69 1.38 0.94-2.02 0.097 1.65 0.77-3.55 0.198 70+ 1.33 0.91-1.96 0.14 0.80 0.33-1.99 0.639 Pathologic Gleason score ≤6 1.0 REF 1.0 REF (3+4) or (2+5) 1.72 1.20-2.47 0.003 1.22 0.65-2.31 0.533 (4+3) or (5+2) 2.28 1.47-3.54 <0.001 1.01 0.38-2.68 0.989 8-10 3.07 1.99-4.74 <0.001 1.91 0.89-4.12 0.097 PSA before surgery (ng/ml) ≤4 1.0 REF 1.0 REF >4-10 1.26 0.76-2.07 0.369 1.98 0.60-6.54 0.265 >10-20 1.67 0.97-2.89 0.067 4.00 1.17-13.71 0.027 >20 2.14 1.15-3.98 0.016 2.95 0.72-12.13 0.134 Surgical margin status Negative 1.0 REF 1.0 REF Positive: Apex only 2.12 1.42-3.15 <0.001 0.78 0.29-2.10 0.629 Positive: other location with +/- apex 1.93 1.32-2.82 0.001 1.33 0.61-2.91 0.474 Pathologic T stage Organ Confined 1.0 REF 1.0 REF Extracapscular 2.10 1.53-2.87 <0.001 1.94 1.05-3.58 0.033 Adjuvant therapy None 1.0 REF 1.0 REF Radiation only 0.73 0.50-1.07 0.11 1.15 0.52-2.55 0.726 Both hormone and radiation 0.56 0.08-4.09 0.567 . . . 141 4.3.3 Time to clinical recurrence based on time of biochemical recurrence The relationship between the timing of BCR ( ≤ 5 y ears versus >5 years) and the timing of subsequent CR ( ≤ 2 y ea r s v er su s >2 years following BCR) is presented in Figure 4.5. There were 210 patients who experienced early BCR ( ≤ 5 y ea r s af t er s ur g er y ) 58 with late BCR (>5 years after surgery). Among patients with early BCR, 55 (26%) progressed to CR with 25 (45%) within 2 years after BCR and 30 (55%) greater than 2 years after BCR. Among patients with late BCR, 9 (16%) progressed to CR, with 2 (22%) occurring with 2 years after BCR and 7 (78%) occurring after 2 years after BCR. The median time elapsed from BCR to CR among patients with early BCR was 2.24 years (IQR=0.63-4.85), which was not statistically different from the median time among patients with late BCR, 2.21 years (IQR=2.00-6.35), (Wilcoxon rank sum p-value comparing the median years elapsed=0.3489). Of the 210 patients that experienced BCR within 5-years, 55 (26%) experienced CR compared to the 9 (16%) that experienced CR after a period greater than 5-years following RP. There was no significant difference in the rate of CR between early and late BCR pat i en t s ( F i sh er ’ s e xac t p -value=0.117) Kaplan Meier analysis of clinical recurrence-free survival based on the time elapsed from BCR to CR for all BCR patients revealed no significant difference (log rank p-value = 0.169) (Figure 4.6). However, when stratified by pathologic stage, OC patients had a significant difference in CR-free survival (log-rank p-value=0.023) compared with patients with ≥T 3 a pathologic stage (log-rank p-value= 0.636) (Figures 4.7 and 4.8). Time to BCR after surgery was not a predictor of time to CR after adjusting for age, pathologic Gleason score, pre-operative PSA 142 level, surgical margin status, and pathologic stage in Cox regression analysis. Patients who experienced late BCR were 43% less likely to experience a clinical recurrence than individuals who experienced early BCR, although this did not reach significance (HR=0.57, 95%CI: 0.27- 1.20, p=0.139; data not shown). Among 55 of the 210 patients (26%) who had early BCR and subsequent CR, 31 died (56%): 22 from PCa, 1 from another type of cancer, 2 from non-cancer related cause, and 6 from unknown cause but not PCa. Among the 9 of the 58 patients (16%) with late BCR and subsequent CR, 3 died (33%): 2 from PCa and 1 from a non-PCa related cause. There was no significant difference in PCa-related mortality-free survival based on time elapsed from BCR to death between early and late BCR patients (log rank p-value=0.139). 143 Figure 4.5: Patients who experienced bio che m i c a l re cu rr en ce ( B C R ) ≤ 5 year s o r > 5 yea rs a f t e r radi ca l pro st a t e ct o m y an d cl i n i ca l recurrence (CR) status 144 Figure 4.6: Probability of being clinical recurrence-free based on time of BCR ( ≤5 years or >5 years after surgery) Log rank p=0.1688 For early BCR, % CR-free by years after BCR event: 85% at 3 years, 79% at 5 years, 76% at 7 years, 68% at 10 years For late BCR, % CR-free by years after BCR event: 91% at 3 years, 89% at 5 years, 82% at 7 years, 75% at 10 years 145 Figure 4.7: Clinical recurrence-free survival following early or late biochemical recurrence among patients with pT2 prostate cancer. Log rank p=0.0225 For early BCR, % CR-free by years after BCR event: 86% at 3 years, 80% at 5 years, 78% at 7 years, 64% at 10 years For late BCR, % CR-free by years after BCR event: 97% at 3 years, 97% at 5 years, 91% at 7 years, 91% at 10 years 146 Figure 4.8: Clinical recurrence-free survival following early or late biochemical recurrence among patients with ≥ pT 3a prostate cancer. Log rank p=0.636 For early BCR, % CR-free by years after BCR event: 85% at 3 years, 79% at 5 years, 75% at 7 years, 72% at 10 years For late BCR, % CR-free by years after BCR event: 83% at 3 years, 77% at 5 years, 69% at 7 years, 55% at 10 years 147 4.4 Biomarkers of recurrence of localized prostate cancer: University of Southern California, Institute of Urology, Radical Prostatectomy cohort 4.4.1 Characteristics of patients included in the discovery/training set Gene expression profiles were generated for a total of 293 organ confined PCa patients who underwent radical prostatectomy at the University of Southern California. Of these patients 154 had no evidence of disease (NED) following surgery, indicating no recurrence of disease, 106 experienced biochemical recurrence (BCR) only and no further progression, and 33 patients experienced clinical recurrence of disease where local or distal metastasis was detected (CR). Comparing the characteristics between NED and CR patients, CR patients were older (age 70+, 39% CR versus 23% NEDs), had higher Gleason score (Gleason 8-10, 36% CR versus 16% NEDs, p=0.01), and more had neo-adjuvant hormonal therapy prior to surgery (24% CR versus 4% NEDs). CR patients were also more likely to be classified as high-risk according to the D ’ A m i co risk classification using available diagnostic data prior to surgery (Table 4.22). When comparing BCR patients with CR patients, BCR only patients were younger (<60 years old, 32% BCR versus 12% CR), had lower pathologic Gleason scores (Gleason 6 or less, 35% NEDs versus 15% CR), were diagnosed with lower clinical stage (cT1, 74% BCR versus 52% CR), were more likely to be classified as low- r i sk ac cor d i ng t o t he D ’ A m i co r i sk classification (30% BCR versus 8% CR), and were less likely to receive neo-adjuvant hormonal therapy (8% BCR versus 24% CR). The median follow up time was 9.55 years for NEDs (controls), 3.12 years for BCR only patients, and 5.83 years for patients who experienced metastatic recurrence of disease. 148 Table 4.22: Characteristics of patients with gene expression profiles available Controls NED BCR only CR NED vs. CR BCR vs. CR n=154 n=106 n=33 Age <60 54 (35) 34 (32) 4 (12) 0.028 0.02 60-64 26 (25) 26 (25) 8 (24) 65-69 39 (25) 29 (27) 8 (24) 70+ 35 (23) 17 (16) 13 (39) PSA before surgery (ng/ml) ≤4 30 (19) 5 (5) 3 (9) 0.215 0.571 >4-10 84 (55) 65 (61) 18 (55) >10-20 33 (21) 28 (26) 8 (24) >20 7 (5) 8 (8) 4 (12) Pathologic Gleason score ≤6 56 (36) 37 (35) 5 (15) 0.01 0.012 (3+4) or (2+5) 60 (39) 40 (38) 11 (33) (4+3) or (5+2) 14 (9) 16 (15) 5 (15) 8-10 24 (16) 13 (12) 12 (36) Surgical margin status Negative 119 (77) 64 (60) 24 (73) 0.651 0.221 Positive 35 (23) 42 (40) 9 (27) Race/ethnicity Non-Hispanic White 137 (89) 90 (85) 29 (88) 0.255 0.685 Hispanic 12 (8) 7 (7) 1 (3) African-American 4 (3) 3 (3) 2 (6) Asian/PI 1 (1) 6 (6) 1 (3) Clinical stage cT1 105 (68) 78 (74) 17 (52) 0.089 0.035 cT2 48 (31) 27 (25) 15 (45) cT3 1 (1) 1 (1) 1 (3) Pathologic stage T2a 10 (6) 12 (11) 3 (9) 0.819 0.849 T2b 9 (6) 8 (8) 1 (3) T2c 134 (87) 86 (81) 29 (88) T2 with unknown laterality 1 0 0 Prostatectomy year 07/1988 - 07/1994 56 (37) 34 (32) 18 (55) 0.093 0.039 07/1994 - 03/2005 90 (59) 67 (63) 13 (39) 03/2005 - 06/2008 6 (4) 5 (5) 2 (6) D'Amico risk groups (those with available clinical data: Gleason, stage, PSA) Low 50 (40) 26 (30) 2 (8) <0.001 0.039 Intermediate 60 (48) 45 (52) 11 (42) High 15 (12) 16 (18) 13 (50) Neoadjuvant hormonal therapy No 148 (96) 98 (92) 25 (76) 0.001 0.024 Yes 6 (4) 8 (8) 8 (24) Radiation therapy No 135 (88) 91 (86) 26 (79) 0.177 0.412 Yes 19 (12) 15 (14) 7 (21) Adjuvant hormone therapy No 151 (98) 105 (99) 33 (100) . . Yes 3 (2) 1 (1) 0 (0) Median follow-up time (IQR) 9.55 (6.61-15.25) 3.12 (1.78-5.79) 5.83 (4.18-8.69) *Fisher's Exact p-value Recurrence cases p-value* Abbreviations: No evidence of disease (NED), biochemical recurrence cases (BCR), clinical metastatic recurrence (CR) 149 4.4.2 Quality assessment of gene expression data Quality assessment of the data is an important step prior to performing expression analyses. Assessing quality allows for identification of any outlier samples or any characteristics of the data influenced by the technical experimental process that may create variance potentially interfering with the interpretation of the final results. A previous concern regarding the use of FFPE samples was that formalin fixation of tissue can induce adenosine residues prone to chemical modifications that would degrade RNA fragments, making it difficult to obtain accurate gene expression data. The use of formalin-fixed paraffin embedded (FFPE) tissue for gene expression profiling is possible using the whole genome DASL HT assay that overcomes the technical limitations in using microarrays when using these samples (April et al., 2009). Using short target sequences of about 50 nucleotides allows the assay to accurately assess degraded RNA samples, a performance that is highly comparable to results obtained using fresh-frozen tissue (April et al., 2009). To confirm good quality data, the data generated from our RNA samples of PCa tumors were assessed for the overall detectable levels of genes and the average signal obtained across all samples. Some methods used to assess the quality of the data during processing of the samples by Illumina include: 1) internal control probes 2) a technical replicate during the experimental preparation process. 150 Internal control probe performance indicates if there were any problems created during the experiment process or if it is a problem related to the sample. There could be variations due to chip type or system setup (Gene Expression Microarray Data Quality Control; Technical Note: RNA Analysis, Illumina, Inc.). Figure 4.9 shows the raw intensity of the internal controls on the arrays processed. The boxed area shows a region of samples that were run in a separate array that showed greater variation than all other arrays. This array was excluded from further analyses after further assessment. Figure 4.9: Internal controls used for quality assessment of the samples and the experimental runs when processing of the samples. A technical replicate is a sample that is divided prior to processing the samples using the WG DASL HT assay. The divided samples are mixed into two different hybridization mixtures and applied onto different arrays, in order to measure the variation induced by the processing of the samples. A technical replicate was performed in our samples and showed to have very good reproducibility, indicating low variation was introduced during the processing of the samples (Figure 4.10). 151 Figure 4.10: High reproducibility detected using average signal based on one sample run in duplicate Figure 4.11: Assessment of expression data generated from FFPE samples. The top plot presents the average signal intensity obtained from each sample (x-axis) for each shipment of samples (May 2012, August 2011, March 2013). The bottom plot presents the number of detected probes for each sample based on the detection p-value thresholds of 0.01 and 0.05. For each sample, each gene is assigned a R 2 =0.97 152 detection p-value that represents the confidence a probe is expressed above background defined by the negative control probes. There are 3 samples (circled in red) that had low sensitivities (~2,000-4,000 genes detected) based on the p-value thresholds and were therefore excluded from further analyses. 4.4.3 Differentially expressed genes between tumors After quality assessments of the data were performed, analysis was performed to determine differentially expressed genes between NEDs (n=154) and CR (n=33). The empirical Bayes moderated t-test was used on the entire set of ~29,000 features, adjusting for age, pre-operative PSA level, pathologic Gleason score, neo-adjuvant hormone therapy, operation year, and surgical margin status. The fold-change shows the ratio of the average expression of genes expressed among CR compared to NEDs, indicating either overexpression, if positive, or under-expression, if negative, of a gene among the recurrence group relative to the no recurrence group (i.e. a fold change of 4 comparing CR to NEDs, indicates a 4-fold higher expression of that gene in the CR group compared to NEDs). While there were 184 differentially expressed features, there were 172 unique differentially expressed genes (Table 4.24). The biological processes of these genes were determined using the PANTHER Classification system, a curated database that allows for inference of genes and protein function (Mi, Muruganujan, & Thomas, 2013). Of the 172 genes, 156 were identified with biological functions and the top three include involvement in metabolic process (49.4%), cellular process (33.3%), and cell communication (23.1%) (Table 4.23). Based on Gene Ontology definitions provided by AmiGO (Carbon et al., 2009), metabolic process are the pathways involved that allow organisms to transform chemical substances, such as cellular anabolism and catabolism, but can also include processes like DNA repair and replication, protein synthesis and degradation. Cellular process is any process at the cellular level and can include cell communication, reproduction, structure organization, and phenotype switching. Cell communication is any process that facilitates the interaction between a cell and another cell, the extracellular matrix, or any other part of its environment. The genes in the differential expression 153 list that had multiple probes listed to be significantly differentially expressed include: ABLIM1, AN07, BAIAP2, FAM3B, LAMA3, MYBPC1, REPS2, and YIPF1. These genes are also mostly involved in cellular process and cell communication. Using the entire set of 293 tumors profiled, a list of differentially expressed genes between tumors of Gleason score 7 (49) versus Gleason score <7 (n=244) was generated to capture genes associated with high grade PCa versus low grade (Table 4.25). The gene AZGP1, ranked 6 th in this list, is also in the top 10 in the differentially expressed genes between metastatic cases and non-recurrence cases (Table 4.24). Table 4.23: Biological processes of the differentially expressed genes between tumors of patients with no recurrence and patients who experienced clinical recurrence leading to metastatic disease. Biological Process (Gene Ontology Accession #) # of genes involved % of differentially expressed genes 1 Metabolic process (GO:0008152) 77 49.40% 2 Cellular process (GO:0009987) 52 33.30% 3 Cell communication (GO:0007154) 36 23.10% 4 Developmental process (GO:0032502) 31 19.90% 5 Immune system process (GO:0002376) 27 17.30% 6 Transport (GO:0006810) 23 14.70% 7 Response to stimulus (GO:0050896) 23 14.70% 8 System process (GO:0003008) 21 13.50% 9 Cell cycle (GO:0007049) 17 10.90% 10 Cellular component organization (GO:0016043) 14 9.00% 11 Cell adhesion (GO:0007155) 14 9.00% 12 Apoptosis (GO:0006915) 9 5.80% 13 Reproduction (GO:0000003) 6 3.80% 14 Generation of precursor metabolites and energy (GO:0006091) 6 3.80% 15 Homeostatic process (GO:0042592) 2 1.30% 16 Localization (GO:0051179) 1 0.60% 154 Table 4.24: Genes differentially expressed between tumors of patients with no recurrence (NED) and patients who experienced clinical metastasis after surgery (CR). RANK GENE FOLD CHANGE (CR:NED) LOG(Fold Change) Crude p-value FDR adjusted p-value Illumina Probe ID RefSeq # 1 ZYG11A 4.314 2.109 9.77E-09 0.000180 ILMN_1723439 XM_001133615.1 2 ADI1 -1.242 -0.313 1.23E-08 0.000180 ILMN_1813975 NM_018269.1 3 AZGP1 -1.514 -0.598 4.23E-08 0.000393 ILMN_1797154 NM_001185.2 4 RAB33B 1.524 0.608 5.36E-08 0.000393 ILMN_1727738 NM_031296.1 5 MMP11 3.422 1.775 9.48E-08 0.000556 ILMN_1655915 NM_005940.3 6 ZIC3 2.040 1.028 2.30E-07 0.001127 ILMN_1681805 NM_003413.2 7 SLCO1A2 2.213 1.146 2.74E-07 0.001148 ILMN_1720727 NM_005075.2 8 FAM3B -2.547 -1.349 5.40E-07 0.001803 ILMN_2355486 NM_206964.1 9 DNAJB14 1.207 0.271 5.53E-07 0.001803 ILMN_2288915 NM_024920.3 10 TXLNB -2.859 -1.515 6.44E-07 0.001889 ILMN_2107482 NM_153235.2 11 MYBPC1 -2.754 -1.461 7.38E-07 0.001967 ILMN_2330170 NM_206821.1 12 ANO7 -2.510 -1.328 1.02E-06 0.002494 ILMN_1732948 NM_001001666.3 13 PACSIN3 -1.199 -0.262 1.51E-06 0.003387 ILMN_1682957 NM_016223.3 14 REPS2 -1.481 -0.566 1.62E-06 0.003387 ILMN_1766425 NM_004726.2 15 BAIAP2 -1.470 -0.555 2.57E-06 0.005028 ILMN_1705922 NM_006340.1 16 BMPR2 1.416 0.501 2.86E-06 0.005252 ILMN_2070896 NM_001204.5 17 DYRK2 1.216 0.282 4.13E-06 0.007124 ILMN_2374244 NM_003583.2 18 GPR137B 1.756 0.813 5.11E-06 0.008332 ILMN_2121816 NM_003272.1 19 NCAPD3 -1.725 -0.787 5.79E-06 0.008578 ILMN_1683441 NM_015261.2 20 DUOX1 3.143 1.652 5.96E-06 0.008578 ILMN_1690289 NM_017434.3 21 CDK10 -1.548 -0.631 6.38E-06 0.008578 ILMN_3306950 NM_001098533.1 22 ANO7 -1.684 -0.752 6.43E-06 0.008578 ILMN_1683824 NM_001001891.3 23 SUZ12 1.199 0.262 7.64E-06 0.009042 ILMN_1797813 NM_015355.1 24 LIPH -1.454 -0.540 7.67E-06 0.009042 ILMN_1729433 NM_139248.2 25 YIPF1 -1.263 -0.337 7.82E-06 0.009042 ILMN_2052163 NM_018982.3 26 LARS -1.150 -0.202 8.01E-06 0.009042 ILMN_1757317 NM_020117.9 27 TUBB2A -2.094 -1.066 8.40E-06 0.009124 ILMN_2038775 NM_001069.2 28 OSBPL8 1.142 0.192 9.45E-06 0.009798 ILMN_2405078 NM_020841.4 29 CDKN2B 2.174 1.120 1.03E-05 0.009798 ILMN_2376723 NM_078487.2 30 LRRC58 -1.661 -0.732 1.04E-05 0.009798 ILMN_3251467 NM_001099678.1 31 ABLIM1 -3.258 -1.704 1.04E-05 0.009798 ILMN_1731610 NM_006720.3 32 SMPDL3B -2.622 -1.391 1.07E-05 0.009798 ILMN_1789913 NM_014474.2 33 MAT2B 1.207 0.272 1.22E-05 0.010694 ILMN_1811367 NM_013283.3 34 NOMO1 -1.231 -0.300 1.30E-05 0.010694 ILMN_2126957 NM_014287.3 35 SMOX -1.528 -0.612 1.31E-05 0.010694 ILMN_1775380 NM_175842.1 36 CCDC148 1.713 0.776 1.31E-05 0.010694 ILMN_1663816 NM_138803.2 37 ACADSB -1.703 -0.768 1.43E-05 0.011337 ILMN_1740920 NM_001609.3 38 UTS2D 2.641 1.401 1.53E-05 0.011788 ILMN_2180232 NM_198152.2 39 FMO5 -2.286 -1.193 1.71E-05 0.012843 ILMN_1811632 NM_001461.1 40 STC2 2.055 1.039 1.77E-05 0.012955 ILMN_1691884 NM_003714.2 41 EDARADD 3.013 1.591 1.99E-05 0.013606 ILMN_1761820 NM_145861.2 42 WDR74 -1.155 -0.208 1.99E-05 0.013606 ILMN_2057836 NR_002716.1 43 KIAA0319L -1.124 -0.168 1.99E-05 0.013606 ILMN_1683277 NM_024874.3 44 MGC16703 -1.180 -0.239 2.16E-05 0.014003 ILMN_2050434 NM_145042.2 45 ALDH1A2 -2.636 -1.398 2.18E-05 0.014003 ILMN_2383707 NM_170696.1 46 UBE2MP1 -1.214 -0.280 2.20E-05 0.014003 ILMN_1695790 NR_002837.1 47 BOK -1.644 -0.717 2.27E-05 0.014144 ILMN_1730032 NM_032515.3 48 RASSF6 -1.906 -0.931 2.89E-05 0.017688 ILMN_2352245 NM_177532.3 49 GLB1L3 -2.529 -1.338 2.99E-05 0.017899 ILMN_1731912 XM_001131501.1 50 PGC -3.937 -1.977 3.13E-05 0.018125 ILMN_1795484 NM_002630.2 155 Table 4.24 continued: RANK GENE FOLD CHANGE (CR:NED) LOG(Fold Change) Crude p-value FDR adjusted p-value Illumina Probe ID RefSeq # 51 SPAG1 1.404 0.489 3.15E-05 0.018125 ILMN_2367681 NM_003114.3 52 LDLRAD3 -2.211 -1.145 3.22E-05 0.018187 ILMN_2129563 NM_174902.2 53 NADSYN1 -2.452 -1.294 3.33E-05 0.018409 ILMN_1779034 NM_018161.4 54 SNORD22 -1.278 -0.354 3.39E-05 0.018433 ILMN_1724884 NR_000008.1 55 PAIP2 -1.312 -0.392 3.47E-05 0.018503 ILMN_1782094 NM_016480.3 56 UPF2 1.150 0.201 3.69E-05 0.019257 ILMN_2383693 NM_080599.1 57 RPS15 -1.171 -0.228 3.86E-05 0.019257 ILMN_2219134 NM_001018.3 58 C17orf76 2.085 1.060 3.89E-05 0.019257 ILMN_2221784 NM_207387.1 59 BAIAP2 -1.824 -0.867 3.94E-05 0.019257 ILMN_2258749 NM_017450.1 60 SCAMP4 -1.967 -0.976 4.04E-05 0.019257 ILMN_1800843 NM_079834.2 61 GPR81 -2.851 -1.512 4.06E-05 0.019257 ILMN_2161848 NM_032554.3 62 EFCAB4A -1.397 -0.482 4.10E-05 0.019257 ILMN_1745623 NM_173584.3 63 NCSTN -1.213 -0.279 4.14E-05 0.019257 ILMN_1735180 NM_015331.2 64 C17orf37 -1.491 -0.577 4.45E-05 0.020388 ILMN_1727078 NM_032339.3 65 PDCD7 -1.117 -0.160 4.61E-05 0.020800 ILMN_2148290 NM_005707.1 66 MYBPC1 -2.206 -1.142 4.76E-05 0.021025 ILMN_1708957 NM_206820.1 67 CORO1B -1.261 -0.335 4.90E-05 0.021025 ILMN_2377019 NM_020441.2 68 PALMD -2.385 -1.254 4.92E-05 0.021025 ILMN_3307906 NM_017734.4 69 MAB21L2 2.302 1.203 4.95E-05 0.021025 ILMN_1798663 XM_001130278.1 70 DMC1 -1.274 -0.350 5.05E-05 0.021147 ILMN_2162367 NM_007068.2 71 ERGIC1 -1.221 -0.288 5.37E-05 0.022192 ILMN_1778377 NM_020462.1 72 C1orf150 -2.157 -1.109 5.64E-05 0.022995 ILMN_1762204 NM_145278.2 73 CHPT1 1.301 0.380 5.90E-05 0.023725 ILMN_1729112 NM_020244.2 74 NKX2-1 4.279 2.097 6.32E-05 0.024824 ILMN_2394841 NM_003317.3 75 ABLIM1 -2.576 -1.365 6.35E-05 0.024824 ILMN_2396672 NM_001003407.1 76 CCT3 -1.206 -0.270 6.84E-05 0.025783 ILMN_1651828 NM_001008800.1 77 MFSD4 1.831 0.873 6.84E-05 0.025783 ILMN_1729734 NM_181644.3 78 RAB25 -1.422 -0.508 6.99E-05 0.025783 ILMN_1791826 NM_020387.2 79 ZFP36 -1.820 -0.864 7.02E-05 0.025783 ILMN_1720829 NM_003407.2 80 CCDC88B -1.922 -0.943 7.06E-05 0.025783 ILMN_1772208 NM_032251.4 81 ANO7 -1.607 -0.684 7.14E-05 0.025783 ILMN_1675423 NM_001001891.3 82 KIF21B 2.120 1.084 7.21E-05 0.025783 ILMN_1692060 XM_935437.1 83 OTUD5 -1.352 -0.435 7.44E-05 0.025995 ILMN_2088847 NM_017602.2 84 CTBP2 -1.587 -0.666 7.44E-05 0.025995 ILMN_3250209 NM_001329.2 85 VPS37B -1.359 -0.443 7.60E-05 0.026031 ILMN_1710427 NM_024667.1 86 MYH9 1.112 0.153 7.96E-05 0.026031 ILMN_1722872 NM_002473.3 87 CCDC64B -2.236 -1.161 8.02E-05 0.026031 ILMN_3235171 NM_001103175.1 88 SRM -1.224 -0.292 8.06E-05 0.026031 ILMN_1661337 NM_003132.2 89 PPP4R4 1.740 0.799 8.09E-05 0.026031 ILMN_2370019 NM_020958.2 90 ATRN -1.531 -0.615 8.13E-05 0.026031 ILMN_1772124 NM_139321.1 91 CACNB1 1.934 0.952 8.14E-05 0.026031 ILMN_1763604 NM_199247.1 92 SNORD12B -1.279 -0.356 8.16E-05 0.026031 ILMN_3240002 NR_003695.1 93 TMSB4X -1.290 -0.367 8.30E-05 0.026183 ILMN_3240316 NM_183049.2 94 RNF5 -1.815 -0.860 8.97E-05 0.028005 ILMN_2052863 NR_003129.1 95 RHBDF2 1.682 0.750 9.21E-05 0.028428 ILMN_2373062 NM_024599.3 96 BCAM -1.196 -0.258 1.00E-04 0.030634 ILMN_1790455 NM_005581.3 97 LAMA3 2.213 1.146 1.02E-04 0.030890 ILMN_2293594 NM_198129.1 98 PASK -1.812 -0.857 1.04E-04 0.030890 ILMN_1667022 NM_015148.2 99 PLEKHM3 2.691 1.428 1.04E-04 0.030890 ILMN_2327625 NM_001080475.1 100 SETD8 -1.238 -0.308 1.05E-04 0.030890 ILMN_1651936 NM_020382.3 156 Table 4.24 continued: RANK GENE FOLD CHANGE (CR:NED) LOG(Fold Change) Crude p-value FDR adjusted p-value Illumina Probe ID RefSeq # 101 CYB5A -1.385 -0.470 1.07E-04 0.030974 ILMN_1714167 NM_001914.2 102 ABCC11 2.387 1.255 1.09E-04 0.031076 ILMN_2358714 NM_145186.2 103 C3orf25 -2.014 -1.010 1.09E-04 0.031076 ILMN_1795514 NM_207307.1 104 GMNN 2.599 1.378 1.11E-04 0.031215 ILMN_1720114 NM_015895.3 105 STAP2 -1.463 -0.549 1.12E-04 0.031269 ILMN_2281529 NM_017720.2 106 LRP11 1.374 0.458 1.13E-04 0.031269 ILMN_1676197 NM_032832.4 107 DCXR -1.297 -0.375 1.15E-04 0.031412 ILMN_1681437 NM_016286.2 108 AK027322 -1.103 -0.141 1.16E-04 0.031519 ILMN_3243593 NR_003286.1 109 PODXL2 -1.846 -0.885 1.22E-04 0.032771 ILMN_1657347 NM_015720.1 110 SELT 1.358 0.441 1.25E-04 0.033286 ILMN_2227368 NM_016275.3 111 C2orf43 -2.623 -1.391 1.28E-04 0.033847 ILMN_1660275 NM_021925.2 112 YIPF1 -1.365 -0.449 1.35E-04 0.035120 ILMN_1803564 NM_018982.3 113 TSHZ1 -1.220 -0.287 1.35E-04 0.035120 ILMN_1718907 NM_005786.4 114 FKBP11 -1.559 -0.641 1.38E-04 0.035227 ILMN_1787345 NM_016594.1 115 CGGBP1 1.154 0.207 1.39E-04 0.035227 ILMN_1752631 NM_001008390.1 116 NRP1 1.724 0.786 1.39E-04 0.035227 ILMN_1699574 NM_003873.4 117 FLJ40504 -1.314 -0.394 1.41E-04 0.035404 ILMN_2041222 NM_173624.1 118 DECR1 -1.163 -0.218 1.43E-04 0.035459 ILMN_1720838 NM_001359.1 119 TARSL2 2.136 1.095 1.49E-04 0.036241 ILMN_1720267 NM_152334.2 120 SIAH2 1.190 0.251 1.52E-04 0.036241 ILMN_1801313 NM_005067.5 121 REPS2 -1.464 -0.550 1.52E-04 0.036241 ILMN_2405797 NM_001080975.1 122 TBCCD1 -1.819 -0.863 1.52E-04 0.036241 ILMN_2076415 NM_018138.1 123 GNPNAT1 -1.290 -0.368 1.53E-04 0.036241 ILMN_2117809 NR_002220.1 124 PNPLA2 -1.642 -0.715 1.53E-04 0.036241 ILMN_1787923 NM_020376.2 125 CTNND1 -1.373 -0.457 1.58E-04 0.036974 ILMN_2293511 NM_001331.1 126 TMEM50A -1.698 -0.764 1.61E-04 0.037307 ILMN_1745368 NM_014313.2 127 SCUBE2 2.762 1.466 1.62E-04 0.037307 ILMN_1779416 NM_020974.1 128 MAP7D1 -1.456 -0.542 1.65E-04 0.037739 ILMN_3234997 NM_018067.3 129 ANKRD36B -1.915 -0.937 1.69E-04 0.038526 ILMN_2189614 NM_025190.2 130 USP8 1.145 0.195 1.71E-04 0.038657 ILMN_2094587 NM_005154.2 131 ABLIM1 -2.023 -1.017 1.76E-04 0.038849 ILMN_1785424 NM_006720.3 132 FAM3B -2.706 -1.436 1.76E-04 0.038849 ILMN_1685767 NM_058186.3 133 FKBP1A 1.482 0.567 1.76E-04 0.038849 ILMN_1702237 NM_054014.1 134 TP53I11 -1.224 -0.292 1.77E-04 0.038849 ILMN_1715669 NM_006034.2 135 PIAS4 1.099 0.136 1.79E-04 0.038928 ILMN_1802905 NM_015897.2 136 ARHGEF6 1.956 0.968 1.81E-04 0.038991 ILMN_1803423 NM_004840.2 137 DSE -1.118 -0.161 1.83E-04 0.039054 ILMN_1779014 NM_003309.2 138 ME2 -1.495 -0.580 1.84E-04 0.039054 ILMN_2141650 NR_002819.1 139 CNN2 -1.646 -0.719 1.86E-04 0.039157 ILMN_1770290 NM_201277.1 140 RASL11B 2.258 1.175 1.87E-04 0.039157 ILMN_2148469 NM_023940.2 141 CD99 -1.411 -0.497 1.88E-04 0.039157 ILMN_2056032 NM_002414.3 142 TNPO1 -2.067 -1.047 1.91E-04 0.039410 ILMN_1763627 NM_153188.2 143 CSRP2 1.551 0.633 1.96E-04 0.040171 ILMN_1660806 NM_001321.1 144 LENG9 -1.512 -0.596 1.98E-04 0.040201 ILMN_1732720 NM_198988.1 145 SIGLEC9 2.047 1.033 1.99E-04 0.040201 ILMN_1795236 NM_014441.1 146 GRHPR -1.202 -0.265 2.01E-04 0.040201 ILMN_1664798 NM_012203.1 147 TAPBPL -1.557 -0.639 2.01E-04 0.040201 ILMN_1805449 NM_018009.3 148 LAMA3 2.378 1.249 2.08E-04 0.041238 ILMN_1688892 NM_198129.1 149 ESM1 1.892 0.920 2.12E-04 0.041820 ILMN_1773262 NM_007036.2 150 CDC42BPA 1.456 0.542 2.17E-04 0.042347 ILMN_1781472 NM_003607.3 157 Table 4.24 continued: RANK GENE FOLD CHANGE (CR:NED) LOG(Fold Change) Crude p-value FDR adjusted p-value Illumina Probe ID RefSeq # 151 DUOXA1 2.511 1.328 2.18E-04 0.042347 ILMN_1710622 NM_144565.2 152 GLDC 3.147 1.654 2.24E-04 0.043043 ILMN_1806754 NM_000170.2 153 ATR -1.655 -0.727 2.24E-04 0.043043 ILMN_2376771 NM_001184.2 154 HNRNPA1 1.916 0.938 2.29E-04 0.043592 ILMN_2191759 NR_002943.1 155 CSTF1 -2.240 -1.163 2.31E-04 0.043592 ILMN_1758339 NM_001033521.1 156 GRP94c -1.770 -0.824 2.32E-04 0.043592 ILMN_2092516 NR_003130.1 157 REPS2 -1.237 -0.306 2.35E-04 0.043821 ILMN_1724668 NM_004726.2 158 ANKRD29 2.434 1.283 2.40E-04 0.044130 ILMN_1749338 NM_173505.2 159 TM2D1 -1.284 -0.361 2.41E-04 0.044130 ILMN_1763670 NM_032027.2 160 CLEC4F 2.962 1.567 2.43E-04 0.044130 ILMN_1723115 NM_173535.2 161 DOK6 1.649 0.721 2.44E-04 0.044130 ILMN_1785290 NM_152721.3 162 AGPAT2 2.235 1.160 2.46E-04 0.044130 ILMN_2377430 NM_001012727.1 163 TAOK3 -1.312 -0.392 2.47E-04 0.044130 ILMN_3307863 NM_016281.3 164 MDC1 1.366 0.450 2.48E-04 0.044130 ILMN_1814122 NM_014641.1 165 CREB3L1 -1.252 -0.324 2.48E-04 0.044130 ILMN_1749032 NM_052854.2 166 GPX4 -1.453 -0.539 2.51E-04 0.044203 ILMN_1734353 NM_001039847.1 167 RER1 -1.349 -0.432 2.52E-04 0.044203 ILMN_1812067 NM_007033.3 168 BHLHA15 -1.702 -0.767 2.61E-04 0.045295 ILMN_1673590 NM_177455.3 169 CLCN7 1.097 0.134 2.62E-04 0.045295 ILMN_1694731 NM_001287.3 170 ZNF655 -1.725 -0.786 2.62E-04 0.045295 ILMN_1769673 NM_001009958.1 171 DLX5 2.533 1.341 2.65E-04 0.045384 ILMN_1759598 NM_005221.5 172 LAT 1.653 0.725 2.67E-04 0.045518 ILMN_1691539 NM_001014987.1 173 SLC35F4 -1.386 -0.470 2.72E-04 0.046079 ILMN_1733864 XM_936857.2 174 SNORD21 -1.219 -0.286 2.74E-04 0.046161 ILMN_1774973 NR_000006.8 175 COL1A2 1.299 0.378 2.81E-04 0.047015 ILMN_2104356 NM_000089.3 176 FAM167A 1.767 0.821 2.82E-04 0.047015 ILMN_1687213 NM_053279.1 177 RPSA -1.263 -0.337 2.87E-04 0.047382 ILMN_2411723 NM_002295.4 178 SYNJ1 1.148 0.199 2.89E-04 0.047382 ILMN_1701991 NM_203446.1 179 RNMT -1.788 -0.839 2.89E-04 0.047382 ILMN_1769637 NM_003799.1 180 VWA1 -1.181 -0.239 2.99E-04 0.048323 ILMN_1781636 NM_022834.3 181 GMPR -1.393 -0.479 3.00E-04 0.048323 ILMN_1729487 NM_006877.2 182 RPLP0L2 -1.500 -0.585 3.03E-04 0.048323 ILMN_2208491 NR_002775.1 183 REPS2 -1.303 -0.382 3.03E-04 0.048323 ILMN_3250972 NM_004726.2 184 DHX36 1.245 0.316 3.03E-04 0.048323 ILMN_2100000 NM_020865.1 158 Table 4.25: Differentially expressed genes between tumors with Gleason score 7 and Gleason <7. RANK GENE FOLD CHANGE (GS ≥7 : GS<7) LOG(Fold Change) Crude p-value FDR adjusted p-value Illumina Probe ID RefSeq # 1 CCDC68 -1.469468318 -0.5552943 3.96E-07 0.0116237 ILMN_2203876 NM_025214.1 2 ZNF718 -1.403532406 -0.4890624 2.36E-06 0.02820086 ILMN_2131926 NM_001039127.2 3 KRT36 -1.308978847 -0.3884418 2.88E-06 0.02820086 ILMN_1790252 NM_003771.4 4 EFHC2 1.730437371 0.79113673 4.94E-06 0.03058625 ILMN_1791302 NM_025184.3 5 FCRL5 -1.490596861 -0.5758901 5.21E-06 0.03058625 ILMN_2105223 NM_031281.1 6 AZGP1 -1.276048183 -0.3516828 8.08E-06 0.03845227 ILMN_1797154 NM_001185.2 7 GRXCR2 -1.833760132 -0.8748049 9.18E-06 0.03845227 ILMN_3236728 NM_001080516.1 8 CCDC88C -1.379305238 -0.4639418 1.39E-05 0.04209978 ILMN_3248352 NM_001080414.2 9 CORO2B 1.885279743 0.91477861 1.52E-05 0.04209978 ILMN_2084836 NM_006091.2 10 CEPT1 -1.178398237 -0.2368272 1.55E-05 0.04209978 ILMN_2266309 NM_006090.3 11 CHI3L1 -1.5254867 -0.6092696 1.72E-05 0.04209978 ILMN_3307868 NM_001276.2 12 HOXC4 1.897664561 0.924225 1.84E-05 0.04209978 ILMN_2396039 NM_153693.3 13 SERPINA1 -1.477805323 -0.5634562 1.98E-05 0.04209978 ILMN_2338452 NM_001002236.1 14 PTK2B -1.300769823 -0.3793657 2.01E-05 0.04209978 ILMN_2330966 NM_173176.1 15 NLRP1 1.928270616 0.94730754 2.19E-05 0.04277852 ILMN_2313079 NM_033007.3 16 ST6GALNAC5 2.253916589 1.17243413 2.39E-05 0.0437416 ILMN_1773959 NM_030965.1 17 PDZD2 1.85029839 0.88775795 2.65E-05 0.04574034 ILMN_1729095 NM_178140.2 18 IMPDH1 -1.356022567 -0.4393812 3.02E-05 0.04929834 ILMN_2388363 NM_183243.1 159 4.4.4 Development of the predictive signature After pre-processing of the gene expression data, a predictive signature of metastatic disease was developed using stability selection with elastic net regression. Only NED and CR patients were used to develop this predictive signature in order to find a genetic signature that could truly discriminate between indolent and aggressive disease. Elastic net regression was applied to each of the 500 bootstraps, which resampled the training data using all ~29,000 features each time. After resampling was completed, the gene models obtained using frequency thresholds from 20% to 80% were built and evaluated using elastic net regression with repeated cross validation. A frequency threshold of 20% was the most liberal and included all genes that were seen among at least 20% of the bootstraps, with a higher potential of including false positive markers, while a frequency threshold of 80% was the most stringent criteria picking genes that were seen among at least 80% of the bootstraps. All stability selection runs force-included clinical variables (Gleason score, operation year, pre-operative PSA level, and age at surgery). The number of genes in the models therefore ranged from 163 (20% frequency threshold) to 3 genes (80% threshold) (Table 4.26). For the 50% threshold, two models were developed: one that force included clinical variables (28 gene model) and another that included genes that were above the 50% threshold when clinical variables were not included (28 gene model + 12 extra genes = 40 gene model). 160 Figure 4.12: Summary of the methods used for differential expression analysis, predictive model development, and validation of the selected predictive model. The next step would be to apply the models to a test set in order to determine predictive ability based on AUC. However, since our training set of 154 NED and 33 CR was not large enough to split into training and validation sets, we minimized the overoptimistic bias due to fitting and estimating the model AUC in the same data, by using elastic net with repeated 5-fold cross-validation on the entire training data. Each gene model at each threshold was evaluated to Differential expression of genes NED (n=154) versus METs (n=33) 184 features (172 unique genes) Developing a predictive genetic signature Sample half of training data Data pre-processing Raw gene expression data Background correction and normalization Batch adjustment (by array chip) -Bioconductor limma package: neqc ~29,000 features ~29,000 features -Bioconductor sva package: ComBat ~29,000 features Frequency thresholds across subsamples 20% (163 features selected) to 80% (3 features selected) Stability selection with elastic net regression Elastic net regression on each subsample Final model 50% frequency threshold 28 genes + clinical variables ~29,000 features Cases (METs) Controls (NED) Elastic net regression with repeated cross-validation Using features selected from stability selection Validation in 3 independent datasets (MC, MSKCC, EMC) Elastic net regression with repeated 5-fold cross-validation Repeat 500 times 161 determine predictive ability by determining the average AUC across 10 cross-validations. The model at 50% frequency threshold with 28 genes including clinical variables (Gleason score, operation year, pre-operative PSA level, and age) showed the best prediction in the cross- validation. The ROC plot comparing the ROC curves of the 28-gene model and clinical variables (Gleason score, PSA level, age) alone show the improvement of prediction when using the genetic signature (Figure 4.12). The 28 genes in the predictive model are listed in Table 4.27 with their biological processes in Table 4.28. Figure 4.13: ROC curves derived from repeated 5-fold cross-validation: 28 gene model versus clinical variables only model. The gene signature (solid line) has almost perfect predictive ability (AUC=0.99) and shows a major improvement over the model with just clinical variables. The ROC curve of the model with only clinical variables (Gleason score, pre-operative PSA level, and age) (dashed line) has an AUC=0.66. 1 − Sp ec ifi ci ty Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 162 Table 4.26: The genes obtained using different frequency thresholds (20% - 80%) across 500 stability selection bootstraps. Rank Illumina Probe ID on WG-DASL HT platform Gene 20% 25% 30% 35% 40% 50% (40 gene model) 50% (28 gene model) 60% 70% 80% 1 ILMN_2394841 NKX2-1 1 1 1 1 1 1 1 1 1 1 2 ILMN_1655637 UPK1A 2 2 2 2 2 2 2 2 2 2 3 ILMN_1733963 ADRA2C 3 3 3 3 3 3 3 3 3 3 4 ILMN_2358714 ABCC11 4 4 4 4 4 4 4 4 4 - 5 ILMN_1655915 MMP11 5 5 5 5 5 5 5 5 5 - 6 ILMN_2400759 CPVL 6 6 6 6 6 6 6 6 6 - 7 ILMN_1723439 ZYG11A 7 7 7 7 7 7 7 7 7 - 8 ILMN_1723115 CLEC4F 8 8 8 8 8 8 8 8 8 - 9 ILMN_1709333 OAS2 9 9 9 9 9 9 9 9 9 - 10 ILMN_1795484 PGC 10 10 10 10 10 10 10 10 - - 11 ILMN_2264177 UPK3B 11 11 11 11 11 11 11 11 - - 12 ILMN_1687216 PCBP3 12 12 12 12 12 12 12 12 - - 13 ILMN_2396672 ABLIM1 13 13 13 13 13 13 13 13 - - 14 ILMN_1761820 EDARADD 14 14 14 14 14 14 14 - - - 15 ILMN_2161848 GPR81 15 15 15 15 15 15 15 - - - 16 ILMN_2330170 MYBPC1 16 16 16 16 16 16 16 - - - 17 ILMN_1670708 F10 17 17 17 17 17 17 17 - - - 18 ILMN_1702604 KCNA3 18 18 18 18 18 18 18 - - - 19 ILMN_1806754 GLDC 19 19 19 19 19 19 19 - - - 20 ILMN_1666776 KCNQ2 20 20 20 20 20 20 20 - - - 21 ILMN_1678799 RAPGEF1 21 21 21 21 21 21 21 - - - 22 ILMN_1680874 TUBB2B 22 22 22 22 22 22 22 - - - 23 ILMN_1766334 MB 23 23 23 23 23 23 23 - - - 24 ILMN_1710622 DUOXA1 24 24 24 24 24 24 24 - - - 25 ILMN_1660275 C2orf43 25 25 25 25 25 25 25 - - - 26 ILMN_1690289 DUOX1 26 26 26 26 26 26 26 - - - 27 ILMN_3239648 PCA3 27 27 27 27 27 27 27 - - - 28 ILMN_1665033 NPR3 28 28 28 28 28 28 28 - - - 29 ILMN_2107482 TXLNB 29 29 29 29 29 29 - - - - 30 ILMN_2383707 ALDH1A2 30 30 30 30 30 30 - - - - 31 ILMN_2395214 FMNL3 31 31 31 31 31 - - - - - 32 ILMN_1797425 DDX55 32 32 32 32 32 - - - - - 33 ILMN_2180232 UTS2D 33 33 33 33 33 31 - - - - 34 ILMN_1669474 PCDHA10 34 34 34 34 34 32 - - - - 35 ILMN_1718932 MTRR 35 35 35 35 35 33 - - - - 36 ILMN_1731610 ABLIM1 36 36 36 36 36 - - - - - 37 ILMN_1711573 CD96 37 37 37 37 37 34 - - - - 38 ILMN_1789913 SMPDL3B 38 38 38 38 38 - - - - - 39 ILMN_1709434 VIT 39 39 39 39 39 - - - - - 40 ILMN_1677652 PREX2 40 40 40 40 40 - - - - - 41 ILMN_1685767 FAM3B 41 41 41 41 41 - - - - - 42 ILMN_2401352 UHRF1 42 42 42 42 42 - - - - - 43 ILMN_1652650 SH3KBP1 43 43 43 43 43 - - - - - 44 ILMN_2276688 DIXDC1 44 44 44 44 44 - - - - - 45 ILMN_2148469 RASL11B 45 45 45 45 45 35 - - - - 46 ILMN_1655347 SCGB1A1 46 46 46 46 46 - - - - - 47 ILMN_1692156 KCNC2 47 47 47 47 47 - - - - - 48 ILMN_2327625 PLEKHM3 48 48 48 48 48 36 - - - - 49 ILMN_1680424 CTSG 49 49 49 49 - - - - - - 50 ILMN_1725773 DNAJC12 50 50 50 50 - - - - - - Threshold across stability selection bootstraps 163 Table 4.26 continued: Rank Illumina Probe ID on WG-DASL HT platform Gene 20% 25% 30% 35% 40% 50% (40 gene model) 50% (28 gene model) 60% 70% 80% 51 ILMN_1738268 IHPK3 51 51 51 51 - 37 - - - - 52 ILMN_2075065 FADS2 52 52 52 52 - - - - - - 53 ILMN_3236804 SNORD126 53 53 53 53 - - - - - - 54 ILMN_2233180 RFXAP 54 54 54 54 - 38 - - - - 55 ILMN_1753665 PRR4 55 55 55 55 - - - - - - 56 ILMN_1806607 SFN 56 56 56 56 - - - - - - 57 ILMN_1798663 MAB21L2 57 57 57 57 - - - - - - 58 ILMN_2099315 TRPM8 58 58 58 58 - - - - - - 59 ILMN_1767253 RRP12 59 59 59 59 - - - - - - 60 ILMN_2093720 THG1L 60 60 60 60 - - - - - - 61 ILMN_1708580 PDZK1IP1 61 61 61 61 - - - - - - 62 ILMN_1801216 S100P 62 62 62 62 - - - - - - 63 ILMN_1685616 EFNA5 63 63 63 63 - - - - - - 64 ILMN_2399588 THSD1 64 64 64 64 - - - - - - 65 ILMN_3235335 NCRNA00093 65 65 65 - - - - - - - 66 ILMN_2121272 PDE10A 66 66 66 - - - - - - - 67 ILMN_1681805 ZIC3 67 67 67 - - - - - - - 68 ILMN_1798582 GRB7 68 68 68 - - - - - - - 69 ILMN_1719677 MOBKL2C 69 69 69 - - - - - - - 70 ILMN_1706497 P2RX2 70 70 70 - - - - - - - 71 ILMN_2290893 PCDHA3 71 71 71 - - - - - - - 72 ILMN_1669239 P2RX2 72 72 72 - - - - - - - 73 ILMN_1676173 PARVB 73 73 73 - - - - - - - 74 ILMN_2351269 TTYH1 74 74 74 - - 39 - - - - 75 ILMN_2293594 LAMA3 75 75 75 - - - - - - - 76 ILMN_3307906 PALMD 76 76 76 - - - - - - - 77 ILMN_2114720 SLPI 77 77 77 - - - - - - - 78 ILMN_1763491 CKMT1B 78 78 78 - - - - - - - 79 ILMN_3235176 C2orf89 79 79 79 - - - - - - - 80 ILMN_1731237 CLSTN2 80 80 80 - - - - - - - 81 ILMN_2355486 FAM3B 81 81 81 - - - - - - - 82 ILMN_1688721 FLJ44048 82 82 82 - - - - - - - 83 ILMN_1813530 AGT 83 83 - - - - - - - - 84 ILMN_1763627 TNPO1 84 84 - - - - - - - - 85 ILMN_1683232 C9orf128 85 85 - - - - - - - - 86 ILMN_3306362 PCOTH 86 86 - - - - - - - - 87 ILMN_2322446 PABPC1L2B 87 87 - - - - - - - - 88 ILMN_1770678 CBX2 88 88 - - - - - - - - 89 ILMN_1720113 PTPRO 89 89 - - - - - - - - 90 ILMN_1668345 OAF 90 90 - - - - - - - - 91 ILMN_1740450 DLGAP1 91 91 - - - - - - - - 92 ILMN_1724407 TACC3 92 92 - - - - - - - - 93 ILMN_1688932 NR5A2 93 93 - - - - - - - - 94 ILMN_1779034 NADSYN1 94 94 - - - - - - - - 95 ILMN_1692340 ZNF662 95 95 - - - - - - - - 96 ILMN_1763837 ANPEP 96 96 - - - - - - - - 97 ILMN_1680927 CASC2 97 97 - - - - - - - - 98 ILMN_2165753 HLA-A29.1 98 98 - - - - - - - - 99 ILMN_2124816 ZNF34 99 99 - - - - - - - - 100 ILMN_1693124 SDHAF1 100 100 - - - 40 - - - - Threshold across stability selection bootstraps 164 Table 4.26 continued: Rank Illumina Probe ID on WG-DASL HT platform Gene 20% 25% 30% 35% 40% 50% (40 gene model) 50% (28 gene model) 60% 70% 80% 101 ILMN_1716674 VWA2 101 101 - - - - - - - - 102 ILMN_1657148 C19orf23 102 102 - - - - - - - - 103 ILMN_1761903 KCNS1 103 103 - - - - - - - - 104 ILMN_1732948 ANO7 104 104 - - - - - - - - 105 ILMN_1730740 VSIG8 105 105 - - - - - - - - 106 ILMN_1799329 TTLL10 106 106 - - - - - - - - 107 ILMN_1802151 OSBPL5 107 - - - - - - - - - 108 ILMN_1699357 SLC22A5 108 - - - - - - - - - 109 ILMN_1749493 ADARB2 109 - - - - - - - - - 110 ILMN_1735438 GPM6B 110 - - - - - - - - - 111 ILMN_2376723 CDKN2B 111 - - - - - - - - - 112 ILMN_2388272 MED24 112 - - - - - - - - - 113 ILMN_2175236 PRLR 113 - - - - - - - - - 114 ILMN_3236551 WDFY4 114 - - - - - - - - - 115 ILMN_2342651 OR11H12 115 - - - - - - - - - 116 ILMN_1715452 FREM1 116 - - - - - - - - - 117 ILMN_1720727 SLCO1A2 117 - - - - - - - - - 118 ILMN_1738494 AQP7 118 - - - - - - - - - 119 ILMN_1694917 SERPINB11 119 - - - - - - - - - 120 ILMN_1686097 TOP2A 120 - - - - - - - - - 121 ILMN_1662795 CA2 121 - - - - - - - - - 122 ILMN_1659086 NEFL 122 - - - - - - - - - 123 ILMN_1728714 SSSCA1 123 - - - - - - - - - 124 ILMN_2090641 FAM110C 124 - - - - - - - - - 125 ILMN_2142353 GRTP1 125 - - - - - - - - - 126 ILMN_1804662 NRG4 126 - - - - - - - - - 127 ILMN_1685057 SLC22A4 127 - - - - - - - - - 128 ILMN_1663390 CDC20 128 - - - - - - - - - 129 ILMN_2048477 SLC22A3 129 - - - - - - - - - 130 ILMN_1657606 ZFHX4 130 - - - - - - - - - 131 ILMN_1663327 SOX6 131 - - - - - - - - - 132 ILMN_1720235 ADSSL1 132 - - - - - - - - - 133 ILMN_2068093 NEK5 133 - - - - - - - - - 134 ILMN_1736527 TFCP2L1 134 - - - - - - - - - 135 ILMN_2346316 PTPN20A 135 - - - - - - - - - 136 ILMN_1809311 APOF 136 - - - - - - - - - 137 ILMN_1682270 FGFR2 137 - - - - - - - - - 138 ILMN_1727778 NTNG1 138 - - - - - - - - - 139 ILMN_1653292 PFKFB4 139 - - - - - - - - - 140 ILMN_1810191 PLA2G4C 140 - - - - - - - - - 141 ILMN_1697338 SRPK3 141 - - - - - - - - - 142 ILMN_1653120 ST6GAL1 142 - - - - - - - - - 143 ILMN_1730497 TRIM39 143 - - - - - - - - - 144 ILMN_3243190 EMR4P 144 - - - - - - - - - 145 ILMN_1724265 PART1 145 - - - - - - - - - 146 ILMN_1696659 THSD1 146 - - - - - - - - - 147 ILMN_1796928 BPIL1 147 - - - - - - - - - 148 ILMN_2060086 ADAM23 148 - - - - - - - - - 149 ILMN_1696532 RBBP5 149 - - - - - - - - - 150 ILMN_1660536 USP50 150 - - - - - - - - - Threshold across stability selection bootstraps 165 Table 4.26 continued: Rank Illumina Probe ID on WG-DASL HT platform Gene 20% 25% 30% 35% 40% 50% (40 gene model) 50% (28 gene model) 60% 70% 80% 151 ILMN_2264205 ZBTB8A 151 - - - - - - - - - 152 ILMN_1773963 GNA15 152 - - - - - - - - - 153 ILMN_1808114 LYVE1 153 - - - - - - - - - 154 ILMN_3310481 MIR658 154 - - - - - - - - - 155 ILMN_1749848 SLC35F1 155 - - - - - - - - - 156 ILMN_3235723 TMPRSS11E2 156 - - - - - - - - - 157 ILMN_2078592 C6orf105 157 - - - - - - - - - 158 ILMN_2414848 TBRG4 158 - - - - - - - - - 159 ILMN_1809384 CYP4F12 159 - - - - - - - - - 160 ILMN_1697499 HLA-DRB5 160 - - - - - - - - - 161 ILMN_1779416 SCUBE2 161 - - - - - - - - - 162 ILMN_2201413 SLC37A2 162 - - - - - - - - - 163 ILMN_1721712 SYNGR1 163 - - - - - - - - - Threshold across stability selection bootstraps 166 Table 4.27: Summary of the 28 genes in the USC/Illumina predictive signature. Rank in stability selection Illumina Probe ID on WG-DASL HT platform Gene symbol Gene name Entrez gene Cytogenetic band Expression changes involved in the following cancers Rank in differential gene expression list (CR:NED) Direction of expression (CR:NED) Log(FoldChang e) (CR:NED) Fold Change (CR:NED) FDR adjusted p-value 1 ILMN_2394841 NKX2-1 (alias TTF-1) NK2 homeobox 1 14q13 Lung, thryoid, T-cell lymphoma 74 2.097138195 4.279 0.02482428 2 ILMN_1655637 UPK1A uroplakin 1A 19q13.3 Bladder, esphagous, pancreas 335 1.367118028 2.580 0.09276792 3 ILMN_1733963 ADRA2C Alpha-2-adrenergic receptor 4p16.3 Cervical, ovarian, melanoma, sarcoma, prostate, colorectal 332 -1.32342195 -2.503 0.09276792 4 ILMN_2358714 ABCC11 (alias MRP8) ATP-binding cassette transporter. sub- family C, member 11 16q12.1 Breast, colorectal, leukemia 102 1.255228482 2.387 0.03107589 5 ILMN_1655915 MMP11 Matrix metalloproteinase-11 22q11.23 Bladder, breast, colorectal, esophageal, gastric, kidney, lung, melanoma, ovarian 5 1.774658928 3.422 0.00055604 6 ILMN_2400759 CPVL Carboxypeptidase, vitellogenic-like 7p15.1 Breast, leukemia, bladder, melanoma, sarcoma, lymphoma, brain and CNS 1204 1.146299134 2.213 0.21967049 7 ILMN_1723439 ZYG11A Zyg-11 family member A, cell cycle regulator 1p32.3 Lymphoma 1 2.109061778 4.314 0.00018013 8 ILMN_1723115 CLEC4F C-type lectin domain family 4, member F 2p13.3 Liver, pancreas 160 1.566593662 2.962 0.04413006 9 ILMN_1709333 OAS2 2-5-oligoadenylate synthetase 2 12q24.2 Breast, colorectal, kidney, leukemia, ovarian, sarcoma, lover, brain/CNS 355 -1.24411778 -2.369 0.0971359 10 ILMN_1795484 PGC Progastricsin (pepsinogen C) 6p21.1 Gastric, colorectal, leukemia, lung, sarcoma 50 -1.976939165 -3.937 0.01812475 11 ILMN_2264177 UPK3B Uroplakin 3B 7q11.2 Bladder, ovarian, pancreatic 289 1.166539906 2.245 0.08109885 12 ILMN_1687216 PCBP3 Poly(rc) binding protein 3 21q22.3 Bladder, lymphoma, ovarian, pancreatic 615 1.065290345 2.093 0.14324993 13 ILMN_2396672 ABLIM1 Actin binding LIM protein 1 10q25 Bladder, brain/CNS, breast, colorectal, esophageal, gastric, head/neck, kideny, leukemia, lung, lymphoma, melanoma, ovarian, prostate, sarcoma 75 -1.365001697 -2.576 0.02482428 14 ILMN_1761820 EDARADD EDAR-associated death domain 1q24.3 Bladder, lung, ovarian 41 1.591036763 3.013 0.01360583 *Data from Oncomine ™ a nd includes studies on cancers that had at least 2 fold change in the specific gene, in the top 10% of their differentially expressed gene lists, and in the same direction as found in our data. 167 Table 4.27 continued: Rank in stability selection Illumina Probe ID on WG-DASL HT platform Gene symbol Gene name Entrez gene Cytogenetic band Expression changes involved in the following cancers Rank in differential gene expression list (CR:NED) Direction of expression (CR:NED) Log(FoldChang e) (CR:NED) Fold Change (CR:NED) FDR adjusted p-value 15 ILMN_2161848 GPR81 G protein coupled receptor-81 12q24.31 Breast, esophageal, gastric, kidney, lung, sarcoma 61 -1.511619914 -2.851 0.01925723 16 ILMN_2330170 MYBPC1 Myosin binding protein C 12q23.2 Breast, esophageal, gastric, kidney, lung, sarcoma 11 -1.46140955 -2.754 0.00196701 17 ILMN_1670708 F10 Coagulation factor X 13q34 Bladder, breast, lung, prostate, sarcoma, head and neck, cervical, colorectal 1268 -1.069739574 -2.099 0.22829813 18 ILMN_1702604 KCNA3 Potassium voltage-gated channel, shaker-related subfamily, member 3 1p13.3 Kideny, leukeia, lymphoma, myeloma, sarcoma 279 1.247770778 2.375 0.0773619 19 ILMN_1806754 GLDC Glycine dehydrogenase 9p22 Bladder, ovarian, kidney, breast, leukemia, cervical 152 1.653750684 3.147 0.04304306 20 ILMN_1666776 KCNQ2 Potassium voltage-gated channel, KQT-like subfamily, member 2 20q13.3 Brain/CNS, kidney, leukemia, melanoma, myeloma, sarcoma 467 -1.310903547 -2.481 0.12018206 21 ILMN_1678799 RAPGEF1 Rap guanine nucleotide exchange factor (GEF) 1 9q34.3 Kidney, melanoma, sarcoma, leukemia 698 1.1691765 2.249 0.15361774 22 ILMN_1680874 TUBB2B Tubulin, beta 2B class IIb 6p25 Bladder, brain/CNS, gastric, kidney, lung, lymphoma, melanoma, sarcoma 569 1.125230914 2.181 0.13538244 23 ILMN_1766334 MB Myoglobin 22q13.1 Colorectal, head/neck, kidney, lung, lymphoma, melanoma 930 -1.104522307 -2.150 0.18210562 24 ILMN_1710622 DUOXA1 Dual oxidase maturation factor 1 15q21.1 Bladder, cervical, head/neck, lung 151 1.328313442 2.511 0.0423467 25 ILMN_1660275 C2orf43 Chromosome 2 open reading frame 43 2p24.1 Kidney, brain/CNS 111 -1.391279742 -2.623 0.03384711 26 ILMN_1690289 DUOX1 Dual oxidase 1 15q15.3 Bladder, cervical, esophageal, head/neck, kidney, lung, melanoma 20 1.652094321 3.143 0.00857791 27 ILMN_3239648 PCA3 Prostate cancer antigen 3 (non-protein coding) 9q21.2 Prostate (overexpression) 1,064 -1.013961742 -2.019 0.20264524 28 ILMN_1665033 NPR3 Natriuretic peptide receptor C/guanylate cyclase C 5p14-p13 Breast, colorectal, esophageal, head/neck, kidney, leukemia, lung, melanoma, sarcoma 601 -1.340571803 -2.533 0.14103924 *Data from Oncomine ™ a nd includes studies on cancers that had at least 2 fold change in the specific gene, in the top 10% of their differentially expressed gene lists, and in the same direction as found in our data. 168 Table 4.28: Biological processes of the 28 genes in the USC/Illumina predictive signature Biological Process (Gene Ontology Accession #) # of genes involved % of the 28 genes involved Metabolic process (GO:0008152) 13 52.00% ADRA2C C2orf43 MYBPC1 OAS2 PCBP3 NKX2-1 GLDC PGC NPR3 F10 ABCC11 MMP11 CPVL Cellular process (GO:0009987) 12 48.00% ADRA2C KCNQ2 CLEC4F MYBPC1 TUBB2B PCBP3 ABLIM1 KCNA3 RAPGEF1 F10 UPK1A CPVL Cell communication (GO:0007154) 9 36.00% ADRA2C KCNQ2 CLEC4F MYBPC1 PCBP3 KCNA3 RAPGEF1 UPK1A CPVL Transport (GO:0006810) 8 32.00% ADRA2C KCNQ2 MB CLEC4F TUBB2B PCBP3 KCNA3 ABCC11 System process (GO:0003008) 8 32.00% ADRA2C KCNQ2 MB MYBPC1 PCBP3 KCNA3 UPK1A ABCC11 Response to stimulus (GO:0050896) 7 28.00% ADRA2C CLEC4F OAS2 PGC F10 UPK1A ABCC11 Immune system process (GO:0002376) 7 28.00% ADRA2C CLEC4F OAS2 F10 DUOX1 UPK1A ABCC11 Developmental process (GO:0032502) 4 16.00% MYBPC1 TUBB2B ABLIM1 NKX2-1 Cell cycle (GO:0007049) 3 12.00% TUBB2B PCBP3 RAPGEF1 Cell adhesion (GO:0007155) 3 12.00% CLEC4F MYBPC1 UPK1A Cellular component organization (GO:0016043) 2 8.00% TUBB2B ABLIM1 Apoptosis (GO:0006915) 2 8.00% ADRA2C PCBP3 Reproduction (GO:0000003) 2 8.00% F10 UPK1A Regulation of biological process (GO:0050789) 1 4.00% ADRA2C Generation of precursor metabolites and energy (GO:0006091) 1 4.00% DUOX1 Genes in the USC 28-gene model corresponding to their biological process 169 4.4.5 Validation of predictive model using external datasets and comparisons with existing predictive models Three independent datasets were used for validation of the gene signature predictive of recurrence: a dataset from the May Clinic (MC), from Memorial Sloan Kettering Cancer Center (MSKCC), and from Erasmus Medical center (EMC) (Boormans et al., 2013; Erho et al., 2013; B. S. Taylor et al., 2010a). All three studies performed whole genome profiling on tumor samples using the Affymetrix Human Exon 1.0 ST array, which contains 1.4 million probe sets. This array differs from the HumanHT-12 v4 BeadChip used with the WG-DASL HT assay in that at least one probe set is designed for each exon (approximately 4 probes for each exon) over the entire genome, in order to comprehensively capture the entire genome. In order to use these data to validate our findings, all Affymetrix probes corresponding to each gene in our predictive models were identified (details mentioned in section 3.4.3.2.4). 170 Since the Mayo Clinic dataset included a large number of patients with a similar study design as our study (see section 3.4.3.2.5), it was used as the primary validation dataset to assess our potential predictive models. A drawback of this dataset is the fact that the only clinical variable reported in the GEO database was Gleason score. Therefore, we were unable to validate our model with all the clinical variables we included in our own final predictive model. Models derived from stability selection were first validated using their entire dataset (n = 545). Repeated 5-fold cross validation was performed on all 10 possible predictive models with different percent thresholds (Table 4.29) including Gleason score and the AUCs were compared to the AUC of a model that included only Gleason score. The model with only Gleason score had an AUC = 0.72; after all models were evaluated using repeated cross validation, the highest AUC obtained was 0.75 (Table 4.29). The 28-gene model at 50% frequency threshold in stability selection performed the best without including genes that did not add much more to the predictive ability of the model. The AUC stabilized at this model, since lowering the frequency threshold did not continue to improve predictive ability after this point. Therefore, we locked the model with the 28-gene signature. 171 Table 4.29: Area under the curve (AUC) values obtained through repeated 5-fold cross validation (CV) using the Mayo Clinic dataset for validation of predictive signatures. USC model based on frequency threshold from stability selection AUC from repeated CV (validation using Mayo Clinic data) Gleason score (GS) only 0.72 80% threshold + GS 0.73 70% threshold + GS 0.72 60% threshold + GS 0.72 55% threshold + GS 0.74 50% threshold (28 gene) + GS 0.75 50% threshold (40 gene) + GS 0.75 40% threshold + GS 0.75 30% threshold + GS 0.75 25% threshold + GS 0.75 20% threshold + GS 0.75 172 Validation of the USC 28-gene model was done in 3 separate datasets. As seen in Table 4.30, when using the Mayo clinic dataset, the 28 gene model with Gleason score yielded an AUC = 0.75, a 3% increase above AUC = 0.72 in the model with only Gleason score. Using the MSKCC expression data, the 28-gene model with clinical variables obtained an AUC = 0.90, a 4% improvement over clinical variables alone with AUC = 0.86. With the EMC dataset, the 28-gene model + clinical variables yielded an AUC = 0.82, a 6% improvement over clinical variables only with an AUC = 0.76 (Table 4.30). Table 4.30: Validation of the USC 28-gene model using 3 independent datasets MC (Erho et al., 2013) MSKCC (Taylor et al., 2010) EMC (Boormans et al., 2013) Tissue used for gene expression and clinical outcomes 333 PT (NED +BCR) vs. 212 PT (CR) 131 PT vs. 19 tissue from MET lesions 39 PT (non-CR) vs. 9 PT (CR) USC 28 gene model + clinical variables 0.75 (0.72-0.77) 0.90 (0.86-0.94) 0.82 (0.74-0.91) Clinical variables only* 0.72 (0.70-0.74) 0.86 (0.82-0.91) 0.76 (0.67-0.85) Abbreviations: Primary tumors (PT), No evidence of disease (no recurrence patients) (NED), clinical recurrence (CR), metastasis tissue (MET); Mayo Clinic (MC) , Memorial Sloan-Kettering Cancer Center (MSKCC), Erasmus Medical Center (EMC) *Clinical variables in model: MC - Gleason score only; MSKCC - age at diagnosis, race/ethnicity, neo-adjuvant treatment and adjuvant treatment for all patients (no missing data); EMC - Gleason score (no missing data). 173 Next, we compared side-by-side the performance of our predictive model to other existing predictive models using datasets from the Mayo Clinic, Memorial Sloan-Kettering Cancer Center, and Erasmus Medical Center, using our methods to fit the models that included the genes reported for each predictive model. We compared our model to those from Myriad Genetics (Prolaris®) (Cuzick et al., 2011), GenomeDx (Decipher ™, Mayo Clinic and validation AUC identified with the same May Clinic set) (Erho et al., 2013), Genomic Health (Oncotype DX® Genomic Prostate Score) (Knezevic et al., 2013), VA San Diego/Illumina (Bibikova et al., 2007), and the Swedish Watchful Waiting Cohort (Sboner et al., 2010). The GenomeDx model seemed to outperform all models with an AUC = 0.79 followed by the VA San Diego/Illumina 16-gene model (AUC = 0.77) (Table 4.31). However, we note that the GenomeDx model was expected to have higher AUC since we used for validation the same set from which the model derived (Mayo Clinic); therefore, the estimated AUC is likely an overestimation of the true AUC. Table 4.31: Validation of external predictive gene signatures using the 3 datasets with gene expression data and our own statistical methods including repeated 5-fold cross validation. Mayo Clinic dataset (Erho et al.) MSKCC dataset (Taylor et al.) MSKCC dataset (Taylor et al.) EMC dataset (Boormans et al.) 333 PT (NED +BCR) 131 PT 122 PT (non-CR) 39 PT (non-CR) vs. vs. vs. vs. Predictive model 212 PT (CR) 19 tissue from MET lesions 9 PT (CR) 9 PT (CR) Clinical variables (CV) only 0.72 (0.70-0.74) 0.87 (0.83-0.91) 0.88 (0.82-0.94) 0.76 (0.67-0.85) 31-gene model + CV 0.73 (0.70-0.76) 0.90 (0.86-0.94) 0.74 (0.63-0.85) 0.62 (0.51-0.72) 18-gene model + CV 0.74 (0.72-0.77) 0.96 (0.93-0.98) 0.78 (0.71-0.84) 0.67 (0.56-0.79) 28-gene model + CV 0.75 (0.72-0.77) 0.90 (0.86-0.94) 0.83 (0.77-0.89) 0.82 (0.73-0.90) 17-gene model + CV 0.75 (0.73-0.77) 0.999 (0.999-1.00) 0.76 (0.68-0.84) 0.68 (0.57-0.80) 16-gene model + CV 0.77 (0.74-0.79) 0.99 (0.99-1.00) 0.65 (0.54-0.76) 0.66 (0.55-0.76) 17 of 22 genes + CV 0.79 (0.77-0.81) 0.97 (0.96-0.99) 0.82 (0.76-0.88) 0.62 (0.51-0.74) VA San Diego/Illumina Decipher ™ (M ayo Clinic/GenomeDx) *Clinical variables in model: MC - Gleason score only; MSKCC - age at diagnosis, race/ethnicity, neo-adjuvant treatment and adjuvant treatment for all patients (no missing data); EMC - Gleason score (no missing data). PT= Primary tumor, NED = No evidence of disease following surgery, BCR = biochemical recurrence, CR = clinical metastatic recurrence, MET= metastasis AUC from repeated cross-validation Prolaris ® (Myriad Genetics) Swedish Watchful Waiting Cohort USC/Illumina Oncotype DX ® GPS (Genomic Health) 174 5 DISCUSSION 5.1 The role of tobacco smoking and metabolic enzyme genes in prostate cancer In this study, smoking status was found to be associated with localized PCa risk, particularly among non-Hispanic Whites. Current smoking was also associated with risk of advanced PCa among non-Hispanic White men. For both localized and advanced PCa, the association with smoking was modified by a polymorphism in the carcinogen metabolism CYP1A2 gene. Overall, our findings lend support to a role for tobacco smoking in PCa risk after taking into account both PCa stage and race/ethnicity in the analyses. In congruence with our findings, a population-based case-control study in the U.S. reported that current cigarette smoking was associated with an increased risk of PCa when compared to non-smoking (Plaskon, Penson, Vaughan, & Stanford, 2003). PCa risk also increased with increasing pack-years of cigarette smoking. Yet, in contrast with our findings, a stronger association between pack-years and PCa was observed among men with more aggressive PCa, and quitting smoking was associated with reduced PCa risk. However, our findings of former smokers having an increased risk of localized PCa and non-Hispanic White current smokers having an increased risk of advanced PCa were similar to findings in a recent meta – analysis of 24 prospective cohort studies showing that both former and current smokers had an increased risk of incident PCa, although stage and race/ethnicity was not accounted for in the meta-analysis (Huncharek et al., 2010). A large cohort study including data from 10 European countries (EPIC), which considered stage and grade, reported an inverse association between 175 smoking and localized and low-grade prostate cancer, which is in contrast with our results that showed a positive association, and reported no significant association with smoking among advanced and high grade cases (Rohrmann et al., 2012). A recent meta –analysis of 24 prospective cohort studies further presented that current smokers had an increased risk of PCa-related mortality (Huncharek et al., 2010). In our study, we saw similar patterns of increased PCa-related mortality among current smokers compared to never smokers (~30%), although it did not reach statistical significance. The estimate we obtained is similar to the ~30% increase in overall mortality found in previous studies (Hickey et al., 2001; Zu & Giovannucci, 2009). Two additional studies not included in the meta-analysis support the finding that current smokers have an increased risk of fatal PCa (Kenfield, Stampfer, & Chan, 2011; Watters, Park, Hollenbeck, Schatzkin, & Albanes, 2009). A possible explanation provided for PCa-related mortality among current smokers is that smoking may exacerbate the progression of the disease, leading to more aggressive behavior of PCa (Daniell, 1995; Kobrinsky, 2003). Our results on overall mortality show a strong positive association with tobacco smoking, in concordance with previous findings (Z. M. Chen, Xu, Collins, & Li, 1997; Doll, Peto, Wheatley, Gray, & Sutherland, 1994; Jacobs, Adachi, & Mulder, 1999; J. M. Weir & Dunn, 1970). As with other tobacco-related cancers, a possible mechanism by which tobacco smoking might influence PCa risk is the action of tobacco-related carcinogens that could induce DNA damage in the prostate. These mutagenic carcinogens can be endogenously metabolized to their active forms, which upon reaching the target tissues can bind to DNA. Alternatively, they can be detoxified to less active forms that can be readily excreted from the body. Carcinogen metabolism enzymes are responsible for the detoxification or activation of mutagenic carcinogens and are 176 coded by genes known to be variable in the population (Grover & Martin, 2002). In this study, we found that the association between smoking status and PCa risk was modified by CYP1A2 - 154A>C (rs762551), a gene that codes for an enzyme that plays a key role in the activation of various tobacco carcinogens, including HCAs, PAHs, NOCs, and AAs (Bartsch et al., 2000; Boobis, Lynch, Murray, & la Torre, 1994; F F Kadlubar, 1992; D. Kim & Guengerich, 2005). We observed that the association between current smoking status and PCa risk was restricted to carriers of one or two copies of the C-allele. CYP1A2 is an inducible phase I metabolizing enzyme highly active in the liver, where it plays a predominant role in the activation of HCAs and AAs (D. Kim & Guengerich, 2005), such as those found in tobacco, to reactive species that can undergo further activation in the liver or detoxification. CYP1A2 mRNA is also expressed in prostate tissue (Finnström et al., 2001; Sterling & Cutroneo, 2004; Williams, 2000). CYP1A2 expression is variable in the general population and the CYP1A2 -154A>C polymorphism may partially explain the observed variability in CYP1A2 inducibility, with the protein coded by the A allele being correlated with higher enzymatic activity than the one coded by the C allele (Sachse, Brockmöller, & Bauer, 1999). We have previously reported that the association between well- done cooked red meat, known to accumulate HCAs, and colorectal cancer was stronger among carriers of the C allele (J. Wang et al., 2012b). A meta-analysis of 19 case-control studies further showed that the CC genotype is associated with an increased risk of various types of cancer combined (including breast, colorectal, lung, pancreatic, ovarian, stomach, and bladder) and a significant increase in risk among Caucasians (H. Wang et al., 2012a), along with other studies showing that low activity of CYP1A2 was associated with risk of testicular cancer [46] and PCa (Murata, Watanabe, Yamanaka, Kubota, & Ito, 2001). A possible explanation for these findings is that slower activation of HCAs in the liver by CYP1A2 m a y al l ow H C A s t o r em ai n i n t he b ody ’ s circulation, which could result in prolonged exposure to HCAs in other organs and tissues, such as the prostate. We cannot exclude, however, the possibility that our finding might be a false positive; Bonferroni adjustment of the CYP1A2 -154A>C by smoking interaction p-value by the 177 number of SNPs/phenotypes tested (n = 10) would render an interaction p-value of borderline significance (p = 0.08). Screening bias has been identified as a possible limitation in previous studies, and could be present if tobacco smoking is in fact correlated with PCa screening, specifically prostate specific antigen (PSA) screening (Zu & Giovannucci, 2009). We evaluated potential confounding by PSA screening during the five years prior to the reference year among the cases and controls from the SFBA study site, for whom we had data on PSA screening. Including PSA screening in the model with other covariates did not change the OR estimates for any of the smoking variables by >10% and was therefore not included in the final model. Furthermore, there was no statistical difference in PSA screening when comparing cases with controls: 76% of controls, 80% of localized cases and 71% of advanced cases reported previous PSA screening. 5.1.1 Strengths and Limitations The overall strengths of this study include the utilization of a population-based study design with cases obtained from two SEER cancer registries, a large sample size of cases and controls, a multi-ethnic population that includes non-Hispanic White, African-American, and Hispanic men, oversampling of advanced cases, and detailed information on lifestyle and other characteristics. Another strength is the consideration of genetic variation in tobacco carcinogen metabolism enzymes to examine gene by environment interactions. Among the limitations of this study is the inclusion of only a few selected functional SNPs for each candidate gene, which did not allow us to comprehensively consider all their genetic variation. Other limitations include the lack of data on environmental tobacco exposure, which may have introduced some exposure 178 misclassification and small sample size when stratifying the analyses by multiple factors (stage, race/ethnicity, age at diagnosis). 5.1.2 Conclusions In summary, our findings support a role for tobacco smoking and risk of PCa, primarily risk of localized PCa. Moreover, our gene-environment analyses support a role for tobacco carcinogens in PCa risk, further strengthening an association between tobacco smoke and PCa risk. There was no statistically significant association between tobacco smoking and PCa-related mortality, but there was a significant association with overall mortality. 5.2 Predicting late versus earlier biochemical recurrence and subsequent clinical recurrence In this study, we sought to identify unique factors that are predictive of early or late BCR and to understand the timing of BCR on CR and PCa-related mortality. Our results show that while high pre-operative PSA, Gleason Score > 6 and a positive-surgical margin is predictive of early BCR, predictors to clearly discriminate risk of late BCR are elusive. Among patients who experienced BCR, 22% (58/268) had late BCR (>5 years after surgery), which is comparable to what has been previously reported. 7 The fact that approximately a quarter of BCR cases occur after suggested follow-up time further supports ongoing follow up for patients. Furthermore, patients with ≥ T 3a disease had a higher risk of experiencing BCR at any time after surgery and particularly remained at a higher risk of CR after late BCR when compared to patients with OC disease. Patients with ≥T 3 a disease therefore represent a group that require longer follow-up. 179 Importantly, while the rapidity with which BCR occurs does not predict the risk or rapidity of CR occurrence, the subset of patients with OC disease behaved differently than those with ECE disease. Patients with OC disease displaying rapid biochemical recurrence may represent a unique subset with aggressive tumor biology that warrants early, aggressive invention including salvage- radiotherapy that may not be typically used in this cohort. Our results can be compared to previous studies with analyses of predictors of early and/or late BCR (Ahove et al., 2010; Amling et al., 2000; Bolton et al., 2013; Caire et al., 2009; Klotz et al., 2010; Loeb et al., 2011; Pound et al., 1999; Walz et al., 2009; Ward et al., 2003). Multivariable analyses in another study also showed that Gleason score of >7, positive surgical margin and extracapsular extension (stage T3) were independently associated with earlier recurrence (Walz et al., 2009). Ward et al. also found that Gleason, pre-operative PSA, positive surgical margin was associated with an increased risk of BCR within 5 years after RP (Ward et al., 2003). Yet, in their results, Gleason score and pre-operative PSA were associated with late BCR. Whereas our findings on these variables did not reach statistical significance, the observed risk estimates had a consistent direction. In agreement with our results, positive surgical margin was not associated with late BCR in their study. Our results showed pathologic stage to be a strong predictor of both early and late BCR, supporting findings from previous reports. Ahove et al. also found pathologic stage T3 to be associated with an increased risk of late BCR (>5 years after RP) (Ahove et al., 2010). Amling et al. reported that extracapsular extension stage pT3b, PSA >10 ng/ml, and Gleason 8-10 cases were more likely to experience an earlier BCR while organ confined patients had constant low risk through the follow- up (Amling et al., 2000). Loeb et. al additionally showed that the 180 actuarial probability of BCR and CR among men who were BCR-free at 10 years after surgery increased by Gleason score particularly among patients with stage pT3 PCa. Differences in the study populations should be noted, which may contribute to the differences in findings. Ward et al. included cases with lymph node involvement and excluded clinical stage T3 cases, while Ahove et al. excluded patients with adjuvant radiation. In this study, the majority of BCR occurs within 5 years after surgery (78%). Importantly, CR-free survival after BCR did not differ between early or late BCR in our entire cohort, however, patients with OC disease displayed an increased risk of CR following early BCR. Comparable CR-free survival irrespective of timing of BCR for all BCR patients are consistent with two previous cohort studies (Caire et al., 2009; Ward et al., 2003). However, our findings are in contradiction with results in another study of 304 BCR cases that showed CR to be dependent on timing of BCR, although late BCR in this study was defined as after the 2 year mark after surgery (Pound et al., 1999). After stratifying by pathologic stage, our results are in agreement with a previous study that suggests that low-risk patients, such as patients with organ confined disease (pT2) without high Gleason scores or high PSA levels, may not need annual lifelong PSA follow-up (Tollefson et al., 2010). However, patients with EC E ( ≥ p T3a ) disease are at risk for both early and late BCR, and therefore, appropriate follow-up should be maintained after 5 years from surgery for these patients. In this study, we did not find a difference in PCa-related mortality between patients with early and late BCR (log-rank p-value=0.139). This is in agreement with Pound et. al, confirming their finding that the timing of BCR is independent of time until death (Pound et al., 1999) although there are other reports that have shown conflicting results (Caire et al., 2009; Freedland 181 et al., 2006; Eggener et al., 2011). Cases with aggressive clinical characteristics such as higher grade and stage have been reported to have a higher risk of PCa-related mortality regardless of timing of BCR, while organ confined cases have low mortality rates throughout follow –up. 19,20 In our study, the 2 PCa-related deaths that occurred after late BCR (>5 years after surgery) were both stage pT3a, however, further extrapolation is limited by overall low events. 5.2.1 Strengths and Limitations There are several strengths of this study. First includes the use of a large multi-ethnic cohort of radical prostatectomy patients treated at a single institute by experienced surgeons. Second, is the active follow-up method used to maintain updated information on each patient enrolled in the cohort. Patients were contacted at least once a year to determine their status and any new information regarding their health. A limitation of this study is the small number of recurrences we are able to assess for overall recurrence and especially late recurrence. However, since there is consistency between our findings in relation with results from studies with larger cohorts, this may be a minimal limitation. Further studies using larger cohorts with extensive follow-up and detailed clinical information on variables that could be associated with recurrence would be beneficial to confirm predictors of timing of BCR and subsequent CR. 5.2.2 Conclusions Clinical and pathologic differences in patients who experience earlier or later BCR after RP support risk adapted surveillance. Follow-up should be maintained especially for high stage 182 patients beyond the 5-year period. Organ confined patients are less likely to experience CR beyond 5 years, however, those with early BCR may require early adjuvant therapy since they appear at higher risk for clinical recurrence. 5.3 The role of biomarkers of prognosis for localized prostate cancer Using whole genome gene expression from 187 tumors of radical prostatectomy patients diagnosed with organ confined prostate cancer with extensive follow-up (median 8.9 years), this study aimed to identify genes associated with aggressive prostate cancer and to develop a gene expression-based signature predictive of clinical metastatic recurrence. Based on our results, a 28- gene signature was developed that improves predictive ability upon a model with clinical variables alone in both the original dataset and in validation using three independent datasets. The model was obtained after thorough quality assessment of the gene expression data and using the latest statistical techniques meant to capture a set of genes that best predict recurrence while maximizing error control. Our results on single gene associations include 184 features (172 unique genes) statistically significantly differentially expressed between tumors without any recurrence and tumors that progressed to metastatic disease. Approximately 50% of these genes are involved in cellular metabolism, followed by genes (~33%) involved in cellular processs and cell communication (~23%). Furthermore, previous studies have reported the role of 5 genes in PCa progression and castrate-resistant PCa, which include among the top 10 ten differentially expressed genes in our dataset: acireductone dioxygenase 1 (ADI1) (Oram, Ai, Pagani, & Hitchens, 2007), alpha-2-glycoprotein 1, zinc-binding (AZGP1 or ZAG) (Agell et al., 2012; Bibikova et al., 2007; Chandran et al., 2007; Cima et al., 2011; Lapointe et al., 2004), anoctamin 183 7 (ANO7 or NGEP) (Erho et al., 2013), solute carrier organic anion transporter family, member 1A2 (SLCO1A2) (Wright et al., 2011), and matrix metallopeptidase 11 (MMP11) (Escaff, Fernandez, Gonzalez, & Suarez, 2010). Both ADI1 and ANO7 are androgen-responsive genes that can have an influence on progression. RALBP1 associated Eps domain containing 2 (REPS2), which appeared in the differential expression list multiple times, also shows to be correlated with prostate cancer progression and castrate resistant PCa (Henshall et al., 2003; Oosterhoff, 2004; Oosterhoff, Penninkhof, & Brinkmann, 2003; A. E. Ross et al., 2011). In our data, the differentially expressed genes between tumors Gleason 7 and greater versus tumors Gleason 6 and below include AZGP1 within the top 10 differentially expressed as well, showing that it is statistically significantly under-expressed in more poorly differentiated tumors. Among the 28-gene model, approximately 52% of the genes were involved in cellular metabolism, followed by cellular processes and cell communication. It also includes genes previously studied in association with prostate cancer such as prostate cancer antigen 3 (PCA3), ATP-binding cassette, sub-family C (CFTR/MRP), member 11 (ABCC11), matrix metalloproteinase 11 (MMP11), and myosin binding protein C, slow type (MYBPC1). Two other previously reported gene expression signatures associated with aggressive PCa also include MYBPC1 (Agell et al., 2012; Erho et al., 2013). PCA3 has been studied to be a potential biomarker of early diagnosis in prostate cancer as well, since unlike PSA, PCA3 is specific for prostate cancer, can be detected in urine samples, and was found to be associated with disease volume and biopsy progression (Deras, Aubin, Blase, Day, & Koo, 2008; Goode, Marshall, Duff, Chevli, & Chevli, 2013; Hessels & Schalken, 2009; Prensner, Rubin, Wei, & Chinnaiyan, 2012; Tosoian, Loeb, Kettermann, & Landis, 2010). ABCC11 (also known as MRP8) is a gene that is involved in multi-drug resistance and chemoresistance to 5-fluorouracil, and is associated with reduced survival among cancer patients (Cai, Omwancha, & Hsieh, 2006; Oguri et al., 2007; 184 Szakács et al., 2004). It also closely resembles the gene MRP4, based on structure and substrate specificity, which is an androgen receptor regulated gene and has been associated with multi-drug resistance in prostate cancer cells (Cai et al., 2006). The gene MMP11 is associated with tumor cell invasion and metastasis. An overexpression, as seen in our comparison among CR versus NED patients, could inhibit tumor cell apoptosis, but it may also be capable of increasing apoptosis during development. It is also associated with inhibiting immunity infiltrate in tumors, which are mediators in immunotherapeutic therapies (Egeblad & Werb, 2002). Other genes in this list that are associated with progression in other cancers include NK2 homeobox 1 (NKX2-1), primarily involved as a lineage-survival oncogene in lung adenocarcinoma but also have suppressive roles in tumor progression (B. A. Weir et al., 2007; Yamaguchi, Hosono, Yanagisawa, & Takahashi, 2013), and uroplakin 1A (UPK1A), which is a tumor suppressor gene associated with metastasis and progression in esophageal cancer patients (Kong et al., 2010). The predictive performance of our 28-gene model was validated in three separate datasets with whole-genome profiling that was obtained using a different platform than we used. In addition, we compared the performance of this novel signature to the performance of other signatures previously reported through validation of each within one of the datasets. Despite the differences in platforms, validation of our model in the Mayo Clinic dataset showed an AUC = 0.75 (0.72-0.77), which was identical to the one reported by investigators from the Mayo Clinic and GenomeDx, which was AUC=0.75 (0.68-0.83). When comparing side-by-side the performance of four different classifiers and ours using data from the Mayo Clinic dataset, the Mayo Clinic GenomeDx genomic classifier ( D e ci p her ™) performed the best (AUC=0.79) in their own dataset, followed by a 16-gene model + Gleason reported by Bibikova et al. (AUC=0.77) (Bibikova et al., 2007), then our USC/Illumina 28-gene model + Gleason (AUC=0.75) along with the 17-gene classifier from Genomic Health (Oncotype DX®) (AUC=0.75), then an 18-gene model reported from the Swedish Watchful Waiting Cohort (AUC=0.74) (Sboner et al., 2010), 185 then the 31-gene model based on cell cycle proliferation genes by Myriad Genetics (Prolaris®) (Cuzick et al., 2011) versus a model with only Gleason score (AUC=0.72). All models improved upon the model with just Gleason, however, the last two showed minimal improvement. 5.3.1 Strengths and limitations The strengths of this study include 1) the use of microdissected tissue to enrich the isolation of the most malignant looking glands, in order to obtain gene expression profiles representative of the most aggressive part of the tumor and not muddled with non-malignant tissue 2) the extensive and active follow-up on each of the patients in the cohort 3) the use of novel and state-of-the-art statistical methods to minimize variability in order to capture truly predictive genes of metastatic disease 4) the use of only organ-confined cancer patients of stage pT2 in both cases and controls that allowed for a better comparison of homogenous groups and obtainment of gene profiles of truly early PCa. The limitations of this study include 1) the possibility that tumor heterogeneity within the pr os t at e m ay not hav e al l o w ed f or p r ope r sa m pl i ng of t he f o ci m ost r ep r e se n t at i v e of t h e t um or ’ s potential to progress and 2) the relatively small size of the metastatic cases (n=33). Since we used only organ-confined patients, the clinical metastatic recurrence rate in this group is naturally lower than patients diagnosed with higher stages of disease. We sought to address this by validating our model in a dataset with higher numbers of clinical metastatic patients and obtained promising results that compared to and even improved upon other existing models, however, additional studies should be done in order to determine if our signature can provide additional predictive benefit for clinical use. 186 5.3.2 Conclusions We identified a 28-gene model used in conjunction with clinical variables may aid clinicians in risk assessment of their patients diagnosed with early stage PCa to determine a course of treatment that will reduce overtreatment of indolent cases and might help identify patients who would benefit from earlier and more aggressive treatments. The performance of our model is comparable, and in some instances, superior to those of recently launched commercial products. Thus, this newly identified signature is promising and with further validation and qualification it might contribute to improve the stratification of patients diagnosed with localized PCa. 5.4 Summary and final remarks The studies reported in this dissertation evaluated the role of genes and environment in prostate cancer development and progression. Tobacco smoking showed to be associated with prostate cancer risk and overall mortality in a multi-ethnic case-control study, while a polymorphism in the metabolism enzyme gene CYP1A2 involved in the metabolism of several carcinogens found in tobacco smoke, PAHs, HCAs, NOCs, and AAs additionally modified the association between smoking and prostate cancer risk. In evaluating prostate cancer progression, the GST gene variants were shown to be associated with biochemical recurrence (BCR) after radical prostatectomy in an Argentine population, while accounting for clinical variables associated with prostate cancer progression. In a US cohort of over 4,000 radical prostatectomy patients, a retrospective cohort study determined clinical variables in association with early or late biochemical recurrence and timing of subsequent clinical metastatic recurrence. The results showed that a subsequent metastatic event is not time-dependent on biochemical recurrence, supporting the maintenance of follow-up of patients after 5 years after radical prostatectomy who could potentially still be at risk for a late biochemical recurrence. These results emphasize the 187 need to identify additional predictors of recurrence, as clinical predictors are limited in their ability to prognosticate recurrence and time to recurrence. Using the same cohort, we implemented state-of-the-art laboratory and statistical methods that allowed us to discover a 28- gene signature predictive of aggressive recurrence among organ-confined tumor cases. The model can predict as accurately or better than existing signatures. The findings of these studies can have both public health and translational implications. As with most epidemiological studies, our results will need to be replicated in other studies in order to assess fully their validity and determine if our findings are relevant factors in the role of prostate cancer development and progression. With a concrete understanding of the role of environmental factors, such as tobacco smoking, and modifying genes, such as metabolism enzyme genes or markers of progression, in association with PCa risk and mortality, the public can be sufficiently informed on their possible risks and prevention of disease based on personalized lifestyle and genetic assessment. Also important, our gene and environment findings offer support to the plausibility of a relationship between an environmental exposure, such as smoking, and cancer risk. By capturing the true risk factors of disease and of disease progression, clinicians can be better able to assess risk and to inform patients with confidence of the required steps needed to either prevent disease or to prevent further progression of their existing disease. The role of biomarkers in personalized medicine to predict aggressive disease has large translational implications that could potentially save many lives. Our research findings provide insights into the biological behavior of prostate tumors and the genes involved in discriminating tumors that are less likely to metastasize into a lethal form of disease from tumors that will remain indolent. While there are still limitations that need to be addressed in future studies, such as the role of tumor heterogeneity and multi-focal disease, our findings act as proof-of principle 188 that the chosen approaches can be successful in improving our ability to distinguish men with indolent or aggressive disease by incorporating tumor biomarkers. 189 BIBLIOGRAPHY Adams, J. D., O'Mara-Adams, K. J., & Hoffmann, D. (1987). Toxic and carcinogenic agents in undiluted mainstream smoke and sidestream smoke of different types of cigarettes. Carcinogenesis, 8(5), 729 –731. doi:10.1093/carcin/8.5.729 Agell, L., Hernández, S., Nonell, L., Lorenzo, M., Puigdecanet, E., de Muga, S., et al. (2012). A 12-gene expression signature is associated with aggressive histological in prostate cancer: SEC14L1 and TCEB1 genes are potential markers of progression. The American journal of pathology, 181(5), 1585 –1594. doi:10.1016/j.ajpath.2012.08.005 Agency for Toxic Substances and Disease Registry (ATSDR). (n.d.). Agency for Toxic Substances and Disease Registry (ATSDR). 2300 N Street, NW, Suite 800, Washington DC 20037 United States : CQ Press. doi:10.4135/9781452240121.n18 Ahove, D. A., Hoffman, K. E., Hu, J. C., Choueiri, T. K., D'Amico, A. V., & Nguyen, P. L. (2010). Which Patients With UndetectablePSA Levels 5 Years After RadicalProstatectomy Are Still at Risk ofRecurrence? —Implications for aRisk-adapted Follow-up Strategy. URL, 76(5), 1201 –1205. doi:10.1016/j.urology.2010.03.092 American Cancer Society: Prostate Cancer. (n.d.). American Cancer Society: Prostate Cancer. Retrieved December 10, 2013, from http://www.cancer.org/cancer/prostatecancer/ Amin Al Olama, A., Kote-Jarai, Z., Schumacher, F. R., Wiklund, F., Berndt, S. I., Benlloch, S., et al. (2013). A meta-analysis of genome-wide association studies to identify prostate cancer susceptibility loci associated with aggressive and non-aggressive disease. Human molecular genetics, 22(2), 408 –415. doi:10.1093/hmg/dds425 Amling, C. L., Blute, M. L., & Bergstralh, E. J. (2000). Long-term hazard of progression after radical prostatectomy for clinically localized prostate cancer: continued risk of biochemical failure after 5 years. Th e J o urnal of … . Andreoiu, M., & Cheng, L. (2010). Multifocal prostate cancer: biologic, prognostic, and therapeutic implications. Human Pathology, 41(6), 781 –793. doi:10.1016/j.humpath.2010.02.011 April, C., Klotzle, B., Royce, T., Wickham-Garcia, E., Boyaniwsky, T., Izzo, J., et al. (2009). Whole-Genome Gene Expression Profiling of Formalin-Fixed, Paraffin-Embedded Tissue Samples. PLoS ONE, 4(12), e8162. doi:10.1371/journal.pone.0008162 Archer, M. C. (1989). Mechanisms of action of N-nitroso compounds. Cancer surveys, 8(2), 241 – 250. Avena, S., Via, M., Ziv, E., Pérez-Stable, E. J., Gignoux, C. R., Dejean, C., et al. (2012). Heterogeneity in Genetic Admixture across Different Regions of Argentina. PLoS ONE, 7(4), e34695. doi:10.1371/journal.pone.0034695 Balic, I., Graham, S. T., Troyer, D. A., Higgins, B. A., Pollock, B. H., Johnson-Pais, T. L., et al. (2002). Androgen receptor length polymorphism associated with prostate cancer risk in Hispanic men. JURO, 168(5), 2245 –2248. doi:10.1097/01.ju.0000031244.48584.69 Balk, S. P. (2002). Androgen receptor as a target in androgen-independent prostate cancer. Urology, 60(3), 132 –138. doi:10.1016/S0090-4295(02)01593-5 Bartsch, H., Nair, U., Risch, A., Rojas, M., Wikman, H., & Alexandrov, K. (2000). Genetic Polymorphism of CYP Genes, Alone or in Combination, as a Risk Modifier of Tobacco- related Cancers. Cancer Epidemiology Biomarkers & Prevention, 9(1), 3 –28. Bastian, P. J., Carter, B. H., Carter, B. H., Bjartell, A., Bjartell, A., Seitz, M., et al. (2009). Insignificant Prostate Cancer and Active Surveillance: From Definition to Clinical Implications. European Urology, 55(6), 1321 –1332. doi:10.1016/j.eururo.2009.02.028 Berglund, R. K., Masterson, T. A., Vora, K. C., Eggener, S. E., Eastham, J. A., & Guillonneau, B. 190 D. (2008). Pathological Upgrading and Up Staging With Immediate Repeat Biopsy in Patients Eligible for Active Surveillance. The Journal of Urology, 180(5), 1964 –1968. doi:10.1016/j.juro.2008.07.051 Bibikova, M., Chudin, E., Arsanjani, A., Zhou, L., Garcia, E. W., Modder, J., et al. (2007). Expression signatures that correlated with Gleason score and relapse in prostate cancer. Genomics, 89(6), 666 –672. doi:10.1016/j.ygeno.2007.02.005 Bill-Axelson, A., Holmberg, L., Ruutu, M., Garmo, H., Stark, J. R., Busch, C., et al. (2011). Radical prostatectomy versus watchful waiting in early prostate cancer. New England Journal of Medicine, 364(18), 1708 –1717. doi:10.1056/NEJMoa1011967 Block, G., Coyle, L. M., Hartman, A. M., & Scoppa, S. M. (1994). Revision of dietary analysis software for the Health Habits and History Questionnaire. American journal of epidemiology, 139(12), 1190 –1196. Blute, M. L., Bergstralh, E. J., IOCCA, A., Scherer, B., & Zincke, H. (2001). Use of Gleason score, prostate specific antigen, seminal vesicle and margin status to predict biochemical failure after radical prostatectomy. JURO, 165(1), 119 –125. doi:10.1097/00005392- 200101000-00030 Bolton, D. M., Ta, A., Bagnato, M., Muller, D., Lawrentschuk, N. L., Severi, G., et al. (2013). Interval to biochemical recurrence following radical prostatectomy does not affect survival in men with low-risk prostate cancer. World Journal of Urology. doi:10.1007/s00345-013- 1125-0 Bonkhoff, H. (1996). Role of the basal cells in premalignant changes of the human prostate: a stem cell concept for the development of prostate cancer. European Urology. Boobis, A. R., Lynch, A. M., Murray, S., & la Torre, de, R. (1994). CYP1A2-catalyzed conversion of dietary heterocyclic amines to their proximate carcinogens is their major route of metabolism in humans. Cancer Research. Boorjian, S. A., Karnes, R. J., Rangel, L. J., Bergstralh, E. J., & Blute, M. L. (2008). Mayo Clinic validation of the D'amico risk group classification for predicting survival following radical prostatectomy. J Urol, 179(4), 1354 –60– discussion 1360 –1. doi:10.1016/j.juro.2007.11.061 Boormans, J. L., Korsten, H., Ziel-van der Made, A. J. C., van Leenders, G. J. L. H., de Vos, C. V., Jenster, G., & Trapman, J. (2013). Identification of TDRD1as a direct target gene of ERGin primary prostate cancer. International Journal of Cancer, 133(2), 335 –345. doi:10.1002/ijc.28025 Bostwick, D. G., & Dundore, P. A. (1997). Biopsy Pathology of Prostate. Chapman & Hall. Bostwick, D. G., Burke, H. B., Djakiew, D., Euling, S., Ho, S. M., Landolph, J., et al. (2004). Human prostate cancer risk factors. (Vol. 101, pp. 2371 –2490). Presented at the Cancer. doi:10.1002/cncr.20408 Burchard, E. G., Ziv, E., Coyle, N., Gomez, S. L., Tang, H., Karter, A. J., et al. (2003). The Importance of Race and Ethnic Background in Biomedical Research and Clinical Practice. New England Journal of Medicine, 348(12), 1170 –1175. doi:10.1056/NEJMsb025007 Cai, C., Omwancha, J., & Hsieh, C. L. (2006). Androgen induces expression of the multidrug resistance protein gene MRP4 in prostate cancer cells. … c an ce r a nd pro st a t i c … . Caire, A. A., Sun, L., Ode, O., Stackhouse, D. A., Maloney, K., Donatucci, C., et al. (2009). Delayed Prostate-specific Antigen Recurrence After Radical Prostatectomy: How to Identify and What Are Their Clinical Outcomes? URL, 74(3), 643 –647. doi:10.1016/j.urology.2009.02.049 Carbon, S., Ireland, A., Mungall, C. J., Shu, S., Marshall, B., Lewis, S., et al. (2009). AmiGO: online access to ontology and annotation data. Bioinformatics (Oxford, England), 25(2), 288 –289. doi:10.1093/bioinformatics/btn615 Carter, H. B. (2013, September). American Urological Association (AUA) guideline on prostate cancer detection: process and rationale. BJU International. doi:10.1111/bju.12318 Cartwright, R. A., Rogers, H. J., & Barham-Hall, D. (1982). Role of N-acetyltransferase 191 phenotypes in bladder carcinogenesis: a pharmacogenetic epidemiological approach to bladder cancer. The Lancet. Cary, K. C., & Cooperberg, M. R. (2013). Biomarkers in prostate cancer surveillance and screening: past, present, and future. Therapeutic advances in urology, 5(6), 318 –329. doi:10.1177/1756287213495915 Castelao, J. E., Yuan, J. M., & Skipper, P. L. (2001). Gender-and smoking-related bladder cancer risk. Jou rnal of t he … . doi:10.1093/jnci/93.7.538 Catsburg, C., Joshi, A. D., Corral, R., Lewinger, J. P., Koo, J., John, E. M., et al. (2012). Polymorphisms in carcinogen metabolism enzymes, fish intake, and risk of prostate cancer. Carcinogenesis, 33(7), 1352 –1359. doi:10.1093/carcin/bgs175 Chan, A. T., Tranah, G. J., & Giovannucci, E. L. (2005a). Prospective study of N -‐ acetyltransferase -‐ 2 genotypes, meat intake, smoking and risk of colorectal cancer. … journal of cancer. Chan, A. T., Tranah, G. J., Giovannucci, E. L., Willett, W. C., Hunter, D. J., & Fuchs, C. S. (2005b). Prospective study of N-acetyltransferase-2 genotypes, meat intake, smoking and risk of colorectal cancer. International Journal of Cancer, 115(4), 648 –652. Chandran, U. R., Ma, C., Dhir, R., Bisceglia, M., Lyons-Weiler, M., Liang, W., et al. (2007). Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer, 7(1), 64. doi:10.1186/1471-2407-7-64 Cheever, M. A., & Higano, C. S. (2011). PROVENGE (Sipuleucel-T) in prostate cancer: the first FDA-approved therapeutic cancer vaccine. Clin Cancer Res, 17(11), 3520 –3526. doi:10.1158/1078-0432.CCR-10-3126 Chen, C., Grennan, K., Badner, J., Zhang, D., & Gershon, E. (2011). Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS ONE. Chen, Z. M., Xu, Z., Collins, R., & Li, W. X. (1997). Early health effects of the emerging tobacco epidemic in China. a 16-year prospective study. JAMA: the journ a l of t h e … . doi:10.1001/jama.278.18.1500 Cheng, I., Chen, G. K., Nakagawa, H., He, J., Wan, P., Laurie, C. C., et al. (2012). Evaluating genetic risk for prostate cancer among Japanese and Latinos. Cancer Epidemiology Biomarkers & Prevention, 21(11), 2048 –2058. doi:10.1158/1055-9965.EPI-12-0598 Cheville, J. C., Karnes, R. J., Therneau, T. M., Kosari, F., Munz, J. M., Tillmans, L., et al. (2008). Gene Panel Model Predictive of Outcome in Men at High-Risk of Systemic Progression and Death From Prostate Cancer After Radical Retropubic Prostatectomy. Journal of Clinical Oncology, 26(24), 3930 –3936. doi:10.1200/JCO.2007.15.6752 Chuang, C.-Y., Lee, C.-C., Chang, Y.-K., & Sung, F.-C. (2003). Oxidative DNA damage estimated by urinary 8-hydroxydeoxyguanosine: influence of taxi driving, smoking and areca chewing. Chemosphere, 52(7), 1163 –1171. doi:10.1016/S0045-6535(03)00307-2 Chun, F. K. H., Steuber, T., Erbersdobler, A., Currlin, E., Walz, J., Schlomm, T., et al. (2006). Development and internal validation of a nomogram predicting the probability of prostate cancer Gleason sum upgrading between biopsy and radical prostatectomy pathology. European Urology, 49(5), 820 –826. doi:10.1016/j.eururo.2005.11.007 Cima, I., Schiess, R., Wild, P., Kaelin, M., Schuffler, P., Lange, V., et al. (2011). Cancer genetics-guided discovery of serum biomarker signatures for diagnosis and prognosis of prostate cancer. Proceedings of the National Academy of Sciences, 108(8), 3342 –3347. doi:10.1073/pnas.1013699108 Cohen, M. S., Hanley, R. S., Kurteva, T., Ruthazer, R., Silverman, M. L., Sorcini, A., et al. (2008). Comparing the Gleason Prostate Biopsy and Gleason Prostatectomy Grading System: The Lahey Clinic Medical Center Experience and an International Meta-Analysis. European Urology, 54(2), 371 –381. doi:10.1016/j.eururo.2008.03.049 Cookson, M. S., Aus, G., Burnett, A. L., Canby-Hagino, E. D., D'amico, A. V., Dmochowski, R. 192 R., et al. (2007). Variation in the definition of biochemical recurrence in patients treated for localized prostate cancer: the American Urological Association Prostate Guidelines for Localized Prostate Cancer Update Panel report and recommendations for a standard in the reporting of surgical outcomes. JURO, 177(2), 540 –545. doi:10.1016/j.juro.2006.10.097 Cooper, C. S., Campbell, C., & Jhavar, S. (2007). Mechanisms of Disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer. Nature Clinical Practice Urology, 4(12), 677 –687. doi:10.1038/ncpuro0946 Cooperberg, Matthew R, Broering, J. M., Kantoff, P. W., & Carroll, P. R. (2007). Contemporary trends in low risk prostate cancer: risk assessment and treatment. JURO, 178(3 Pt 2), S14 –9. doi:10.1016/j.juro.2007.03.135 Cooperberg, Matthew R, Freedland, S. J., Pasta, D. J., ELKIN, E. P., Presti, J. C., Amling, C. L., et al. (2006). Multiinstitutional validation of the UCSF cancer of the prostate risk assessment for prediction of recurrence after radical prostatectomy. Cancer, 107(10), 2384 –2391. doi:10.1002/cncr.22262 Cooperberg, Matthew R, Pasta, D. J., ELKIN, E. P., Litwin, M. S., Latini, D. M., Chane, Du, J., & Carroll, P. R. (2005). The University of California, San Francisco Cancer of the Prostate Risk Assessment score: a straightforward and reliable preoperative predictor of disease recurrence after radical prostatectomy. JURO, 173(6), 1938 –1942. doi:10.1097/01.ju.0000158155.33890.e7 Corcoran, N. M., Hovens, C. M., Metcalfe, C., Hong, M. K. H., Pedersen, J., Casey, R. G., et al. (2012). Positive surgical margins are a risk factor for significant biochemical recurrence only in intermediate-risk disease. BJU International, 110(6), 821 –827. doi:10.1111/j.1464- 410X.2011.10868.x Cotignola, J., Leonardi, D. B., Shahabi, A., Acuna, A. D., Stern, M. C., Navone, N., et al. (2013). Glutathione-S-transferase (GST) polymorphisms are associated with relapse after radical prostatectomy. Prostate Cancer and Prostatic Diseases, 16(1), 28 –34. doi:10.1038/pcan.2012.45 Cuzick, J., Swanson, G. P., Fisher, G., Brothman, A. R., Berney, D. M., Reid, J. E., et al. (2011). Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study. The lancet oncology, 12(3), 245 –255. doi:10.1016/S1470-2045(10)70295-3 D'Amico, A. V., Whittington, R., & Malkowicz, S. B. (1998). Biochemical outcome after radical prostatectomy, external beam radiation therapy, or interstitial radiation therapy for clinically localized prostate cancer. J AMA: t he j our nal of … . doi:10.1001/jama.280.11.969 Daniell, H. W. (1995). A worse prognosis for smokers with prostate cancer. The Journal of Urology. De Marzo, A. M., Platz, E. A., Sutcliffe, S., Xu, J., Gronberg, H., Drake, C. G., et al. (2007). Inflammation in prostate carcinogenesis. Nature Reviews Cancer, 7(4), 256 –269. doi:10.1038/nrc2090 DeMarini, D. M. (2004). Genotoxicity of tobacco smoke and tobacco smoke condensate: a review. Mutation Research/Reviews in Mutation Research, 567(2-3), 447 –474. doi:10.1016/j.mrrev.2004.02.001 Deras, I. L., Aubin, S., Blase, A., Day, J. R., & Koo, S. (2008). PCA3: a molecular urine assay for predicting prostate biopsy outcome. The Jo urna l of … . Di Paolo, O. A., Teitel, C. H., Nowell, S., Coles, B. F., & Kadlubar, F. F. (2005). Expression of cytochromes P450 and glutathione S-transferases in human prostate, and the potential for activation of heterocyclic amine carcinogens via acetyl-coA-, PAPS- and ATP-dependent pathways. International Journal of Cancer, 117(1), 8 –13. doi:10.1002/ijc.21152 Dillioglugil, Ö., Leibman, B. D., & Kattan, M. W. (1997). Hazard rates for progression after radical prostatectomy for clinically localized prostate cancer. Urology. Ding, Y. S., Trommel, J. S., Yan, X. J., Ashley, D., & Watson, C. H. (2005). Determination of 14 193 Polycyclic Aromatic Hydrocarbons in Mainstream Smoke from Domestic Cigarettes. Environmental Science & Technology, 39(2), 471 –478. doi:10.1021/es048690k Doll, R., Peto, R., Wheatley, K., Gray, R., & Sutherland, I. (1994). Mortality in relation to smoking: 40 years' observations on male British doctors. Bmj. Eeles, R. A., Kote-Jarai, Z., Olama, Al, A. A., Giles, G. G., Guy, M., Severi, G., et al. (2009). Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nature Genetics, 41(10), 1116–1121. doi:10.1038/ng.450 Eeles, R. A., Olama, A. A. A., Benlloch, S., Saunders, E. J., Leongamornlert, D. A., Tymrakiewicz, M., et al. (2013). Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nature Genetics, 45(4), 385 –91– 391e1 –2. doi:10.1038/ng.2560 Egeblad, M., & Werb, Z. (2002). New functions for the matrix metalloproteinases in cancer progression. Nature Reviews Cancer, 2(3), 161 –174. doi:10.1038/nrc745 Eggener, S. E., Mueller, A., Berglund, R. K., Ayyathurai, R., Soloway, C., Soloway, M. S., et al. (2013). A multi-institutional evaluation of active surveillance for low risk prostate cancer. J Urol, 189(1 Suppl), S19 –25 – discussion S25. doi:10.1016/j.juro.2012.11.023 Eichholzer, M., & Gutzwiller, F. (1998). Dietary Nitrates, Nitrites, and N -‐ Nitroso Compounds and Cancer Risk: A Review of the Epidemiologic Evidence. Nutrition reviews, 56(4), 95 – 105. doi:10.1111/j.1753-4887.1998.tb01721.x El-Alfy, M., Pelletier, G., Hermo, L. S., & Labrie, F. (2000). Unique features of the basal cells of human prostate epithelium. Microscopy Research and Technique, 51(5), 436 –446. doi:10.1002/1097-0029(20001201)51:5<436::AID-JEMT6>3.0.CO;2-T Epstein, J. I., Feng, Z., Trock, B. J., & Pierorazio, P. M. (2012). Upgrading and downgrading of prostate cancer from biopsy to radical prostatectomy: incidence and predictive factors using the modified Gleason grading system and factoring in tertiary grades. European Urology, 61(5), 1019 –1024. doi:10.1016/j.eururo.2012.01.050 Erho, N., Crisan, A., Vergara, I. A., Mitra, A. P., Ghadessi, M., Buerki, C., et al. (2013). Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy. PLoS ONE, 8(6), e66855. doi:10.1371/journal.pone.0066855 Escaff, S., Fernandez, J. M., Gonzalez, L. O., & Suarez, A. (2010). Study of matrix metalloproteinases and their inhibitors in prostate cancer. … j ou rna l of canc er . F F Kadlubar, M. A. B. K. R. K. H. C. C. N. P. L. (1992). Polymorphisms for aromatic amine metabolism in humans: relevance for human carcinogenesis. Environmental Health Perspectives, 98, 69 –74. Fejerman, L., Romieu, I., John, E. M., Lazcano-Ponce, E., Huntsman, S., Beckman, K. B., et al. (2010). European ancestry is positively associated with breast cancer risk in Mexican women. Cancer Epidemiology Biomarkers & Prevention, 19(4), 1074 –1082. doi:10.1158/1055-9965.EPI-09-1193 Feldman, B. J., & Feldman, D. (2001). The development of androgen-independent prostate cancer. Nature Reviews Cancer, 1(1), 34 –45. doi:10.1038/35094009 Finnström, N., Bjelfman, C., Söderström, T. G., Smith, G., Egevad, L., Norlén, B. J., et al. (2001). Detection of cytochrome P450 mRNA transcripts in prostate samples by RT -‐ PCR. European Journal of Clinical Investigation, 31(10), 880 –886. doi:10.1046/j.1365- 2362.2001.00893.x Freedland SJ, Humphreys EB, Mangold LA, et al: Time to prostate specific antigen recurrence after radical prostatectomy and risk of prostate cancer specific mortality. JURO 2006; 176: 1404–1408. G Mastrangelo, E. F. V. M. (1996). Polycyclic aromatic hydrocarbons and cancer in man. Environmental Health Perspectives, 104(11), 1166 –183. Galanter, J. M., Fernandez-Lopez, J. C., Gignoux, C. R., Gignoux, C. R., Barnholtz-Sloan, J., 194 Barnholtz-Sloan, J., et al. (2012). Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas. PLoS Genetics, 8(3), e1002554. doi:10.1371/journal.pgen.1002554 Gentleman, R. C., Carey, V. J., Bates, D. M., Ben Bolstad, Dettling, M., Dudoit, S., et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Gertig, D. M., Hankinson, S. E., Hough, H., Spiegelman, D., Colditz, G. A., Willett, W. C., et al. (1999). N-acetyl transferase 2 genotypes, meat intake and breast cancer risk. International Journal of Cancer, 80(1), 13 –17. Giovannucci, E., Liu, Y., Platz, E. A., Stampfer, M. J., & Willett, W. C. (2007). Risk factors for prostate cancer incidence and progression in the health professionals follow -‐ up study. International Journal of Cancer, 121(7), 1571 –1578. doi:10.1002/ijc.22788 Glinsky, G. V., Glinskii, A. B., Stephenson, A. J., Hoffman, R. M., & Gerald, W. L. (2004). Gene expression profiling predicts clinical outcome of prostate cancer. Journal of Clinical Investigation, 113(6), 913 –923. doi:10.1172/JCI200420032 Gong, Z., Neuhouser, M. L., Goodman, P. J., Albanes, D., Chi, C., Hsing, A. W., et al. (2006). Obesity, Diabetes, and Risk of Prostate Cancer: Results from the Prostate Cancer Prevention Trial. Cancer Epidemiology Biomarkers & Prevention, 15(10), 1977 –1983. doi:10.1158/1055-9965.EPI-06-0477 Gonzalez Burchard, E., Borrell, L. N., Choudhry, S., Naqvi, M., Tsai, H. J., Rodriguez-Santana, J. R., et al. (2005). Latino populations: a unique opportunity for the study of race, genetics, and social environment in epidemiological research. American journal of public health, 95(12), 2161 –2168. doi:10.2105/AJPH.2005.068668 Goode, R. R., Marshall, S. J., Duff, M., Chevli, E., & Chevli, K. K. (2013). Use of PCA3 in detecting prostate cancer in initial and repeat prostate biopsy patients. The Prostate, 73(1), 48 –53. doi:10.1002/pros.22538 Graefen, M., Karakiewicz, P. I., Cagiannos, I., Klein, E., Kupelian, P. A., Quinn, D. I., et al. (2002a). Validation Study of the Accuracy of a Postoperative Nomogram for Recurrence After Radical Prostatectomy for Localized Prostate Cancer. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 20(4), 951 –956. Graefen, M., Karakiewicz, P. I., Cagiannos, I., Quinn, D. I., Henshall, S. M., Grygiel, J. J., et al. (2002b). International Validation of a Preoperative Nomogram for Prostate Cancer Recurrence After Radical Prostatectomy. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 20(15), 3206 –3212. doi:10.1200/JCO.2002.12.019 Greene, K. L., Meng, M. V., Elkin, E. P., Cooperberg, M. R., Pasta, D. J., Kattan, M. W., et al. (2004). Validation of the Kattan preoperative nomogram for prostate cancer recurrence using a community based cohort: results from cancer of the prostate strategic urological research endeavor (capsure). JURO, 171(6 Pt 1), 2255 –2259. Grover, P. L., & Martin, F. L. (2002). The initiation of breast and prostate cancer. Carcinogenesis, 23(7), 1095 –1102. Gurel, B., Iwata, T., Koh, C. M., Yegnasubramanian, S., Nelson, W. G., & De Marzo, A. M. (2008). Molecular Alterations in Prostate Cancer as Diagnostic, Prognostic, and Therapeutic Targets. Advances in Anatomic Pathology, 15(6), 319 –331. doi:10.1097/PAP.0b013e31818a5c19 Gustafsson, J. A., Soderkvist, P., Haaparanta, T., Busk, L., Pousette, A., Glaumann, H., et al. (1981). Induction of cytochrome P-450 and metabolic activation of mutagens in the rat ventral prostate. Prog Clin Biol Res, 75B, 191–205. Haas, G. P., & Sakr, W. A. (1997). Epidemiology of prostate cancer. CA: A Cancer Journal for Clinicians. Haiman, C. A., Chen, G. K., Blot, W. J., Strom, S. S., Berndt, S. I., Kittles, R. A., et al. (2011). Genome-wide association study of prostate cancer in men of African ancestry identifies a susceptibility locus at 17q21. Nature Genetics, 43(6), 570 –573. doi:10.1038/ng.839 195 Haiman, C. A., Patterson, N., Freedman, M. L., Myers, S. R., Pike, M. C., Waliszewska, A., et al. (2007). Multiple regions within 8q24 independently affect risk for prostate cancer. Nature Genetics, 39(5), 638 –644. doi:10.1038/ng2015 Hajj, El, A., Ploussard, G., la Taille, de, A., Allory, Y., Vordos, D., Hoznek, A., et al. (2013). Analysis of outcomes after radical prostatectomy in patients eligible for active surveillance (PRIAS). BJU International, 111(1), 53 –59. doi:10.1111/j.1464-410X.2012.11276.x Hamasaki, T., Inatomi, H., Katoh, T., & Aono, H. (2003). N -‐ acetyltransferase -‐ 2 gene polymorphism as a possible biomarker for prostate cancer in Japanese men. … j ou rnal o f urol o gy … . doi:10.1046/j.1442-2042.2003.00586.x Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646 –674. doi:10.1016/j.cell.2011.02.013 Harnden, P., Shelley, M. D., Coles, B., & Staffurth, J. (2007). Should the Gleason grading system for prostate cancer be modified to account for high-grade tertiary components? A systematic review and meta-analysis. The lancet oncology. Hassett, C., Lin, J., Carty, C. L., Laurenzana, E. M., & Omiecinski, C. J. (1997). Human hepatic microsomal epoxide hydrolase: comparative analysis of polymorphic expression. Arch Biochem Biophys, 337(2), 275 –283. Hastie, T., & Tibshirani, R. (1995). Generalized additive models for medical research. Statistical methods in medical research, 4(3), 187 –196. Hastie, T., & Tibshirani, R. (2003). Expression arrays and the p ≫ n problem. See 〈 http://www- stat stanford edu/ ∼ ha st i e / … . Hattab, E. M., Koch, M. O., Eble, J. N., Lin, H., & Cheng, L. (2006). Tertiary Gleason pattern 5 is a powerful predictor of biochemical relapse in patients with Gleason score 7 prostatic adenocarcinoma. The Journal of Urology. Hayes, R. B., Ziegler, R. G., Gridley, G., Swanson, C., Greenberg, R. S., Swanson, G. M., et al. (1999). Dietary Factors and Risks for Prostate Cancer among Blacks and Whites in the United States. Cancer Epidemiology Biomarkers & Prevention, 8(1), 25 –34. Hecht, S. S. (2003). Tobacco carcinogens, their biomarkers and tobacco-induced cancer. Nature Reviews Cancer, 3(10), 733 –744. doi:10.1038/nrc1190 Hecht, S. S., & Hoffmann, D. (1991). N-nitroso compounds and tobacco-induced cancers in man. IARC scientific publications. Hein, D. W., Leff, M. A., Ishibe, N., Sinha, R., Frazier, H. A., Doll, M. A., et al. (2002). Association of prostate cancer with rapid N-acetyltransferase 1 (NAT1*10) in combination with slow N-acetyltransferase 2 acetylator genotypes in a pilot case-control study. Environmental and molecular mutagenesis, 40(3), 161 –167. doi:10.1002/em.10103 Henshall, S. M., Afar, D. E. H., Hiller, J., Horvath, L. G., Quinn, D. I., Rasiah, K. K., et al. (2003). Survival Analysis of Genome-Wide Gene Expression Profiles of Prostate Cancers Identifies New Prognostic Targets of Disease Relapse. Cancer Research, 63(14), 4196 –4203. Hessels, D., & Schalken, J. A. (2009). The use of PCA3 in the diagnosis of prostate cancer. Nature Reviews Urology, 6(5), 255 –261. doi:10.1038/nrurol.2009.40 Hickey, K., Do, K. A., & Green, A. (2001). Smoking and prostate cancer. Epidemiologic reviews. Hirvonen, A. (1999). Polymorphic NATs and cancer predisposition. IARC Sci Publ, (148), 251 – 270. Hoffmann, D., & Hoffmann, I. (2001). The changing cigarette: chemical studies and bioassays. Sm oki ng and t obac co c on t r ol … . Humphrey, P. A. (2003). Prostate Pathology. Amer Society of Clinical. Huncharek, M., Haddock, K. S., & Reid, R. (2010). Smoking as a risk factor for prostate cancer: a meta-analysis of 24 prospective cohort studies. Jou rn al … . doi:10.2105/AJPH.2008.150508) Ingles, S. A., Ross, R. K., & Mimi, C. Y. (1997). Association of prostate cancer risk with genetic polymorphisms in vitamin D receptor and androgen receptor. J ourna l o f t h e … . 196 doi:10.1093/jnci/89.2.166 International Agency for Research on Cancer (IARC). IARC Monographs on the Evaluation of Carcinogenic Risks to Humans: Tobacco Smoke and Involuntary Smoking (Vol. 83). Iremashvili, V., Manoharan, M., Pelaez, L., Rosenberg, D. L., & Soloway, M. S. (2012). Clinically significant Gleason sum upgrade. Cancer, 118(2), 378 –385. doi:10.1002/cncr.26306 Irshad, S., Bansal, M., Castillo-Martin, M., Zheng, T., Aytes, A., Wenske, S., et al. (2013). A molecular signature predictive of indolent prostate cancer. Science Translational Medicine, 5(202), 202ra122. doi:10.1126/scitranslmed.3006408 Jacobs, D. R., Jr, Adachi, H., & Mulder, I. (1999). Cigarette smoking and mortality risk: twenty- five-year follow-up of the Seven Countries Study. Arc h i ve s of I nt er na l … . Jakoby, W. B., & Ziegler, D. M. (1990). The enzymes of detoxication. J Biol Chem. Jemal, A., Bray, F., Center, M. M., Ferlay, J., Ward, E., & Forman, D. (2011). Global cancer statistics. CA: A Cancer Journal for Clinicians, 61(2), 69 –90. doi:10.3322/caac.20107 John, E. M., Schwartz, G. G., Koo, J., Van Den Berg, D., & Ingles, S. A. (2005). Sun exposure, vitamin D receptor gene polymorphisms, and risk of advanced prostate cancer. Cancer Res, 65(12), 5470 –5479. Josephy, P. D., & Novak, M. (2013). Reactive electrophilic metabolites of aromatic amine and amide carcinogens. Frontiers in bioscience, 5, 341 –359. Joshi, A. D., Corral, R., Catsburg, C., Lewinger, J. P., Koo, J., John, E. M., et al. (2012a). Red meat and poultry, cooking practices, genetic susceptibility and risk of prostate cancer: results from a multiethnic case-control study. Carcinogenesis, 33(11), 2108 –2118. doi:10.1093/carcin/bgs242 Joshi, A. D., John, E. M., Koo, J., Ingles, S. A., & Stern, M. C. (2012b). Fish intake, cooking practices, and risk of prostate cancer: results from a multi-ethnic case-control study. Cancer Causes & Control, 23(3), 405 –420. doi:10.1007/s10552-011-9889-2 Karnes, R. J., Bergstralh, E. J., Davicioni, E., Ghadessi, M., Buerki, C., Mitra, A. P., et al. (2013). Validation of a Genomic Classifier that Predicts Metastasis Following Radical Prostatectomy in an At Risk Patient Population. J Urol, 190(6), 2047 –2053. doi:10.1016/j.juro.2013.06.017 Kattan, M. W., & Eastham, J. A. (1998). A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. Jo urnal of t he … . doi:10.1093/jnci/90.10.766 Kattan, M. W., Wheeler, T. M., & Scardino, P. T. (1999). Postoperative nomogram for disease recurrence after radical prostatectomy for prostate cancer. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 17(5), 1499 –1507. Kenfield, S. A., Stampfer, M. J., & Chan, J. M. (2011). Smoking and prostate cancer survival and recurrence. JAM A : t he j our nal o f t h e … . Kim, D., & Guengerich, F. P. (2005). Cytochrome P450 activation of arylamines and heterocyclic amines. Annual review of pharmacology and toxicology, 45(1), 27 –49. doi:10.1146/annurev.pharmtox.45.120403.100010 Kim, J. H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold- out and bootstrap. Computational Statistics & Data Analysis. Klotz, L., Zhang, L., Lam, A., Nam, R., Mamedov, A., & Loblaw, A. (2010). Clinical results of long-term follow-up of a large, active surveillance cohort with localized prostate cancer. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 28(1), 126 –131. doi:10.1200/JCO.2009.24.2180 Knezevic, D., Goddard, A. D., Natraj, N., Cherbavaz, D. B., Clark-Langone, K. M., Snable, J., et al. (2013). Analytical validation of the Oncotype DX prostate cancer assay - a clinical RT- PCR assay optimized for prostate needle biopsies. BMC genomics, 14, 690. doi:10.1186/1471-2164-14-690 Kobrinsky, N. L. (2003). Impact of Smoking on Cancer Stage at Diagnosis. Journal of Clinical 197 Oncology, 21(5), 907 –913. doi:10.1200/JCO.2003.05.110 Kong, K. L., Kwong, D. L., Fu, L., Chan, T. H. M., Chen, L., Liu, H., et al. (2010). Characterization of a candidate tumor suppressor gene uroplakin 1A in esophageal squamous cell carcinoma. Cancer Research, 70(21), 8832 –8841. doi:10.1158/0008-5472.CAN-10-0779 Kosari, F., Munz, J. M. A., Savci-Heijink, C. D., Spiro, C., Klee, E. W., Kube, D. M., et al. (2008). Identification of prognostic biomarkers for prostate cancer. Clin Cancer Res, 14(6), 1734–1743. doi:10.1158/1078-0432.CCR-07-1494 Lane, J. A., Hamdy, F. C., Martin, R. M., Turner, E. L., Neal, D. E., & Donovan, J. L. (2010). Latest results from the UK trials evaluating prostate cancer screening and treatment: the CAP and ProtecT studies. European journal of cancer (Oxford, England : 1990), 46(17), 3095 – 3101. doi:10.1016/j.ejca.2010.09.016 Lang, N. P., & Kadlubar, F. F. (1991). Aromatic and heterocyclic amine metabolism and phenotyping in humans. Prog Clin Biol Res, 372, 33 –47. Langholz, B. (2007). Use of Cohort Information in the Design and Analysis of Case -‐ Control Studies. Scandinavian Journal of Statistics, 34(1), 120 –136. doi:10.1111/j.1467- 9469.2006.00548.x Langholz, B., & Borgan, O. (1997). Estimation of absolute risk from nested case-control data. Biometrics, 53(2), 767 –774. Lapointe, J., Li, C., Higgins, J. P., van de Rijn, M., Bair, E., Montgomery, K., et al. (2004). Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences, 101(3), 811 –816. doi:10.1073/pnas.0304146101 Larkin, S. E. T., Holmes, S., Cree, I. A., Walker, T., Basketter, V., Bickers, B., et al. (2011). Identification of markers of prostate cancer progression using candidate gene expression. British Journal of Cancer, 106(1), 157 –165. doi:10.1038/bjc.2011.490 Lavender, N. A., Benford, M. L., VanCleave, T. T., Brock, G. N., Kittles, R. A., Moore, J. H., et al. (2009). Examination of polymorphic glutathione S-transferase (GST) genes, tobacco smoking and prostate cancer risk among Men of African Descent: A case-control study. BMC Cancer, 9(1), 397. doi:10.1186/1471-2407-9-397 Lavery, H. J., & Droller, M. J. (2012). Do Gleason patterns 3 and 4 prostate cancer represent separate disease states? J Urol, 188(5), 1667 –1675. doi:10.1016/j.juro.2012.07.055 Lerner, S. E., Blute, M. L., Bergstralh, E. J., Bostwick, D. G., Eickholt, J. T., & Zincke, H. (1996). Analysis of Risk Factors for Progression in Patients with Pathologically Confined Prostate Cancers After Radical Retropubic Prostatectomy. The Journal of Urology, 156(1), 137–143. doi:10.1016/S0022-5347(01)65967-6 Liang Chen, Prabu D Devanesan, Sheila Higginbotham, Freek Ariese, Ryszard Jankowiak, Gerry J Sm al l , et al . ( 19 96) . Ex pa nded Ana l y si s o f B enz o[ a ] py r ene −D N A A dduct s Fo r m ed i n V i t r o and i n Mou se Sk i n: T h ei r Sig ni f i ca nc e i n T um or I ni t i at i on. C hem i ca l re se ar ch i n … , 9(5), 897–903. doi:10.1021/tx960004a Loeb, S., Feng, Z., Ross, A., Trock, B. J., Humphreys, E. B., & Walsh, P. C. (2011). Can we stop prostate specific antigen testing 10 years after radical prostatectomy? J Urol, 186(2), 500 – 505. doi:10.1016/j.juro.2011.03.116 Lu, H., & Zhu, L. (2007). Pollution patterns of polycyclic aromatic hydrocarbons in tobacco smoke. Journal of Hazardous Materials, 139(2), 193 –198. doi:10.1016/j.jhazmat.2006.06.011 Makarov, D. V., SANDERSON, H., Partin, A. W., & Epstein, J. I. (2002). Gleason Score 7 Prostate Cancer on Needle Biopsy: Is the Prognostic Difference in Gleason Scores 4 3 and 3 4 Independent of the Number of Involved Cores? The Journal of Urology, 167(6), 2440 – 2442. doi:10.1016/S0022-5347(05)65000-8 Marquardt, H., Kuroki, T., Huberman, E., Selkirk, J. K., Heidelberger, C., Grover, P. L., & Sims, P. (1972). Malignant Transformation of Cells Derived from Mouse Prostate by Epoxides and Other Derivatives of Polycyclic Hydrocarbons. Cancer Research, 32(4), 716 –720. 198 Martin, F. L., Cole, K. J., Muir, G. H., Kooiman, G. G., Williams, J. A., Sherwood, R. A., et al. (2002). Primary cultures of prostate cells and their ability to activate carcinogens. Prostate Cancer and Prostatic Diseases, 5(2), 96 –104. doi:10.1038/sj.pcan.4500579 May, M., Knoll, N., Siegsmund, M., Fahlenkamp, D., Vogler, H., Hoschke, B., & Gralla, O. (2007). Validity of the CAPRA score to predict biochemical recurrence-free survival after radical prostatectomy. Results from a european multicenter survey of 1,296 patients. JURO, 178(5), 1957 –62– discussion 1962. doi:10.1016/j.juro.2007.07.043 McKeage, K. (2012). Docetaxel: a review of its use for the first-line treatment of advanced castration-resistant prostate cancer. Drugs, 72(11), 1559 –1577. doi:10.2165/11209660- 000000000-00000 Mehra, R., Tomlins, S. A., Yu, J., Cao, X., Wang, L., Menon, A., et al. (2008). Characterization of TMPRSS2-ETS Gene Aberrations in Androgen-Independent Metastatic Prostate Cancer. Cancer Research, 68(10), 3584 –3590. doi:10.1158/0008-5472.CAN-07-6154 Meinshausen, N., & Bühlmann, P. (2010). Stability selection. J ourna l o f t he Roy al St at i s t i cal … . Menzie, C. A., Potocki, B. B., & Santodonato, J. (1992). Exposure to carcinogenic PAHs in the environment. Environmental Science & Technology, 26(7), 1278 –1284. doi:10.1021/es00031a002 Mi, H., Muruganujan, A., & Thomas, P. D. (2013). PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Research, 41(Database issue), D377 –86. doi:10.1093/nar/gks1118 Mitchell, J. A., Cooperberg, M. R., ELKIN, E. P., LUBECK, D. P., MEHTA, S. S., Kane, C. J., & Carroll, P. R. (2005). Ability of 2 pretreatment risk assessment methods to predict prostate cancer recurrence after radical prostatectomy: data from CaPSURE. JURO, 173(4), 1126 – 1131. doi:10.1097/01.ju.0000155535.25971.de Mommsen, S., & Aagaard, J. (1983). Tobacco as a risk factor in bladder cancer. Carcinogenesis, 4(3), 335 –338. doi:10.1093/carcin/4.3.335 Mouraviev, V., Mayes, J. M., & Polascik, T. J. (2009). Pathologic basis of focal therapy for early- stage prostate cancer. Nature Reviews Urology, 6(4), 205 –215. doi:10.1038/nrurol.2009.29 Murata, M., Watanabe, M., Yamanaka, M., Kubota, Y., & Ito, H. (2001). Genetic polymorphisms in cytochrome P450 (CYP) 1A1, CYP1A2, CYP2E1, glutathione S-transferase (GST) M1 and GS TT 1 and sus ce p t i b i l i t y t o p r os t a t e …. Cancer letters. Murphy, A. B., Akereyeni, F., Nyame, Y. A., Guy, M. C., Martin, I. K., Hollowell, C. M. P., et al. (2013). Smoking and prostate cancer in a multi-ethnic cohort. The Prostate, 73(14), 1518 – 1528. doi:10.1002/pros.22699 Nair, J., Ohshima, H., Friesen, M., Croisy, A., Bhide, S. V., & Bartsch, H. (1985). Tobacco- specific and betel nut-specific N-nitroso compounds: occurrence in saliva and urine of betel quid chewers and formation in vitro by nitrosation of betel quid. Carcinogenesis, 6(2), 295 – 303. doi:10.1093/carcin/6.2.295 Nakagawa, T., Kollmeyer, T. M., Morlan, B. W., Anderson, S. K., Bergstralh, E. J., Davis, B. J., et al. (2008). A Tissue Biomarker Panel Predicting Systemic Progression after PSA Recurrence Post-Definitive Prostate Cancer Therapy. PLoS ONE, 3(5), e2318. doi:10.1371/journal.pone.0002318 Naylor, S. L. (2007). SNPs associated with prostate cancer risk and prognosis. Front Biosci. NCCN Clinical Practice Guidelines in Oncology (NCCN Guideline®): Prostate Cancer (Version 4.2013). (n.d.). NCCN Clinical Practice Guidelines in Oncology (NCCN Guideline®): Prostate Cancer (Version 4.2013). Retrieved from http://www.nccn.org/professionals/physician_gls/pdf/prostate.pdf NIH, N. T. P. N. (2011). Report on Carcinogens (12th Ed. ). DIANE Publishing. Nock, N. L., Tang, D., Rundle, A., Neslund-Dudas, C., Savera, A. T., Bock, C. H., et al. (2007). Associations between smoking, polymorphisms in polycyclic aromatic hydrocarbon (PAH) metabolism and conjugation genes and PAH-DNA adducts in prostate tumors differ by race. 199 Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 16(6), 1236–1245. doi:10.1158/1055-9965.EPI-06-0736 Oesterling, J. E. (1991). Prostate specific antigen: a critical assessment of the most useful tumor marker for adenocarcinoma of the prostate. JURO, 145(5), 907 –923. Oguri, T., Bessho, Y., Achiwa, H., Ozasa, H., Maeno, K., Maeda, H., et al. (2007). MRP8/ABCC11 directly confers resistance to 5-fluorouracil. Molecular Cancer Therapeutics, 6(1), 122 –127. doi:10.1158/1535-7163.MCT-06-0529 Oosterhoff, J. K. (2004). Mechanisms of androgen-independent Prostate Cancer Progression: Which Way to Go? Oosterhoff, J. K., Penninkhof, F., & Brinkmann, A. O. (2003). REPS2/POB1 is downregulated during human prostate cancer progression and inhibits growth factor signalling in prostate cancer cells. Oncogene. Oram, S. W., Ai, J., Pagani, G. M., & Hitchens, M. R. (2007). Expression and function of the human androgen-responsive gene ADI1 in prostate cancer. … ( N ew Y ork . Papafili, A., Hill, M. R., Brull, D. J., McAnulty, R. J., Marshall, R. P., Humphries, S. E., & Laurent, G. J. (2002). Common promoter variant in cyclooxygenase-2 represses gene expression: evidence of role in acute-phase inflammatory response. Arterioscler Thromb Vasc Biol, 22(10), 1631 –1636. Partin, A. W., Mangold, L. A., Lamm, D. M., & Walsh, P. C. (2001). Contemporary update of prostate cancer staging nomograms (Partin Tables) for the new millennium. Urology, 58(6), 843–848. doi:10.1016/S0090-4295(01)01441-8 Penney, K. L., Sinnott, J. A., Fall, K., Pawitan, Y., Hoshida, Y., Kraft, P., et al. (2011). mRNA Expression Signature of Gleason Grade Predicts Lethal Prostate Cancer. Journal of Clinical Oncology, 29(17), 2391 –2396. doi:10.1200/JCO.2010.32.6421 Perner, S., Demichelis, F., Beroukhim, R., Schmidt, F. H., Mosquera, J.-M., Setlur, S., et al. (2006). TMPRSS2:ERG fusion-associated deletions provide insight into the heterogeneity of prostate cancer. Cancer Research, 66(17), 8337 –8341. doi:10.1158/0008-5472.CAN-06- 1482 Phillips, D. H. (1999). Polycyclic aromatic hydrocarbons in the diet. Mutation Research/Genetic Toxicology and Environmental Mutagenesis, 443(1-2), 139 –147. doi:10.1016/S1383- 5742(99)00016-2 Plaskon, L. A., Penson, D. F., Vaughan, T. L., & Stanford, J. L. (2003). Cigarette smoking and risk of prostate cancer in middle-aged men. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 12(7), 604 –609. Platz, E. A., Leitzmann, M. F., Visvanathan, K., Rimm, E. B., Stampfer, M. J., Willett, W. C., & Giovannucci, E. (2006). Statin drugs and risk of advanced prostate cancer. JNCI Journal of the National Cancer Institute, 98(24), 1819 –1825. doi:10.1093/jnci/djj499 Pound, C. R., Partin, A. W., & Eisenberger, M. A. (1999). Natural history of progression after PSA elevation following radical prostatectomy. JAMA: the journal of … . Powell, I. J. (2007). Epidemiology and Pathophysiology of Prostate Cancer in African-American Men. The Journal of Urology, 177(2), 444 –449. doi:10.1016/j.juro.2006.09.024 Prensner, J. R., Rubin, M. A., Wei, J. T., & Chinnaiyan, A. M. (2012). Beyond PSA: The Next Generation of Prostate Cancer Biomarkers. Science Translational Medicine, 4(127), 127rv3 – 127rv3. doi:10.1126/scitranslmed.3003180 Preston-Martin, S., & Correa, P. (1988). Epidemiological evidence for the role of nitroso compounds in human cancer. Cancer surveys. Price, A. L., Patterson, N., Yu, F., Cox, D. R., Waliszewska, A., McDonald, G. J., et al. (2007). A genomewide admixture map for Latino populations. American journal of human genetics, 80(6), 1024 –1036. doi:10.1086/518313 200 Prostate mutations in rats induced by the suspected human carcinogen 2-amino-1-methyl-6- phenylimidazo[4,5-b]pyridine. (2000). Prostate mutations in rats induced by the suspected human carcinogen 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine. Cancer Research, 60(2), 266 –268. Ramaswamy, S., Ross, K. N., Lander, E. S., & Golub, T. R. (2002). A molecular signature of metastasis in primary solid tumors. Nature Genetics, 33(1), 49 –54. doi:10.1038/ng1060 Rappaport, S. M., Kim, S., Lan, Q., Vermeulen, R., Waidyanatha, S., Zhang, L., et al. (2009). Evidence that humans metabolize benzene via two pathways. Environmental Health Perspectives, 117(6), 946 –952. doi:10.1289/ehp.0800510 Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science (New York, N.Y.), 310(5748), 644 –648. doi:10.1126/science.1117679 Roberts, S. G., Blute, M. L., Bergstralh, E. J., Slezak, J. M., & Zincke, H. (2001). PSA Doubling Time as a Predictor of Clinical Progression After Biochemical Failure Following Radical Prostatectomy for Prostate Cancer. Mayo Clinic Proceedings, 76(6), 576 –581. doi:10.4065/76.6.576 Robinson, M. D., & Speed, T. P. (2007). A comparison of Affymetrix gene expression arrays. BMC bioinformatics, 8(1), 449. doi:10.1186/1471-2105-8-449 Rodgman, A., & Perfetti, T. A. (2013). The Chemical Components of Tobacco and Tobacco Smoke, Second Edition. CRC Press. Rohrmann, S., Linseisen, J., Allen, N., Bueno-de-Mesquita, H. B., Johnsen, N. F., nneland, A. T. O., et al. (2012). Smoking and the risk of prostate cancerin the European Prospective Investigationinto Cancer and Nutrition. British Journal of Cancer, 108(3), 708 –714. doi:10.1038/bjc.2012.520 Ross, A. E., Marchionni, L., Vuica-Ross, M., Cheadle, C., Fan, J., Berman, D. M., & Schaeffer, E. M. (2011). Gene expression pathways of high grade localized prostate cancer. The Prostate. doi:10.1002/pros.21373 Ross, R. K., Jones, P. A., & Yu, M. C. (1996). Bladder cancer epidemiology and pathogenesis. Seminars in oncology. Ryan, C. J., Smith, M. R., de Bono, J. S., Molina, A., Logothetis, C. J., de Souza, P., et al. (2013). Abiraterone in Metastatic Prostate Cancer without Previous Chemotherapy. New England Journal of Medicine, 368(2), 138 –148. doi:10.1056/nejmoa1209096 Rybicki, B. A., Rundle, A., Savera, A. T., Sankey, S. S., & Tang, D. (2004). Polycyclic aromatic hydrocarbon-DNA adducts in prostate cancer. Cancer Research, 64(24), 8854 –8859. doi:10.1158/0008-5472.CAN-04-2323 Sachse, C., Bhambra, U., & Smith, G. (2003). Polymorphisms in the cytochrome P450 CYP1A2 gene (CYP1A2) in colorectal cancer patients and controls: allele frequencies, linkage di se q ui l i b r i um and …. Bri t i sh j ou rna l of … . Sachs e , C ., B r ock m öl l er , J ., & B aue r , S. ( 1999 ) . Fun ct i onal si g ni f i c anc e of a C → A polymorphism in intron 1 of the cytochrome P450 CYP1A2 gene tested with caffeine. British j ourn al o f c l i n i ca l … . doi:10.1046/j.1365-2125.1999.00898.x Sboner, A., Demichelis, F., Calza, S., Pawitan, Y., Setlur, S. R., Hoshida, Y., et al. (2010). Molecular sampling of prostate cancer: a dilemma for predicting disease progression. BMC Medical Genomics, 3(1), 8. doi:10.1186/1755-8794-3-8 Schroder, F. H., Hugosson, J., & Roobol, M. J. (2009). Screening and prostate-cancer mortality in a randomized European study. … England Jour nal of … . Schwartz, G. G. (2005). VITAMIN D IN HEALTH AND DISEASE: Vitamin D and the Epidemiology of Prostate Cancer. Seminars in dialysis, 18(4), 276 –289. doi:10.1111/j.1525- 139X.2005.18403.x Schwartz, G. G., John, E. M., Rowland, G., & Ingles, S. A. (2010a). Prostate cancer in African- American men and polymorphism in the calcium-sensing receptor. Cancer biology & 201 therapy, 9(12), 994 –999. doi:10.4161/cbt.9.12.11689 Schwartz, G. G., John, E. M., Rowland, G., & Ingles, S. A. (2010b). Prostate cancer in African- American men and polymorphism in the calcium-sensing receptor. Cancer biology & therapy, 9(12), 994 –999. Screening for Prostate Cancer, Topic Page. U.S. Preventive Services Task Force. . (n.d.). Screening for Prostate Cancer, Topic Page. U.S. Preventive Services Task Force. . Retrieved from http://www.uspreventiveservicestaskforce.org/prostatecancerscreening.htm SEER Cancer Statistics Factsheets: Prostate Cancer. (n.d.). SEER Cancer Statistics Factsheets: Prostate Cancer. Bethesda, MD. Retrieved November 8, 2013, from http://seer.cancer.gov/statfacts/html/prost.html Seitz, H. K., & Becker, P. (2007). Alcohol metabolism and cancer risk. Alcohol Research and Health. Seldin, M. F., Pasaniuc, B., & Price, A. L. (2011). New approaches to disease mapping in admixed populations. Nature Reviews Genetics, 12(8), 523 –528. doi:10.1038/nrg3002 Shah, R. B., Mehra, R., & Chinnaiyan, A. M. (2008). Role of the TMPRSS2-ERG gene fusion in prostate cancer. … ( N ew Y o rk . Shahabi, A., Wilson, M. L., Lewinger, J. P., Goodwin, T. M., Stern, M. C., & Ingles, S. A. (2013). Genetic admixture and risk of hypertensive disorders of pregnancy among Latinas in Los Angeles County. Epidemiology (Cambridge, Mass.), 24(2), 285 –294. doi:10.1097/EDE.0b013e31828174cb Shi, W., Oshlack, A., & Smyth, G. K. (2010). Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips. Nucleic Acids Research. Shimada, T., & Fujii-Kuriyama, Y. (2004). Metabolic activation of polycyclic aromatic hydrocarbons to carcinogens by cytochromes P450 1A1 and 1B1. Cancer science, 95(1), 1 – 6. Sim, H. G., Telesca, D., Culp, S. H., Ellis, W. J., & Lange, P. H. (2008). Tertiary Gleason pattern 5 in Gleason 7 prostate cancer predicts pathological stage and biochemical recurrence. The Jour na l o f … . Sims, P., Grover, P. L., Swaisland, A., Pal, K., & Hewer, A. (1974). Metabolic activation of benzo (a) pyrene proceeds by a diol-epoxide. Nature. Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., & Manola, J. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203 –209. doi:10.1016/S1535-6108(02)00030-2 Smith, C. J., Perfetti, T. A., Garg, R., & Hansch, C. (2003). IARC carcinogens reported in cigarette mainstream smoke and their calculated log P values. Food and Chemical Toxicology, 41(6), 807 –817. doi:10.1016/S0278-6915(03)00021-8 Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 3, Article3. doi:10.2202/1544-6115.1027 Smyth, G. K. (2005). limma: Linear Models for Microarray Data. Bioinformatics and Computational Biology Solutions Using R and Bioconductor, (Chapter 23), 397 –420. doi:10.1007/0-387-29362-0_23 Snowdon, D. A., & PHILLIPS, R. L. (1984). Diet, obesity, and risk of fatal prostate cancer. Am er i ca n Jou rnal of … . Stephenson, A. J., Kattan, M. W., Eastham, J. A., Dotan, Z. A., Bianco, F. J. J., Lilja, H., & Scardino, P. T. (2006). Defining biochemical recurrence of prostate cancer after radical prostatectomy: a proposal for a standardized definition. Journal of Clinical Oncology, 24(24), 3973 –3978. doi:10.1200/JCO.2005.04.0756 Stephenson, A. J., Scardino, P. T., Eastham, J. A., Bianco, F. J., Dotan, Z. A., DiBlasio, C. J., et al. (2005). Postoperative nomogram predicting the 10-year probability of prostate cancer recurrence after radical prostatectomy. Journal of clinical oncology : official journal of the 202 American Society of Clinical Oncology, 23(28), 7005 –7012. doi:10.1200/JCO.2005.01.867 Stephenson, A. J., Smith, A., Kattan, M. W., Satagopan, J. M., Reuter, V. E., Scardino, P. T., & Gerald, W. L. (2008). Integration of Gene Expression Profiling and Clinical Variables to Predict Prostate Carcinoma Recurrence After Radical Prostatectomy, 1 –16. Stephenson, R. A., & Stanford, J. L. (1997). Population-based prostate cancer trends in the United States: patterns of change in the era of prostate-specific antigen. World Journal of Urology, 15(6), 331 –335. doi:10.1007/BF01300179 Sterling, K. M., & Cutroneo, K. R. (2004). Constitutive and inducible expression of cytochromes P4501A (CYP1A1 and CYP1A2) in normal prostate and prostate cancer cells. Journal of Cellular Biochemistry, 91(2), 423 –429. doi:10.1002/jcb.10753 Stewart, L. V., & Weigel, N. L. (2004). Vitamin D and Prostate Cancer. Experimental Biology and Medicine, 229(4), 277 –284. Steyerberg, E. W., Roobol, M. J., Kattan, M. W., Van der Kwast, T. H., de Koning, H. J., & Schroder, F. H. (2007). Prediction of indolent prostate cancer: validation and updating of a prognostic nomogram. JURO, 177(1), 107 –12– discussion 112. doi:10.1016/j.juro.2006.08.068 Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100(16), 9440 –9445. doi:10.1073/pnas.1530509100 Strange, R. C., & Fryer, A. A. (1999). The glutathione S-transferases: influence of polymorphism on cancer susceptibility. IARC Sci Publ, 148, 231–249. Strom, S. S., Yamamura, Y., Flores-Sandoval, F. N., Pettaway, C. A., & Lopez, D. S. (2008). Prostate cancer in Mexican-Americans: Identification of risk factors. The Prostate, 68(5), 563 –570. doi:10.1002/pros.20713 Szakács, G., Szakács, G., Annereau, J.-P., Annereau, J.-P., Lababidi, S., Lababidi, S., et al. (2004). Predicting drug sensitivity and resistance. Cancer Cell, 6(2), 129 –137. doi:10.1016/j.ccr.2004.06.026 Talantov, D., Jatkoe, T. A., Böhm, M., & Zhang, Y. (2010). Gene based prediction of clinically localized prostate cancer progression after radical prostatectomy. The Jour na l of … . doi:10.1016/j.juro.2010.05.084 Tannock, I. F., de Wit, R., Berry, W. R., & Horti, J. (2004). Docetaxel plus prednisone or mitoxantrone plus prednisone for advanced prostate cancer. … Eng l and Jour nal o f … . Taylor, B. S., Schultz, N., Hieronymus, H., Gopalan, A., Xiao, Y., Carver, B. S., et al. (2010a). Integrative genomic profiling of human prostate cancer. Cancer Cell, 18(1), 11 –22. doi:10.1016/j.ccr.2010.05.026 Taylor, R. A., Toivanen, R., & Risbridger, G. P. (2010b). Stem cells in prostate cancer: treating the root of the problem. Endocrine Related Cancer, 17(4), R273 –85. doi:10.1677/ERC-10- 0145 Tollefson, M. K., Leibovich, B. C., Slezak, J. M., Zincke, H., & Blute, M. L. (2006). Long-Term Prognostic Significance of Primary Gleason Pattern in Patients With Gleason Score 7 Prostate Cancer: Impact on Prostate Cancer Specific Survival. The Journal of Urology, 175(2), 547 –551. doi:10.1016/S0022-5347(05)00152-7 Tomlins, S. A., Aubin, S. M. J., Siddiqui, J., Lonigro, R. J., Sefton-Miller, L., Miick, S., et al. (2011). Urine TMPRSS2:ERG Fusion Transcript Stratifies Prostate Cancer Risk in Men with Elevated Serum PSA. Science Translational Medicine, 3(94), 94ra72 –94ra72. doi:10.1126/scitranslmed.3001970 Tosoian, J. J., Loeb, S., Kettermann, A., & Landis, P. (2010). Accuracy of PCA3 measurement in predicting short-term biopsy progression in an active surveillance program. The Journal of …. Tricker, A. R. (1997). N-nitroso compounds and man: sources of exposure, endogenous formation and occurrence in body fluids. European journal of cancer prevention: the official 203 …. Tricker, A. R., Ditrich, C., & Preussmann, R. (1991). N-Nitroso compounds in cigarette tobacco and their occurrence in mainstream tobacco smoke. Carcinogenesis, 12(2), 257 –261. doi:10.1093/carcin/12.2.257 Trock, B. J., Guo, C. C., Gonzalgo, M. L., Magheli, A., Loeb, S., & Epstein, J. I. (2009). Tertiary Gleason Patterns and Biochemical Recurrence After Prostatectomy: Proposal for a Modified Gleason Scoring System. The Journal of Urology, 182(4), 1364 –1370. doi:10.1016/j.juro.2009.06.048 True, L., Coleman, I., Hawley, S., Huang, C.-Y., Gifford, D., Coleman, R., et al. (2006). A molecular correlate to the Gleason grading system for prostate adenocarcinoma. Proceedings of the National Academy of Sciences of the United States of America, 103(29), 10991 –10996. doi:10.1073/pnas.0603678103 Truong, M., Slezak, J. A., Lin, C. P., Iremashvili, V., Sado, M., Razmaria, A. A., et al. (2013). Development and multi -‐ institutional validation of an upgrading risk tool for Gleason 6 prostate cancer. Cancer, 119(22), 3992 –4002. doi:10.1002/cncr.28303 Tsai, H.-J., Choudhry, S., Naqvi, M., Rodriguez-Cintron, W., Burchard, E. G., & Ziv, E. (2005). Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Human genetics, 118(3-4), 424 –433. doi:10.1007/s00439-005-0067-z Turesky, R. J. (2002). Heterocyclic aromatic amine metabolism, DNA adduct formation, mutagenesis, and carcinogenesis. Drug metabolism reviews. Turteltaub, K. W., Mauthe, R. J., & Dingley, K. H. (1997). MeIQx-DNA adduct formation in rodent and human tissues at low doses. Mutati on Re se a rc h/ … . Ukimura, O., Coleman, J. A., la Taille, de, A., Emberton, M., Epstein, J. I., Freedland, S. J., et al. (2013). Contemporary Role of Systematic Prostate Biopsies: Indications, Techniques, and Implications for Patient Care. European Urology, 63(2), 214 –230. doi:10.1016/j.eururo.2012.09.033 Ukimura, O., Desai, M. M., Palmer, S., Valencerina, S., Gross, M., Abreu, A. L., et al. (2012). 3- Dimensional Elastic Registration System of Prostate Biopsy Location by Real-Time 3- Dimensional Transrectal Ultrasound Guidance With Magnetic Resonance/Transrectal Ultrasound Image Fusion. The Journal of Urology, 187(3), 1080 –1086. doi:10.1016/j.juro.2011.10.124 Van As, N. J., Norman, A. R., Thomas, K., Khoo, V. S., Thompson, A., Huddart, R. A., et al. (2008). Predicting the probability of deferred radical treatment for localised prostate cancer managed by active surveillance. European Urology, 54(6), 1297 –1305. doi:10.1016/j.eururo.2008.02.039 Varambally, S., Yu, J., Laxman, B., Rhodes, D. R., Mehra, R., Tomlins, S. A., et al. (2005). Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. Cancer Cell, 8(5), 393 –406. doi:10.1016/j.ccr.2005.10.001 Vineis, P., & Pirastu, R. (1997). Aromatic amines and cancer. Cancer Causes & Control, 8(3), 346–355. doi:10.1023/A:1018453104303 Vineis, P., Alavanja, M., & Buffler, P. (2004). Tobacco and cancer: recent epidemiological evidence. Jour nal o f t he … . doi:10.1093/jnci/djh014 Wallace, L. (1996). Environmental exposure to benzene: an update. Environmental Health Perspectives. Walsh, P. C., & DeWeese, T. L. (2007). Localized prostate cancer. N ew E n gl an d Jour na l o f … . Walz, J., Chun, F. K. H., Klein, E. A., Reuther, A., Saad, F., Graefen, M., et al. (2009). Nomogram predicting the probability of early recurrence after radical prostatectomy for prostate cancer. J Urol, 181(2), 601 –7– discussion 607 –8. doi:10.1016/j.juro.2008.10.033 Wang, H., Zhang, Z., Han, S., Lu, Y., Feng, F., & Yuan, J. (2012a). CYP1A2 rs762551 polymorphism contributes to cancer susceptibility: a meta-analysis from 19 case-control 204 studies. BMC Cancer. Wang, J., Joshi, A. D., Corral, R., Siegmund, K. D., Marchand, L. L., Martinez, M. E., et al. (2012b). Carcinogen metabolism genes, red meat and poultry intake, and colorectal cancer risk. International Journal of Cancer, 130(8), 1898 –1907. doi:10.1002/ijc.26199 Wang, M. H., Shugart, Y. Y., Cole, S. R., & Platz, E. A. (2009). A Simulation Study of Control Sampling Methods for Nested Case-Control Studies of Genetic and Molecular Biomarkers and Prostate Cancer Progression. Cancer Epidemiology Biomarkers & Prevention, 18(3), 706 –711. doi:10.1158/1055-9965.EPI-08-0839 Ward, J. F., Blute, M. L., Slezak, J., Bergstralh, E. J., & Zincke, H. (2003). The long-term clinical impact of biochemical recurrence of prostate cancer 5 or more years after radical prostatectomy. JURO, 170(5), 1872 –1876. doi:10.1097/01.ju.0000091876.13656.2e Watters, J. L., Park, Y., Hollenbeck, A., Schatzkin, A., & Albanes, D. (2009). Cigarette Smoking and Prostate Cancer in a Prospective US Cohort Study. Cancer Epidemiology Biomarkers & Prevention, 18(9), 2427 –2435. doi:10.1158/1055-9965.EPI-09-0252 Weir, B. A., Woo, M. S., Getz, G., Perner, S., Ding, L., Beroukhim, R., et al. (2007). Characterizing the cancer genome in lung adenocarcinoma. Nature, 450(7168), 893 –898. doi:10.1038/nature06358 Weir, J. M., & Dunn, J. E. (1970). Smoking and mortality: a prospective study. Cancer, 25(1), 105 –112. doi:10.1002/1097-0142(197001)25:1<105::AID-CNCR2820250115>3.0.CO;2-Z Welch, H. G., & Albertsen, P. C. (2009). Prostate Cancer Diagnosis and Treatment After the Introduction of Prostate-Specific Antigen Screening: 1986 –2005. JNCI Journal of the National Cancer Institute, 101(19), 1325 –1329. doi:10.1093/jnci/djp278 Whitson, J. M., Porten, S. P., Cowan, J. E., Simko, J. P., Cooperberg, M. R., & Carroll, P. R. (2013). Factors associated with downgrading in patients with high grade prostate cancer. Urologic oncology, 31(4), 442 –447. doi:10.1016/j.urolonc.2011.02.010 Williams, J. A. (2000). Metabolic activation of carcinogens and expression of various cytochromes P450 in human prostate tissue. Carcinogenesis, 21(9), 1683 –1689. doi:10.1093/carcin/21.9.1683 Wilt, T. J., Brawer, M. K., Jones, K. M., Barry, M. J., Aronson, W. J., Fox, S., et al. (2012). Radical prostatectomy versus observation for localized prostate cancer. New England Journal of Medicine, 367(3), 203 –213. doi:10.1056/NEJMoa1113162 Witte, J. S. (2009). Prostate cancer genomics: towards a new understanding. Nature Reviews Genetics, 10(2), 77 –82. doi:10.1038/nrg2507 Wright, J. L., Kwon, E. M., Ostrander, E. A., Montgomery, R. B., Lin, D. W., Vessella, R., et al. (2011). Expression of SLCO transport genes in castration-resistant prostate cancer and impact of genetic variation in SLCO1B3 and SLCO2B1 on prostate cancer outcomes. Cancer Epidemiology Biomarkers & Prevention, 20(4), 619 –627. doi:10.1158/1055- 9965.EPI-10-1023 Wright, J. L., Salinas, C. A., Lin, D. W., & Kolb, S. (2009). Prostate cancer specific mortality and Gleason 7 disease differences in prostate cancer outcomes between cases with Gleason 4+ 3 and Gle as on 3+ 4 t um or s i n a …. The Jour nal of … . WU, H., SUN, L., MOUL, J. W., Wu, H. Y., McLEOD, D. G., AMLING, C., et al. (2004). Watchful waiting and factors predictive of secondary treatment of localized prostate cancer. JURO, 171(3), 1111 –1116. doi:10.1097/01.ju.0000113300.74132.8b Yaeger, R., Avila-Bront, A., Abdul, K., Nolan, P. C., Grann, V. R., Birchette, M. G., et al. (2008). Comparing genetic ancestry and self-described race in african americans born in the United States and in Africa. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 17(6), 1329 –1338. doi:10.1158/1055-9965.EPI-07-2505 Yamaguchi, T., Hosono, Y., Yanagisawa, K., & Takahashi, T. (2013). NKX2-1/TTF-1: An Enigmatic Oncogene that Functions as a Double-Edged Sword for Cancer Cell Survival and 205 Progression. Cancer Cell, 23(6), 718 –723. doi:10.1016/j.ccr.2013.04.002 Yu, M. C., Skipper, P. L., Tannenbaum, S. R., Chan, K. K., & Ross, R. K. (2002). Arylamine exposures and bladder cancer risk. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 506-507, 21 –28. doi:10.1016/S0027-5107(02)00148-3 Yu, Y. P., Landsittel, D., Jing, L., Nelson, J., Ren, B., Liu, L., et al. (2004). Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 22(14), 2790 –2799. doi:10.1200/JCO.2004.05.158 Zhang, P. (1993). Model selection via multifold cross validation. The Annals of Statistics. doi:10.2307/3035592 Zhao, K. H., Hernandez, D. J., Han, M., Humphreys, E. B., Mangold, L. A., & Partin, A. W. (2008). External Validation of University of California, San Francisco, Cancer of the Prostate Risk Assessment Score. Urology, 72(2), 396 –400. doi:10.1016/j.urology.2007.11.165 Zou, H., & Hastie, T. (2003). Regression shrinkage and selection via the elastic net, with applications to microarrays. Jour na l o f t h e Roya l S t at i s t i c al So ci et y : … . Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of t he Ro yal S t a t i st i ca l Soc i e t y: Se ri es … . doi:10.1111/j.1467-9868.2005.00503.x/full Zu, K., & Giovannucci, E. (2009). Smoking and aggressive prostate cancer: a review of the epidemiologic evidence. Cancer Causes & Control, 20(10), 1799–1810. doi:10.1007/s10552-009-9387-y
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dietary and supplementary folate intake and prostate cancer risk
PDF
Dietary carcinogens and genetic variation in their metabolism: epidemiological studies on the risk of selected cancers
PDF
Genetic variation in the base excision repair pathway, environmental risk factors and colorectal adenoma risk
PDF
Co-chaperone influence on androgen receptor signaling and identification of androgen receptor genes in prostate cancer
PDF
Body size and the risk of prostate cancer in the multiethnic cohort
PDF
Carcinogen metabolism genes, meat intake, and colorectal cancer risk
PDF
Prostate cancer disparities among Californian Latinos by country of origin: clinical characteristics, incidence, treatment received and survival
PDF
Cytoreductive radical prostatectomy for prostate cancer patients with oligometastatic disease
PDF
Robust feature selection with penalized regression in imbalanced high dimensional data
PDF
Common immune-related factors and risk of non-Hodgkin lymphomy
PDF
Prostate cancer: genetic susceptibility and lifestyle risk factors
PDF
Radical prostatectomy or external beam radiation therapy versus no local therapy for survival benefit in metastatic prostate cancer: a SEER-Medicare analysis
PDF
Fish consumption and risk of colorectal cancer
PDF
Meat intake, polymorphisms in the NER and MMR pathways and colorectal cancer risk
PDF
SLIT3 gene expression by 1,25(OH)₂D₃ in an endometriosis stromal cell line
PDF
Genetic and environmental risk factors for childhood cancer
PDF
Comparison of Cox regression and machine learning methods for survival analysis of prostate cancer
PDF
Genomic risk factors associated with Ewing Sarcoma susceptibility
PDF
The effects of tobacco exposure on hormone levels and breast cancer risk among young women
PDF
Functional characterization of a prostate cancer risk region
Asset Metadata
Creator
Shahabi, Ahva
(author)
Core Title
Genes and environment in prostate cancer risk and prognosis
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Molecular Epidemiology
Publication Date
02/26/2015
Defense Date
12/16/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biomarkers,GST genes,metabolic enzyme genes,OAI-PMH Harvest,prognosis,prostate cancer,radical prostatectomy,recurrence,tobacco smoking
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lewinger, Juan Pablo (
committee chair
), Stern, Mariana C. (
committee chair
), Ingles, Sue Ann (
committee member
), Pinski, Jacek K. (
committee member
), Siegmund, Kimberly D. (
committee member
)
Creator Email
ahva.shahabi@med.usc.edu,ahvashah@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-366242
Unique identifier
UC11287974
Identifier
etd-ShahabiAhv-2272.pdf (filename),usctheses-c3-366242 (legacy record id)
Legacy Identifier
etd-ShahabiAhv-2272.pdf
Dmrecord
366242
Document Type
Dissertation
Rights
Shahabi, Ahva
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
biomarkers
GST genes
metabolic enzyme genes
prognosis
prostate cancer
radical prostatectomy
recurrence
tobacco smoking