Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genetic risk factors in breast cancer susceptibility: The multiethnic cohort
(USC Thesis Other)
Genetic risk factors in breast cancer susceptibility: The multiethnic cohort
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NOTE TO USERS This reproduction is the best copy available. ® UMI Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. GENETIC RISK FACTORS IN BREAST CANCER SUSCEPTIBILITY: THE MULTIETHNIC COHORT by Philip Michael Bretsky A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL THE UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (EPIDEMIOLOGY) December 2004 Copyright 2004 Philip Michael Bretsky Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3197571 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3197571 Copyright 2006 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION Written mindful of Mimi Coleman, Penny Coleman and Sophie Wittenberg. And further dedicated to other families for whom a ‘genetic predisposition’ to breast cancer has true meaning. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. iii ACKNOWLEDGEMENTS An endeavor such as a dissertation does not come without considerable support. This thesis is no exception but includes individuals from multiple institutions as well as a personal support network. From the University of Southern California, my thesis committee provided invaluable assistance and helpful critiques. It includes Ronald Ross, Malcolm Pike, Gerhard Coetzee and Brian Henderson. Dr. Henderson functioned as the Chairperson o f this committee. His support and graciousness extended much further - and for that I am most appreciative. I wish also to thank Giske Ursin for her input and critical review of this manuscript. Also from USC, I thank Chris Haiman, David Van Den Berg, Sue Ingles and the Epidemiology Seminar class for many valuable discussions and comments. Dan Stram reviewed an early draft of this thesis and provided key input throughout the analysis. His ‘TagSNPs’ program and statistical expertise provide the structure for the haplotype estimation and the case-control analysis of ESR. Larry Kolonel from the University of Hawai’i Cancer Etiology Program provided feedback and support as a Principal Investigator of the Multiethnic Cohort. The ATM work occurred only as a consequence of efforts from Shlomit Gilad, Rami Skaliter and Avital Grossman of QBI Enterprises, Nes Ziona, Israel as Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. well as Shoshana Paglin and Joachim Yahalom of the Memorial Sloan Kettering Cancer Center. The ESR haplotyping work was performed at the Whitehead Institute/MIT Center for Genome Research headed by Eric Lander. I wish to acknowledge the mentorship of David Altshuler and Joel Hirschhorn during my time there. I wish to thank Matt Freedman, Mark Daly, Steve Schaffner, Noel Burtt, and Megan Loomer for their expertise and helpful critiques. Lastly, thanks Dad for all of your advice, encouragement and for three plus years of journals with yellow ‘stickies.’ And Mare, you kept me sane which was perhaps the most difficult task of the entire process. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS DEDICATION ii ACKNOWLEDGEMENTS iii LIST OF TABLES x LIST OF FIGURES xv ABBREVIATIONS xvii ABSTRACT xviii CHAPTER 1 MISSENSE MUTATIONS IN THE ATAXIA- TELANGIECTASIA MUTATED (ATM) GENE AND BREAST CANCER SUSCEPTIBILITY: A REVIEW 1 1.1. Summary 1 1.2. Ataxia-Telangiectasia 2 1.3. The ATM Gene: Structure and Function 6 1.3.1. Structure of the ATM Gene 6 1.3.2. ATM Function: The Maintenance of Genomic Integrity 8 1.4. Epidemiology of Disease in A-T Patients and ATM Heterozygotes 13 1.4.1. The Spectrum of Mutations in A-T Patients 13 1.4.2. Clinical Phenotype of A-T Heterozygotes - Family Studies 18 1.4.3. Frequency of Truncating ATM Mutations in Patients with Breast Cancer 26 1.5. AreATM Missense Mutations a Distinct and Clinically Significant Class of ATM Mutations? 32 1.6. Conclusions and Study Rationale 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. v i TABLE OF CONTENTS (continued). CHAPTER 2 THE RELATIONSHIP BETWEEN TWENTY MISSENSE ATM VARIANTS AND BREAST CANCER RISK 37 2.1. Summary 37 2.1.1. Objective 37 2.1.2. Methods 37 2.1.3. Results 38 2.1.4. Conclusions 38 2.2. Introduction 38 2.3. Materials and Methods 41 2.3.1. Multiethnic Cohort Study Population 41 2.3.2. ATM Missense Variant Discovery 44 2.3.3. Genotyping 45 2.3.4. Statistical Analysis 46 2.4. Results 47 2.5. Discussion 52 3 CO-SEGREGATION OF AN ATM MISSENSE VARIANT IN BREAST CANCER PATIENTS WITH A POSITIVE FAMILY HISTORY AND AN EARLIER AGE AT ONSET: THE MULTIETHNIC COHORT 56 3.1. Summary 56 3.2. Introduction 57 3.3. Materials and Methods 5 8 3.3.1. Multiethnic Cohort Study Population 5 8 3.3.2. Genotyping 60 3.3.3. Statistical Analysis 61 3.4. Results 61 3.5. Discussion 63 4 WOMEN CARRYING MULTIPLE MISSENSE MUTATIONS IN THE ATM GENE: IMPLICATIONS FOR RISK OF BREAST CANCER 67 4.1. Summary 67 4.2. Introduction 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. v ii TABLE OF CONTENTS (continuted). CHAPTER 4 4.3. Materials and Methods 72 4.3.1. Multiethnic Cohort Study Population 72 4.3.2. Genotyping 73 4.3.3. Statistical Analysis 74 4.4. Results 74 4.5. Discussion 81 5 THE SUITABILITY OF THE TWO ESTROGEN RECEPTOR GENES (ESR1 AND ESR2) AS CANDIDATE LOCI IN SPORADIC BREAST CANCER SUSCEPTIBILITY AND OTHER PHENOTYPES 85 5.1. Introduction 85 5.2. Structure and Function of the Estrogen Receptor Genes 89 5.2.1. Comparative Structures and Tissue Distribution of ERa and ERp 89 5.2.2. ER Function: Signal Transduction and Transcriptional Response 92 5.3. Genetic Variation in the Estrogen Receptor and Breast Cancer Susceptibility 94 5.3.1. ER Expression and Significance in Breast Cancer 94 5.3.2. Specific Variant Studies and Breast Cancer Susceptibility 96 5.4. Conclusions 116 6 DENSE LINKAGE DISEQUILIBRIUM MAPS OF ESTROGEN RECEPTORS ALPHA (ESR1) AND BETA (ESR2) AND IMPLICATIONS FOR DISEASE ASSOCIATION STUDIES: THE MULTIETHNIC COHORT 122 6.1. Summary 122 6.2. Introduction 123 6.3. Methods 129 6.3.1. Human Subjects 129 6.3.2. Marker Selection and Mapping 130 6.3.3. Genotyping 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. viii TABLE OF CONTENTS (continued). CHAPTER 6 6.3.4. Statistical Analysis for Recombination and Linkage Disequilibrium 132 6.3.5. Fine Structure of Haplotype Blocks 135 6.3.6. Estimation of Haplotypes and Haplotype Tagging SNP Selection 13 6 6.4. Results 142 6.4.1. ESR1 142 6.4.2 ESR2 178 6.5. Discussion 191 7 ASSOCIATION BETWEEN HAPLOTYPES OF ESTROGEN RECEPTORS ALPHA (ESR1) AND BETA (ESR2) AND BREAST CANCER RISK IN A PILOT CASE-CONTROL STUDY: THE MULTIETHNIC COHORT 195 7.1. Summary 195 7.2. Introduction 196 7.3. Methods 198 7.3.1. Multiethnic Cohort Study Population 198 7.3.2. ESR1 and ESR2 htSNP Evaluation 200 7.3.3. Genotyping 7.3.4. Statistical Analysis: Tests of Haplotype 201 Association with Phenotype 203 7.4. Results 205 7.4.1. Overview 205 7.4.2. ESR1 205 7.4.3. ESR2 242 7.5. Discussion 257 8 FUTURE DIRECTIONS IN THE STUDY OF HAPLOTYPIC VARIATION AND THE RISK OF BREAST CANCER: THE MULTIETHNIC COHORT 261 8.1. Summary 261 8.2. ATM: Haplotype and Linkage Disequilibrium Architecture 262 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ix TABLE OF CONTENTS (continued). CHAPTER 8 8.3. ESR& ESR2: Haplotype-Based Association Study Design 8.4. Conclusions BIBLIOGRAPY APPENDIX A VISUAL BASIC PROGRAM CALCULATING D’ AND 95% CONFIDENCE INTERVALS B ‘BLOCK FINDER’: A VISUAL BASIC PROGRAM EMPLOYING THE HAPLOTYPE 'BLOCK' DEFINITION OF GABRIEL ET AL. (2002) C ‘PHASING’ A PROGRAM RESOLVING PHASE OF PROBABILISTIC HAPLOTYPE DATA 265 272 274 303 339 354 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. X LIST OF TABLES TABLE 1. Summary of family studies characterizing the observed versus expected cancer incidence or mortality among presumptive A-T heterozygotes. 2. Estimated risks for breast cancer among presumptive A-T heterozygotes. 3. Summary of the frequency of heterozygous germ-line ATM truncation mutations in female breast cancer patients. 4. Summary of the frequency of heterozygous germ-line ATM truncation mutations in female breast cancer patients displaying severe tissue radiation side effects. 5. Proposed phenotype/genotype relationships in the two population model of ATM mutations. Adapted from Gatti et al.. 6. Breast cancer staging classification stratified by ethnicity. 7. Breast cancer location by ICD-0 code. 8. Descriptive statistics of subjects stratified by case or control status 9. Ethnic-specific distribution of missense variants in the ATM gene among breast cancer cases and controls. 10. Association between ATM variant carrier status and breast cancer risk by stage of disease. Odds ratios compare high stage to controls. 11. Ethnic-specific distribution of the L546V and F858L missense variants in the ATM gene among breast cancer cases and controls. 19 24 28 31 33 43 43 48 49 50 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. xi LIST OF TABLES (continued). TABLE 12. Association between L546V variant carrier status and breast cancer risk by stage of disease and family history of breast cancer among African-American women only 64 13. Descriptive characteristics of subjects carrying multiple ATM missense variants as compared to all participants combined. 75 14. Characteristics of breast cancer cases (high stage and low stage) and controls carrying missense mutations at multiple A TM sites. 77 15. Summary of studies evaluating the presence and frequency of germ-line ESR1 variants in female breast cancer patients as compared to controls. Risk estimates compare variant carriers (homozygous or heterozygous) to wild-type homozygotes. 101 16. Summary of studies evaluating the role of variation ESR1 variants in susceptibility to phenotypes other than female breast cancer. Risk estimates compare minor allele variant carriers (homozygous or heterozygous) to wild-type homozygotes. 107 17. Summary of studies evaluating the role of variation ESR2 variants in susceptibility in various disease phenotypes other than female breast cancer. Risk estimates compare variant carriers (homozygous or heterozygous) to wild-type homozygotes. 117 18. Tabular representation of two SNPs in full disequilibrium. 133 19. Tabular representation of two SNPs after free recombination. 133 20. ESR1 SNP location, distance between sites and genotyping percentage. 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Xll LIST OF TABLES (continued). TABLE 21. ESR1 minor allele frequencies for the 114 SNPs in each of five ethnic groups 148 22. Common haplotype patterns in each block of ESR1 by ethnicity. 159 23. Haplotype tagging SNPs (htSNPs) capturing the haplotypes of ESR1 by ethnicity (effective minimum Rh2 = 0.7). 164 24. Frequency distribution of haplotype tagging SNPs (htSNPs) capturing the haplotypes of ESR1 by ethnicity 168 25. ESR2 SNP location, distance between sites and genotyping percentage. 181 26. ESR2 minor allele frequencies for the 32 SNPs in each of five ethnic groups 182 27. Common haplotype patterns in each block of ESR2 by ethnicity. 188 28. Haplotype tagging SNPs (htSNPs) capturing the haplotypes of ESR2 stratified by ethnicity. 189 29. Frequency distribution of haplotype tagging SNPs (htSNPs) capturing ESR2 haplotypes of ESR2 by ethnicity 190 30. Descriptive statistics of subjects stratified by case or control status: 206 31. Predicted and observed haplotype frequencies in each block of ESR1 207 32. Ethnic-specific distribution of 37 variants in the ESR1 gene among breast cancer cases and controls. 212 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. xiii LIST OF TABLES (continued). TABLE 33. SNP by SNP association between ESR1 variants and breast cancer risk 218 34. Association between ESR1 ‘phased’ haplotype dose and breast cancer 223 35. Risk estimates using ‘phased’ data for each common haplotype (>5%) 227 36. Association between ESR1 Block #1 ‘risk’ haplotypes (h2412 and h4412 carriers) and breast cancer 231 37. Association between ESR1 haplotype dose fitting both dominant and recessive risk models 233 38. Association between ESR1 haplotypes using an indicator function 239 39. Predicted and observed haplotype frequencies in each block of ESR2 243 40. Ethnic-specific distribution of 15 variants in the ESR2 gene among breast cancer cases and controls 244 41. SNP by SNP association between ESR2 variants and breast cancer risk by stage of disease 249 42. Association between ESR2 ‘phased’ haplotype dose and breast cancer risk 252 43. Risk estimates using ‘phased’ data for each common haplotype (>5%) 255 44. Association between ESR2 haplotype dose fitting both dominant and recessive models of risk 256 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES (continued). TABLE 45. Association between ESR2 haplotypes using an indicator function 46. ATM SNP location, and distance between sites: LD mapping project. 47. Descriptive statistics of subjects (n= 3936) stratified by ethnicity. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. XV LIST OF FIGURES FIGURE 1. The array of malignancies reported among ataxia- telangiectasia patients............................................................................4 2. Schematic of the ATM gene demonstrating position and size of the 66 exons. 7 3. Linkage of ATM to cell cycle arrest regulation by Chk-2 protein kinase............................................................................10 4. Phosphorylation sites of activated ATM responding to ds DNA breaks............................... 11 5. Schematic pathway by which ATM mutations lead to carcinogenesis .....................................................................................12 6. Spectrum of ATM Mutations among A-T Patients........................... 15 7. Declining proportion of truncating mutations reported among A-T patients.............................................................17 8. Proportion of truncating mutations, exon skipping variants, and missense mutations as reported among unrelated A-T patients........................................................................18 9. Prevalence of ATM truncating mutations in breast cancer............. 29 10. A family with a valine to glycine missense substitution associated with a mild A-T phenotype, radiosensitivity and predisposition to breast cancer....................................................35 11. Exonic organization of A TM with missense variants of interest annotated.................................................................................45 12. Established hormonal risk modifiers for breast cancer.................... 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. x v i LIST OF FIGURES (continued). FIGURE 13. Genetic component of sporadic breast cancer which may account for observed excess familial risk.................................87 14. Rates of invasive breast cancer: Tamoxifen versus Placebo 89 15. Functional domains of estrogen receptor (E R )................................ 91 16. The signal transduction pathways that are available to estrogen in initiation of gene transcription...................................93 17. ESR1 SNP assay results: 114 SNPs retained for LD and haplotype.....................................................................................143 18. Intronic and exonic organization of ESR1 with SNP assay results........................................................................................144 19. ESR1 D statistic plots for each ethnicity (Figures 19a-e).............153 20. ESR2 SNP assay results: 32 SNPs retained for LD and haplotype analysis..............................................................................179 21. Intronic and exonic organization of ESR2 with SNP assay results denoted......................................................................... 180 22. ESR2 D ’ statistic plots for each ethnicity........................................184 23. Relationship between allele and/or haplotype frequency and sample-size requirements for population- based association studies..................................................................269 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABBREVIATIONS 95%CI: 95% Confidence Interval A-T: Ataxia-T elangiectasia ATM: Ataxia Telangiectasia Mutated (gene) CDCV: Common Disease / Common Variant Hypothesis ESR1: Estrogen Receptor 2 (alpha, a) gene ESR2: Estrogen Receptor 2 (beta, J3 ) gene htSNPs: haplotype tagging SNPs ICD-O: International Classification of Diseases - Oncology LD: Linkage Disequilibrium SEER: Surveillence, Epidemiology and End Results SNPs: Single-Nucleotide Polymorphisms TSC: The SNP Consortium Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT xviii Genetic factors in breast cancer are suggested in the strong familial tendency noted as early as 1866 by Paul Broca, commenting on his own family’s experience. Studies of ataxia-telangiectasia (A-T) families have suggested an increased risk of breast cancer among obligate female heterozygous carriers of ATM mutations. We characterized the prevalence and distribution o f ATM missense variant in a case-control study among women participating in a prospective multi-ethnic cohort. Haplotype-based association studies of candidate genes offer a more comprehensive approach than has been previously achievable for association studies of complex disease. Genes encoding isoforms of the estrogen receptors (ESR1 and ESR2) are promising candidate loci in the pathogenesis of breast cancer. A second objective of this thesis was to determine if haplotypic variation at the ESR1 and ESR2 loci significantly modified the risk of breast cancer. We began by measuring the extent of linkage disequilibrium (LD) and inferred accompanying haplotypic diversity across the ESR1 and ESR2 loci respectively in a multiethnic panel of unrelated individuals. We assayed a total of 146 variants (approximately 60,000 genotypes) and observed a general pattern of strong blocks of LD; within each block there exist relatively few common haplotypes (>5%) which together accounted for the large majority of observed Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. chromosomes. Mean LD block size spanned a greater distance among the Japanese (28.0 kb), Caucasian (25.2 kb), and Hawaiian (21.9 kb) samples as compared to Latinos (17.0 kb) and African-Americans (11.9 kb). We then identified a reference set of “haplotype-tagging” SNPs (htSNPs) that faithfully recapitulated common haplotypes observed in the multiethnic panel for each of the blocks to be assayed. We outline a study design so that the effect of common variation in these loci can be comprehensively evaluated for their effects on the risk of breast cancer in a case-control study among 3936 African-American, Native Hawaiian, Latina, Japanese, and Caucasian women. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 CHAPTER 1 MISSENSE MUTATIONS IN THE ATAXIA- TELANGIECTASIA MUTATED (ATM) GENE AND BREAST CANCER SUSCEPTIBILITY: A REVIEW 1.1. Summary Deficiencies detecting and repairing DNA damage lead to mutations and chromosomal abnormalities, hallmarks of cancer. The gene mutated in ataxia- telangiectasia (ATM) is a crucial and proximal component of the DNA damage recognition and cell cycle checkpoint control system. Homozygous carriers of mutations in ATM develop a pleiotropic clinical phenotype which includes a marked predisposition to cancer. Additionally, surveys of A-T families have provided initial support for an increased risk of breast cancer among female A-T heterozygotes. Evidence to the contrary includes results from breast cancer families that have not demonstrated linkage with the ATM locus on 1 lq23, although A TM may simply not be penetrant enough to be exhibited in breast cancer families. Further, there is no increased frequency of ATM mutations among patients with breast cancer. It has been proposed that this paradox might be resolved if there are, in fact, two population of ATM heterozygotes - those with a truncating mutation coupled with a wild-type allele (A T h fmd' fi) which leads to either no or markedly Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reduced protein production versus those with a missense mutation coupled with a wild-type allele (A T A f"s/v /) which produces an altered, and perhaps aberrant, protein. To date, studies examining the frequency of ATM mutations in patients with breast cancer have been focused on detecting A T hfm nc mutations, specifically through the use of the protein truncation test (PTT) method. Whereas detection of ATK f 1 I S mutations has been examined in small groups of breast cancer patients, there have been no large case-control association studies of these ATM variants. 1.2. Ataxia-Telangiectasia Ataxia-telangiectasia (A-T) is a pleiotropic inherited disease characterized by neurodegeneration, cancer, immunodeficiencies, radiation sensitivity and genetic instability. A-T is caused by mutations in the A TM (Ataxia Telangiectasia Mutated) gene, and is inherited as an autosomal recessive disorder (Khanna et al. 1998; Lavin and Shiloh 1997). Individuals that completely lack a functional A TM protein have the disease A-T, originally known as the Louis-Bar syndrome (Louis- Bar 1941). It occurs with an estimated frequency of 1:40,000 live births in outbred populations (Meyn 1997). A-T patients have a median life expectancy of approximately 30 years (Li et al. 1999). Among the multisystemic clinical sequelae seen in A-T patients, the most characteristic is a progressive cerebellar degeneration leading to truncal and limb ataxia, dysarthria, nystagmus, and ocular apraxia (Boder 1985). Particularly Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 affected are Purkinje cells in the cerebellum. Functional abnormalities are not typically manifested in infancy, rather present as gait abnormalities by 12-18 months of age; the differential diagnosis includes cerebral palsy. Cerebellar ataxia progresses gradually and, by their teens, patients are unable to walk and are confined to a wheelchair. Along with functional neurologic abnormalities comes a continuous loss of peripheral and central nervous system neurons (Sedgwick and Boder 1991). Deficits in humoral and cellular immune responses, evident as recurrent infections of the respiratory system, are also characteristic of A-T homozygosity. Most patients (approximately 70%) have an IgA deficiency, concomitant with decreased T-helper (CD4+) cell function (Boder 1985). The combined effects of immunodeficiency and cerebellar dysfunction makes aspiration pneumonia the leading cause of death among A-T patients (Li et al. 1999). As many as one third of A-T patients develop cancer, making it the most frequent non-neurologic complication of A-T. Most of these cancers are of the lymphoid type and generally manifest within the first 15 years of life (Morrell et al. 1990; Peterson et al. 1992). The array of malignancies seen in A-T patients includes (Figure 1): • Lymphomas: non-Hodgkin’s (40% of all A-T tumors) and Hodgkin’s (5%) • Leukemias: acute lymphocytic (20%) and chronic T-cell Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 • Solid tumors: gastric carcinoma, breast carcinoma, medulloblastoma, basal cell carcinoma, ovarian dysgerminoma, hepatoma, and uterine leiomyoma Non-Hodgkin's Lymphoma Acute Lymphocytic Hodgkin's Lymphoma Leukeiria Figure 1. The array of malignancies reported among ataxia-telangiectasia patients (Hecht and Hecht 1990; Morrell et al. 1986; Peterson et al. 1992; Sandoval and Swift 1998; Spector et al. 1982; Taylor 1992). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5 Cutaneous manifestations of A-T include, most notably, oculocutaneous telangiectasias which appear on the sclera, face as well as the antecubital and popliteal fossae by 7 years of age (Cabana et al. 1998). Older A-T patients may exhibit cutaneous signs of premature aging, including greying of the hair, senile keratosis, and areas of hypo- and hyper-pigmentation (Sedgwick and Boder 1991). Other sequelae include abnormal X-ray sensitivity (Gatti et al. 1991) and impaired fertility; women often have congenital hypoplasia of the ovaries so that menarche is delayed or absent (Woods and Taylor 1992). Although these clinical features have been considered to be representative of “classic” A-T, it is important to note that variations in the disease have been noted. For instance, there exist milder forms of the disease - which display a later onset and a slower clinical progression. Further, radiation sensitivity (e.g. severe dermatologic reactions) may be reduced or even absent in some ethnic groups (Fiorilli et al. 1985). Other disorders display a “partial” A-T phenotype - and the degree of ataxia, level of immunodeficiency and presence or absence of tumors may present in differing combinations and severity (Taylor et al. 1987). The underlying molecular mechanism of the phenotypic heterogeneity of A-T remains unexplained. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 1.3. The ATM Gene: Structure and Function 1.3.1. Structure of the ATM Gene Genetic linkage analysis using a large Amish A-T pedigree permitted Gatti and colleagues to map the presumptive disease locus to chromosome region 1 lq22-23(Gatti et al. 1988). It was subsequently narrowed to a 500 kb minimal region on 1 lq23 (Lange et al. 1995) and identified definitively in 1995 by positional cloning (Savitsky et al. 1995a). Further, this effort established ATM as the sole gene responsible for the heterogeneous A-T disorder. Previous characterization of heterokaryons had defined four complementation groups in A- T, designated A/B, C, D and E (Jaspers et al. 1988; Mumane and Painter 1982) as well as two variant groups including Nijemgen breakage syndrome and A-T Fresno (Jaspers et al. 1988). Whereas previously it had been unclear whether these complementation groups represented different genes or were distinct mutations within the same gene, the cloning effort demonstrated conclusively that mutations detected from the ATM gene were representative of all four complementation groups (Savitsky et al. 1995a). Occupying a genomic region of ~150 kb, ATM is composed of 66 exons, and contains an open reading frame (ORE) of 9168 nucleotides (Savitsky et al. 1995b); the first two exons are alternative leader exons (Figure 2). The first methionine is located in exon 4, and the stop codon is in the 3’, and largest, exon Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7 66. The -360 kD ATM protein primarily localizes to the nucleus, but can be found in cytoplasmic vesicles (Brown et al. 1997; Watters et al. 1997). While the ATM protein is expressed in a wide variety of adult and embryonic tissues, its expression is particularly high in the developing nervous system, spleen, thymus and testes (Chen and Lee 1996). This wide distribution undoubtedly relates to the pleiotropic nature of the disease. 5000 10000 15000 20000 25000 30000 35000 40000 45000 _ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I H—H-----1 II I I llll 50000 55000 60000 65000 70000 75000 80000 85000 90000 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I__ H H f l I I I I I I I I I I I I I I 95000 100000 105000 110000 115000 ■ 120000 125000 130000 135000 _ l _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ i _ _ _ _ _ _ _ 1_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ t _ _ _ _ _ _ 1 1 1 ii i i 1 11— i i m u m i i — 1 1 1 PI-3 Kinase I 140000 145000 150000 155000 ;160000 165000 170000 175000 180000 i _ J _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ l _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ ! I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ _ I_ _ _ _ _ _ _ _ _ _ _ _ _ I____ - H H Figure 2. Schematic of the ATM gene demonstrating position and size of the 66 exons, phosphatidylinositol 3-kinase (PI-3 kinase) domain at the carboxy terminus noted. The most distinctive feature of ATM is a phosphatidylinositol 3-kinase (PI- 3 kinase) domain near the C-terminus of the protein (Savitsky et al. 1995a) (Figure 2). Through this domain, ATM is related to the protein family essential for cell Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 signal transduction, specifically in the presence of DNA damage, and for activating cell cycle checkpoints (Elledge 1996; Savitsky et al. 1995b; Zakian 1995). This family includes the related ATR (AT- and Rad3-related) protein kinase in vertebrates, the MEC1 and Rad3 in yeast, and mei-41 in Drosophila. This kinase domain, however, represents less than 10% of the total ATM protein and few additional functional domains or interaction sites have been identified to date. However, a proline rich motif that interacts with the protein tyrosine kinase c-Abl has been discovered along with a leucine zipper in the N terminus (Baskaran etal. 1997; Shafm anetal. 1997). 1.3.2. ATM Function: The Maintenance of Genomic Integrity Evidence that the ATM gene plays a central role in responding to DNA damage came initially from case reports of A-T patients who had fatal reactions to radiation therapy (Eyra et al. 1988; Gotoff et al. 1967; Morgan et al. 1968). It has also been shown that cultured cells from A-T patients are more sensitive to the cytotoxic effects of ionizing radiation (IR) as compared to control cells (Arlett et al. 1988; Cole et al. 1988). Similarly, cell lines derived from A-T patients exhibit defects in several IR-inducible cell-cycle checkpoints; most critical being arrest in the G1 phase of the cell cycle (Kastan et al. 1992; Morgan and Kastan 1997). This collective evidence was interpreted to mean that A-T cells were defective in the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9 ability to coordinate cell cycle transitions in response to DNA damage; an interpretation supported by recent studies in yeast. Cell cycle transitions require exquisite “molecular logic” (Elledge 1996) whereby a transition not only turns off the previous state but simultaneously promotes the future state. The essential task of the cell cycle is to precisely replicate DNA during S phase and ensure its equal distribution during M phase. Checkpoint controls are crucial in maintaining genomic integrity for their role in slowing the cell cycle in the event of stress or damage permits repair to take place. Abandoning such controls results in a tendency to remain in cycle leading to uncontrolled cellular proliferation, the hallmark of cancer. Oncogenic processes exert their greatest effect at the G1 phase regulation - for after a restriction point late in G l, cells become refractory to regulatory signals and progress through to division (Pardee 1989; Sherr 1994). The p53 gene is responsible for G l arrest in response to genotoxic damage; it is the most frequently mutated gene in human cancer (Hollstein et al. 1991; Levine et al. 1991; Nigro et al. 1989). Recent evidence has identified ATM as an essential and proximal component of restriction point control. Its scope of interaction includes the phosphorylation and activation of p53 (Canman et al. 1998), c-Abl (Baskaran, 1997 #20; Shafman, 1997 #41] and Chk2 proteins (Chaturvedi et al. 1999; Matsuoka et al. 1998), as well as the inactivation of Cdc25 (Blasina et al. 1999) - all of which either promote apoptosis or cell cycle arrest (Figures 3 and 4). Lastly, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 the manner in which ATM inactivation, occurring via truncating or missense mutations, can lead to carcinogenesis is outlined in Figure 5. DNA Damage ATM p53 i p21 i Cdc2/cydin B ► Chk2 Gl I Cdc25C I Cdc2/cydin B ♦ G2 * - M Figure 3. Linkage of ATM to cell cycle arrest regulation by Chk-2 protein kinase. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 Figure 4. ATM* DNA Double Stm nd Hteal M dm 2 6 7 C h e c k p o h Arrest Phosphorylation sites of activated ATM in response to double stranded DNA breaks. ATM acts as an essential and proximal component of restriction point control via its scope of interaction. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 12 ATM' Mutations and Carciii 11 ij -.A TM Abnormal Responses to DNA damage: •Lack o f cell cycle arrest •Repair abnormality •Short telomere T / fi TM i i Normal Responses to DNA damage: •Cell cycle arrest •Normal DNA Repair u e tto m tc SfahiUtv Genomic instability: •Chromosomal abberations •Translocations •Accumulation o f mutations in oncogenes I c Figure 5. Schematic outlining the pathway by which ATM mutations lead to carcinogenesis highlighting ATM's role in the maintenance of genomic integrity (Khanna 2000). More germane to the oncogenesis of breast cancer, however, is the recently demonstrated biochemical connection between ATM and the inherited breast cancer susceptibility gene BRCA1 (Cortez et al. 1999). It has long been hypothesized that BRCA1 is involved in the response to DNA damage. Evidence supporting this hypothesis includes its expression profile which is linked to the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13 cell-cycle - elevated levels are seen at the Gl/S transition and G2 phase and decline in early G l (Rajan et al. 1997). Further, BRCA1 is hyperphosphorylated in response to DNA damage but, interestingly, this phosphorylation is markedly diminished, although not entirely absent, in^TM-deficient cells (Scully et al. 1997). The work of Cortez and colleagues further revealed that BRCA1 and ATM reside in a nuclear complex and that ATM phosphorylates BRCA1 in a cluster of residues at the carboxy terminus in response to y-radiation (Cortez et al. 1999). It is as yet unclear how this interaction relates to the function of BRCA2. The demonstrated co-localization of BRCA1 and BRCA2 (Rajan et al. 1997) along with the similarities in phenotype conferred by mutations in each (Tavtigian et al. 1996; Wooster et al. 1995) suggests a common function. The demonstrated link between ATM and BRCA1 along with an inferred relationship with BRCA2 defines a molecular pathway that may be disrupted in a fraction of breast cancer patients and represents one biologic mechanism by which ATM heterozygosity is associated with an increased risk of breast cancer. 1.4. Epidemiology of disease in A-T patients and ATM heterozygotes 1.4.1. The Spectrum o f Mutations in A-T Patients To date, over 200 unique ATM mutations have been described from greater than 300 A-T patients (Figure 6); a current list is maintained at Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. www.vmresearch.orgA4 7M htm P. Most patients are compound heterozygotes and, in many populations, there exists a strong founder effect (Concannon and Gatti 1997; Telatar et al. 1998a). Null, or truncating, mutations are predominant - currently 70% of ATM mutations identified result in premature protein truncations (Stankovic et al. 1998; Telatar et al. 1998b) with the remaining mutations consisting of missense mutations and small, in-frame insertions and deletions - nearly all of which similarly code for a truncated protein. There are slightly more mutations reported in the 3’ half of th z ATM gene. As a result of initial cloning efforts as well as the reliance on a protein truncation test for mutation detection, it is important to note that the positioning and type of ATM mutations may not be comprehensive. More have been reported in the 3’ end of the gene as the first published sequence of ATM included only exons 28-65 (Savitsky et al. 1995a). Hence, many mutation screens did not query the entire length of the gene. Further, the proportion of truncating mutations reported may also be biased upwards for, in many studies, a protein truncation test (PTT) has been used to identify mutations - a methodology which would necessarily not detect all but those missense mutations so dramatic so as to change the eletrophoretic properties of the protein itself or small in-frame insertion/deletions. There is evidence that the prevalence of null mutations in A-T has been over-estimated. Since the first report by Gilad Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. ■ E x o n S k ip /N o n se n s e A N o In itia tio n /F r a m e s h ift • M is s e n s e A A A A m m t A A A m A m m a ------- H * ▲ ▲ ■ ■ ▲ ▲ ■ ▲ ■ ▲ ■ ■ ■ ■ ■ ■ ■ ■ ■ 1 2* L ▲ ▲ ▲ ■ ▲ ▲ A ▲ ▲ ■ ■ A A A A A A U m ▲■■▲■■■■■I -J----------------- ^ ------------------- L _ A A A A A A A ■ A A A A A A A A ■ ■ A ■ A ■ ■ ■ • • • • C ell cycle block in yea st (after UV dam age to DNA) V Phosphorylation: kinase cascade signalling G e it e E l e m e n ts Leucine Zipper RAD3 H om ology PI-3 Kinase 119337473 1 M I I! 1 1 M 1 i 1 1 ~ 1 5 0 k b M 1 1 i i i 11! I ! ! 1 1 Ml 1 1 1 1 11 I III I 119483609 ....................... — Soin-cc: A -T D atabase - http://w w w .vm research.org/atm .htm l Figure 6. Spectrum of ATM Mutations among A-T Patients. Adapted from: www.vmresearch.org/y4 TM.html. 16 and colleagues in 1996 (Gilad et al. 1996), which estimated that 89% of identified mutations were expected to inactivate the A TM protein by truncating it, this proportion has now been reduced (Figure 7) with recent estimates suggesting a value closer to 70% (Concannon and Gatti 1997; Gatti et al. 1999; Wright et al. 1996). Stankovic and colleagues (Stankovic et al. 1998), in a comprehensive reporting of the spectrum of 59 unique ATM mutations from 78 A-T patients noted that 71% of the mutations were predicted to lead to protein truncation whereas the remaining 29% were in-frame deletions or missense mutations associated with the expression of some ATM protein. Similarly, Sandoval (Sandoval et al. 1999) characterized ATM mutations in 66 unrelated A-T patients, finding a similar distribution of mutation type (Figure 8). Lastly, there is ample evidence that mutations are routinely missed - likely the result of the reliance on the PTT for identification. Gatti et al. have suggested that the failure to detect existing mutations is a significant problem. They noted that among 191 A-T patients for whom any mutation was detected, only 63 had both mutations detected - corresponding to a sensitivity of 66% (Gatti et al. 1999). Hence, these data suggest that the true frequency of A-T heterozygotes in the general population may be underestimated by solely relying on the PTT as a detection method. In considering breast cancer, if the frequency of missense mutations differs between cases and controls, then the risk difference Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 17 would not only be biased towards the null but given the rarity of A-T heterozygosity in the population as a whole, would likely be missed altogether. 8 5 / i Ihnation Percent Glad (1996, W ig t Gbncamon Stankmc Gatti (1999, hf44) (1996,n=39) (1 997, ( 1 9 9 8 * 1 1 = 3 9 ) n = 3 8 3 ) i f = 115) Figure 7. Declining proportion of truncating mutations reported among A-T patients (Concannon and Gatti 1997; Gatti et al. 1999; Gilad et al. 1996; Stankovic et al. 1998; Wright et al. 1996). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 18 • 57% nonsense - Protein truncation • 20% splice site substitution - Exon skipping • 20% missense - Residual ATM protein Figure 8. Proportion of truncating mutations, exon skipping variants, and missense mutations as reported among unrelated A-T patients. (Sandoval et al. 1999). 1.4.2. Clinical Phenotype of A-T Heterozygotes - Family Studies An increased incidence of cancer among relatives of A-T patients was first noted in 1966 by Reed and colleagues (Reed et al. 1966) who culled anecdotal reports. Since then, the incidence and mortality due to a broad array of neoplasias among A-T families has been characterized and compared to that which would be expected from standardized rates (detailed in Table 1). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 1. Summary of family studies characterizing the observed versus expected cancer incidence or mortality among presumptive A-T heterozygotes (significantly increased outcome measure in bold). Cause of Death O bserved Expected Risk Estim ate (95% c i y SM R SIR A uthor, Y ear Relationship to AT Patient All Causes 3 4.9 0.62 (0.13,2.92) SMR (Pippard et al. 1988) Parents 24 23.5 1.02 (0.65, 1.51) SMR (Inskip et al. 1999) Parents 158 154.9 1.02 (0.86, 1.19) SMR (Inskip et al. 1999) Grandparents All NeoDlasms 67 52.6 1.27 (0.98, 1.64) SMR (Swift et al. 1976) Relatives 138 103 1.40 (0.60, 1.66) SIR (Swift et al. 1987) Relatives >20 years 3 1.5 2.0 (0.41,5.84) SMR (Pippard et al. 1988) Parents 50 37.6 1.33 (0.99, 1.76) SIR (Morrell et al. 1990) “Close” Relatives' 11 7.7 1.42 (0.72,2.53) SMR (Inskip et al. 1999) Parents 46 37.4 1.23 (0.90, 1.69) SMR (Inskip et al. 1999) Grandparents 22 16.5 1.33 (0.84,2.02) SIR (Geoffiroy-Perez et al. 2001) Obligate, typed AT Heterozygotes Breast 6 5.1 1.18 (0.43,2.57) SMR (Swift et al. 1976) Relatives 27 20.5 1.32 (0.87, 1.93) SIR (Swift et al. 1987) Relatives >20 years 2 0.17 11.76 (1.42,42.45) SM R (Pippard et al. 1988) Parents 9 5.6 1.61 (0.74,3.06) SIR (Morrell et al. 1990) “Close” Relatives 6 3.9 1.7 (0.60,4.3) SIR (Borresen et al. 1990a) Relatives' 20 11.2" 1.8 (1.10,2.77) SM R (Swift et al. 1991) Relatives 5 1.1 4.55 (1.89,10.9) SIR (Janin et al. 1999) O bligate, typed AT Hets <=44 years Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 1 (continued). Cause of Death Observed Expected R isk Estim ate (95% CI)a SM R SIR A uthor, Y ear Relationship to AT Patient Breast (con’t.) 4 4.61 0.86 (0.93,6.61) SIR (Janin et al. 1999) Obligate, typed AT Hets >=45 years 3 0.9 3.37 (0.69,9.84) SMR (Inskip et al. 1999) Parents 3 3.4 0.89 (0.18,2.59) SMR (Inskip etal. 1999) Grandparents Gl Tract 9 2.27 3.96 (1.81, 7.53) SIR (Geoffiroy-Perez et al. 2001) Obligate, typed AT Heterozygotes Esophagus 1 0 N/A SMR (Swift etal. 1976) Relatives 2 1.1 1.89 (0.23,6.84) SMR (Inskip etal. 1999) Grandparents Stomach 10 5 2.0 (0.96,3.68) SMR (Swift etal. 1976) Relatives 4 5.3 0.75 (0.20, 1.92) SIR (Swift etal. 1987) Relatives >20 years 6 1.2 5.00 (1.84, 10.90) SIR (Morrell et al. 1990) “Close” Relatives 1 0.5 2.02 (0.03, 11.26) SMR (Inskip et al. 1999) Parents 6 3.5 1.71 (0.63,3.72) SMR (Inskip et al. 1999) Grandparents Colon 6 5.9 1.01 (0.37,2.22) SMR (Swift et al. 1976) Relatives 3 3 1.00 (0.21,2.93) SMR (Inskip et al. 1999) Grandparents Rectum 1 0.3 3.36 (0.09, 18.74) SMR (Inskip et al. 1999) Parents 2 1.7 1.16 (0.14,4.28) SMR (Inskip et al. 1999) Grandparents Colon & Rectum 2 5.5 0.36 (0.04, 1.30) SIR (Morrell et al. 1990) “Close’ Relatives 3 2.3 1.30 (0.27,2.80) SMR (Swift etal. 1976) Relatives 20 20 1.00 (0.61, 1.54) SIR (Swift et al. 1987) Adult Relatives t o o Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 1 (continued). C ause o f Death O bserved Expected Risk Estim ate (95% CD* SM R SIR A uthor, Y ear Relationship to AT Patient Female Reproductive Cervix Uteri 2 1.0 2.00 (0.24, 7.22) SMR (Swift etal. 1976) Relatives 6 4.7 1.28 (0.35,2.79) SIR (Swift etal. 1987) Relatives >20 years 2 0.7 2.94 (0.36, 10.63) SMR (Inskip et al. 1999) Grandparents Ovary 4 1.7 2.35 (0.64, 6.02) SMR (Swift etal. 1976) Relatives 4 3.6 1.11 (0.30,2.84) SIR (Swift et al. 1987) Adult Blood Relatives 3 0.9 3.36 (0.67, 9.84) SMR (Inskip et al. 1999) Parents 1 1.1 0.90 (0.02, 5.00) SMR (Inskip etal. 1999) Grandparents Solid Orsans Pancreas 5 2.4 2.08 (0.91,4.85) SMR (Swift etal. 1976) Relatives 7 3.3 2.12 (0.44,6.19) SIR (Swift etal. 1987) Relatives >20 years 2 0.9 2.22 (0.27, 8.01) SIR (Morrell et al. 1990) “Close” Relatives 3 5.7 1.89 (0.39, 5.53) SMR (Inskip et al. 1999) Grandparents Lung 8 6.3 1.27 (0.55, 0.65) SMR (Swift etal. 1976) Relatives 12 14.5 0.83 (0.43, 1.45) SIR (Swift etal. 1987) Relatives >20 years 8 5.0 1.60 (0.85,4.15) SIR (Morrell et al. 1990) “Close” Relatives 5 2.1 2.34 (0.76, 5.45) SMR (Inskip et al. 1999) Parents 10 10.2 0.98 (0.47, 1.80) SMR (Inskip et al. 1999) Grandparents Kidney 1 0.2 6.64 (0.17, 37.01) SMR (Inskip et al. 1999) Parents Liver, Gallbladder, 6 1.2 5.00 (1.84,10.90) SM R (Swift et al. 1976) Relatives Bile Ducts Male Reproductive Prostate 2 0 N/A SMR (Swift etal. 1976) Relatives 9 7.0 1.29 (0.59,2.45) SIR (Swift et al. 1987) Relatives >20 years 0 0.02 N/A SMR (Pippard et al. 1988) Parents Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 1 (continued). C ause of Death O bserved Expected R isk Estim ate (95% CD" SM R SIR A uthor, Y ear R elationship to AT Patient Prostate (con’t) 7 2.3 3.04 (1.22, 6.26) SM R (M orrell et al. 1990) “Close” Relatives 2 1.3 1.56 (0.19,5.65) SMR (Inskip et al. 1999) Grandparents Testis 2 0 N/A SIR (Swift etal. 1987) Relatives >20 years Penis 1 0 N/A SMR (Swift etal. 1976) Relatives Genitourinary Bladder 8 5.9 1.36 (0.59,2.68) SIR (Swift etal. 1987) Relatives >20 years 2 1.8 1.11 (0.13,4.01) SMR (Morrell et al. 1990) “Close” Relatives Kidney 1 0 N/A SMR (Swift etal. 1976) Relatives 1 0.03 36.07 (0.91,200.91) SMR (Pippard et al. 1988) Parents Hematoloeic & Lvmphoid All Combinedf 13 9.3 1.40 (0.75,2.39) SIR (Swift etal. 1987) Relatives >20 years 5 3.0 1.67 (0.54,3.89) SMR (Morrell et al. 1990) “Close” Relatives Some Combined8 6 4.2 1.43 (0.53,3.12) SMR (Swift etal. 1976) Relatives NHL 1 0.01 199.26 (5.04, SM R (P ippard et al. 1988) Parents 1 1.7 1109.87) 1.67 (0.04,9.29) SMR (Inskip et al. 1999) Grandparents Thyroid 4 1.5 2.67 (0.73,6.83) SIR (Swift et al. 1987) Relatives >20 years Lymphosarcoma 2 0 N/A SMR (Swift etal. 1976) Relatives Other Melanoma 5 2.6 1.92 (0.62,4.47) SIR (Swift et al. 1987) Relatives >20 years 4 0.9 4.44 (1.21,11.37) SM R (M orrell et al. 1990) “Close” Relatives ro N ) Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 1 (continued). Cause of Death Observed Expected Risk Estim ate (95% 0 ) “ SM R SIR A uthor, Y ear Relationship to AT Patient Lip, Oral Cavity, Pharynx 4 4.8 0.83 (0.23,2.13) SIR (Swift etal. 1987) Relatives >20 years Larynx 3 0 N/A SIR (Swift et al. 1987) Relatives >20 years Brain 1 0 N/A SMR (Swift et al. 1976) Relatives 0 0.07 N/A SMR (Pippard et al. 1988) Parents Bone 1 0 N/A SMR (Swift etal. 1976) Relatives Pelvic Viscera 1 0 N/A SMR (Swift et al. 1976) Relatives Unspecified 10 0 N/A SIR (Swift etal. 1987) Relatives >20 years a SMRs and SIRs are calculated by dividing the observed number of events to the expected. Confidence intervals are reported directly from the citation, if provided, or are estimated using tabulated values for estimating a Poisson-distributed variable: Breslow and Day, (Breslow and Day 1980) Table 2.10. b All neoplasms excludes non-melanoma skin cancers and in situ cervical cancers. 0 Relatives in the study by Morrell et al. include all grandparents, parents, aunts, uncles, siblings, nieces and nephews. d Breast cancer incidence data from Swift, (Swift et al. 1991)expected values based on SEER data provided by Kuller LH and Modan B. e Borresen’s relatives include: the parents, in the parents' siblings, grandparents, and grandparents' siblings. f Includes CLL (5), AML (1), Hodgkin’s lymphoma (3), lymphosarcoma (2), multiple myeloma (1), NHL (1) 8 Includes CLL (0), AML (2) Hodgkin’s lymphoma (2), lymphoma (0), unspecified lymphatic leukemia (2), unspecified leukemia (1) t o 24 It is notable that among all of these studies, only for breast cancer is there a suggestion of an increased risk among presumptive A-T heterozygotes (summarized in Table 2). In a 1994 meta-analysis, Easton (Easton 1994a) estimated this risk to be 3.9-fold (95% Cl: 2.1, 7.2); a more recent analysis provided by Inskip (Inskip et al. 1999) estimated the relative risk to be 3.5 (95% Cl: 2.0, 6.0). Because A-T heterozygotes have been estimated to constitute between 0.5-1% of the general population (Easton 1994a), a significant fraction of otherwise “sporadic” breast carcinoma could be attributable to mutated ATM alleles. This fraction has been estimated to be 3.8 - 8% by Swift and colleagues (Swift et al. 1991; Swift et al. 1987). Table 2. Estimated risks for breast cancer among presumptive A-T heterozygotes - adapted from Easton (Easton 1994a). Outcome M easure Obs. Cases RR 95% C l Relationship to AT Patient Com parison G roup A uthor, Year Incidence 27 6.8 2.0, 22.6 Relatives >20 yrs Spouses (Swift et al. 1987) Incidence a 23 5.1 1.5, 16.9 Relatives Spouses (Swift et al. 1991) Incidence 6 3.9 1.3, 12.1 Parents (Obligate Heterozygotes) Relatives (Borresen et al. 1990b) Mortality 6 1.7 0.6, 4.3 Parents and Grandparents SMR (Nat’l Rates) (Inskip et al. 1999) TOTAL 62 3.5 2.0, 6.0 a A prospective cohort study, whereas Swift (Swift et al. 1987)was a retrospective study of documented cancer incidence. These studies are neither without controversy nor methodologic shortcomings. Swift’s 1991 publication has been most particularly criticized, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 largely as a result of the conclusion that diagnostic or occupational exposure to ionizing radiation increases the risk of breast cancer among A-T heterozygous women. Nevertheless, the use of spousal controls in this paper as the comparison group may have been inappropriate as they were younger than the case patients. For instance, the ratio of person-years for cases as compared to controls was 1.6 among the 20-39 year age group, 2.8 in the 60-69 group, and 4.7 in the 70 and above group. Given that the risk of breast cancer increases with age, some portion of the estimated RR might be due to this inequality. Kuller and Modan (Kuller and Modan 1991) suggest that a “healthy spouse effect” results in overestimating the risk of breast cancer in this case-control comparison. Their assertion is supported by the lower incidence of breast cancer among spousal controls (n, observed: 3) than would be expected by applying national rates determined by the Surveillance Epidemiology and End-Results (SEER) program (n, expected: 5.8). More generally, all the family studies above are subject to possible bias in family recruitment due to higher participation of families with breast cancer. Such ascertainment bias would inflate the observed association between breast cancer and A-T heterozygosity. Retrospective case finding efforts involving questionnaires are subject to recall bias but, in all studies, only cancers confirmed by hospital or pathology records were included. On the other hand, the possibility remains that the relative risk estimates above are, in fact, biased Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 6 towards the null as not all relatives of A-T patients comprising the “at risk” group will necessarily be A-T carriers. In order to avoid the potential for ascertainment bias, two studies examined the inheritance patterns of polymorphic genetic markers linked to the ATM locus in order to positively identify heterozygotes among breast cancer patients within A-T families. Athma et al. (Athma et al. 1996) found that among 33 A-T family members with breast cancer, 25 were found to be A-T heterozygotes, while only 14.9 were expected - this association corresponds to an odds of 3.8. Janin et al (Janin et al. 1999) estimated the breast cancer risk among molecular genotyped A-T heterozygotes to be 3.32. These odds were even more pronounced among those 44 years and below (RR=4.55) as compared to those above 44 (RR=2.48). Hence, these two studies using genotyping confirmation to identify ATM heterozygotes are in agreement with the increased breast cancer risk detected in family studies. 1.4.3. Frequency of Truncating ATM Mutations in Patients with Breast Cancer Given that the family studies of A-T index cases, including those with molecular genotyping confirmation, show consistent evidence for an elevated risk of breast cancer in AT heterozygotes, it has been hypothesized that the ATM gene could be implicated in sporadic or familial breast cancer. However, studies examining the frequency of ATM mutations in a broad spectrum of patients with Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 27 breast cancer have not supported this supposition (Figure 5). These studies include loss of heterozygosity (LOH) studies, linkage analysis and mutation screening efforts. Taken together, these contradictory results have led some investigators to hypothesize that two distinct A-T carrier populations exist. Inactivation of the ATM gene may be a frequent event in the development of breast cancer. Loss of heterozygosity (LOH) has been shown to occur in at least 40% of sporadic breast tumors(Carter et al. 1994; Hampton et al. 1994). Subsequently, the region exhibiting loss was more precisely mapped to two independent areas on 1 lq23.1 which includes the ATM gene (Laake et al. 1997). Whereas LOH studies provide support for the existence of a tumor suppressor gene in this region, genetic linkage analyses do not support this supposition. Two independent studies do not demonstrate linkage between breast cancer and the ATM gene on 1 la22-23. (Cortessis et al. 1993; Wooster et al. 1993). Both of these studies were conducted among cases defined by multiple-case families and, as such, it remains a possibility that the AT gene is involved in other (e.g. “sporadic”) forms of breast cancer. The fact that several lines of evidence implicated the involvement of loci at region 1 lq23 in breast cancer development and that the A TM gene was cloned in 1995, a number of studies began reporting mutation screening results from cohorts of breast cancer patients and controls (summarized in Table 3). Again, a lack of continuity in results emerges, even within the same research groups. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 8 Table 3. Summary of the frequency of heterozygous germ-line ATM truncation mutations in female breast cancer patients. Case Pathology and Family History Details Age (years) Truncating M utation: Freq. (% ) A uthor, Year Primary breast cancer, unselected for family history Mean: 55.8 0/38 (0%) (Vorechovsky et al. 1996a) Primary breast cancer with family history of breast cancer or tumors associated with A-T Mean: 53 3/ 88 (3.4%) (Vorechovsky et al. 1996b) Early onset breast cancer All < 40 2 / 401 (0.5%) (FitzGerald et al. 1997) Primary breast cancer with family history of breast cancer. BRCA1/2 mutations excluded. Not reported 1 /100 (1.0%) (Chen et al. 1998) Primary breast cancer patients. Moderate or absent family history of breast cancer. BRCA1/2 mutations excluded. All < 40 0 /100 (0%) (Izattetal. 1999a) “Sporadic” breast cancer Mean: 53.4 0 /4 7 (0%) (Bebb et al. 1999) Primary breast cancer (25% advanced disease), unselected for family history. <55 (n=150) 1 / 483 (0.2%) (Laake et al. 2000) Breast cancer (60% unilateral, 40% bilateral) with long-term survival, 25% with family history First primary <45 7 / 82 (8.5%) (Broeks et al. 2000) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. •o ce Broeks, 2000 L aake, 2000 B ebb, 1999 O ppitz, 1999 Nichols, 1999 Izatt, 1999 Shayeghi, 1998 C hen, 1998 A ppleby, 1997 FitzG erald, 1997 V orechovsky, 1996b V orechovsky, 1996a 3 4 5 6 7 8 Percent (% ) o f Breast Cancer C ases w ith A T M Truncating M utations Figure 9. Prevalence of ATM truncating mutations in breast cancer (Appleby et al. 1997; Bebb et al. 1999; Broeks et al. 2000; Chen et al. 1998; FitzGerald et al. 1997; Izatt et al. 1999a; Laake et al. 2000; Nichols et al. 1999; Oppitz et al. 1999; Shayeghi et al. 1998; Vorechovsky et al. 1996b; Vorechovsky etal. 1996a). 30 For instance Vorechovsky et al. (Vorechovsky et al. 1996a) were the first to screen for AT heterozygotes in sporadic breast cancer patients using a PCR single strand conformational polymorphism (PCR-SSCP) detection methodology. No mutations were found when at least 1 A-T carrier would be expected among 38 consecutive cases of breast cancer (assuming a 0.5 - 1.0% population carrier frequency and an estimated 3-fold relative risk among cases). The same research group also selected patients for mutational screen that reported a family history of breast cancer or a family history of other cancers associated with A-T homozygosity - including leukemia/lymphoma or gastric cancer. (Vorechovsky et al. 1996b). Using a PCR-SSCP analysis, 3 truncating mutations were discovered among 88 breast cancer cases; only 0.44 AT carriers were expected among these 88 individuals given a 0.5% population carrier frequency. Since the initial observations of Vorechovsky et al. (Vorechovsky et al. 1996b; Vorechovsky et al. 1996a), a broad spectrum of breast cancers have been screened for truncating mutations in thq ATM gene using a cDNA-based protein truncation (PTT) assay as the mutation detection method (summarized in Table 2a). Breast cancer study cohorts include: unselected series (Bebb et al. 1999; Laake et al. 2000; Vorechovsky et al. 1996a), patients with early-onset disease (FitzGerald et al. 1997), patients with (Chen et al. 1998; Vorechovsky et al. 1996b)or without (Izatt et al. 1999a) a family history, and frequent occurrence of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 31 bilateral disease (Broeks et al. 2000). Further, given the relationship between A- T and radiation sensitivity, additional studies have selected breast cancer patients showing adverse reactions to radiation therapy (Appleby et al. 1997; Nichols et al. 1999; Oppitz et al. 1999; Shayeghi et al. 1998) or Hodgkin’s patients who developed breast cancer after radiation therapy (Nichols et al. 1999) (Table 4). Of all these studies, only one additional study found a proportion of truncating mutations above the population estimate (Broeks et al. 2000). Taken together, these studies provide little evidence that heterozygous truncating ATM mutations confer a genetic predisposition either to early onset breast cancer or to familial breast cancer. Table 4. Summary of the frequency of heterozygous germ-line ATM truncation mutations in female breast cancer patients displaying severe tissue radiation side effects. Case Pathology and Family History Details Age (years) Truncating Mutation: Freq. (%) Author, Year Breast cancer patients with severe response to radiotherapy Not reported 0/16 (0%) (Appleby et al. 1997) Breast cancer patients with radiotherapy complications Not reported 1 / 41 (2.4%) (Shayeghi et al. 1998) Breast cancer patients with radiation-associated malignancies after Hodgkin’s disease Median: 38 Range: 29-66 0/24 (0%) (Nichols et al. 1999) Breast cancer patients with severe tissue radiation side effects3 Mean: 60.8 SD: 12.6 0/10 (0%) (Oppitz et al. 1999) a Male breast cancer (1 case) excluded. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 32 1.5. Are ATM Missense Mutations a Distinct and Clinically Significant Class of ATM Mutations One could reasonably conclude from the studies summarized above that truncating mutations are no more common among breast cancer cases than among control subjects. So why should it be that two different study designs, one estimating the risk of breast cancer among A-T heterozygotes identified in families, and the other characterizing the frequency of A-T heterozygosity among breast cancer cases should reach such dramatically different conclusions? To resolve this paradox, Gatti, Tward and Concannon (Gatti et al. 1999) propose that there are two populations of A-T carriers, one group with at a truncating allele coupled to a wild type and the other with a missense allele coupled to a wild type. Whereas truncating mutations would block expression of ATM protein, missense mutations could code for stable ATM proteins that are present at normal intracellular concentrations but function abnormally. The lynchpin of such a postulated model mandates that the two classes of ATM mutations would have distinct effects on ATM function and cancer risk. These proposed genotype/phenotype relationships are outlined below (Table 5). Evidence in support of such a model, however, can only be achieved by determining the frequency o i ATM missense mutations, either in the homozygous state or coupled to a wild type allele, in the general population. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 33 Table 5. Proposed phenotype/genotype relationships in the two population model of ATM mutations. Adapted from Gatti et al. (Gatti et al. 1999). ___________ Genotype_________________________Phenotype____________ Wild Type Allele / Wild Type Allele Normal Truncating Mutation / Truncating Mutation Ataxia-telangiectasia & Cancer susceptibility Missense Mutation / Missense Mutation Neurological symptoms & Cancer susceptibility Misense Mutation / Truncating Mutation Ataxia-telangiectasia & Cancer susceptibility Truncating Mutation / Wild Type Allele Cancer susceptibility (?) Misense Mutation / Wild Type Allele________ Cancer susceptibility (?)___________________ Generally, few missense mutations have been described or, more accurately, may have been overlooked. For instance, Vorechovsky et al. (Vorechovsky et al. 1996b) reported 5 amino acid substitutions in breast cancer cases and controls. Izatt et al. used a fluorescent chemical cleavage assay to detect mismatched DNA and found a missense mutation in the PI-3-kinase domain (G8293A) and five rare variants (Izatt et al. 1999a). It has previously been shown that overexpression of a mutant ATM polypeptide lacking a PI-3 kinase domain increases genetic instability in normal cells grown in culture (Morgan et al. 1997). Most compelling epidemiologically, however, is Stankovic’s (Stankovic et al. 1998) description of a valine to glycine substitution in the 3’ end of the gene (T7271G) which was associated with a milder clinical phenotype, lower radiosensitivity and a predisposition to breast cancer. It was discovered in one family in the homozygous state (Figure 10). This family contained one A-T Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 34 patient living into his 7th decade. Equally remarkable was the observation that an affected A-T woman had a son (as noted above affected women typically have congenital hypoplasia of the ovaries). The second family had two brothers who were compound heterozygotes for this mutation (the other being a 7 nucleotide deletion). Both families sharing this missense mutation had a family history of breast cancer. Recent evidence from cell and animal systems has provided growing support for this model. The T7271G missense has been shown in expression and activity studies to yield a dominant negative inhibitor of ATM (Chenevix- Trench et al. 2002). In addition, an inducible expression system for A TM has been developed showing that several other missense alleles outside of the kinase domain induce a partial A-T phenotype when introduced into normal cells also in a dominant-negative fashion, perhaps through a mechanism interfering with ATM-ATM multimerization and interaction (Scott et al. 2002). Lastly, a knock- in mouse model of a known A-T-causing in-frame deletion results in mice with a significant number of solid tumors (Spring et al. 2002). This in-frame deletion results in the production of a functionally distrupted, nearly full-length ATM and, hence, has important implications for missense mutations. Taken together, these observations provide compelling support for cancer predisposition among human A-T missense carriers. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 35 Figure 10. A family with a valine to glycine missense substitution associated with a mild A-T phenotype, radiosensitivity and predisposition to breast cancer. Adapted from Stankovic et al. (Stankovic et al. 1998). The arrows represent reported breast cancer; also noted is the age at which the cancer occurred. 1.6. Conclusions and Study Rationale Considerable molecular evidence places A TM as a key and proximal component in DNA-damage response, maintenance of genomic integrity, and in the regulation of cell cycle checkpoints. Further, its recently demonstrated functional interaction with BRCA1 provides a molecular basis that may explain R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 36 why ATM heterozygosity is associated with an increased risk of breast cancer in family studies. The degree to which the breast cancer risk is increased in A-T heterozygotes remains an open debate, made more complex by postulation that there exist two types of A TM heterozygotes- resulting in distinct phenotypes. It is clear that a comprehensive molecular epidemiologic analysis o f ATM missense variants in breast cancer is required. This study seeks to not only describe the frequency of ATM missense mutations in a multiethnic population but also to provide estimates of the relative risk of breast cancer associated with such mutations using a case-control study design. R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 37 CHAPTER 2 THE RELATIONSHIP BETWEEN TWENTY MISSENSE A TM VARIANTS AND BREAST CANCER RISK 2.1. Summary 2.1.1. Objective Deficiencies in tasks of detecting and repairing DNA damage lead to mutations and chromosomal abnormalities, a hallmark of cancer. The gene mutated in ataxia-telangiectasia (ATM) is a proximal component in performing such tasks. Studies of ataxia-telangiectasia (A-T) families have suggested an increased risk of breast cancer among obligate female heterozygous carriers of ATM mutations. Paradoxically, studies of sporadic and familial breast cancer have failed to demonstrate an elevated prevalence of mutations among breast cancer cases. 2.1.2. Methods We characterized the prevalence and distribution of 20 ATM missense mutations/polymorphisms in a population-based case-control study of 854 African-American, Latina, Japanese, and Caucasian women aged 45 years and above participating in the Multiethnic Cohort Study (MEC). The study population included 428 incident breast cancer cases and 426 controls. R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 38 2.1.3. Results The prevalence of variants ranged from 0% to 13.6% among controls and varied by ethnicity (0%-32.5%). Overall, these data provide little support for an association of ATM missense mutations with breast cancer among older women. We observed only one sequence variation (L546V), common among African- American women, to be over-represented among all high stage breast cancer cases (OR=3.35; 95% Cl 1.27, 8.84). After correction for multiple comparisons, this observed risk modification did not attain statistical significance. 2.1.4. Conclusions The distribution of/lTM missense mutations and polymorphisms varied widely across the four ethnic groups studied. Although a single missense variant (L546V) appeared to act as a modest predictor of risk, the remaining variants were no more common in breast cancer cases as compared to controls. 2.2. Introduction Ataxia-telangiectasia (A-T) is a pleiotropic inherited disease characterized by neurodegeneration, oculocutaneous telangiectasias, an increased incidence of cancer, immunodeficiencies, radiation sensitivity and genetic instability (Boder 1985; Shiloh 1995). The gene mutated in A-T, ATM (Ataxia R ep ro d u ced with p erm issio n o f th e copyright ow n er. Further reproduction prohibited w ithout p erm ission . 39 Telangiectasia Mutated), is composed of 66 exons, spans more than 150kb (Savitsky et al. 1995a) and is a member of the phosphatidylinositol 3-kinase family (Savitsky et al. 1995b). Heterozygous carriers of germline ATM variants are estimated to constitute 0.35-1% of the general population; the majority of these variants (>70%) are predicted to lead to either truncation or altered splicing of the protein (Gilad et al. 1996; Shiloh 1995). Recent evidence has identified A TM as an essential and proximal component of cell cycle restriction point control. Its scope of interaction includes proteins involved in apoptosis or cell cycle arrest. Further, it has been demonstrated that a biochemical connection exists between A TM and the inherited breast cancer susceptibility gene BRCA1 wherein ,4 TMphosphorylates BRCA1 in a cluster of residues at the carboxy terminus in response to y-radiation (Cortez et al. 1999). Such an interaction may represent a mechanism by which variation in ATM could be key in the pathogenesis of breast cancer. Studies of A-T families have suggested an increased risk of breast cancer among obligate female heterozygous carriers of A-T variants (Athma et al. 1996; Concannon and Gatti 1997; Easton 1994a; Swift et al. 1991; Swift et al. 1987). Paradoxically, studies of sporadic and familial breast cancer have failed to consistently demonstrate an elevated prevalence of germline ATM gene variants among breast cancer cases (FitzGerald et al. 1997; Vorechovsky et al. 1996a). To resolve these apparently disparate findings, Gatti et al. proposed a model for R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 40 the role of ATM heterozygosity in breast and other cancers, positing two classes o f ATM mutations (Gatti et al. 1999) - null or truncating mutations that lead to A-T and missense mutations that cause cancers. There exists initial support for this model including the fact among cohorts of breast cancer cases missense mutations are observed whereas truncating mutations are not common (Dork et al. 2001; Sommer et al. 2002; Teraoka et al. 2001). More specifically, Stankovic et al. (Stankovic et al. 1998) identified a missense mutation (T7271G or V2425G) in the PI3-kinase region in two A-T families which was associated with a 13-fold increased risk of breast cancer. It was subsequently demonstrated that this mutation yields a dominant negative inhibitor of ATM (Chenevix-Trench et al. 2002). In addition, in vitro evidence has shown missense alleles outside of the kinase domain can induce a partial A-T phenotype in a dominant-negative interaction [Scott, 2002], Lastly, mouse models of a known A-T-causing in-frame deletion mutation produce mice with a significant number o f solid tumors (Spring et al. 2002). Taken together, these observations provide support for cancer predisposition among human A-T missense carriers. To date, precise estimates of the risk of breast and other cancers associated with ATM missense variants are unavailable. But, given a population prevalence of 0.5-1% of ATM variant heterozygotes, any elevated risk would carry with it significant clinical implications. Whereas the prevalence o f ATM R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 41 missense mutations has not been comprehensively evaluated in a multi-ethnic population, one might expect a wide variety based on the marked ATM sequence diversity noted when comparing African and non-African populations (Thorstenson et al. 2001). In this study, we evaluated the relationship between 20 missense variants/polymorphisms in the A TM gene identified in a separate sequencing effort, with breast cancer risk in a case-control study among African- America, Latina, Japanese, and Caucasian women participating in the Multiethnic Cohort Study (MEC). 2.3. Materials and Methods 2.3.1. Multiethnic Cohort Study Population This nested case-control study is part of a large, ongoing, multiethnic cohort study in Hawaii and Los Angeles, California with an emphasis on diet and other lifestyle characteristics in the etiology of cancer. Aspects of this large cohort as well as details of its design and implementation are described more fully elsewhere (Kolonel et al. 2000). Briefly, participants were recruited between 1993 and 1996 from driver’s license files in Hawaii and California; the age range at baseline was between 45-75 years. The focus was on four main ethnic groups - African-Americans, Japanese-Americans, Latinos/Latinas and Caucasians. The total number of male and female subjects who comprised the cohort was 215,251. Among women, baseline data were collected on 22,251 R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 42 African-Americans, 29,957 Japanese, 26,502 Caucasians, and 24,620 Latinas. Participants have completed a baseline questionnaire designed for self administration which included five sections: (i) Background, including medical history and family cancer history; (ii) Diet history; (iii) Medication use; (iv) Physical activity; (v) Female reproductive history, including menstruation history, parity, age at first full term pregnancy, oral contraceptive use, age at menopause and the use of hormones. Eligible cases were women enrolled in the cohort and diagnosed between 1993 and 1998 with a new primary, histologically confirmed breast cancer identified by linkage of the cohort to population-based cancer Surveillence, Epidemiology and End Results (SEER) registries in Hawaii and California. Cases were contacted by letter and phone call and agreed to provide a blood specimen. The participation rate for providing a blood sample on request was 74% for cancer cases. Women with carcinoma in situ (non-infiltrating pathology) and neoplasms of the skin of the breast were not included as breast cancer cases. Information on stage of disease was ascertained from the SEER Registries and used in subgroup analyses. Stage of disease was characterized as “localized” or “high stage” - which included regional (by direct extension and/or lymph node involvement) or systemic. For this particular effort, a nested case-control study was designed with the intention of comprehensively analyzing the role of rare ATM missense R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 43 variants in a multi-ethnic, population-based sample. Given that ATM missense mutation frequencies were hypothesized to be over-represented among breast cancer cases, approximately 100 cases from each ethnic group were initially selected (n=428) and women diagnosed with high stage disease were over sampled (n=222) as compared to cases who initially presented with localized disease (n=206). Breast cancer staging classification by ethnicity is outlined in Table 6 below and the site of disease among all cases combined is detailed in Table 7. Table 6. Breast cancer staging classification stratified by ethnicity (n=428 cases): number (column percent in parentheses). Ethnicity Low Stage Disease (Localized) High Stage Disease (Regional/Systemic) TOTAL African-American 51 (24.6) 66 (29.9) 117 (27.3) Japanese 54 (26.1) 46 (20.8) 100 (23.4) Latina 51 (24.6) 50 (22.6) 101 (23.6) Caucasian 50 (24.1) 60 (27.1) 110 (25.7) TOTAL 206 222 428 Table 7. Anatomic location of breast cancer by ICD-0 code (n=428 cases): number (column percent in parentheses). ICD-O Code Anatomic Location of Neoplasia N um ber of Cases (% of Total) C50 Breast, excluding skin 127 (29.7) C50.1 Central portion of breast 15 (3.5) C50.2 Upper-inner quadrant of breast 33 (7.7) C50.3 Lower-inner quadrant of breast 20 (4.7) C50.4 Upper-outer quadrant of breast 109 (25.5) C50.5 Lower-outer quadrant of breast 18 (4.2) C50.6 Axillary tail o f breast or tail o f breast, NOS 4 (0.9) C50.8 Overlapping lesion o f breast 49 (11.4) C50.9 Breast, NOS 53 (12.4) TOTAL 428 R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 4 4 Blood samples had also been collected from an approximately 3% random sample of healthy cohort members at baseline (Feigelson et al. 1999; McKean-Cowdin et al. 2001). In this effort, we selected approximately 100 controls for each of the four ethnic groups (n=426); the participation rate for cohort controls was 66%. Only controls with no previous diagnosis of breast cancer upon were included. The study was approved by the Institutional Review Board of the Keck School of Medicine of the University of Southern California. 2.3.2. ATM Missense Variant Discovery A sequencing effort had previously been undertaken to discover missense variants spanning the full-length sequence transcript of the ATM gene (Gilad and Skaliter, personal communication). Full sequence analysis of ATM was performed on cDNA from peripheral lymphocytes of patients and controls. Briefly, a nested reverse transcription-polymerase chain reaction (RT-PCR) approach was used to generate overlapping, internally labeled PCR products. These cover the entire sequence of ATM and were analyzed by sequencing of RT-PCR products. All base changes were reconfirmed. This study included a total of 274 individuals comprised of 94 primary breast cancer patients, 70 bilateral cases and 63 individuals without disease selected within a hospital- based group of breast cancer cases from the United States. From this effort, 20 missense variants of interest were identified, the locations of which are R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 4 5 represented in Figure 11. A number of the variants identified, but not all, had been previously described (Dork et al. 2001; Stankovic et al. 1998; Thorstenson et al. 2001; Vorechovsky et al. 1996b; Vorechovsky et al. 1996a). 161 73 332 663 902 1236 1608 1899 2125 2377 26392839 3078 la lb 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16171819 202122 3154 3403 3747 3994 4237 4612 4777 5006 5320 5675 59196096 6453 68086976 7090 7308 7630 7928 8152 84198672 8988 I I __________ 1 I I I I __________ I ___ Figure 11. Exonic organization of A TM with missense variants of interest annotated. 2.3.3. Genotyping Genomic DNA was purified from the buffy coats of peripheral blood samples for all cases and controls using the Puregene DNA Isolation protocol and kit (Gentra Systems; Minneapolis, MN). SNP genotyping was using the fluorogenic 5'-nuclease assay (TaqMan Assay) (Lee et al. 1993). The TaqMan R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm issio n . 46 assay was performed using a TaqMan PCR Core Reagent Kit (Applied Biosystems) according to manufacturer’s instructions in a final volume of 20uL. Using a fluorescent dye labeled probe specific for each allele, the profile of each well was measured in a Sequence Detection System (Model 7700 or Model 7900HT; Applied Biosystems) and the results analyzed with Sequence Detection Software (Applied Biosystems). Assays were run concurrently with negative controls to detect contamination. Additional quality control measures included the use of unique sample identifiers and a 5% intercalated repeat sample to ensure consistency in genotyping calls. 2.3.4. Statistical Analysis Data management, descriptive and univariate analyses were performed using the SAS statistical software Version 8.01 (SAS Institute; Cary, NC). The EpiLog software system (EpiCenter Software; Pasadena, CA) was used to estimate odds ratios and 95% CIs by unconditional logistic regression while adjusting for ethnicity. A Bonferroni correction for multiple comparisons was used to define the a level of significance in order to avoid spurious positive results. This a critical value for these analyses is < 0.0025 (0.05 / 20). Given this level of significance, this study as designed has sufficient power to detect a relative risk of 2.2 for a 10% minor allele, 2.8 for a 5% minor allele and 6.5 for a 1% minor allele at 80% power. R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 47 2.4. Results We characterized the prevalence and distribution of 20 A TM missense mutations/polymorphisms in a case-control study of 854 African-America, Latina, Japanese, and Caucasian women aged 45 years and above participating in the Multiethnic Cohort Study. Associations between established reproductive breast cancer risk factors and breast cancer risk were generally consistent with expectation in all ethnic groups among cases and controls (Table 3). For instance, cases more often reported a family history of breast cancer (18.0% of cases versus 9.9% of controls; p=0.01) and tended to have a later first full term pregnancy (after age 30: 11.6% versus 6.4%; p = 0.04). The prevalence of variants ranged in frequency from 0% to 13.6% among controls for all ethnicities combined and varied widely by ethnicity (Table 4). Two of the missense variants (D126E and D1853N) are previously described common polymorphisms and are present in equal frequencies among cases and controls (Table 4). Most of the other missense variants were uncommon and did not appear to be over-represented among breast cancer cases (Table 9). We did observe an exon 13 missense variant (L546V) to be over represented among all breast cancer cases as compared to controls (Crude OR: 2.44; 95% Cl 0.91, 6.54; Table 3). This association, however, was limited to African American women as the L546V was relatively common within this R ep ro d u ced with p erm issio n o f th e copyright ow n er. Further reproduction prohibited w ithout p erm ission . 48 group; 7.7% overall; 10.3% among all cases (12.1% among high stage) and 5.1% among controls. The L546V missense mutation was also seen in two Latina cases but not seen among any of the Japanese or Caucasian study participants. Table 8. Descriptive statistics of subjects stratified by case or control status: total observations with percent in parentheses. Variable Cases Controls Total p-valuea (n=428) (n=426) (n=854) Age (in years)b Less than 55 85 (19.9) 107 (25.1) 192 (22.5) 5 5 -6 4 146 (34.1) 136 (31.9) 282 (33.0) 65 and above 197 (46.0) 183 (48.2) 197 (51.8) 0.1839 Ethnicity African-American 117 (27.3) 117 (27.5) 234 (27.4) Japanese 100 (23.4) 100 (23.5) 200 (23.4) Latina 101 (23.6) 99 (23.2) 200 (23.4) White 110 (25.7) 110 (25.8) 220 (25.8) 0.9995 Family History: Breast Cancer Any reported 77 (18.0) 42 (9.9) 119 (13.9) 0.0090 Any <50 years 33 (7.7) 17 (4.0) 50 (5.9) 0.0015 Family History: Ovarian Cancer 19 (4.4) 21 (4.9) 40 (4.7) 0.8594 Age at First Menstrual Period <13 years 229 (54.1) 211 (49.9) 440 (52.0) 13 years or older 194 (45.9) 212 (50.1) 406 (48.0) 0.2420 Age o f First Full Term Pregnancy (of those pregnant) <20 years 108 (29.3) 129 (34.6) 237 (32.1) 21-25 years 129 (34.9) 141 (37.8) 270 (36.5) 26-30 years 87 (23.6) 78 (20.9) 162 (21.9) 31 years or older 45 (12.2) 25 (6.7) 70 (9.5) 0.0356 a p-value calculated by the x2 test for heterogeneity (categorical variables) comparing cases to controls. jj Age defined as age at blood draw among controls and age at diagnosis among cases. R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm issio n . R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 9. Ethnic-specific distribution of missense variants in the ATM gene among breast cancer cases and controls. All Ethnicities African-American Latina Japanese Caucasian (n=428) (n=426) (n= l 17) (n=l 17) (n=101) (n=99) (n = =100) (n: =100) (n=l 10) (n=l 10) Site” Am. Acid Cases Controls Cases Controls Cases Controls Cases Controls Cases Controls Change # (%) # (%) # (%) # (%) # (%) # (%) # (%) # (%) # (%) # (%) 146 S49C 3 (0.7) 6 (1.4) 2 (1.7) 1 (0.9) 0 2 (2.0) 0 1 (1.0) 1 (0.9) 2 (1.8) 378 D126E 43 (10.3) 53 (12.7) 34 (29.3) 38 (32.5) 6 (6.1) 8 (8.3) 0 1 (1.1) 3 (2.9) 6 (5.6) 544 V182L 7 (1.7) 4 (1.0) 4 (3.5) 4 (3.5) 3 (3.0) 0 0 0 0 0 1636 L546V 14 (3.3) 6 (1.4) 12 (10.3) 6 (5.1) 2 (2.0) 0 0 0 0 0 2119 S707P 6 (1.5) 5 (1.2) 1 (0.9) 0 3 (3.0) 2 (2.1) 0 0 2 (1.9) 3 (2.9) 2289 F763L 2 (0.5) 0 1 (0.9) 0 1 (1.0) 0 0 0 0 0 2572 F858L 5 (1.3) 4 (1.1) 1 (0.9) 0 1 (1.1) 2 (2.1) 1 (1.1) 0 2 (2.0) 1 (1.0) 2614 P872S 8 (1.8) 8 (1.9) 7 (6.0) 6 (5.2) 1 (1.0) 0 0 0 0 2 (2.0) 2932 S978P 0 0 0 0 0 0 0 0 0 0 3118 M l 040V 7 (1.7) 4 (1.0) 6 (5.2) 3 (2.7) 1 (1.0) 0 0 0 0 1 (0.9) 3161 P1054R 8 (1.9) 13 (3.1) 1 (0.9) 1 (0.9) 2 (2.0) 5 (5.1) 0 1 (1.0) 5 (4.6) 6 (5.5) 3383 Q1128R 4 (1.0) 4 (1.0) 4 (3.5) 3 (2.7) 0 1 (1.0) 0 0 0 0 4258 L1420F 4 (1.0) 7 (1.7) 0 1 (1.0) 2 (2.0) 2 (2.1) 0 1 (1.1) 2 (1.9) 3 (2.8) 5557 D1853N 47 (12.3) 47 (12.5) 9 (8.4) 8 (7.8) 13 (13.8) 13 (14.3) 2 (2.3) 5 (5.9) 23 (24.5) 21 (21.2) 5558 D1853V 2 (0.5) 1 (0.2) 0 1 (0.9) 0 0 0 0 2 (1.9) 0 6096 R2032S 0 1 (0.3) 0 1 (0.9) 0 0 0 0 0 0 6176 T2059I 2 (0.5) 2 (0.2) 2 (1.7) 2 (1.7) 0 0 0 0 0 0 6235 V2079I 3 (0.7) 6 (1.4) 1 (0.9) 2 (1.7) 1 (1.0) 3 (3.0) 0 0 1 (0.9) 1 (0.9) 6437 S2146T 2 (0.5) 2 (0.2) 1 (0.9) 2 (1.7) 1 (1.0) 0 0 0 0 0 6995 L2322F 5 (1.2) 4 (1.0) 5 (4.3) 4 (3.5) 0 0 0 0 0 0 a Nucleotide position in GenBank (accession number U82828). 4^ VO R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 10. Association between ATM variant carrier status and breast cancer risk by stage o f disease. Odds ratios compare high stage to controls (95% Confidence Intervals in parentheses). SITE (Amino Acid Chg.) O bserved Alleles High Stage Cases # (% ) Low Stage Cases # (% ) C ontrols # (% ) O dds R atio (OR): High Stage to C ontrol (95% C l) p-value S49C G/G 218 (99.5) 204 (99.0) 418(98.6) Reference C carrier 1 (0.5) 2 (1.0) 6 (1.4) 0.33 (0.04,2.45) 0.4915 D126E A/A 194 (90.2) 180 (89.1) 363 (87.3) Reference A /T& T/T 20 (9.8) 22 (10.9) 53 (12.8) 0.76 (0.44, 1.30) 0.3828 V182L G/G 210 (98.1) 196 (98.5) 400 (99.0) Reference C carrier 4 (1.9) 3 (1.5) 4 (1.0) 1.90 (0.48,5.52) 0.5852 L546V C/C 211 (95.5) 202 (98.1) 416(98.6) Reference G carrier 10 (4.5) 4 (1.9) 6 (1.4) 3.35 (1.27, 8.84) 0.0297 S707P A/A 212 (98.6) 197 (98.5) 401 (98.8) Reference G carrier 3 (1.4) 3 (1.5) 5 (1.2) 1.16 (0.27,4.88) 1.0 F763L T/T 215 (99.5) 201 (99.5) 405(100) Reference A carrier 1 (0.5) 1 (0.5) 0 Undefined 0.7417 F858L T/T 199 (97.5) 185(100) 378 (98.9) Reference C carrier 5 (2.5) 0 4 (1.1) 2.42 (0.67, 8.77) 0.3195 P872S A/A 211 (96.8) 202 (99.5) 408 (98.1) Reference G carrier 7 (3.2) 1 (0.5) 8 (1-9) 1.72 (0.62,4.77) 0.4383 S978P T/T 217(100) 200(100) 408(100) Undefined M1040V T/T 213 (97.7) 203 (99.0) 411 (99.0) Reference C carrier 5 (2.3) 2 (1.0) 4 (1.0) 2.46 (0.68, 8.87) 0.3082 O R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 10 (continued). SITE* (Amino Acid Chg.) Observed Alleles High Stage Cases # (% ) Low Stage Cases # (% ) C ontrols # (% ) O dds R atio (OR): High Stage to C ontrol (95% C l) p-value P1054R G/G 212 (97.3) 202 (99.0) 411 (96.9) Reference C carrier 6 (2.7) 2 (1.0) 13 (3.1) 0.91 (0.34,2.43) 1.0 Q1128R T/T 215 (99.5) 193 (98.5) 397 (99.0) Reference C carrier 1 (0.5) 3 (1.5) 4 (1.0) 0.47 (0.05,4.03) 0.8287 L1420F G/G 220(100) 199 (98.0) 406 (98.3) Reference G carrier 0 4 (2.0) 7 (1.7) 0 (0, Undefined) 0.1275 D1853N G/G 168 (86.1) 168(89.4) 330 (87.5) Reference A/G& A/A 27 (13.9) 20 (10.6) 47 (12.4) 1.10 (0.66, 1.84) 0.8181 D1853V A/A 215 (99.5) 200(99.5) 410 (99.8) Reference T carrier 1 (0.5) 1 (0.5) 1 (0.2) 1.94 (0.13,29.2) 1.0 R2032S A/A 209(100) 190 (100) 391(100) Reference T carrier 0 0 1 (0.3) Undefined 1.0 T2059I C/C 213 (99.1) 199(100) 403 (99.5) Reference T carrier 2 (0.9) 0 2 (0.5) 1.92 (0.28, 13.27) 0.8945 V2079I G/G 218 (99.5) 203 (99.0) 410 (98.6) Reference A carrier 1 (0.5) 2 (1.0) 6 (1.4) 0.32 (0.04,2.39) 0.4778 S2146T G/G 215 (99.5) 202 (99.5) 403 (99.5) Reference C carrier 1 (0.5) 1 (0.5) 2 (0.5) 0.95 (0.09,10.59) 1.0 L2322F T/T 211 (97.6) 200(100) 405 (99.0) Reference C carrier 5 (2.4) 0 4 (0.9) 2.45 (0.68, 8.83) 0.3121 Any V ariant None 151 (68.0) 149 (72.3) 291 (68.3) Reference Carrier 71 (32.0) 57 (27.7) 135 (31.7) 1.01 (0.72,1.44) 1.0 5 2 2.5. Discussion Studies of A-T families have documented an increased risk of breast cancer among both presumptive and obligate heterozygous carriers of ATM gene (Swift et al. 1991; Swift et al. 1987). While this observation has been corroborated by a Dutch study (Broeks et al. 2000) most other case-control studies to date have failed to support the hypothesis that ATM variant carriers are at an increased risk of breast cancer (Bay et al. 1999; Bebb et al. 1999; Chen et al. 1998; FitzGerald et al. 1997; Janin et al. 1999; Olsen et al. 2001; Vorechovsky et al. 1996a). Initial surveys, guided by the suggestion that most ‘at risk’ A-T alleles were truncating or null mutations (Gilad et al. 1996), relied on methods that identify aberrant pre-mRNA splice variants - namely PPT and SCCP methods. The PTT method would necessarily overlook rare missense variants and the SCCP method finds only 80% of these mutations. Hence, these studies likely underestimated the prevalence of ATM variants in breast cancer cases and controls. As a result, few missense variants have been described or, alternatively, may have been overlooked. Nevertheless, one missense mutation (T7271G) located in the PI3-kinase region has been shown to be highly penetrant for breast cancer, associated with an estimated 13-fold increased (Vorechovsky et al. 1996a) and yields a dominant negative inhibitor of ATM (Chenevix-Trench et al. 2002). R ep ro d u ced with p erm issio n o f th e copyright ow n er. Further reproduction prohibited w ithout p erm ission . 53 There is increasing evidence that missense variants in ATM encode stable, functionally abnormal proteins. Over-expression of a mutant ATM polypeptide has previously been shown to increase genetic instability in normal cells, thus displaying a dominant negative cellular phenotype (Khanna 2000); such dominant interference has been demonstrated using an in vitro mutagenesis approach (Scott et al. 2002). Finally, the ATM protein exists as a component of a multi-protein complex (Wang et al. 2000), thus, expression of a mutated protein from even a single missense allele might interfere with this complex. Considerable molecular evidence places ATM as a key and proximal component in DNA-damage response, maintenance of genomic integrity, and in the regulation of cell cycle checkpoints (Elledge 1996). Additionally, A T M s demonstrated functional interaction with BRCA1 (Cortez et al. 1999) - along with an inferred relationship with BRCA2 - defines a molecular pathway that may be disrupted in some fraction of breast cancer patients. Based on these observations, gene/gene interactions between ATM missense variants and variants/polymorphisms in BRCA1 and BRCA2 represent a promising avenue of further study. We characterized the prevalence and distribution of 20 ATM missense mutations/polymorphisms in a multiethnic study population, consisting of African-American, Latina, Japanese, and Caucasian women. In the aggregate, the variants characterized were rare, consistent with other ATM studies (Bonnen R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 5 4 et al. 2000; Thorstenson et al. 2001). Further, the ethnic distributions of specific variants were comparable to those reported in previous studies which observed striking differences between African and non-African populations (Thorstenson et al. 2001). We also observe, the D1856N variant to be frequent among Caucasians (21.1% of controls) whereas the D126E variant was very common among African-Americans (32.5%) but less often observed among the other ethnic groups. Thorstenson et al. observed a similar ethnic-specific distribution of these same two polymorphic markers and suggest that such variation may be either the result o f random genetic drift or due to selective pressure (Thorstenson etal. 2001). With the exception of the L546V missense mutation, we did not note a specific increase in the frequency of ATM missense mutations in breast cancer cases compared to controls. However, as a consequence of testing 20 variants, we did expect one to attain statistical significance as a consequence of multiple hypothesis testing. As such, a Bonferroni correction was employed; no individual variant attained the critical level of significance as determined by this procedure. One potential pitfall of this investigation is the issue of population stratification, which in case-control studies, can lead to spurious associations between a disease phenotype and unlinked candidate loci simply as a consequence of recent population admixture (Pritchard and Rosenberg 1999). R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 55 Although of concern, is an unlikely explanation for the demonstrated association in African-Americans as the observed D126E (Nt. 378) prevalence among cases and controls (-30%) is identical to that reported among Africans in a comprehensive survey of ATM diversity. (Thorstenson et al. 2001). An additional study limitation was that this investigation lacked sufficient power to effectively evaluate some ethnic-specific risks - most particularly among Japanese and Caucasians due to the low prevalence of the variant alleles. However, its multiethnic design will allow us to continue to examine this and other ATM variants in different ethnic groups in the future. We evaluated the possible association of 20 variant loci in four ethnic groups with breast carcinoma in this study. The L546V variant appears to act as a moderate, although not statistically significant, predictor of risk within a multiethnic population although its effect is almost exclusively observed among African-American women. Additional evaluation of missense variants is required to better characterize the effective contribution of this and other ATM missense variants. The degree to which ATM heterozygosity is associated with an increased risk in breast cancer remains an open debate. It is unlikely that all missense mutations will have the same effect, either as a result of their location in functionally significant regions of the protein or the magnitude of the encoded protein change. Hence, more research regarding the molecular structure and function of variant A TM is required. R ep ro d u ced with p erm issio n o f th e copyright ow n er. Further reproduction prohibited w ithout p erm ission . 56 CHAPTER 3 CO-SEGREGATION OF AN ATM MISSENSE VARIANT IN BREAST CANCER PATIENTS WITH A POSITIVE FAMILY HISTORY AND AN EARLIER AGE AT ONSET: THE MULTIETHNIC COHORT 3.1. Summary Mutations in BRCA1 and BRCA2, two breast cancer susceptibility genes, confer a high risk of female breast and ovarian cancer (Ford et al. 1998) but account for only a small portion of overall breast cancer susceptibility (Peto et al. 1999). In only about 40% of families with multiple cases of breast cancer are mutations in BRCA1 and BRCA2 genes identified. As such, it has been hypothesized that mutations conferring an increased risk of breast cancer exist in at least one, likely more, other genes. One such candidate is the gene mutated in ataxia-telangiectasia (ATM), a proximal component in performing tasks of detecting and repairing DNA damage. We examined the relationship between two ATM missense mutations - L546V and F858L (two variants which in an earlier investigation appeared to be over-represented among breast cancer cases; Chapter 2, Table 10) - and breast cancer risk among 2,106 women from four ethnic groups in a case-control study within the Multiethnic Cohort Study. Neither the L546V variant nor the F858L variant was associated with a statistically significant elevation in risk of breast cancer among all individuals R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 57 combined. Similar to the previously described D126E variant (Thorstenson et al. 2001), L546V was common (6.3% among controls) in African-Americans but quite rare among the remaining ethnic groups. When we considered the L546V variant among African-American women only, the only group among whom the prevalence was sufficient to estimate ethnic-specific risk, carriers with a concomitant first degree family history of breast cancer exhibited a dramatic elevation in risk for high stage breast cancer (OR=15.5; p=0.01). This is in contrast to those women in the same group who did not report a family history (OR=1.9; p=0.2). Additionally, among African-American women reporting a family history, it was noted that L546V carriers had incident disease nearly 5 years earlier than those without the variant (60.6 versus 65.4 years). These findings suggest that continued investigation o f ATM missense variants, particularly among younger women with a positive family history for breast cancer, is warranted. 3.2. Introduction Ataxia-telangiectasia is a pleiotropic inherited disease characterized by neurodegeneration, cancer, immunodeficiencies, radiation sensitivity and genetic instability (Boder 1985; Shiloh 1995). Heterozygous carriers of ATM (Ataxia Telangiectasia Mutated) germline variants are estimated to constitute 0.35-1% of R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 5 8 the general population; the majority of these variants (>70%) are predicted to lead to either truncation or altered splicing of the protein (Concannon and Gatti 1997; Gilad et al. 1996). Accumulating evidence suggests that while truncating mutations are the most common mutation in A-T patients, missense mutations are more often observed in individuals with breast cancer (Dork et al. 2001; Izatt et al. 1999a; Stankovic et al. 1998; Teraoka et al. 2001). For instance, one missense mutation in A T M ’ s PI3-kinase region (T7271G or V2425G; not considered in these analyses) has been associated with a 13-fold increased risk of breast cancer (Stankovic et al. 1998); the mutated protein itself yields a dominant negative inhibitor of ATM (Chenevix-Trench et al. 2002). An excess of missense mutations has been identified among breast carcinoma cases selected for a first degree family history (Teraoka et al. 2001) as well as an early age at diagnosis (Izatt et al. 1999a; Teraoka et al. 2001). In this study, we evaluated the relationship between two missense variants in the ATM gene with family history, age at onset, and breast cancer risk in a large case-control study among women in the Multiethnic Cohort Study. 3.3. Materials and Methods 3.3.1. Multiethnic Cohort Study Population This nested case-control study is part of a large, ongoing, multiethnic cohort study in Hawaii and Los Angeles, California with an emphasis on diet and R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 59 other lifestyle characteristics in the etiology of cancer. Aspects of this large cohort as well as details of its design and implementation are described more fully elsewhere (Kolonel et al. 2000) and are detailed in the Methods section of Chapter 2 (Section 2.3.1). Briefly, eligible cases were women enrolled in the cohort and diagnosed between 1993 and 1998 with a new primary, histologically confirmed breast cancer identified by linkage of the cohort to population-based cancer Surveillence, Epidemiology and End Results (SEER) registries in Hawaii and California. Cases were contacted by letter and phone call and agreed to provide a blood specimen. The participation rate for providing a blood sample on request was 74% for cancer cases. Women with carcinoma in situ (non-infiltrating pathology) and neoplasms of the skin of the breast (ICD-0 44.5) were not included as breast cancer cases. Information on stage of disease was ascertained from the SEER Registries and used in subgroup analyses. Stage of disease was characterized as “localized” or “high stage” - which included regional (by direct extension and/or lymph node involvement) or systemic. For this particular effort, a nested case-control study was designed with the intention of comprehensively analyzing the role of rare ATM missense variants in a multi-ethnic, population-based sample. Given that ATM missense mutation frequencies were hypothesized to be over-represented among breast R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 60 cancer cases, all eligible women with high stage disease were included (n=297) along with 791 women initially presenting with localized disease. Blood samples had also been collected from an approximately 3% random sample of healthy cohort members at baseline; the participation rate for cohort controls was 66%. Only controls with no previous diagnosis of breast cancer were eligible and a total of n=1018 were included in this effort. The study was approved by the Institutional Review Board of the Keck School of Medicine of the University of Southern California. 3.3.2. Genotyping Genomic DNA was purified from the buffy coats of peripheral blood samples for all cases and controls using the Puregene DNA Isolation protocol and kit (Gentra Systems; Minneapolis, MN). SNP genotyping was done using the fluorogenic 5'-nuclease assay (TaqMan Assay) (Lee et al. 1993). The TaqMan assay was performed using a TaqMan PCR Core Reagent Kit (Applied Biosystems) according to manufacturer’s instructions in a final volume of 20uL. Using a fluorescent dye labelled probe specific for each allele, the profile of each well was measured in a Sequence Detection System (model 7700 or model 7900HT; Applied Biosystems) and the results analyzed with Sequence Detection Software (Applied Biosystems). R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 61 3.3.3. Statistical Analysis Data management, descriptive and univariate analyses were performed using the SAS statistical software Version 8.01 (SAS Institute; Cary, NC). The EpiLog software system (EpiCenter Software; Pasadena, CA) was used to estimate odds ratios and 95% CIs by unconditional logistic regression. 3.4. Results We initially evaluated the association between the L546V and the F858L missense mutations, two variants which in an earlier pilot investigation appeared to be over-represented among breast cancer cases (see Chapter 2, Table 10) in a case-control study of 2106 individuals within the Multiethnic Cohort Study (n, affected=1088; n unaffected=1018). Four ethnic groups were considered: African American (n=492, 23.4% of study population), Latina (n=456, 21.7%), Japanese (n=587, 27.9%) and Caucasian (n=571, 27.1%). Demographic characteristics and associations between established reproductive breast cancer risk factors and breast cancer risk were generally consistent with expectation in all ethnic groups. Overall, cases more often reported a family history of disease (19.6% of cases versus 10.9% of controls), tended to have a later first full term pregnancy (after age 31 years: 8.8% versus 7.6%), and had an earlier age at menarche (before age 12 years: 27.9% versus 23.1%). R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 62 The prevalence of F858L was identical in cases and controls overall (1.0%) but was most often seen among Caucasian women (2.5% of controls; Table 1). The L546V variant was modestly more prevalent among breast cancer cases than in the comparison group (2.3% versus 1.9%; Table 1). Whereas this missense mutation was relatively common among African-American women (6.6% of controls), it was quite rare among Caucasian (0.7%), Japanese (0.3%), and Latina (0.5%) women. Neither variant was significantly associated with an increas Because the L546V variant was polymorphic among African-American women, we pursued its association with breast cancer in conjunction with family history and age of disease onset. This subset of the overall case-control sample was comprised of a total of 492 women (n, affected=254; n, unaffected=238). Within this group, the distribution of the L546V variant was similar when low stage breast cancer and controls were compared (O R =l.l; 95%CI 0.5, 2.45; Table 2). It was associated with a non-significantly increased risk of high stage breast cancer (OR=1.9; 95%CI 0.8, 4.4; Table 2). However, the risk of high stage breast cancer associated with the L546V variant was stronger among women who reported a first-degree family history of breast cancer (O R -15.5; 95%CI: 2.4, 99.3; Table 2) in contrast to all carriers combined. Lastly, given the apparent inter-relationship between family history, breast cancer risk and a positive family history, we hypothesized that age at diagnosis would be earlier among variant carriers. Mean age at diagnosis for all African- R ep ro d u ced with p erm issio n o f th e copyrigh t ow n er. Further reproduction prohibited w ithout p erm ission . 63 American women combined was 63.9 years (std. dev. 8.8). Among the five L546V variant carriers reporting a positive family history, this value fell to 60.6 years (std. dev. 9.5); two of these women (40%) had incident disease less than 55 years of age. Women with a similarly positive family history but carrying two wild-type alleles (n=10) had a mean age at diagnosis of 65.4 years (std. dev. 7.2); only one women (10%) had incident disease earlier than 55 years. The difference in the means, however, did not attain statistical significance (p=0.2). 3.5. Discussion These findings are consistent with studies that have noted an enrichment o f ATM missense mutations in breast carcinoma cases selected for a first degree family history (Teraoka et al. 2001) and an early age at diagnosis (Stankovic et al. 1998; Teraoka et al. 2001). The earlier age of diagnosis seen among African American carriers of the L546V missense variant is supported by the observation in A-T families, that the relative risk of breast cancer appears to be highest (6 to 7-fold increased) in obligate A-T carriers between 50 and 69 years of age (A thm aetal. 1996). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 11. Ethnic-specific distribution o f the L546V and F858L missense variants in the ATM gene among breast cancer cases and controls. Ethnicitv All Ethnicities African-American Latina Jananese Caucasian (n=1088) (n=1018) (n=254) (n=238) (n=240) (n=216) (n=303) (n=284) (n=291) (n=280) Site” Amino Acid Cases Controls Cases Controls Cases Controls Cases Controls Cases Controls Change # (%) * (%) # (%) # (%) # (%) # (%) # (%) # (%) # (%) # (%) 1636 L546V 25 (2.3) 19 (1.9) 21 (8.3) 15 (6.3) 3 (L3) 1 (0.5) 0 1 (0.4) 1 (0.3) 2 (0.7) 2572 F858L 5 (1.3) 4 (1.1) 1 (0.9) 0 1 (1.1) 2 (2.1) 1 (1.1) 0 2 (2.0) 1 (1.0) Table 12. Association between L546V variant carrier status and breast cancer risk by stage of disease and family history o f breast cancer3 among African-American women only.3 Stage of Disease All Women (with and without Familv Historv) Variant Wild Type Crude OR (95% Cl) Carriers # (%) Homozygotes Women Reportine a Familv Historvb Variant Wild Type Crude OR (95% Cl) Carriers # (%) Homozygotes Controls 15 (6.4) 220 Reference 1 (3.1) 31 Reference All Stages 21 (8.3) 233 1.4 (0.7, 2.7) 9(16.4) 46 6.5 (0.8, 53.5) Localized 12 (6.9) 162 1.1 (0.5,2.4) 4(10.0) 36 3.7 (0.4, 34.5) High Stage 9(11.3) 71 1.9 (0.8,4.6) 5 (33.3) 10 15.5 (2.4, 99.3) a 3 women had non-informative genotypes. b Reported breast cancer in mother and/or at least one sister. Os 65 Considerable molecular evidence places ATM as a key and proximal component in DNA-damage response, maintenance of genomic integrity, and in the regulation of cell cycle checkpoints (Elledge 1996). Additionally, ATM ’s demonstrated functional interaction with BRCA1 (Cortez et al. 1999) - along with an inferred relationship with BRCA2 - defines a molecular pathway that may be disrupted in some fraction of breast cancer patients. In particular, there is increasing evidence that missense variants in ATM encode stable, functionally abnormal proteins. Mutations in the kinase domain confer a dominant negative activity on ATM (Lim et al. 2000). Yet such dominant interference has also been demonstrated among missense mutations outside the kinase domain, hypothesized to occur via an ATM-ATM interaction mechanism (Scott et al. 2002). Given that the ATM protein exists as a component of a multi-protein complex (Wang et al. 2000), expression of a mutated protein from even a single missense allele might interfere with this complex; the L546V variant may be but one example. This study lacked sufficient power to effectively evaluate all ethnic- specific risks - particularly among Japanese and Caucasians due to the low prevalence of the variant allele. However, its multiethnic design will allow us to continue to examine the L546V and other ATM variants in different ethnic groups in the future. Further, it is important to note population stratification is of concern in this study (Pritchard and Rosenberg 1999) and cannot be ruled out as Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 6 an explanation for the demonstrated association in African-Americans. In brief, when a case-control sample is ethnically mixed or derived from a population which experienced mixture over recent generations, a non-random association may be observed at markers unlinked to a disease locus (Chakraborty and Weiss 1988; Lander and Schork 1994). However, evidence against this phenomenon includes the observation that the F858L variant was not observed in any African- American controls and in 2.7% of Caucasian controls. These prevalence ratios are within expectation to those reported among Africans (0%) and Europeans (6.3%) in a comprehensive survey o f ATM diversity. (Thorstenson et al. 2001) The observed positive association of the L546V variant with high stage breast cancer among women with a family history may indeed be due to chance. However, this high risk combined with an earlier age of disease onset lends credence to the suggestion that this particular variant is biologically relevant and influential in breast carcinogenesis. Confirmation of these results, particularly among younger women who report a family history of breast cancer, is required to better characterize the effective contribution of this and other ATM missense variants. The degree to which ATM heterozygosity is associated with an increased risk of breast cancer remains an open debate, one made more complex by the need to discriminate between ATM mutations that are pathogenic versus variants having little or no effect on protein function. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 67 CHAPTER 4 WOMEN CARRYING MULTIPLE MISSENSE MUTATIONS IN THE ATM GENE: IMPLICATIONS FOR RISK OF BREAST CANCER 4.1. Summary The relationship between ATM variant heterozygosity and breast cancer risk remains controversial. Studies of A-T families have suggested an increased risk of breast cancer among obligate female heterozygous carriers of A-T mutations with an estimated relative risk of 3.9. Paradoxically, studies of sporadic and familial breast cancer have failed to demonstrate an elevated prevalence of germline ATM gene mutations among breast cancer cases. To date, most case-control studies of ATM variants and the risk of breast cancer have relied on methods that identify aberrant pre-mRNA splice variants. Such a screening approach would necessarily overlook many rare missense variants, thus underestimating the prevalence of ATM variants in breast cancer cases and controls. As a result, few missense variants have been described or, alternatively, may have been overlooked. An additional complexity to these investigations has been the difficulty in distinguishing those variants which are mutations affecting protein structure and function versus those which are polymorphisms. And, although the current Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 8 approach similarly cannot directly evaluate this intricacy, here it utilizes a multiethnic population permitting a specific focus on women carrying multiple ATM missense mutations. In this way, it may be possible to preliminary identify a single variants or variants in linkage disequilibrium that together segregate with disease. 4.2. Introduction Ataxia-telangiectasia (A-T) is an autosomal recessive disorder involving neurodegeneration, immunodeficiency, chromosomal instability, radiosensitivity and cancer predisposition (Boder 1985; Lavin and Shiloh 1997). The gene mutated in ataxia-telangiectasia (ATM) is a proximal component of the pathways involved in detecting and repairing DNA damage. Specifically, A TM mediates the arrest of the cell cycle at Gi/S, S, and G2/M preventing the processing of damaged DNA, activates DNA-repair pathways, and induces apoptosis if normal cell function cannot be restored (Kastan and Lim 2000). Many of these effects are mediated via the phosphatidylinositol-3 kinase domain in the C-terminus of the ATM protein (residues 2656-3056). Additionally, ATM contains a number of specific amino-acid motifs, namely a p5 3-binding region (residues 1-246), a P- adaptin-binding region (residues 812-948), a leucine zipper (residues 1217- 1238), a c-Abl-binding region (residues 1373-1382), and a Rad3-homology region (residues 1443-2428) (Khanna and Jackson 2001). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 69 Disruption of cell-cycle control is a hallmark of cancer (Hanahan and Weinberg 2000). Studies of ataxia-telangiectasia (A-T) families have suggested an increased risk of breast cancer among obligate female heterozygous carriers ATM mutations. However, studies of sporadic and familial breast cancer have failed to demonstrate an elevated prevalence of such germline gene mutations among breast cancer cases. One possible resolution of this paradox is that there exists two populations of A-T carriers, one group with at a truncating allele coupled to a wild type and the other with a missense allele coupled to a wild type (Gatti et al. 1999). Whereas truncating mutations would block expression of ATM protein, missense mutations could code for stable ATM proteins that are present at normal intracellular concentrations but function quite abnormally. To date, there exists initial support for this model including the fact among cohorts of breast cancer cases missense mutations are observed whereas truncating mutations are not common (Dork et al. 2001; Sommer et al. 2002; Teraoka et al. 2001). One such missense mutation (T7271G) -which yields a dominant negative inhibitor of A T M (Chenevix-Trench et al. 2002) - has been shown to be highly penetrant for breast cancer, associated with an estimated 13-fold increased risk (Stankovic et al. 1998). Yet, an additional complexity in the analysis of ATM variants and their association with breast cancer has been the difficulty in determining which Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 70 missense variants are mutations versus polymorphisms. Formal tests of the association between A-T heterozygosity and breast cancer in a population-based sample is doubly hampered -by the large sample size required to evaluate mutations of very low prevalence and by the large number and variety of missense mutations. Further, homozygotes for particular substitutions (e.g. F858L, P1054R, and L1420F) have been identified only among breast cancer patients and, thus, moderate risks cannot be excluded (Dork et al. 2001). As such, a formal and validated definition which would categorize nucleotide substitutions as either mutations altering protein function or instead rare variants without functional significance would be invaluable in identifying those missense mutations in A TM predisposing women to breast cancer. Previous studies have utilized singly or in combination the following criteria in delineating polymorphisms versus variants with a higher a priori probability of functional significance: 1) whether the substitution had been previously reported as a mutation occurring in an A-T patient (Sommer et al. 2002; Teraoka et al. 2001); 2) whether the altered residues could be directly implicated in ATM function (e.g. specific residues in the kinase domain) (Teraoka et al. 2001); and 3) whether the affected amino acid is conserved in species widely separated evolutionarily (Bottema et al. 1991; Greenwell et al. 1995; Pecker et al. 1996; Teraoka et al. 2001). There are, however, significant limitations to each of these criteria making each useful but ultimately arbitrary. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 71 In considering the first criterion, it has been noted that missense mutations usually do not cause A-T, but may predispose carriers to cancer (Gatti et al. 1999). Thus, missense mutations may be present in A-T patients in addition to two truncating mutations causing disease. When considering the second criterion, it is important to note that there is no single assay that assesses the variety of functions, some of which may still be unknown, possessed by the complex ATM protein. In the third which compares conservation across species, the paucity of published sequences across species greatly limits this approach. Previous studies have compared the frequency of ATM mutations in sporadic or unselected case patients to that in control subjects. In one study, changes common to cases and controls were excluded and it was found that missense or truncating changes were found in 10 cases compared to two in controls (p = 0.013) (Sommer et al. 2002). In the current investigation, we utilized data from a total of 20 missense mutations and polymorphisms (described fully in Chapter 2, Section 2.4) and focused on those individuals carrying multiple variants. In this manner, we hope to gain additional insight into which specific changes are simply background versus those which alone or in combination may confer high relative risks for breast cancer. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 72 4.3. Materials and Methods 4.3.1. Multiethnic Cohort Study Population This nested case-control study is part of a large, ongoing, multiethnic cohort study in Hawaii and Los Angeles, California with an emphasis on diet and other lifestyle characteristics in the etiology of cancer. Aspects of this large cohort as well as details of its design and implementation are described more fully elsewhere (Kolonel et al. 2000) and are detailed in the Methods section of Chapter 2 (Section 2.3). In brief, eligible cases were women enrolled in the cohort and diagnosed between 1993 and 1998 with a new primary, histologically confirmed breast cancer identified by linkage of the cohort to population-based cancer Surveillence, Epidemiology and End Results (SEER) registries in Hawaii and California. Cases were contacted by letter and phone call and agreed to provide a blood specimen. The participation rate for providing a blood sample on request was 74% for cancer cases. Women with carcinoma in situ (non-infiltrating pathology) and neoplasms of the skin of the breast were not included as breast cancer cases. Information on stage of disease was ascertained from the SEER Registries and used in subgroup analyses. Stage of disease was characterized as “localized” or “high stage” - which included regional (by direct extension and/or lymph node involvement) or systemic. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 73 For this particular effort, a nested case-control study was designed with the intention of comprehensively analyzing the role of rare ATM missense variants in a multi-ethnic, population-based sample. Given that ATM missense mutation frequencies were hypothesized to be over-represented among breast cancer cases, approximately 100 cases from each ethnic group were initially selected (n=428) and women diagnosed with high stage disease were over sampled (n=222) as compared to cases who initially presented with localized disease (n=206). Blood samples had also been collected from an approximately 3% random sample of healthy cohort members at baseline (Feigelson et al. 1999; McKean-Cowdin et al. 2001). We selected approximately 100 such controls for each of the four ethnic groups (n=426); the participation rate for cohort controls was 66%. Only controls with no previous diagnosis of breast cancer upon were included. The study was approved by the Institutional Review Board of the Keck School of Medicine of the University of Southern California. 4.3.2. Genotyping Genomic DNA was purified from the buffy coats of peripheral blood samples for all cases and controls using the Puregene DNA Isolation protocol and kit (Gentra Systems; Minneapolis, MN). SNP genotyping was done using the fluorogenic 5'-nuclease assay (TaqMan Assay) (Lee et al. 1993). The Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 74 TaqMan assay was performed using a TaqMan PCR Core Reagent Kit (Applied Biosystems) according to manufacturer’s instructions in a final volume of 20uL. Using a fluorescent dye labelled probe specific for each allele, the profile of each well was measured in a Sequence Detection System (model 7700 or model 7900HT; Applied Biosystems) and the results analyzed with Sequence Detection Software (Applied Biosystems). 4.3.3. Statistical Analysis Data management, descriptive and univariate analyses were performed using the SAS statistical software Version 8.01 (SAS Institute; Cary, NC). The SAS system was also used to calculate Pearson correlation coefficients (r) and associated p-values testing the null hypothesis of no correlation. The EpiLog software system (EpiCenter Software; Pasadena, CA) was used to estimate odds ratios and 95% CIs by unconditional logistic regression. In each, the two sided level of significance was defined as a < 0.05. 4.4. Results We characterized the prevalence and distribution of 20 ATM missense mutations/polymorphisms in a case-control study of 854 African-America, Latina, Japanese, and Caucasian women aged 45 years and above participating in the Multiethnic Cohort Study. Characteristics of these women are fully Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 75 described in Table 8 of Chapter 2 (Section 2.4) and the prevalence of the 20 variants is detailed in Table 9 of the same section. Given that we had previously described the risk associated with each individual variant along with that of carrying any variant (Chapter 2, Table 10), it was the intention of this effort to describe the effect of carrying multiple such missense mutations. There were 53 women (6.2%) who concurrently carried missense variants at two or more sites. Demographic and disease status characteristics of these women as compared to women with zero or only one missense variant are presented in Table 13 below. Table 13. Descriptive characteristics of subjects carrying multiple ATM missense variants as compared to all participants combined: total observations with percent in parentheses. Variable M ultiple Missense V ariant C arriers (n=53) Women with Zero or One V ariant (n=801) p-valuea Case/Control Status High Stage Cases 18 (34.0) 204 (25.5) Low Stage Cases 11 (20.7) 195 (24.3) Controls 24 (45.3) 402 (50.2) 0.3901 Age at entry (in years) Less than 50 8 (15.1) 136 (17.0) 5 0 - 5 4 10 (18.9) 127 (15.9) 5 5 - 5 9 5 (9.4) 125 (15.6) 6 0 - 6 4 10 (18.9) 127 (15.9) 6 5 - 6 9 8 (15.1) 165 (20.6) 7 0 -7 4 12 (22.6) 108 (13.5) 75 and above 0 (0) 12 (1.5) 0.3798 Ethnicity African-A merican 32 (60.4) 202 (25.2) Japanese 2 (3.8) 198 (24.7) Latina 12 (22.6) 188 (23.5) White 7 (13.2) 213 (26.6) <0.0001 Family History: Breast Cancer Reported 15 (28.3) 104 (13.0) 0.0036 Any <50years 9 (17.7) 41 (5.1) 0.0007 a p-value calculated by the x2 test for heterogeneity (categorical variables) comparing multiple missense variant carriers to all cases and controls combined. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 76 Although a statistically significant difference in case versus control status was not observed when comparing multiple mutation carriers versus those with none or one variant, women with high stage disease were somewhat more frequent among multiple mutation carriers as compared to women with zero or only a single variant (34.0% versus 25.5%). There was no difference in the age of entry into the cohort between the two groups. Statistically significant differences were noted when considering ethnicity and family history. Women carrying mutations at multiple sites in ATM were most often African-American (60.4%) and least often Japanese (3.8%). This is in sharp contrast to a consideration of the remainder of the cohort genotyped in which the number of women in each ethnic group was equivocal. Additionally, women with multiple ATM variants were nearly three times more likely to report a first degree family history of breast cancer (OR: 2.65; 95% Cl 1.44, 4.88). This effect was even more pronounced when considering a family history of breast cancer presenting before the age of 50; women with multiple variants were four times more likely to report such a history (OR: 3.97; 95% Cl 1.91,8.26). Table 14 below details patient characteristics for each individual carrying multiple ATM missense mutations. These data include stage of disease, age (of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 14. Characteristics o f breast cancer cases (high stage a n d l o w stage) an d controls carrying missense mutations at multiple A T M sites. 77 C o X 3 C/5 O 0 > H * 3 C/5 C .2 03 -4 -* 3 S a > o £ 3 H < 8 Oh o < ( N <N Hh c o ' < N h J S 3 X 03 ° n V " Oh O h < N 00 c /3 VO ■ 't S. 03 / vo W 2 d « c = N E v) -3 . < u 55 ’E ■ S w <DI O X s C/3 h J o o « 3 s o Oh i -T O h — 1 g v 5 O O g T f > r ‘ S 0 0 £ 4 <N Oh > 0 3 l-H * _ S ' d > >-f SS ■ -> S ' 0 3 <N U m o o 03 ® 0 0 Ov VO " 0 3 © <N r 1 Tt- " 0 0 tN f— > OO Oh O ' Oh H U H H H U H O < : o u o o o o f e : o o D o u t: u O O D «!«!«! C D C D C D U U U f c = u u u o C D C D C D u u u S 3 © 0 0 i n o o i n © ( N c n p i n o m p © p rn T T ( N d < n n i ( N i n ( N ( N < N p 0 0 0 0 ■ p r - p p T f p i n T f O N N O p O N p d © o o d o d d o d d d d o d O o d m i n m i n » n m i n m i n i n i n m i n m i n i n i n i n U U U U U U u u u u u U U U U U U U o o o o 1 / 5 m i n i n V V V V U S US * us u s * us us u s " a i < 3 < 0 - - c c < ts < § s '£ < J J r*3 © CN Ov (N 0 3 r -H r~- 03 vo r -H 03 < < < < < < < << < r - ( 3 0 3 M vo r f vo t" f~ < < r t vo ( N O * — OO (— VO r> >n «n «n vo to 03 Ov Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (27.8) (50.0) (38.9) (27.8) (16.7) (16.7) (27.8) Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 14 (continued). Stage Patient Tumor Missense Mutation Site and Type of Substitution Age, Ethnicity Family Site0 Size 126 546L-V 872P-S 1040 1054P-R 1583 2322 All Others years Hxb (cm) D-E M-V D-N L-F Low 55 White C50.4 0.9 A/T A/G L1420F 62 AA C50.8 1.5 C/G A/G 68 AA C50.3 0.8 A/T V2079I, S2146T 71 AA 1st, <50 C50.8 1.0 C/G C/T 74 AA 1st C50.3 5.0 A/T A/G Q1128R 77 AA C50.3 1.2 A/G Q1128R 64 AA C50.4 3.0 C/G c u 63 Latina C50.4 1.8 C/G 73 AA 1st, <50 C50.9 4.5 A/T V182L 79 Latina C50.4 Unk. A/T A/G 53 White C50 0.1 A/G S707P Subtotal 5 3 1 2 1 5 0 (%) (45.5) (27.3) (0.9) (18.2) (0.9) (45.5) (0) Control 57 Japanese A/T C/G 45 AA C/G A/G C/T T/C 73 AA A/T S49C 48 AA A/T A/G 59 White C/G A/G F858L 59 Latina A/T A/G V2079I 49 Japanese A/G L1420F 61 AA 1st, <50 A/G T/C 65 AA A/T T2059I 50 AA A/T V2079I, S2146T 47 AA A/T C/G A/G C/T C/G 66 Latina C/G S49C 00 Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 14 (continued). Stage Patient Tumor ATM M issense Mutation Site and Tvpe of Substitution Age, Ethnicity Family Site0 Size 126 546L- 872P- 1040 1054 1583 2322 All Others years Hxb (cm) D-E V S M-V P-R D-N L-F Controls 46 Latina F858L 53 White A/T C/T V2079I 54 AA 1st, <50 A/T A/G 72 Latina A/T V2079I 70 AA 1s t A/T V182L, V2079I, S2146T 71 AA C/G C/T 59 Latina A/T C/G A/G F858L, V2079I 61 Latina 1st, <50 A/T C/G 73 AA A/T R2032S 52 AA A/T C/G A/G A/G T/C T2059I 52 White A/T L1420F 60 AA C/G A/G T/C Subtotal 16 5 5 4 6 7 4 (%) (66.7) (20.8) (20.8) (16.7) (25.0) (29.2) (16.7) Total 26 17 13 11 10 15 9 (%) (49.1) (32.1) (24.5) (20.7) (18.9) (28.3) a Age at diagnosis for cases, age at entry into the cohort for controls. b 1s t = Reported in a mother or sister (1st degree family relative); <50 = breast cancer presenting before age o f 50. °C50: breast, excluding skin; C50.1: central portion of breast; C50.2: upper-inner quadrant of breast; C50.3: lower-inner quadrant o f breast; C50.4: upper-outer quadrant of breast; C50.5: lower-outer quadrant of breast; C50.6: axillary tail o f breast or tail of breast, NOS; C50.8: overlapping lesion of breast; C50.9: breast, NOS. 80 disease for cases and entry into the cohort for controls), and family history of disease. Additionally, the site and size (in cm) of the presenting tumor is listed along with all mutated ATM sites. Each of the three disease status groups (high stage cases, low stage cases, and controls) had a similar mean number of mutated sites per observation. Among the 18 high stage cases the mean number of mutations was 2.67 (SD: 0.69). There were an average of 2.27 mutations (SD 0.47) among the 11 low stage cases and 2.75 mutations (SD: 1.11) among the 24 controls. The most frequently mutated site among women carrying two or more mutations was D126E (n=26,49.1% of women carrying at least two mutations) followed by L546V (n=17, 32.1%) and D1853N (n=15,28.3%). When considering women with at least one rare allele at the D126E locus, it was noted that all observed mutations at three other sites occurred only among these women. These loci were: T2059I, V2079I and S2146T. Further, a high degree of positive correlation was observed with several other sites among women carrying at least one rare allele at the L546V locus. Among these were: P872S (r: 0.79, p<0.0001), M1040V (r: 0.64, p<0.0001), and L2322F (r: 0.61, p<0.0001). Alternatively, there were other sites displaying a negative correlation of either borderline or attaining statistical significance with the L546V locus including: F858L (r: -0.26, p<0.07), P1054R (r: -0.34, p=0.01), D1853N (r: - 0.30, p=0.04), and V2079I (r: -0.30, p-0.03) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 81 All mutated alleles at S707P were observed exclusively among women carrying at least one mutated allele at D1853. Similarly, each 6 mutations observed at F858L were found in women concomitantly carrying at least one mutated allele at P1054R. Lastly, 8 of 9 (88.9%) rare alleles at the P872S locus occurred in women who were also carrying a mutation at L2322F. 4.5. Discussion To date, several independent studies have demonstrated extensive linkage disequilibrium across the ATM locus (Bonnen et al. 2000; Gatti 1998; Li et al. 1999; Thorstenson et al. 2001). Most recently, it has been shown that a single ‘block’ of LD spans a 140 kb region encompassing ATM. Further, the amount and pattern of LD across A TM shows very little difference across four different ethnic populations (Bonnen et al. 2002). As a consequence, shared haplotypes are commonly observed when comparing each o f the populations considered and these shared haplotypes account for a remarkably high percentage of all chromosomes observed: 100% of European American chromosomes, 94% of Hispanic, 90% of Asian American, and 85% of African-American (Bonnen et al. 2002). It is not the intention of this current study to characterize LD across ATM for it has been shown that SNPs with low frequency have little power in detecting LD. (Goddard et al. 2000; Lewontin 1995)However, this investigation Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 2 did seek to provide a manner in which missense mutations affecting disease susceptibility could potentially be distinguished from polymorphisms having no such effect. To date, no study has examined the effect of multiple missense mutations carried by individuals - likely a consequence of the fact that previous efforts have 1) genotyped only a limited number of sites and 2) tended to study women of European ancestry. This study provides a hint that an accumulation of ATM missense mutations may function as breast cancer susceptibility markers. This is demonstrated by the observation that women carrying two or more mutated sites report a markedly elevated family history of disease when compared to women carrying zero or only one such variant. Further, women with multiple variants also showed evidence of familial disease occurring before the age of 50. These findings are consistent with studies that have noted an enrichment of ATM missense mutations in breast cancer cases selected for a first degree family history and an early age at diagnosis (Stankovic et al. 1998; Teraoka et al. 2001). It is important to note that within A-T families the relative risk of breast cancer appears to be highest (6 to 7-fold increased) in obligate A-T carriers between 50 and 69 years of age (Athma et al. 1996). Hence, this study, by virtue of having enrolled women aged 45 years and older, is well positioned to evaluate the effect of such missense variants on the risk of breast cancer. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 83 Three sites were commonly mutated among women carrying two or more ATM missense variants - namely D126E, L546V, and D1853N. The D126E variant has previously been identified as a common polymorphism limited to Africans but not found elsewhere in the world. (Thorstenson et al. 2001). In contrast to D126E, this same investigation noted that the D1853N missense polymorphism distinguished a non-African haplotype (H3) - found on 8% of chromosomes worldwide except in Africa (Thorstenson et al. 2001). We noted that three loci were positively and significantly correlated with variation at the L546V locus: P872S, M1040V, and L2322F. Given the previous observation that a mutated allele at L546V was associated with an elevated risk o f high stage breast cancer and a family history of the disease (Chapter 2, Table 10; Chapter 3, Table 12) one might hypothesize that these loci are either in LD with the L546V locus or perhaps represent additional sequence alterations that truly confer cancer risk - either singly or in concert with a L546V variant. Additional follow-up of low stage cancer cases or healthy controls carrying multiple mutations at these sites may be particularly useful in identifying a constellation of mutations leaving carriers at elevated risk of breast cancer. In conclusion, we postulate that particular rare variants may confer an elevated risk of breast cancer. Yet, the exact resolution of such mutations from the background of polymorphisms or rare variants having little or no functional Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 4 consequence remains. Only functional assays, such as the in vitro cellular data provided by Spring et al. (Spring et al. 2002) can reconcile such epidemiologic data as it relates ATM to breast cancer susceptibility. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 5 CHAPTER 5 THE SUITABILITY OF THE TWO ESTROGEN RECEPTOR GENES (ESR1 AND ESR2) AS CANDIDATE LOCI IN SPORADIC BREAST CANCER SUSCEPTIBILITY AND OTHER PHENOTYPES S.l. Introduction Breast cancer is the most frequently diagnosed neoplasm in women. Currently, there are in excess of 180,000 incident cases of breast cancer annually in the United States with an associated 46,000 deaths (Ries et al. 1997). For women surviving to 75 or 80 years of age, lifetime risk of developing breast cancer in approximately 1 in 8 (Bunker et al. 1998). Although mortality rates from breast cancer have remained relatively steady since the 1940s, the incidence has dramatically increased, rising 32% between 1980 and 1987 (Garfinkel et al. 1994; Harris et al. 1992). The majority of breast neoplasms present in later years and has not been associated with known genetic mutations and, hence, are termed “sporadic.” Estrogens are key regulators of growth and differentiation in the normal mammary gland and regular gene expression via the estrogen receptor. To date, two estrogen receptors (ERa and ERP) encoded by different genes (ESR1 and ESR2) have been described (Green et al. 1986; Kuiper et al. 1996; Mosselman et al. 1996). Further, estrogen is considered to be a key factor in etiology of breast Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 6 cancer. When considering postmenopausal breast cancer, cases had significantly higher concentrations of serum estradiol (E2) as compared to non-cases in a meta-analysis of prospective cohort studies (Thomas et al. 1997a). Similarly, in premenopausal breast cancer, cases had a 12% higher mean estradiol concentration over the entire menstrual cycle (Thomas et al. 1997b). Pike related the hormonal changes during a woman’s life to the aging of breast tissue and, by extension, to the risk of breast cancer (Pike 1983). Lifetime estrogen exposure is influenced by the reproductive history events and risk modifiers detailed in Figure 12 below. Risk Factors Early Menarche Late M enopause Obesity (Postmenopausal) Hormone Replacem ent Therapy (HRT) Late Pregnancy Protective Factors •Lactation •Obesity (Premenopausal) •Early Pregnancy Figure 12. Established hormonal risk modifiers for breast cancer (Pike 1983). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 87 Genetic factors in breast cancer are suggested in the strong familial tendency noted as early as 1866 by Paul Broca, commenting on his own family’s experience (Broca 1866). Most notable for their high penetrance are BRCA1 and BRCA2, yet together these are estimated to account for only approximately 5% of all breast cancer cases as well as a higher proportion of early onset and in families with a high incidence of breast cancer (Claus et al. 1998; Ford et al. 1998). As a consequence, there remains a strong suggestion that there remain additional genes contributing to the heritable susceptibility to breast cancer. A proposed distribution highlighting the genetic underpinning of sporadic breast cancer is presented in Figure 13 below. E x c e s s F a m i l i a l R i s k Figure 13. Genetic component of sporadic breast cancer which may account for the observed excess familial risk. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 8 Among these as yet unidentified and potential susceptibility genes, the estrogen receptor genes (ESR1 and ESR2) represent particularly plausible candidates as inherited breast cancer susceptibility genes due to their biological role in normal breast development and function. It has been shown that estrogen receptor is expressed at low levels in normal breast, but at higher levels in adjacent pre-malignant or malignant tissue (Allegra et al. 1979; van Agthoven et al. 1994). Further, the presence of detectable ER protein in tumors is associated with responsiveness to adjuvant chemotherapy and a favorable prognosis (McGuire and Clark 1989; Rubens and Hayward 1980); 30% of primary breast cancers in postmenopausal women fail to express ER as compared to 70% in premenopausal women (Fuqua et al. 1993). Nearly all (90-95%) of ER-negative tumors fail respond to endocrine therapy as compared to less than half (40%) of ER-positive tumors (Jensen 1981; Maass et al. 1980; Manni et al. 1980; Osborne et al. 1980). Lastly, the chemotherapeutic tamoxifen (an estrogen receptor antagonist) not only has demonstrated success in the treatment of ER-positive metastatic breast cancers and as adjuvant chemotherapy (Jordan 1992) but it also decreases the incidence of invasive and noninvasive breast cancer (Fisher et al. 1998) (Figure 14). As a result, one might hypothesize that variation in the estrogen receptor genes leading to either inappropriate expression of wild-type ER or Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 9 expression of certain ER variants provide an environment in which genetic changes permissive for breast carcinogenesis can occur. o s T “ I - a ) a Q ) * > m U . Figure 14. 5.2. Structure and Function of the Estrogen Receptor Genes 5.2.1. Comparative Structures and Tissue Distribution of ERa and ERp There are two subtypes of estrogen receptor and several isoforms and splice variants of each. ERa was the first receptor to be discovered, cloned and 9 / ■ Placebo ■ Tamoxifen Total 1 2 3 4 5 6 Yearly Interval of Follow-up Rates of invasive breast cancer: Tamoxifen versus Placebo. Tamoxifen for Prevention of Breast Cancer, National Surgical Adjuvant Breast and Bowel Project P-l Study. Total n=13 388 (Placebo n=6 707; Tamoxifen n= 6 681). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9 0 sequenced in 1986 (Green et al. 1986; Greene et al. 1986). Consequently, most studies have focused on the biological role of this receptor. The second ER, ERp, was not identified and characterized until a decade later (Kuiper et al. 1996; Mosselman et al. 1996). The two subtypes are on different chromosomes with ERa (gene: ESR1) found on the long arm of chromosome 6.and ERp (gene: ESR2) on chromosome 14q22-24. The two demonstrate focal homology as well as an overlapping, although not identical, tissue distribution. As with nuclear receptors in general, both ERa and ERP display a modular structure containing six distinct regions, A-F (Figure 15). The important regions are the ‘C’ region that is required for DNA binding and the ‘E’ region that binds estrogens and selective estrogen receptor modulators (SERMs); both are highly conserved evoluationarily (Evans 1988). As one might expect, amino acid sequences of the two ER DNA binding regions and the ligand- binding domains are highly related phylogenetically (96% and 58% respectively) (Mosselman et al. 1996). However, the overall degree of homology of the receptors are quite low as the remaining regions (‘A ’, ’B \ hinge, and ‘F’ domain) do not share this high degree of similarity. This lack of conservation is manifested most dramatically by the reciprocal presence of the AF-1 in ERa and AF-2 in ERp. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 91 A c t i v a t i n g F u n c t i o n - 1 (Absent in ESR2) D N A B i n d i n g R e g i o n H i n g e E s t r o g e n B i n d i n g R e g i o n A c t i v a t i n g F u n c t i o n - 2 (Present in ESR2) 295.7 kb Figure 15. Functional domains of estrogen receptor (ER). As noted above, the tissue distribution of the two estrogen receptors differ although there is some degree of overlap. The tissue distribution of ERP mRNA was initially identified in thymus, spleen, ovary and testis (Mosselman et al. 1996). More specifically, granulosa cells and developing spermatids express mostly ERp (Enmark et al. 1997). A non-identical distribution is observed in osteoblasts, where ERa plays the dominant role. In fact, in a case report of a patient lacking functional ERa receptors, he displayed a phenotype of infertility and severe osteoporosis (Smith et al. 1994). Lastly, endometrium, normal and a 61.2 kb Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9 2 malignant breast tissue and ovarian stroma, mostly contain ERa although ERP may be co-expressed (Spiers et al. 1999). 5.2.2. ER Function: Signal Transduction and Transcriptional Response The estrogen receptor is a member of the nuclear receptor superfamily and, classically, functions as a ligand-activated transcription factor (Mangelsdorf et al. 1995). In this manner, it is able to transduce extracellular signals, generally small lipophilic molecules, into transcriptional responses. In brief, estrogen (E2) or a selective estrogen receptor modulator (SERM) binds to estrogen receptor (ER) which dimerizes upon ligand binding. The homodimer and subsequently binding of coactivator (CoA) or co-repressor (CoR) molecules is required to form a transcription complex at an estrogen response element (ERE) contained in the transcriptional control regions target specific genes (Danielian et al. 1992; Fawell et al. 1990). It is this interaction that results in the modulation of gene expression and the biological effect of estrogen (Figure 16). More recently, other additional pathways have been shown to modify ER function, including alternative conformations induced by different ligands, phosporylation of the receptor itself, and association with other transcriptional factors, i.e. in the absence of direct DNA binding (Kahlert et al. 2000; Kato et al. 1995; Razandi et al. 2000; Stein and Yang 1995). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 93 The nearly identical DNA binding domains of ERa and ERp led to the hypothesis that the two might share the same DNA response element. And, in fact, estrogen-estrogen receptor complexes have been shown to bind to the ERE as both homodimers or heterodimers (Cowley et al. 1997; Pettersson et al. 1997). Cross-signaling and the combined influence on one another’s transcriptional activity add a significant degree of complexity in dissecting out the distinct biological role of each ER. On the other hand, it also implies that the interaction between the two pathways may also have clinical significance. Cytoplasm Nucleus, Gene Transcription Estrogen ERE ERa ERR fos jun AP-1 Gene Transcription ONA Figure 16. The signal transduction pathways available to estrogen in the initiation of gene transcription. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 94 5.3. Genetic Variation in the Estrogen Receptor and Breast Cancer Susceptibility 5.3.1. ER Expression and Significance in Breast Cancer As noted above, there is wide tissue distribution of estrogen receptor - including bones, neural tissue, and arteries; the receptors have been shown to be involved in the physiological and metabolic processes of each (Komm et al. 1988; McEwan and Alves 1999; Mendelsohn and Karas 1991). As such, ESR1 and ESR2 variants have implications and are reasonable candidates in the pathogenesis of endometrial cancer (Kuiper et al. 1996), osteoporosis (Horowitz 1993), atherosclerotic disease (Sullivan et al. 1988), neuropsychiatric disease (Cao et al. 1997; Levinson et al. 2000), and Alzheimer’s disease (Henderson 1997). However, the most plausible and intensely studied of disease phenotypes is breast cancer risk, given its biological role in normal breast cell development and function (Henderson et al. 1988; Picard et al. 1997). ESR1 polymorphisms in particular have been associated not only with overall breast cancer risk (Anderson et al. 1994; Schubert et al. 1999) but also age at onset (Pari et al. 1989). Human breast cancer presents as a typical hormone dependent tumor and the presence of ER is a reliable clinical indicator for its hormone dependency. An estimated 70% of primary breast cancer among postmenopausal women are ER positive, as compared to 30% of those among premenopausal women (Fuqua Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 95 et al. 1993). Expression of estrogen receptor has important implications for tumor biology and clinical therapy (Osborne 1998). Notably, the presence of ERs in tumor tissue functions as a prognostic and hormone responsiveness indicator. In general, expression correlates with higher overall survival and lower risk of relapse (Ferno 1998; McGuire and Clark 1989). It has been hypothesized that the loss of hormone dependence in breast tumors may be due to the presence of variant steroid receptors activating transcription even in the absence of ligand (Sluyser 1992). A number of studies have suggested that the ESR1 locus may be instrumental in breast tumorigenesis. For instance, the ESR1 locus (chromosome 6q25) has been shown to be lost sometimes in the majority of breast tumors (Chappell et al. 1997; Devilee et al. 1991; Fujii et al. 1996, Sheng, 1996 #239; Orphanos et al. 1995; Theile et al. 1996). Additionally, somatic alterations of ESR1 have been identified in breast tumor biopsy tissue (Dotzlaw et al. 1997; McGuire et al. 1991) - and specifically, truncated forms of DNA-binding ERa have been found in human breast cancers (Fuqua et al. 1991). Lastly, this same research group has demonstrated in cell culture increased activity of an ERa variant isolated from breast tumor tissue as compared to wild type ER (Fuqua et al. 2000). Results from several studies suggest that ESR2 (ERP) may also exert some influence on the susceptibility and development of breast cancer. Initial studies revealed that a variety of ERP mRNA transcripts could be isolated from Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 96 both human breast cancer cell lines (Hu et al. 1998) as well as tumors (Skliris et al. 2001; Spiers et al. 1999). Typically, ERp is expressed together with ERa in both normal and malignant breast tissue (Cullen et al. 2000; Dotzlaw et al. 1997; Fuqua et al. 1999; Iwao et al. 2000; Jarvinen et al. 2000; Mann et al. 2001; Omoto et al. 2001; Spiers and Kerin 2000; Vladusic et al. 1998). This proposed significance of ERp and its potential as a target in the treatment of breast cancer is strengthened when its distinct signaling pattern, described earlier, is considered. 5.3.2. Specific Variant Studies and Breast Cancer Susceptibility In order to evaluate the role of inherited variation in the estrogen receptor, several groups have identified and genotyped SNPs or restriction fragment length polymorphisms in either high risk family studies or within population-based case/control studies. In an analogous manner, the distribution of variants has also been evaluated for other phenotypes including Alzheimer’s disease and cognitive impairment, multiple sclerosis, atherosclerosis and coronary reactivity, and prostate cancer. Among all disease outcomes considered to date, the influence of ESR1 variation on bone mineral density (BMD) and fracture risk has been most extensively studied and a comprehensive meta-analysis culling published reports has recently been published (Ioannidis et Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 97 al. 2002). As a consequence, the discussion below does not recapitulate the BMD results but instead focuses on other phenotypes. Below we summarize those studies of ESRI in which the outcome of interest was breast cancer (Table 15) and detail efforts focusing on other phenotypes (Table 16) including a description of the population considered and findings. Given the relatively recent discovery of ESR2, it is the case that considerably fewer studies have considered the role of this gene. Those published studies which consider variation in ESR2 as a candidate disease locus - none to date in breast cancer -are outlined in Table 17. As a general theme, the study of frequency differences of ER variants between breast cancer patients as compared to controls while providing some suggestion that variants influencing disease susceptibility, these results have not consistently replicated. Two RFLPs in Intron 1 of ESRI (PvuII and pXBal) have been most extensively studied, both singly as well as their haplotypes and these efforts encompass a rich variety of phenotypes. When considering breast cancer, it was suggested as early as 1989 that allele frequencies of the ESRI PvuII RFLP, while the same when comparing peripheral leukocytes and primary breast tumors, differed when considering breast cancer cell lines failing to express estrogen receptor (Hill et al. 1989). This polymorphism is in linkage disequilibrium with the second RFLP, pXBal. A subsequent family study found possible linkage (LOD score = 1.85) to the ER gene using three RFLPs including the pXBal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 98 marker (Zuppan et al. 1991). Finally, the suggestion that ESRI may be involved in the development of at least a subset of breast carcinomas was supported by the observation that the pXBal restriction site was more common in patients as compared to controls. Further, the PvuII site was seen more often among patients with progesterone receptor (PgR) negative tumors as compared to PgR- positive tumors (Anderson et al. 1994). Subsequently, other groups amplified and screened stretches of ESRI identifying novel SNPs in tumor tissue. Roodi et al. (Roodi et al. 1995) noted that after screening the ERS1 exons that only 1% of primary breast cancers had point mutations and that despite extensive searching, only two missense mutations were uncovered. Both occurred in the same ER-negative tumor. They did discover five novel synonymous polymorphisms in the coding region and one of these (Codon 325) demonstrated a strong association with family history of breast cancer (p = 0.0005). In addition, they obtained evidence of deviation from Hardy-Weinberg equilibrium (p=0.003) as no case subjects were observed homozygous for the polymorphism - as compared to the 3.8 that would be expected. Similarly, Iwase et al. (Iwase et al. 1996) screened ER-negative and PgR-positive tumors by SSCP and compared the distribution of two silent mutations among non-cancer controls and breast cancer patients. Interestingly, the Codon 325 polymorphism described by Roodi et al. (Roodi et al. 1995) was over-represented among breast cancer patients as compared to controls Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 99 (p=0.057). The Iwase study also considered an Exon 1 (Codon 10) variant which was associated with neither breast cancer nor any other clincopathologic factors (Iwase et al. 1996). Whereas previous studies were limited by the lack of published genomic sequence information, Schubert et al. (Schubert et al. 1999) sequenced substantial regions flanking each exon permitting amplification and screening of splice junctions as well as the promoter region. Both previously described and novel variants were genotyped among probands and relatives from high-risk breast cancer families as well as patients and controls from a population-based breast cancer study. Among those variants considered was the frequently observed silent Codon 325 polymorphism. In contrast to Roodi et al., (Roodi et al. 1995) this variant was not co-inherited with breast cancer in families. No other identified inherited variant of ESRI was associated with breast cancer in high-risk families or in population-based cases. Hence, the authors conclude that the previously observed variation in ESRI protein (Kamik et al. 1994; McGuire et al. 1991) might be due to somatic mutations during tumorigenesis rather than germline variation. In addition to novel ESRJ variants being detected and evaluated in tumor tissue, several replication studies exist that evaluated the distribution of previously described variants in genomic DNA in case/control studies. Southey et al. (Southey et al. 1998) sought to confirm the finding of Roodi et al. (Roodi et Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 0 al. 1995) in a population-based sample of incident breast cancer cases diagnosed before the age of 40. They concluded that there was no evidence of an association between the Codon 325 polymorphism with either breast cancer or a family history of breast cancer among case subjects. Curren et al. (Curren et al. 2001) selected three polymorphisms spanning the ESRI gene and analyzed each in an unrelated breast cancer-affected and age-matched control population to determine if any were associated with sporadic breast cancer susceptibility. Of these, one SNP (Exon 8, Codon 594) was associated with associated with a nearly 3-fold risk of breast cancer. However, again in contrast to the suggestion of Roodi et al., (Roodi et al. 1995) the Exon 4 (Codon 325) was not significantly different between populations (p=0.181) - neither was an Exon 1 (Codon 10) polymorphism (p = 0.152). There is strong epidemiologic support for estrogen’s role in breast cancer susceptibility and the modulation of ER expression during progression. It has been hypothesized that either inappropriate expression of wild-type ER or the expression of ER variants provides the impetus for abnormal proliferation in epithelial cells which is associated with breast carcinogenesis (Lemieux and Fuqua 1996). And, in summary, despite some suggestion that ESRI variants are involved in influencing susceptibility or development of breast cancer, these results have not consistently replicated as described above and enumerated in Table 15. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 15. Summary of studies evaluating the presence and frequency o f germ-line ESRI variants in female breast cancer patients compared to controls. Risk estimates compare variant carriers (homozygous or heterozygous) to wild- ____________ type homozygotes._________________________________________________________________________________ ESR I V ariant Site: Location. Codon and N ucleotide1 1 (rs #) Risk Estim ate (95% C l) Allele Frequency (Population) Study Population and Ethnicity (if described) B reast T um or Pathology A uthor, Y ear Exon 1 Codon 10, Nt. 262: Ser to Ser; M spI RFLP (rs2077647) Case only study. 0.45 (Cases) A total of 188 women with primary invasive breast cancer including familial and non-familial cases. DNA amplified from 118 ER+ and 70 ER- tumor samples. (Roodi et al. 1995) 0.81 (0.34, 1.91) 0.48 (Controls) 0.41 (Cases) 70 breast cancer cases and 30 non-cancer controls. 13 ER-/PgR+, 57 other ER/PgR phenotypes. (Iwase et al. 1996) 1.24 (0.73,2.11) 0.44 (Controls) 0.51 (Cases) Age-matched population of 125 cases and 125 controls. 70% infiltrating ductal carcinoma, 70% ER+ (Curren et al. 2001) Exon 1 Codon 87; Nt. 493: Ala to Ala (rs746432) Case only study. 0.06 (Cases) Described above. (Roodi et al. 1995) lntron 1 pX bal RFLP xx (major), X x (heterozygote), X X (minor) 2.01 (0.98,4.17) 0.68 Hospital-based series (Anderson et al. 1994) lntron 1 PvuII RFLP pp (major), Pp (heterozygote), PP (minor) 1.05 (0.57, 1.96) 0.45(Controls) 0.46 (Tumors) 0.42 (ER+) 0.55 (ER-) DNA from peripheral leukocytes of 53 controls and tumor tissue from 188 individuals with primary breast cancer. Among the latter, there were 123 ER+ tumors and 65 ER- tumors (Hill et al. 1989) Case only study. 0.48 (Cases) Described above. (Roodi et al. 1995) o R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 15 (continued). ESR I V ariant Site: ... R isk Estim ate (95% C l) Allele Frequency Study Population and Ethnicity (if described) (Population) B reast T um or Pathology A uthor, Y ear lntron 1 PvuII RFLP (cont.) 188 primary human breast tumor biopsies. Correlation between the absence o f one allele and the failure to express ER (Hill et al. 1989) Exon 2 1.15 (0.30,4.35) 0.013 (Cases) Cases were 143 patients with familial clustering o f breast and/or (Anderson et al. Codon 160, Nt. 710: Gly to Cys; Mval RFLP 0.010 (Controls) ovarian cancer (peripheral leukocyte DNA screened) and 96 primary breast carcinomas (tumor tissue DNA). 729 controls (366 female, 363 males). Norwegian population. 1997) 0.92 (0.13,6.61) 0.02 (Probands) 0.02 (Controls)1 5 139 breast cancer family probands, 105 sporadic breast cancer cases and 151 population-based controls Families primarily Caucasian, cases and controls contained roughly equal numbers of African-Americans and Caucasians. Incident cancer cases (aged 20 to 74) with age and race matched controls. (Schubert et al. 1999) Exon 3 Codon 243, Nt. 961: R to R Case only study. 0.02 (Cases) Described above. (Roodi et al. 1995) Did not segregate with breast cancer 0.03 (Probands) Described above. (Schubert et al. 1999) Exon 3 Codon 296, Nt. 889: Leu to Pro Case only study. Screened clinical breast cancer tissues. (McGuire et al. 1991) Exon 4 Codon 303, Nt. 1140: Lys to Arg Case only study. Monomorphic (all participants) Described above. Described above. (McGuire et al. 1991) (Schubert et al. 1999) o K ) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 15 (continued). ESR I V ariant Site Risk Estim ate (95% C l) Allele Frequency Study Population and Ethnicity (if described) (Population) B reast T um or Pathology A uthor, Y ear Exon 4 Codon 309, Nt. 1158: Ser to Phe 0.34 (0.03, 3.45) Monomorphic (Probands) 0.02 (Cases) 0.01 (Controls)0 Described above. (Schubert et al. 1999) Exon 4 Codon 31 l,N t. 1165: T to T In complete LD with Codon 309, Nt. 926 variant above. Described above. (Schubert et al. 1999) Exon 4 Codon 325, Nt. 1207: Pro to Pro (rs1801132) p = 0.0005 0.23(Family Hx) 0.56 (No Hx) Described above. (Roodi et al. 1995) 2.91 (1.16, 7.28)* 0.28 (Cases) 0.13 (Controls) Described above. (Iwase etal. 1996) 1.13 (0.83, 1.54) 0.21 (Controls) 0.23 (Cases) 0.20 (Family Hx) 0.23 (No Hx) Population-based study of 294 controls and 388 cases. (Southey et al. 1998) Early onset breast cancer. Women under 40 years at diagnosis of first primary breast cancer. 1.22 (0.69,2.15) 0.21 (Probands) 0.28 (Affected) 0.24 (Unaffected) Described above. (Schubert et al. 1999) 0.65 (0.39, 1.10) 0.23 (Controls) 0.17 (Cases) Described above. (Curren et al. 2001) o R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 15 (continued). ESR1 V ariant Site Risk Estim ate (95% C l) Allele Frequency Study Population and Ethnicity (if described) (Population) B reast T um or Pathology A uthor, Y ear Exon 4 Codon 352, Nt. 1057: H to H Case only study. cDNA o f 20 tamoxifen-resistant and 20 tamoxifen-sensitive tumors screened. (Kamik et al. 1994) Exon 5 Codon 377, Nt. 1363: H to H (Anderson et al. 1997) Exon 6 Codon 425, Nt. 1507: F to F Did not segregate with breast cancer 0.02 (Probands) Described above. (Schubert et al. 1999) Exon 8 Codon 594, Nt. 2014: T hrtoT hr; Btgl RFLP (,rs2228480) Case only study. 2.79(1.21.6.42)* 0.19 (Cases) Described above. Described above. (Roodi et al. 1995) (Curren et al. 2001) LEGEND: *: p < 0 .0 5 ; f: p < 0.01; J: p < 0.001 NS: Not significant (as reported by authors) a Nucleotide position measured from transcription start site. N.B. ESR1 cDNA is flanked by a 5’ untranslated region (UTR) of 232 nucleotides. b Observed only in Caucasian controls (0.02). Monomorphic among African-American cases and controls as well as Caucasian cases. c Observed among African-American cases (0.01) and controls (0.003). Monomorphic among Caucasians. 105 A similar pattern of demonstrated but nevertheless controversial differences in ESR1 polymorphism frequencies occurs when considering the risk of AD, Parkinson’s Disease (PD) and cognitive impairment. Although several genes associated with both familial and sporadic AD have been identified - namely the presenilins for the former and the widely accepted apololipoprotein E4 allele (ApoE) as well as the ai-antichymotrypsin for the latter - it has been estimated that 50% of genetic factors remain unidentified (Brandi et al. 1999). Estrogen receptor has been suggested as a plausible candidate gene in susceptibility to sporadic AD and cognitive decrement more generally given that estrogen therapy has been shown to improve cognitive function or prevent AD in elderly women (Kawas et al. 1997; Resnick et al. 1997; Tang et al. 1996). Further, women have a slightly higher age-adjusted risk of developing AD as compared to men (Farrer et al. 1997). Lastly, support for a mechanistic interaction between ApoE, a demonstrated AD susceptibility gene, and ESR1 polymorphisms comes from animal studies demonstrating that estradiol induces synaptic sprouting and upregulates ApoE mRNA expression in rodents (Srivastava et al. 1997; Stone etal. 1998). Several case-control studies (Brandi et al. 1999; Isoe-Wada et al. 1999; Mattila et al. 2000), but not all (Maruyama et al. 2000), have demonstrated an increased frequency of PvuII and pXBal polymorphisms in patients with AD or Parkinson’s Disease dementia (PDD) compared to control subjects. Further, in a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 106 prospective cohort study of community-dwelling older women, these RFLPs were associated with a greater likelihood of developing cognitive impairment, as measured by a modified Mini-Mental Status Exam (mMMSE) (Yaffe et al. 2002). Obscuring what appears to be, at first blush, a consistent effect is the observation that in two of the above studies (Brandi et al. 1999; Isoe-Wada et al. 1999), the increased risk of AD was observed among X and P allele carriers. However, in the other two studies (Mattila et al. 2000; Yaffe et al. 2002), it was the more common x and p alleles that were associated with an increased risk of AD and cognitive decrement. Hence, findings while suggesting that ESR1 gene variants may either contribute to AD development or influence the trajectory of cognitive dimunition in normal aging, their precise role has not yet been consistently replicated across studies (Table 16). In contrast to the volume of work considering ESR1 as a candidate locus in a variety of phenotypes, significantly fewer published studies exists which examine variation in ESR2. This may be in part due to its recent cloning but also be a consequence of the fact that it lacks restriction fragment sites which have traditionally been more amenable to assay. However, a systemic mutation screen of the coding region of the ER0 gene (Rosenkranz et al. 1998) has reported novel germline variants. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16. Summary of studies evaluating the role of variation ESR1 variants in susceptibility to phenotypes other than female breast cancer. Risk estimates compare minor allele variant carriers (homozygous or heterozygous) to wild-type homozygotes. ESR1 V ariant Site: Location. Codon. Nucleotide" ( » # ) R isk Estim ate o r M ean (95% C l or ±SD Allele o r H aplotype Frequency (Affected Population or Ethnicity) PhenotvDe Investigated Study Population and Ethnicity (if described) A uthor, Y ear Exon 1 Codon 6, Nt. 248 H to Y Case only 0.04 (Bipolar) Psychiatric Disease A total of 240 individuals: 113 schizophrenic patients, 28 with bipolar illness,. 24 with puerperal psychosis, 5 with autism, 30 with ADHD, 25 with alcoholism and 15 first degree relatives o f patients with autism or ADHD. Multiple ethnic groups. (Feng et al. 2001) Exon 1 Codon 10, Nt. 262: Ser to Ser; M spIRFLP (rs2077647) Case only 0.52 (West European) 0.47 (Finnish) 0.39 (African-American) 0.21 (American Indian) 0.40 (All Others) 0.48 (Aggregate)b Psychiatric Disease Described above.. (Feng et al. 2001) NS 0.48 (Aggregate) Response o f HDL Cholesterol to HRT 309 unrelated, postmenopausal women with coronary artery disease enrolled in the Estrogen Replacement and Atherosclerosis trial. Majority Caucasian with 11% African American, 4% other. (Herrington et al. 2002) Exon 1 Codon 87; Nt. 493: Ala to Ala (rs746432) Case only 0.10 (West European) 0.06 (Finnish) 0.03 (African-American) 0.08 (Aggregate)1 1 Psychiatric Disease Described above. (Feng et al. 2001) NS 0.07 (Aggregate) Response o f HDL Cholesterol to HRT Described above. (Herrington et al. 2002) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 V ariant Site R isk Estim ate Allele Frequency Study Population and Ethnicity (if described) A uthor, Y ear (95% CD (Population) B reast T um or Pathology Exon I Case only 0.04 (Alcohol Psychiatric Disease (Feng et al. 2001) Codon 146, Nt. 669: Dependence) Described above. P to Q Intron 1 0.29 (AD Cases) Dementia. Parkinson’s and Alzheimer’s Disease (Isoe-Wada et al. pXbalRFLP 0.16 (Controls) 13 PD patients with dementia, 71 PD patients without 1999) xx (major), dementia, 86 AD patients and 51 control subjects X x (heterozygote), (CTL). Japanese cohort. X X (minor) 2.47(1.42, 4.30)f 0.22 (Cases) Alzheimer’s Disease (Brandi et al. 1999) 0.10 (Controls) 193 cases with late-onset AD and 202 healthy controls. Italian study cohort. White 0.40 (White Cases) Alzheimer’s Disease (Maruyama et al. 1.33 (0.81,2.18) 0.37 (White Controls) 0.15 (Japanese Cases) Japanese and White (Cambridge, England) patients with sporadic AD. Controls were healthy Japanese 2000) Japanese 0.18 (Japanese Controls) volunteers; white controls were autopsy-examined 0.92(0.57, 1.51) non-AD cases. Adoe4+ and xx+. Familial and Sporadic Alzheimer’s Disease (FAD and (Mattila et al. 2000) FAD SAD) 7.1 (2.5, 19.9) $ Study subjects included 111 FAD and 103 SAD patients and 290 controls. Community-based and Adoe4- and xx+. clinic-based AD registries in Sweden and Finland. FAD 6.1 (2.3,16.2)$ 0.64 (0.19, 2.09) 0.29 (MS Patients) 0.31 (Controls) Multiple Sclerosis 79 unrelated relapsing and remitting type MS patients (Niino et al. 2000) Aee o f O nsetc and 73 controls. All were Japanese and residents of XX: 29.60 ±11.10 the northernmost island o f Hokkadio. Xx: 22.60 ±8.04 xx: 27.49 ±9.14* o oo R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 Variant Site Risk Estimate ___________________ (95% Cl) Allele Frequency (Population) Study Population and Ethnicity (if described) Breast Tumor Pathology Author, Year Intron 1 Late Onset AD pX balR F LP (cont.) 2.63 (1.05, 6.59)* 1.07 (0.62, 1.86) 1.35 (0.82,2.25) 0.23 (Familial AD) 0.27 (Early-onset AD) 0.30 (Late-onset AD) Alzheimer’s Disease. Vascular Dementia. Alcohol- (Ji et al. 2000) associated Dementia Cases were 223 patients with AD (including familial, 0.22 (Vascular Dementia) early-onset and late-onset), 66 with vascular 0.24 (Alcohol-Associated)dementia, and 17 with alcohol-associated dementia. 0.20 (Controls) 0.35 (Cases) 0.35 (Controls) 0.35 (Cases) 0.31 (Controls) 2.75(1.44, 5.25)t XX: 13.36 ±1.24 Xx/xx: 12.75 ± 1.35* 0.83 (0.49, 1.39) MMSE XX: 0.7 ±0.1 Xx: 0.7 ±0.1 xx: 0.9 ± 0.1 * Fracture 0.66 (0.47-0.93) 0.21 (AN Cases) 0.09 (Controls) 0.30 (Cases) 0.36 (Controls) Controls were 134 healthy elderly. Alzheimer’s Disease Caucasian AD cases ascertained from Scotland and Manchester. Controls from same geographical area without dementia. Prostate Cancer Cases were 88 Caucasian patients (mean age 68.9 years) with histologically confirmed prostate cancer. Controls were 241 cancer-free, community-based Caucasian males older than 50 years (mean 73.6) Anorexia nervosa (AN) Cohort o f 170 Caucasian female AN sufferers and 152 female controls Age at Menarche 145 adolescent females, north-western Greece (Lambert et al. 2001) (Modugno et al. 2001) (Eastwood et al. 2002) (Stavrou et al. 2002) (Yaffe et al. 2002) Cognitive Impairment Prospective study o f 2625 women with DNA collected at baseline. Cognitive decliners (n=166) were determined by MMSE decrement over follow- up. All self-reported European ancestry Bone Mineral Density and Fracture (Ioannidis et al. A meta-analysis from 30 study groups, published and 2002) unpublished data. 5834 women.__________________________________ o VO R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 V ariant Site R isk Estim ate (95% C l) Allele Frequency (Population) Study Population and Ethnicity (if described) B reast T um or Pathology A uthor, Y ear Intron 1 0.49 (Cases) Dementia. Parkinson’s and Alzheimer’s Disease (Isoe-Wada et al. P vull RFLP pp (major), Pp (heterozygote), PP (minor) 0.36 (Controls) Described above. 1999) Intron I P vull RFLP (con’t) 2.03 (1.23, 3.36)f 0.25 (AD Cases) 0.14 (Controls) Alzheimer’s Disease Described above. (Brandi et al. 1999) White 0.48 (White Cases) Alzheim er’ s Disease (Maruyama et al. 1.15 (0.68, 1.96) Japanese 0.95 (0.60, 1.49) 0.46 (White Controls) 0.36 (Japanese Cases) 0.41 (Japanese Controls) Described above. 2000) Risk estimates not calculated: significant LD with pXBal Pp (Women Onlvl 0.66 (SAD patients) 0.35 (Controls) Familial and Sporadic Alzheimer’s Disease (FAD and(Mattila et al. 20001 SAD) Described above. Late Onset AD 1.73 (0.97,3.07) 0.41 (Familial AD) Alzheimer’s Disease. Vascular Dementia. Alcohol 0.48 (Earlv-onset AD) associated Dementia 0.51 (Late-onset AD) Described above. 0.44 (Vascular Dementia) 0.44 (Alcohol-Associated) 0.38 (Controls) - (Ji et al. 2000) 1.02 (0.66, 1.55) 0.45 (Cases) 0.44 (Controls) Alzheimer’s Disease Described above. (Lambert et al. 2001) 2.62 (0.98, 6.98) 0.50 (MS Patients) 0.30 (Controls) Multiple Sclerosis, aee at onset. Described above. (Niino et al. 2000) 1.18(0.62,2.02) 0.47 (Cases) 0.41 (Controls) Prostate Cancer Described above. (Modugno et al. 2001) o R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 Variant Site Risk Estimate (95% Cl) Allele Frequency (Population) Study Population and Ethnicity (if described) Breast Tumor Pathology___________________ Author, Year Intron I P vulIR F LP (con’t) Intron 1 pX bal RFLP/PvuII RFLP Haplotypes 0.62 (0.37, 1.03) PP: 13.09 ± 1.20 Pp/pp: 12.80 ±1.19 0.69 (0.45, 1.05) MMSE PP: 0.6 ±0.1 Pp: 0.8 ±0.1 pp: 0.9 ± 0.1 * Fracture 0.93 (0.72-1.18) PPXX Haplotype 2.84 (1.60, 5.06)$ 0.21 (AN Cases) 030 (Controls) 0.38 (Cases) 0.44 (Controls) Anorexia nervosa (AN) Described above. Age at Menarche Adolescents, north-western Greece Cognitive Impairment Described above. Bone Mineral Density and Fracture PPXX: 0.22 (AD) 0.09 . (CTL) PPXx: 0.03 (AD) 0.05 (CTL) PPxx: 0.01 (AD) <0.01 (CTL) PpXX: <0.01 (AD) 0.01 (CTL) PpXx: 0.39 (AD) 0.41 (CTL) Ppxx: 0.07 (AD) 0.06 (CTL) ppXX: 0 (AD) 0 (CTL) ppXx: <0.01 (AD) 0.02 (CTL) ppxx: 0.27 (AD) 0.35 (CTL) Described above. (Eastwood et al. 2002) (Stavrou et al. 2002) (Yaffe et al. 2002) (Ioannidis et al. 2002) (Brandi et al. 1999) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 Variant Site Risk Estimate (95% Cl) Allele Frequency (Population) Study Population and Ethnicity (if described) Author, Year Breast Tumor Pathology__________________________________ White N S Intron 1 pX bal RFLP/ PvuII RFLP Haplotypes (con’t) Japanese NS White PPXX: 0.11 (AD) 0.13, (CTL) PPXx: 0.10 (AD) 0.08 (CTL) PPxx: 0.01 (AD) 0 (CTL) PpXX: 0.01 (AD) 0.02 (CTL) PpXx: 0.41 (AD) 0.35 (CTL) Ppxx: 0.08 (AD) 0.13 (CTL) ppXX: 0 (AD) 0 (CTL) ppXx: 0.03 (AD) 0.02 (CTL) ppxx: 0.24 (AD) 0.27 (CTL) Japanese PPXX: 0.02 (AD) 0.06 (CTL) PPXx: 0.06 (AD) 0.05 (CTL) PPxx: 0.05 (AD) 0.09 (CTL) PpXX: 0 (AD) 0 (CTL) PpXx: 0.21 (AD) 0.19 (CTL) Alzheimer’s Disease (Maruyama et al. 2000) (Maruyama et al. 2000) (con’t) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 V arian t Site R isk Estim ate Allele Frequency Study Population and E thnicity (if described) A uthor, Y ear (95% C l) (Population) B reast T um or Pathology Ppxx: 0.26 (AD) 0.22 (CTL) ppXX: 0 (AD) 0 (CTL) ppXx: 0 (AD) 0 (CTL) ppxx: 0.40 (AD) 0.39 (CTL) NS Multiple Sclerosis, aee at onset. Described above. (Niino et al. 2000) NS PPXX: 0.12 (AD) 0.11 Alzheimer’s Disease (Lambert et al. (CTL) Caucasian, Scotland and Manchester 2001) PPXx: 0.07 (AD) 0.08 (CTL) PPXx: 0.01 (AD) <0.01 (CTL) PpXX: 0 (AD) <0.01 (CTL) PpXx: 0.40 (AD) 0.41 (CTL) Ppxx: 0.11 (AD) 0.09(CTL) ppXX: 0 (AD) 0 (CTL) ppXx: 0 (AD) <0.01 (CTL) ppxx: 0.29 (AD) 0.31 (CTL)) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 Variant Site Risk Estimate (95% Cl) Allele Frequency (Population) Study Population and Ethnicity (if described) Author, Year Breast Tumor Pathology__________________________________ NS Exon 2 Codon 160, Nt. 710: G to C Exon 3 Codon 243; Nt. 961: R to R Exon 4 Codon 299, Nt. 1128 KtoR PPXX Haolotvpe +: 13.43 ± 1.18 12.76 ± 1.25+ Case only Case only Case only PPXX: 0.12 (AD) 0.08. (CTL) PPXx: 0.14 (AD) 0.08 (CTL) PPxx: 0.03 (AD) 0.02 (CTL) PpXX: 0.01 (AD) 0.01 (CTL) PpXx: 0.32 (AD) 0.42 (CTL) Ppxx: 0.10 (AD) 0.13 (CTL) ppXX: 0 (AD) 0 (CTL) ppXx: 0 (AD) 0 (CTL) ppxx: 0.25 (AD) 0.24 (CTL) Described above. 0.01 (Schizophrenia) 0.02 (West European) 0.015 (Aggregate)6 Age at Menarche Adolescents, north-western Greece Psychiatric Disease Described above. Psychiatric Disease Described above. 0.04 (Puerperal psychosis) Psychiatric Disease Described above. (Eastwood et al. 2002) (Stavrou et al. 2002) (Feng et al. 2001) (Feng et al. 2001) (Feng et al. 2001) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 16 (continued). ESR1 V ariant Site Risk Estim ate (95% C l) Allele Frequency (Population) Study Population and E thnicity (if described) B reast T um or Pathology A uthor, Y ear Exon 4 Codon 325, Nt. 1207: Pro to Pro (rs1801132) Case only 0.28 (West European) 0.19 (Finnish) 0.18 (African-American) 0.36 (American Indian) 0.50 (All Others) 0.27 (Aggregate) Psychiatric Disease Described above. (Feng et al. 2001) NS 0.26 (Aggregate) Response o f HDL Cholesterol to Hormone Replacement THerapv Described above. (Herrington et al. 2002) Exon 5 Codon 377, Nt. 1363: H to H Case only 0.04 (Bipolar) Psychiatric Disease Described above. (Feng etal. 2001) Exon 7 Codon 537, Nt. 1840 L to L Case only 0.04 (Puerperal psychosis) Psychiatric Disease Described above. (Feng et al. 2001) Exon 8 Codon 594, Nt. 2014: ThrtoThr; Btgl RFLP (rs2228480) Case only 0.28 (West European) 0.19 (Finnish) 0.18 (African-American) 0.36 (American Indian) 0.50 (All Others) 0.27 (Aggregated Psychiatric Disease Described above. (Feng et al. 2001) LEGEND: *: p < 0 .0 5 ; p < 0.01; J: p < 0.001 NS: Not significant (as reported by authors) a Nucleotide position measured from transcription start site. N.B. ESR1 cDNA is flanked by a 5’ untranslated region (UTR) o f 232 nucleotides. b Aggregate excludes 30 alleles of first degree relatives of patients with autism or ADHD. Total is 450 chromosomes. c Not observed among American Indians (0 o f 28) and all others (0 of 10). d Compares Xx to xx genotype. The XX to Xx comparison did not attain statistical significance. ' Not observed among Finnish (0 of 36), African American (0 of 38), American Indians (0 o f 28) and all others (0 o f 10) 116 In one study, the distribution of a single 3’ untranslated region (UTR) variant was considered in susceptibility to Alzheimer’s disease (Lambert et al. 2001). In the other, the coding region of ESR2 was screened among probands of different weight extremes (Rosenkranz et al. 1998) and provided suggestive evidence of an association with susceptibility to anorexia nervosa (AN). However, a confirmatory study did not replicate these findings but conversely reported an association with a heterozygous genotype of G1082A (Eastwood et al. 2002). These results are summarized in Table 17. No analogous data have been published to date which considers the role of ESR2 variation in breast cancer. 5.4. Conclusions In the years since the discovery and cloning of BRCA1 and BRCA2, it has become clear that these only a fraction of inherited susceptibility to breast cancer is accounted for by these genes. As a consequence, Pharoah et al has estimated that there exist between 30 and 40 susceptibility genes positing a relative risk of 1.5 and a population prevalence of 10% (Pharoah et al. 2002). Given that estrogens are breast tissue mitogens, support the growth of approximately 50% of primary breast cancers and that their effects are mediated via the estrogen receptors, ESR1 and ESR2 represent strong candidates in breast cancer susceptibility. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 17. Summary of studies evaluating the role of variation ESR2 variants in susceptibility in various disease phenotypes other than breast cancer. Risk estimates compare variant carriers (homozygous or heterozygous) to wild-type homozygotes. ESR2 V ariant Site: Location. C odon and Nucleotide (rs #) Risk Estim ate (95% C l) Allele Frequency (Study Population) PhenotvDe Investigated Study Subjects and Ethnicity (if described) A uthor, Y ear Exon 4 Case only. <0.01 (Obesity) Weieht Extremes (Rosenkranz et Codon 250, Nt. 846 G to S Study subjects were 153 extremely obese German children and adolescents (mean age 14.2), 143 healthy underweight students (mean age 25.4), 116 patients with AN (mean age 25.4, 91% female) and 90 patients with BN (mean age 24.4, 92% female) al. 1998) Exon 5 Case only. 0.05 (Obesity) Weieht Extremes (Rosenkranz et Nt. 1082 (G ->A), RsalRFLP 0.04 (Underweight) 0.01 (AN) 0.05 (BN) Described above. al. 1998) 1.75(1.05,2.92)* Var. Homozveotes 6.58(1.65,26.18)* 0.19 (Cases) 0.30 (Controls) Variant homozveotes 0.08 (Cases) 0.01 (Controls) Ovulation Defects and Menstrual Disorders Cases were 98 Chinese women, aged 14-39 years. Includes idiopathic ovulation disturbances, defects due to reproductive causes, non-reproductive causes and polycystic ovary syndrome (PCOS). Controls were 150 healthy Chinese women aged 18-44 years. (Sundarrajan et al. 2001) 3.15 (1.50, 6.60)f Var. Homozveotes 5.75 (0.85,39.04) 0.39 (AN Cases) 0.13 (Controls) Variant homozveotes 0.03 (Cases) <0.01 (Controls) Anorexia nervosa (AN) Study details described in Table 2. (Eastwood et al. 2002) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 17 (continued). ESR2 V ariant Site Risk Estim ate (95% CD Allele Frequency (Population) Study Population and Ethnicity (if described) B reast T um or Pathology A uthor, Y ear Exon 8 Nt. 1421 (T -> C) Case only. <0.01 (Obesity) W eieht Extremes Described above. (Rosenkranz et al. 1998) 3 ’ UTR 0.72 (0.50, 1.02) 0.36 (Cases) Alzheimer’s Disease (Lambert et al. Nt. 1730 (A ~>G) A lulR F LP 0.40 (Controls)3 Study details described in Table 2. 2001) Case only. 0.38 (Obesity) 0.37 (Underweight) 0.38 (AN) 0.6040 (BN) W eieht Extremes Described above. (Rosenkranz et al. 1998) 1.20 (0.62, 2.32) 0.10 (Cases) Ovulation Defects and Menstrual Disorders (Sundarrajan et P = 0.7051 Var. Homozveotes 3.03(1.13,8.15)* 0.15 (Controls) Variant homozveotes 0.11 (Cases) 0.04 (Controls) Described above. al. 2001) 1.17(0.74, 1.84) Var. Homozveotes 1.49 (0.83,2.67) 0.42 (AN Cases) 0.38 (Controls) Variant homozveotes 0.20 (Cases) 0.14 (Controls) Anorexia nervosa (AN) Study details described in Table 2. (Eastwood et al. 2002) LEGEND: *: p < 0 .0 5 ; t: p < 0.01; J: p < 0.001 NS: Not significant (as reported by authors) 3 Lambert et al. (Lambert et al. 2001) also noted a significant interaction between this ESR2 variant and the ESR1 pXbal polymorphism (p=0.025), the ESR1 pVuII polymorphism (p=0.035) and the ESR1 pXbal/pVuII haplotype (p=0.01) oo 119 However, genetic association studies assessing the correlation between estrogen receptor variants and breast cancer in the aggregate are a microcosm reflecting the larger state of affairs in genotype-disease studies. Namely, the literature to date is characterized by reports of associations that can neither be replicated nor corroborated by linkage (Gambaro et al. 2000; Weiss and Terwilliger 2000). Studies summarized above suffer from the common errors typically seen in association studies of complex disease ranging from, small sample size, subgroup analysis and multiple testing, failure to attempt study replication, failure to detect linkage disequilibrium with adjacent loci, and extending to perhaps positive publication bias (Cardon and Bell 2001). This phenomenon, at the macro level, is well demonstrated in a recent review by Hirschhom et al. (Hirschhorn et al. 2002) which finds that from 603 reported positive association, 166, approximately one quarter of the total, have been studied three or more times. And, of these replicated efforts, only 6 associations were consistently replicated. Despite these known, and aptly demonstrated, limitations the case-control study has a much greater statistical power to detect genetic contributions to complex disease as compared to linkage studies (Risch 2000). More specifically, haplotype-based association studies offer a more comprehensive approach than has been previously achievable for association studies of complex disease. By identifying the underlying haplotype structure and extent of linkage Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 12 0 disequilibrium (LD) across ESR1 and ESR2, it may be possible to identify those bialleleic markers which will be redundant versus which will be essential in association studies. Such essential SNPs have been referred to as ‘haplotype tag SNPs (htSNPs)’ and are markers that capture the haplotypes of a gene or a region of LD (Johnson et al. 2001). Hence, the need to discover and test every SNP across the chromosomal region of interest for a given phenotype is unnecessary (Kruglyak 1999; Lander and Schork 1994; Risch 2000). Instead diversity within the region has the potential captured by a much smaller subset of markers. Application of this approach has clear application in clarifying the role of estrogen receptor to the breast cancer phenotype which, to date, have focused on discovering and testing new variants. Future efforts could instead exploit LD across a locus by designing strategies which represent genetic variation in the region by genotyping a small fraction of polymorphic sites. Such an approach further may resolve controversial associations in other quantitative-trait loci such as cognitive decline or dementia. Finally, haplotype-association studies can function as replication studies of unbiased genomewide linkage analysis - this is particularly applicable for ESRJ as its chromosomal region (6q24-25) was identified as a prominent linkage peak in a study of stature (Hirschhom et al. 2001). As a result, considered LD mapping across ESR1 and ESR2 combined with careful attention to the phenotypic sample collected and the selection of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 121 appropriate control subjects has great potential for evaluating the relative contribution of these two promising candidates across a rich variety of phenotypes. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 122 CHAPTER 6 DENSE LINKAGE DISEQUILIBRIUM MAPS OF ESTROGEN RECEPTORS ALPHA (ESR1) AND BETA (ESR2) AND IMPLICATIONS FOR DISEASE ASSOCIATION STUDIES: THE MULTIETHNIC COHORT 6.1. Summary Here we report measurement of linkage disequilibrium (LD) using a dense set of polymorphic markers across the two Estrogen Receptor genes, ESR1 and ESR2, in a diverse panel consisting of unrelated individuals from five ethnic groups - African-American, Caucasian Hawaiian, Japanese, and Latino. The effects of estrogens are mediated by their intracellular receptors and their influence is well established for variety of phenotypes. We have scanned 394 kb of DNA from these two genes, genotyped 146 Single-Nucleotide Polymorphisms (SNPs; approximately 71,000 genotypes) and determined the common haplotypes for each of the five ethnic groups. The results appear to show a pattern of discrete haplotypic blocks of significant punctuated by discrete sites of recombination. Mean LD block size spanned a greater distance among the Japanese (28.0 kb), Caucasian (25.2 kb), and Hawaiian (21.9 kb) samples as compared to Latinos (17.0 kb) and African-Americans (11.9 kb). Similarly, the percentage of sequence identified in blocks ranged from 78.5% among Caucasians to 58.9% Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 123 among African-Americans. Within each block we discovered limited haplotype diversity with, on average, four to six common haplotypes (>5%) comprising a significant percentage of all chromosomes observed. We then employed a formal approach to capturing haplotype diversity within each regional block using a smaller subset of the markers, termed “haplotype tagging SNPs” (htSNPs). This catalogue of well chosen htSNPs provide a reference set for subsequent genotyping studies across the spectrum of phenotypes influenced by estrogens and estrogen receptor action. Lastly, we discuss some potential limitations of this approach, namely the high density of SNPs required to differentiate causal from associated SNPs, the impracticality of the approach in finding causal SNPs in rare haplotypes, and the degree of complexity introduced by population-specific blocks and haplotypes. 6.2. Introduction Considerable interest and enthusiasm has been generated concerning haplotype-based studies as a manner in which to approach association studies of complex disease (Collins et al. 1999; Jorde 2000; Kruglyak 1999; Ott 2000; Reich et al. 2001a). Evaluation of associations between DNA sequence variants and phenotype has been widely employed in genetic epidemiologic studies to identify regions of the genome contributing to disease. In the past, there have been relatively few DNA variants available for study. However, the discovery Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 124 and deposition in public databases of more than two million Single Nucleotide Polymorphisms (SNPs) has considerably changed the dimensions of this inquiry (Altshuler et al. 2000; Sachidanandam et al. 2001). These variants can either be investigated directly or through their non-random association with neighboring markers - termed linkage disequilibrium (LD). Several studies have independently demonstrated that on typing a large number of SNPs within a small genomic region, a crisp haplotype pattern emerges -regional blocks of variable length over which there is relatively little haplotype diversity (Daly et al. 2001; Jeffreys et al. 2001; Johnson et al. 2001). Once defined, a subset of SNPs (haplotype ‘tagging’ SNPs or ‘htSNPs’) can be defined which distinguish the common haplotypes in each block (Johnson et al. 2001) permitting association tests with disease of interest. Although estrogens are primarily recognized as female sex hormones, they are found in a variety of cells and tissues including bones, neural tissue, and arteries and are involved in the physiological and metabolic processes of each (Komm et al. 1988; McEwan and Alves 1999; Mendelsohn and Karas 1991). Their effects are mediated by their intracellular receptors of which there are two subtypes - estrogen receptor a (ESR1) (Green et al. 1986) and the more recently discovered estrogen receptor P (ESR2) (Kuiper et al. 1996). Estrogen receptor sequence variants have been implicated in a variety of pathological processes (reviewed in Chapter 5), the most significant and intensely studied of which is Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 125 breast cancer risk (Henderson et al. 1988). ESR1 polymorphisms in particular have been associated not only with overall breast cancer risk (Anderson et al. 1994; Schubert et al. 1999) but also age at onset (Pari et al. 1989). Estrogen receptors more generally have implications and are reasonable candidates in the pathogenesis of endometrial cancer (Kuiper et al. 1996), osteoporosis (Horowitz 1993), atherosclerotic disease (Sullivan et al. 1988), and Alzheimer’s disease (Henderson 1997). Case-control association mapping is a powerful approach to detecting complex trait loci which exhibit multiple genetic and environmental components contributing to susceptibility (Risch and Merikangas 1996). The basic approach identifies markers that are more frequently observed among affected individuals (cases) than among unaffected individuals (controls). Illustrated in Chapter 5, studies to date of ER variants across a variety of phenotypes have yielded conflicting results with some authors finding no association whatsoever, others reporting association, and still others unable to replicate a positive finding in an independent sample. As outlined by Lander and Schork (Lander and Schork 1994), a positive association can arise if a given allele does not cause the phenotype but is in linkage disequilibrium with the actual cause. True associations due to LD can yield what appear, on first inspection, to be contradictory results as a consequence of the particular genetic features of the population under study. Hence, characterization of the structure of LD Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 2 6 throughout the genome is a necessary companion to the successful pursuit of disease-susceptibility alleles. In this manner, rather than conducting a series of single allele surveys - which raise the specter of multiple hypothesis testing - methods based on haplotypes composed of multiple SNPs in combination has increased power of detection. Although a detailed knowledge of LD patterns in human populations is a necessary prerequisite to the haplotype-based disease gene mapping approach described above, it is important to note that several limitations to this approach require resolution. Firstly, any “haplotype mapping” project requires SNPs at high density (proportional to the rate of association decay over genetic distance) simply to differentiate a limited number of common haplotypes in a candidate region. Distinguishing casual SNPs from those merely associated with phenotype would require an even greater density (Maniatis et al. 2002) and still would not be well suited to discover causal SNPs in rare haplotypes (Pritchard and Przeworski 2001). Further, the proposal that disease association studies can restrict the focus of investigation to a subset of htSNPs differentiating common haplotypes is simply impractical in regions of low LD. An additional complexity to a haplotype-mapping approach is that different populations can exhibit distinct LD patterns. For example, it has been shown that LD is greatly reduced in African populations as compared to a US population o f northern European descents (Reich et al. 2001a). Similarly, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 127 haplotype block size has been estimated to have a mean size of approximately 26kb among European and Asian populations as compared to 17kb in an African and an African-American sample (Gabriel et al. 2002). In the extreme, sufficient recombination could extinguish LD between markers such that a well-defined block in Europeans could conceivably be absent altogether in a population with a greater number of generations. Accumulating evidence suggests that although the above limitations warrant consideration, that haplotype-based association studies remain a powerful approach to disease gene mapping. Firstly, male meiosis in the HLA region exhibits a tight clustering of recombinants falling into clusters 60-90kb apart with 1-7 kb of DNA separating each ‘hot spot’ within a cluster (Jeffreys et al. 2001). And, although this particular distribution of crossovers may be influenced by selection, there is evidence that the extended domains of LD observed are consistent with other estimates (Daly et al. 2001; Reich et al. 2001a). Further, there is compelling evidence that block boundaries and specific haplotypes are typically shared between populations (Gabriel et al. 2002); the authors suggest that this similarly is not surprising given that human population history is largely shared. This confluence of evidence suggests that, despite previous concerns (Clark et al. 1998; Hedrick 1988), LD is largely atomistic. Further, the reduced LD observed in African populations reflects the consequence of non-African Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 128 samples manifesting a subset of historical recombination events thus generalizing previous reports (Reich et al. 2001a; Tishkoff et al. 1996; Tishkoff et al. 2000). Hence, detection of shared block boundaries and the coupling strength between them (i.e. long-range or multiallelic D’) represent the first step for typing strategies and data analysis in association studies. Subsequent efforts, as outlined by Reich et al. (Reich et al. 2001a) could include initial localization of variants associated with disease could occur in large blocks of LD characteristic of European and Asian populations. Populations with smaller LD blocks could permit fme-structure mapping to identify the specific SNP responsible for phenotype (Jorde 2000). This current investigation intends to first provide a comprehensive description of the extent of LD, haplotype block structure and common haplotypes across ESR1 and ESR2 using a dense set of polymorphic markers. While the framework was initially constructed using polymorphic SNPs drawn from the public SNP map, we additionally included polymorphisms and mutations (e.g. missense mutations and restriction fragment length polymorphisms) commonly studied in genotype/phenotype association studies. Lastly, within haplotype blocks, we have captured and efficiently represented haplotype variation using a reduced number of “haplotype tag” SNPs (htSNPs). We expect that a catalogue of htSNPs will provide a useful reference set for Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 129 subsequent genotyping studies across the spectrum of phenotypes influenced by estrogens and estrogen receptor action. 6.3. Methods 6.3.1. Human Subjects For SNP genotyping, a total of 349 unrelated individuals from 5 ethnic groups were sampled: African-American (n=70), Native Hawaiian (n=70), Japanese (n=70), Latino (n=69), and Caucasian (n=70). This analysis is part of a large, ongoing, multiethnic cohort study in Hawaii and Los Angeles, California with an emphasis on diet and other lifestyle characteristics in the etiology of cancer. Aspects of this large cohort as well as details of its design and implementation are described more fully elsewhere (Kolonel et al. 2000) and are detailed in the Methods section of Chapter 2 (Section 2.3). Briefly, participants were recruited between 1993 and 1996 from driver’s license files in Hawaii and California; the age range at baseline was between 45-75 years. The total number of male and female subjects who comprised the cohort was 215,251. The study was approved by the Institutional Review Board of the Keck School of Medicine of the University of Southern California. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 130 6.3.2. Marker Selection and Mapping We initially selected regions of Chromosomes 6 and 14 that encompassed ESR1 and ESR2 respectively along with at least lOkb flanking the gene on both the 5’ and 3’ directions. We sought to space candidate SNPs at a density of one SNP roughly every 5 kb. We selected candidate SNPs discovered by the SNP Consortium (TSC) project (Sachidanandam et al. 2001). These have been discovered using a uniform protocol in a multiethnic sample of known composition and, as such, would likely be most applicable to this current study. In instances in which this selection procedure did not provide adequate coverage and/or as a consequence of genotyping failures, coverage was supplemented using publicly available SNPs with multiple submitters (http://www.ncbi.nlm. nih.gov/SNP/index.htmll as well as the Celera discovery effort (http://www.celera.com). We deliberately over-sampled variants in mouse homology regions (http://genome.ucsc.edu) (Kent et al. 2002) as well as known regulatory (e.g. promoter) regions described in the literature in order to ensure that we adequately covered these regions. Further, all missense coding region variants as well as variants explored in the literature across a variety of phenotypes were selected for inclusion. Regions with a high degree of repeat content adjacent to or encompassing a SNP of interest were eliminated from these analyses. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 131 Given that the accuracy of genome assembly is critical for the study of multimarker haplotypes, we reconfirmed map locations and regions with questionable assemblies by comparing the relative map positions of each SNP across multiple genome builds and independent assemblies (NCBI, UCSC, and Celera). With the exception of some minor perturbations in relative distance between SNPs, on the order of 100 bases, no significant differences were noted in the most recent assemblies of ESR1 and ESR2. 6.3.3. Genotyping Genomic DNA was purified from the buffy coats of peripheral blood samples for all cases and controls using the Quiagen DNA Blood Kit protocol (Quiagen Inc.; Hilden, Germany). SNP genotyping was performed by primer extension of multiplex products and detection by MALDI-TOF mass spectroscopy (Tang et al. 1999). Multiplex PCR was performed using 0.1 units of Hot Star Taq (Quiagen Inc), 2.5 ng of genomic DNA, 2.5 pmol of each PCR primer, and 2.5 pmol of dNTP. Thermocycling was at 95°C for 15 minutes, followed by 45 cycles of 95°C for 20 seconds, 56°C for 30 seconds, and 72°C for 30 seconds. Unincorporated dNTPs were inactivated using 0.3U of Shrimp Alkaline Phosphatase (Roche Diagnostics; Pleasanton, CA) followed by primer extension using 5.4 pmol of each primer extension probe, 50 pmol of the associated dNTP/ddNTP combination, and 0.5 units of Thermosequenase Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 132 (Sequenom Corporation; San Diego, CA). Reactions were cycled at 94°C for 2 minutes followed by 40 cycles at 94°C for 5 seconds, 50°C for 5 seconds, and , and 72°C for 30 seconds. Subsequently, a cation exchange resin was added to remove residual salt from the reaction and 7 nanoliters of the purified primer extension reaction was loaded onto a matrix pad (3-hydroxypicoloinic acid) of a SpectroCHIP (Sequenom Corporation). SpectroCHIPs were analyzed using a Bruker biflex III MALDI-TOF mass spectrometer (Bruker Daltronics; Bremen, Germany) and SpectroREADER (Sequenom Corporation). Spectra were processed using SpectroTYPER (Sequenom Corporation). Included in the experimental design was a 10% sample of intercalated repeated samples. Genotype calls were compared between repeats and, if not 100% concordant, the SNP was removed from further analysis. This phenomenon observed only in the context of violations of Hardy-Weinberg equilibrium (p<0.01) or when a high proportion of calls demonstrated uniformly heterozygous genotypes, indicative of a poorly performing genotype assay. 6.3.4. Statistical Analysis for Recombination and Linkage Disequilibrium LD patterns were defined by retained biallelic markers at each locus and were measured by the D ’ statistic which measures the non-random association between alleles at different loci (Lewontin 1964). Such a measure may be used to infer properties of population history, recombination or the location of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 133 mutations. If we consider two SNPs at a time before recombination, mutation, or any other event has occurred to create a fourth haplotype (Table 18 below): Table 18. Tabular representation of two SNPs in full disequilibrium. SNP 2 SNP1 Allele 1 Allele 2 Allele 1 a b Allele 2 c=0 d In this configuration, the measure of disequilbrium D’ is equivalent to 1: D ’ = (ad-bc) / [(a + c) (c + d)] (Equation 1) More generally, and after many generations and the opportunity for these two markers to freely recombine, the table of haplotype frequencies will have changed to: Table 19. Tabular representation of two SNPs after free recombination. SNP 2 SNP1 Allele 1 Allele 2 TOTAL Allele 1 Pn Pl2 Pi Allele 2 P21 P22 P2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 134 Here, the LD statistic D is a pairwise comparison of gametic frequencies such that: D = pi i P22- P12 P21 • (Equation 2) D’ is the relative disequilibrium: D ’ = D / |D|max) where D|m a x = max(p 1 p2, q 1 q2) if D < 0 and |D|m a x = min (qlp2, plq2) if D > 0. (Equation 3) It is important to note that |D’| can only be less than one if all four possible haplotypes are present for a pair of segregating sites. This is quite different behavior as compared to other measures of LD, such as the r2 value which we also evaluated as a measure of correlation between allele pairs (Hill and Robertson 1968). Nevertheless, the D’ statistic was retained as it has the same range of values regardless of the frequencies of SNPs compared (Lewontin 1964). Further, a |D’| of 1 indicates complete LD, i.e. pairs of SNP alleles that have been inherited without recombination since their shared ancestor, whereas low values of |D’| (<0.5) indicate a high degree of recombination. As a consequence of significant sampling variation in observed values of D’, we calculated 95% confidence intervals of D’ for each pair of markers. Limits were determined by calculating a probability distribution for D ’ given the observed two marker gamete counts; the upper and lower limits representing the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 135 tails of that distribution. Both D’ and the 95% confidence intervals were calculated using a Visual Basic program (Appendix A) run on a Microsoft Excel platform (Microsoft Corportion; Redmond, WA). We defined strong LD as that for which D’ is 0.98 or greater and the lower confidence limit exceeds 0.7. On the other hand, strong evidence of historical recombination was defined as those comparisons for which the upper confidence limit was below 0.9. It is the calculation of D ’ and its associated 95% confidence limits that provided the framework for applying a haplotype block definition outlined below. 6.3.5. Fine Structure of Haplotype Blocks To characterize the fine structure of haplotypes contributing to the overall distribution of D’ values, we sought to objectively parse the loci considered into haplotype ‘blocks.’ We employed the approach outlined by Gabriel et al. (Gabriel et al. 2002) in which blocks were defined as regions that have undergone as little historical recombination as is typical for a lkb region. Visual inspection alone of an LD plot suggested simple definition of blocks consistent with this definition. Further, these plots were consistent with reports showing that LD is not continuously degraded over intermarker distance but rather extends over long distances, punctuated with short regions of recombination (Johnson et al. 2001; Reich et al. 2001a). Data were first directly examined for motifs of at least 5 consecutive markers showing “strong LD” as defined above. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 136 For regions with more sparse SNP coverage, a final block definition was selected to obtain a maximum of 5% of internal pairs displaying strong evidence of recombination. The Gabriel et al. (Gabriel et al. 2002) block definition was formally implemented in a Visual Basic application (Appendix B). Initially, we outlined the framework of these analyses by considering all five populations together and, secondly, evaluated each ethnic group separately. This was performed with the intention of comparing the boundaries and dimension of each block for each ethnic group. As a companion analysis, we also pooled all non-African American populations into a single dataset (i.e. Native Hawaiian, Japanese, Latina, and Caucasian) and noted these boundaries. Markers with appreciable frequency (>2% in any of the ethnic groups) were retained for the analyses outlined below. We similarly evaluated LD and block profiles using obtained using more stringent minor allele frequency cutpoints (0.05,0.10, 0.15 and 0.25). 6.3.6. Estimation of Haplotypes and Haplotype Tagging SNP Selection To describe the haplotype phylogeny within each block of the ESR1 and ESR2 loci, we used an extension of the E-M algorithm of Excoffier and Slatkin (Excoffier and Slatkin 1995) described more fully in Stram et al. (Stram 2003). A number of previous reports have demonstrated the validity and appropriateness of the E-M algorithm for inferring haplotypes from SNP data Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (Excoffier and Slatkin 1995; Niu et al. 2002; Tishkoff et al. 2000). Briefly, this method uses an expectation-maximization algorithm that determines the maximum-likelihood frequencies of multilocus haplotypes. Each expectation depends upon an imperfectly known true population haplotype frequency (Ph) which is updated at each iteration of the M-step. This estimate is conditional on the genotype data G, for each subject, /, of the “haplotype dosage”, bh(H,) which is the count of the number of copies of haplotype h contained in the true pair of haplotypes Ht carried by the individual (so that bh(H,) takes the values of 0, 1, or 2). This haplotype dose estimate is based on the assumption of Hardy-Weinberg equilibrium and is equivalent to: E{6h(H,) | G,} = Eh ~ G i 6h (H) p h \p\a! ~ G i Phi Phi (Equation 4) Where Eh-gi is the summation over the haplotype pairs, H = {hi, hi) compatible with observed genotype data. Also,/?h is the frequency of the haplotype h. The SNPs used to construct haplotypes had minor allele frequencies of at least 2% in any of the five ethnic populations. It has been shown that extremely low frequency SNPs have little power for the detection of LD (Goddard et al. 2000; Lewontin 1995). We did, however, include those variants with a minor allele frequency of 2% or greater in one ethnicity but effectively absent in other groups, although such variants increase the number of low frequency haplotypes observed as well as adding population-specific haplotypes. We posit that the inclusion of such SNPs and their associated haplotypes are key to any Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 138 investigation of phenotypes which disproportionately affect one population group as is well documented for prostate cancer (Hsing et al. 2000) and has been suggested for breast cancer (Pike et al. 2002). We estimated haplotype frequencies by analyzing genotype data as five separate datasets stratified by ethnicity and also as a single aggregate dataset. It has been noted, and held true for these data, that within defined genomic regions, the observed haplotypes commonly fall into rather few major groups. Further, LD and haplotype diversity within the region can be effectively captured by a smaller subset o f markers termed “haplotype tagging SNPs” (htSNPs) (Johnson et al. 2001). For any given set of true haplotype frequencies, Ph, a formal calculation under the assumption of Hardy-Weinberg equilibrium can be made of the squared correlation, R h, between the estimate, | G,}, and the true value, 5 of the number o f copies of h carried by a given subject. In particular: Rh2= War[E{bh(H,) | G,}] / 2ph (1 - p h ) (Equation 5) The variance of the expectation is computed by averaging £{8/,(//,) | G,} and E{8t,(H,) | G,} over all possible genotypes G, weighted by the relative probability of each genotype. A program written in Fortran and explained more fully in Stram et al. (Stram 2003), searches all subsets with no upper bound to size, and chooses htSNPs by maximizing the minimum value of the coefficient Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 139 of determination (Rf,2) over the common SNPs thus defining the proportion of diversity explained by a set of htSNPs. Given that previous efforts have noted that merged analysis combining different population demonstrates that both the block structure and specific alleles are shared and can be readily identified in a pooled sample (Gabriel et al. 2002), we began the htSNP selection process by choosing a ‘superset’ of htSNP candidates that would effectively mark common haplotypes (>5%) in all ethnic groups. The 5% value was selected as a frequency threshold given that attainable sample sizes in disease association studies provide reasonable statistical power to detect variants with risk ratios of 1.5 or higher (Johnson et al. 2001). Further, we “forced” in particular SNPs which we posited had either particular historical interest or a higher probability of influencing protein function. Namely, we retained SNPs that have traditionally been studied in ER phenotype association studies (e.g. the pPvuII and pXBal RFLPs) as well as coding region SNPs. Haplotype tagging SNPs were chosen in concert with these “forced” SNPs so that a minimum Rh2 of 0.7 was obtained. However, to effectively define specific characteristics of haplotypes in each ethnic group, it is necessary to examine each population individually (Gabriel et al. 2002). At first consideration, such an approach would appear counter-intuitive to an investigation based on the common disease common variant (CDCV) hypothesis. As addition of either low-frequency or population- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 140 specific markers would partition shared haplotypes into subgroups leading to a loss of power or information. Taken to the extreme, each individual could potentially carry a unique haplotype. Rather, it would be expected a priori that in the search for a functional variant, the most commonly occurring haplotypes, particularly those shared across similarly susceptible populations, are most pertinent. For, if a positive association is observer with one marker which truly represents the root cause of a phenotype, then the same positive result would be expected in all populations. In this context, however, it is important to consider situations in which a casual variant might not display such consistency. One could postulate that in a candidate gene, the disease-causing variant could appear on different haplotypes perhaps as a function of population-specific LD patterns. Alternatively, there may exist a situation in which there exist many disease-susceptibility alleles or, perhaps, an underlying causal variant could vary by population. In these instances, then, association tests would not display this consistency (Lander and Schork 1994). Lastly, case-control association study samples that are ethnically mixed or derived from a population experiencing recent admixture, may exhibit non-random associations even at markers unlinked to phenotype (Chakraborty and Weiss 1988; Lander and Schork 1994; Pritchard and Rosenberg 1999). Yet, in this latter instance, well-described approaches which entail genotyping a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 141 moderate number of unlinked markers can distinguish true associations in the context of population stratification (Reich and Goldstein 2001b). Yet, by focusing on only the most commonly occurring haplotypes in a given locus and not cataloging its genetic diversity, it is possible to overlook some haplotypes in this region. Hence, a haplotype-based study could perhaps suffer from a lack of discrimination or, potentially, miss a functional region entirely. As a consequence, in this study, a ‘superset’ of htSNPs was sequentially evaluated for each ethnic group. In most instances, fewer htSNPs could be used to effectively capture the LD and haplotype diversity within the region of interest for a particular ethnic group. At some intervals, additional htSNPs were added for an individual ethnic group, particularly if there were many haplotypes clustered between 5% and 10%, so that the full haplotype diversity could be captured at an effective minimum Rh of 0.7. This most commonly occurred among African-Americans, less often among Latinos and was observed only in quite rare instances among the other ethnic groups. In general, there was a wide choice of htSNPs that displayed operating characteristics of the required minimum. We sought to arrive at a ‘common minimum’ set of htSNPs that could be widely applicable to all five ethnic groups. For particular blocks it was occasionally required that ethnic-specific htSNPs supplement this minimum set to obtain an Rh of 0.7 within that Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 142 individual ethnic group. It was not difficult to achieve high levels, often exceeding our minimum threshold, for this index for any given block. 6.4. Results 6.4.1. ESR1 6.4.1 a. Genotyping Results and Allele Frequencies A total of 173 variants spanning ESR1 were selected and assayed in our multiethnic panel. O f these, 23 (13.3%) were monomorphic, displaying only a single allele across the nearly 700 chromosomes assayed. Further, the genotyping results of 6 (3.5%) violated the expectation of Hardy-Weinberg equilibrium at a significance level of p<0.01. An additional 20 (11.6%) failed in >30% of assays and were eliminated from the analyses. Lastly, 10 (5.8%) SNPs were rare (<2%) in all five ethnic groups considered (Figure 17). Therefore, a total of 114 polymorphic SNPs were retained for the subsequent LD and haplotype analysis; these were spaced with an average density of 1 SNP per 2.8 kb (Figure 18). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 143 5.8% 10 Rare 173 SNPs 23 Monomorphic 114 SNPs Retained c = > V5' 20 Low Genotyping % Figure 17. ESR1 SNP assay results: 114SNPs were retained for LD and haplotype analysis. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 144 5 -9928 -3603 -12 246 3128 4593 34684 42190 65774 j ... j . . . i i ... j i . . . i 1 2 r - " ' V L ~ n m T IT "" T T "'trfliP f TfTli'TTfnff’ m i (AA Change) 72742 98377 136260 162319 176000 203743 J __________ L _ m m ■ m r l i 236901 253078 270446 286472 290819 295431 312540 tftffff tf'tt P m m m m % ' W ' T -irk ■•tt TT m ■,-y. kl'V: m = m o n o m o rp h ic t t ;;v ra re reta in ed Figure 18. Intronic and exonic organization of ESR1 with SNP assay results denoted. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 145 The analysis of LD patterns, haplotype diversity and the selection of htSNPs covers a 322.1 kb span of Chromosome 6 and encompasses all 8 protein coding exons of ESR1, its 3’ UTR, an additional 9.9 kb upsteam as well as 16.8 kb downstream from the transcription stop site. Details on the location and genotyping percentage for each of the 114 SNPs retained are presented in Table 20. Allele frequencies for each retained polymorphic SNP varied across each of the five ethnic groups screened (Table 21). The majority of SNPs retained were located in introns (n = 95; 83.3%) or outside of the gene itself (n=14; 12.3%). A total of 5 SNPs (4.4%) were in coding or UTR sequences. Table 20. ESR1 SNP location, distance between sites and genotyping percentage. SNP SNP Reference N um ber (rs) Distance from S tart Distance between SNPs Loc. in gene Golden Path C ontiga Genotyping Failure % 1 rs2881776 -9928 5' 164228335 7.16% 2 rs488133 -3603 6325 Promoter 164234660 4.58% 3 rs2071454 -2223 1380 Promoter 164236040 13.75% 4 rs867239 -416 1807 Promoter 164237847 6.59% 5 rs2077647 30 446 Exon 1 164238293 7.74% 6 rs746432 261 231 Exon 1 164238524 13.18% 7 rs532010 1871 1610 Intron 1 164240134 14.61% 8 hCVl 1414976 2273 402 Intron 1 164240536 1.15% 9 rs576330 3128 855 Intron 1 164241391 2.58% 10 hCV3163576 8990 5862 Intron 1 164247253 15.47% 11 hCVl 1414947 18159 9169 Intron 1 164256422 0.57% 12 hCV 1987600 19050 891 Intron 1 164257313 1.15% 13 hCV 1987601 20152 1102 Intron 1 164258415 1.72% 14 hCV 1987602 20387 235 Intron 1 164258650 7.74% 15 hCV1987605 22136 1749 Intron 1 164260399 20.06% 16 hCV3163583 23940 1804 Intron 1 164262203 0.29% 17 hCV3163587 28833 4893 Intron 1 164267096 0.86% 18 hCV 1987608 30578 1745 Intron 1 164268841 7.16% 19 hCV 1987609 33269 2691 Intron 1 164271532 0.29% 20 pPvuII 34288 1019 Intron 1 164272551 1.15% Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 146 Table 20 (continued). SNP SNP Reference N um ber (rs) Distance from S tart Distance between SNPs Loc. in gene Golden Path Contig * Genotyping Failure % 21 pXbal 34334 46 Intron 1 164272597 2.87% 22 hCV 1987611 37753 3419 Intron 2 164276016 1.43% 23 hCV 1987612 38263 510 Intron 2 164276526 2.29% 24 hCV3163593 39408 1145 Intron 2 164277671 0.86% 25 hCV3163594 42850 3442 Intron 2 164281113 4.58% 26 hCV3163595 50284 7434 Intron 2 164288547 2.01% 27 hCV3163601 55082 4798 Intron 2 164293345 4.01% 28 rsl 033181 65774 10692 Intron 2 164304037 1.43% 29 rs 1033182 65987 213 Intron 2 164304250 15.19% 30 rs2175898 67905 1918 Intron 2 164306168 1.72% 31 hCV 1576295 70311 2406 Intron 2 164308574 18.05% 32 hCV1576303 83460 13149 Intron 3 164321723 4.01% 33 hCV 1576304 85896 2436 Intron 3 164324159 4.87% 34 hCVl 1414864 89031 3135 Intron 3 164327294 3.72% 35 hCV 1576306 96185 7154 Intron 3 164334448 9.17% 36 hCV 1576307 96335 150 Intron 3 164334598 8.02% 37 rs2347923 98374 2039 Intron 3 164336637 3.15% 38 rsl 514347 100398 2024 Intron 3 164338661 2.29% 39 rs2347867 100803 405 Intron 3 164339066 1.72% 40 hCV 1576311 105852 5049 Intron 3 164344115 0.57% 41 hCV 1576312 106291 439 Intron 3 164344554 1.15% 42 hCV 1576315 108570 2279 Intron 3 164346833 2.58% 43 hCV 1576316 111400 2830 Intron 3 164349663 18.62% 44 rs988328 112103 703 Intron 3 164350366 6.59% 45 rs2347868 122521 10418 Intron 3 164360784 2.58% 46 hCV50675 133065 10544 Intron 3 164371328 5.16% 47 hCV50676 133405 340 Intron 3 164371668 8.60% 48 hCV50678 134332 927 Intron 3 164372595 1.15% 49 rs 1801132 136475 2143 Exon 4 164374738 0.57% 50 rs3020390 147411 10936 Intron 4 164385674 0.29% 51 rs3003921 150467 3056 Intron 4 164388730 21.20% 52 rsl 884051 154232 3765 Intron 4 164392495 0.86% 53 rs985191 154411 179 Intron 4 164392674 9.74% 54 rs985192 154431 20 Intron 4 164392694 0.29% 55 rs985694 157578 3147 Intron 4 164395841 10.32% 56 rs985695 157658 80 Intron 4 164395921 8.31% 57 rsl 884049 158320 662 Intron 4 164396583 0.29% 58 rs932479 158990 670 Intron 4 164397253 0.00% 59 rsl 884052 162319 3329 Intron 4 164400582 0.29% 60 rsl 884054 162519 200 Intron 4 164400782 0.57% Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 147 Table 20 (continued). SNP SNP Reference N um ber (rs) Distance from S tart Distance between SNPs Loc. in gene Golden Path C ontiga Genotyping Failure % 61 rs3020325 164858 2339 Intron 4 164403121 0.29% 62 rs3020404 167939 3081 Intron 4 164406202 0.86% 63 rs2179922 168053 114 Intron 4 164406316 0.86% 64 rs3020327 172598 4545 Intron 4 164410861 2.58% 65 rs726281 173531 933 Intron 4 164411794 12.61% 66 rs726282 173607 76 Intron 4 164411870 2.01% 67 rs726283 173962 355 Intron 4 164412225 7.74% 68 rs728523 174144 182 Intron 4 164412407 0.00% 69 rs932477 175549 1405 Intron 4 164413812 1.15% 70 rs926777 176000 451 Intron 4 164414263 2.87% 71 rs2144025 178659 2659 Intron 4 164416922 2.58% 72 rs722208 193838 15179 Intron 4 164432101 2.29% 73 rs722209 194149 311 Intron 4 164432412 2.58% 74 rs2207230 195052 903 Intron 4 164433315 1.15% 75 rs3020409 195841 789 Intron 4 164434104 0.57% 76 rsl569788 199569 3728 Intron 4 164437832 1.15% 77 rs2207231 200837 1268 Intron 4 164439100 2.29% 78 rs2207232 211241 10404 Intron 5 164449504 0.57% 79 rs3020411 214716 3475 Intron 5 164452979 2.87% 80 rs926778 226735 7660 Intron 5 164464998 8.88% 81 rs926779 226873 138 Intron 5 164465136 2.58% 82 rs2348078 227140 267 Intron 5 164465403 1.15% 83 rs3020432 228880 1740 Intron 5 164467143 0.00% 84 rs3020435 230008 1128 Intron 5 164468271 7.74% 85 rs3020364 238071 8063 Intron 5 164476334 0.00% 86 rs3020368 242143 4072 Intron 5 164480406 0.29% 87 rs3020369 245485 3342 Intron 5 164483748 20.63% 88 rsl 884152 248133 2648 Intron 5 164486396 0.00% 89 rs2207396 253335 5202 Intron 6 164491598 0.00% 90 rs974276 253373 38 Intron 6 164491636 1.43% 91 rs974277 253774 401 Intron 6 164492037 1.43% 92 rs3020372 255881 2107 Intron 6 164494144 1.43% 93 rs2982894 262995 7114 Intron 6 164501258 4.01% 94 rs2982896 270446 7451 Intron 6 164508709 2.01% 95 rs2459107 270578 132 Intron 6 164508841 4.30% 96 rs2982899 276565 5987 Intron 6 164514828 1.15% 97 rs750686 279079 2514 Intron 6 164517342 0.57% 98 rs749491 279839 760 Intron 6 164518102 7.74% 99 rs2982900 285945 6106 Intron 6 164524208 0.00% 100 rs3020383 287732 1787 Intron 7 164525995 0.00% Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 148 Table 20 (continued). SNP SNP Reference N um ber (rs) Distance from S tart Distance between SNPs Loc. in gene Golden Path C ontig” Genotyping Failure % 101 rs2747647 289825 2093 Intron 7 164528088 4.58% 102 rs3020384 290143 318 Intron 7 164528406 2.87% 103 G499ul 291048 905 Exon 8 164529311 0.57% 104 rsl 062577 294858 3810 3' UTR 164533121 2.58% 105 rs2813543 295431 573 3’ 164533694 0.86% 106 rs2813544 296535 1104 3' 164534798 0.57% 107 rs926848 297153 618 3' 164535416 0.57% 108 rsl 543403 299657 2504 3' 164537920 4.30% 109 rsl 543404 299791 134 3' 164538054 1.72% 110 rs2813545 303270 3479 3' 164541533 0.86% 111 rs910416 303855 585 3’ 164542118 3.15% 112 rs2813546 306719 2864 3' 164544982 19.20% 113 rs2747652 307969 1250 3' 164546232 0.86% 114 rs2813549 312192 4223 3' 164550455 1.43% “Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. The Human Genome Browser at UCSC. Genome Research, 12: 996-1006,2002. Table 21. ESR1 minor allele frequencies for the 114 SNPs in each of five ethnic groups. Allele Frequencies bv Ethnicitv SNP Site African- American Hawaiian JaDanese Latino Caucasian 1 rs2881776 0.46 0.25 0.43 0.35 0.24 2 rs488133 0.13 0.32 0.10 0.28 0.23 3 rs2071454 0.32 0.20 0.29 0.14 0.17 4 rs867239 0.13 0.08 0.18 0.12 0.01 5 rs2077647 0.39 0.38 0.39 0.34 0.40 6 rs746432 0.04 0.01 0.00 0.01 0.11 7 rs532010 0.35 0.33 0.38 0.31 0.46 8 hCV 11414976 0.12 0.07 0.18 0.13 0.01 9 rs576330 0.00 0.01 0.00 0.01 0.04 10 hCV3163576 0.08 0.00 0.01 0.02 0.01 11 hCV 11414947 0.25 0.09 0.25 0.19 0.07 12 hCV 1987600 0.33 0.21 0.28 0.20 0.20 13 hCV 1987601 0.06 0.00 0.01 0.02 0.02 14 hCV 1987602 0.24 0.21 0.29 0.13 0.18 15 hCV 1987605 0.32 0.16 0.22 0.18 0.18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 149 Table 21 (continued). Allele Frequencies bv Ethnicitv SNP Site African- American Hawaiian JaDanese Latino Caucasian 16 hCV3163583 0.49 0.21 0.29 0.12 0.22 17 hCV3163587 0.44 0.37 0.38 0.37 0.43 18 hCV 1987608 0.41 0.32 0.31 0.30 0.50 19 hCV 1987609 0.08 0.16 0.01 0.29 0.36 20 pPvuII 0.49 0.34 0.40 0.36 0.45 21 pXbal 0.23 0.26 0.16 0.32 0.39 22 hC V I987611 0.38 0.36 0.38 0.33 0.46 23 hCV 1987612 0.07 0.15 0.01 0.28 0.37 24 hCV3163593 0.49 0.35 0.37 0.36 0.46 25 hCV3163594 0.16 0.16 0.01 0.24 0.35 26 hCV3163595 0.14 0.09 0.28 0.02 0.00 27 hCV3163601 0.13 0.11 0.11 0.04 0.02 28 rsl0 3 3 181 0.01 0.01 0.00 0.01 0.09 29 rs 1033182 0.07 0.15 0.02 0.25 0.36 30 rs2175898 0.12 0.50 0.38 0.41 0.22 31 hCV 1576295 0.12 0.20 0.14 0.19 0.31 32 hCV 1576303 0.30 0.06 0.15 0.18 0.11 33 hCV 1576304 0.02 0.01 0.00 0.02 0.07 34 hCVl 1414864 0.35 0.33 0.24 0.45 0.42 35 hCV 1576306 0.25 0.06 0.13 0.14 0.05 36 hCV 1576307 0.35 0.35 0.23 0.48 0.38 37 rs2347923 0.42 0.45 0.37 0.47 0.36 38 rsl 514347 0.12 0.39 0.49 0.31 0.26 39 rs2347867 0.36 0.36 0.24 0.49 0.37 40 hCV 1576311 0.07 0.15 0.03 0.26 0.29 41 hCV 1576312 0.35 0.36 0.24 0.46 0.37 42 hCV 1576315 0.48 0.46 0.36 0.46 0.37 43 hCV 1576316 0.36 0.38 0.16 0.45 0.33 44 rs988328 0.06 0.38 0.48 0.24 0.17 45 rs2347868 0.06 0.39 0.50 0.27 0.26 46 hCV50675 0.40 0.38 0.28 0.48 0.40 47 hCV50676 0.34 0.06 0.15 0.18 0.10 48 hCV50678 0.31 0.05 0.14 0.17 0.11 49 rs 1801132 0.09 0.40 0.49 0.26 0.26 50 rs3020390 0.34 0.43 0.34 0.41 0.33 51 rs3003921 0.27 0.44 0.41 0.12 0.22 52 rsl 884051 0.47 0.46 0.42 0.36 0.34 53 rs985191 0.22 0.08 0.15 0.15 0.09 54 rs985192 0.32 0.48 0.46 0.19 0.26 55 rs985694 0.19 0.42 0.47 0.13 0.25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 150 Table 21 (continued). Allele Frequencies bv Ethnicitv SNP Site African- American Hawaiian JaDanese Latino Caucasian 56 rs985695 0.40 0.14 0.18 0.09 0.14 57 rsl 884049 0.37 0.45 0.49 0.18 0.24 58 rs932479 0.19 0.43 0.46 0.12 0.24 59 rsl 884052 0.11 0.29 0.36 0.09 0.21 60 rsl 884054 0.24 0.43 0.33 0.26 0.40 61 rs3020325 0.36 0.43 0.41 0.21 0.40 62 rs3020404 0.06 0.21 0.15 0.34 0.27 63 rs2179922 0.09 0.27 0.36 0.05 0.14 64 rs3020327 0.09 0.26 0.37 0.05 0.13 65 rs726281 0.26 0.41 0.39 0.22 0.24 66 rs726282 0.09 0.35 0.35 0.05 0.12 67 rs726283 0.24 0.41 0.36 0.20 0.26 68 rs728523 0.21 0.01 0.06 0.06 0.00 69 rs932477 0.20 0.35 0.35 0.04 0.14 70 rs926777 0.45 0.43 0.50 0.13 0.25 71 rs2144025 0.45 0.38 0.46 0.12 0.17 72 rs722208 0.46 0.44 0.45 0.20 0.24 73 rs722209 0.12 0.22 0.24 0.09 0.10 74 rs2207230 0.04 0.00 0.00 0.02 0.00 75 rs3020409 0.00 0.01 0.00 0.06 0.05 76 rsl 569788 0.44 0.43 0.46 0.19 0.26 77 rs2207231 0.14 0.20 0.26 0.09 0.11 78 rs2207232 0.18 0.23 0.28 0.09 0.11 79 rs3020411 0.40 0.49 0.48 0.25 0.30 80 rs926778 0.41 0.32 0.40 0.22 0.23 81 rs926779 0.43 0.31 0.38 0.21 0.23 82 rs2348078 0.40 0.49 0.50 0.25 0.30 83 rs3020432 0.40 0.49 0.50 0.26 0.30 84 rs3020435 0.40 0.48 0.50 0.23 0.28 85 rs3020364 0.46 0.49 0.50 0.26 0.30 86 rs3020368 0.14 0.08 0.11 0.05 0.05 87 rs3020369 0.36 0.48 0.49 0.23 0.26 88 rsl 884152 0.34 0.22 0.28 0.16 0.11 89 rs2207396 0.26 0.26 0.19 0.10 0.17 90 rs974276 0.33 0.22 0.28 0.16 0.12 91 rs974277 0.34 0.22 0.28 0.16 0.12 92 rs3020372 0.12 0.08 0.10 0.05 0.10 93 rs2982894 0.05 0.07 0.09 0.04 0.04 94 rs2982896 0.14 0.26 0.19 0.10 0.18 95 rs2459107 0.41 0.48 0.47 0.23 0.28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 151 Table 21 (continued). Allele Frequencies bv Ethnicitv SNP Site African- American Hawaiian Japanese Latino Caucasian 96 rs2982899 0.07 0.09 0.11 0.05 0.04 97 rs750686 0.43 0.48 0.47 0.23 0.31 98 rs749491 0.41 0.47 0.46 0.25 0.31 99 rs2982900 0.03 0.08 0.11 0.04 0.02 too rs3020383 0.04 0.09 0.11 0.04 0.04 101 rs2747647 0.02 0.01 0.00 0.05 0.08 102 rs3020384 0.48 0.37 0.24 0.24 0.20 103 G499ul 0.20 0.31 0.16 0.24 0.23 104 rsl 062577 0.07 0.09 0.24 0.14 0.07 105 rs2813543 0.07 0.09 0.04 0.11 0.21 106 rs2813544 0.18 0.12 0.15 0.28 0.24 107 rs926848 0.00 0.07 0.25 0.11 0.05 108 rsl 543403 0.47 0.49 0.40 0.48 0.50 109 rsl 543404 0.45 0.50 0.41 0.49 0.46 110 rs2813545 0.24 0.12 0.16 0.29 0.25 111 rs910416 0.48 0.49 0.41 0.44 0.49 112 rs2813546 0.02 0.03 0.02 0.04 0.05 113 rs2747652 0.45 0.49 0.41 0.42 0.48 114 rs2813549 0.07 0.11 0.04 0.12 0.22 6.4.1b. Linkage Disequilibrium and Haplotype Block Structure o f ESR1 Patterns of linkage disequilibrium (marker versus marker) as measured by the D ’ statistic across ESR1 revealed that the region spanned by these 114 markers could be decomposed into multiple, discrete haplotype blocks. The amount and pattern of LD is summarized for each of the 5 ethnic groups in a D’ statistic plot (Figures 19a-e). These analyses used a minor allele cutpoint of 0.10, however similar LD profiles are obtained using higher minor allele frequencies (0.15 and 0.25; data not shown). In addition, the features and divisions of this block-like pattern of LD are largely retained across ethnic Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 152 groups. Nevertheless, some groups (e.g. Japanese and Hawaiians) appeared simply on inspection to have longer stretches of significant LD as compared to African-Americans and Latinos. We then formally identified those regions of ESR1 that are inherited without significant historical recombination - i.e. those stretches contained within discrete haplotype blocks. By analyzing all ethnic groups combined, we systematically identified 13 blocks with a mean block size of 20.2 kb; a total of 81.6% of the sequence examined was found in these blocks. The smallest of these blocks spanned 4.3 kb (markers 65-70) whereas the largest stretched 46.0 kb (markers 77-87). Further, three zones of recombination were noted; one in Intron 3 (marker 26, 6.1 kb span), the second in Intron 5 (marker 88, 3.9 kb span), and the third beginning in Intron 7 and extending through Exon 8 into the 3’ UTR (markers 101-104, 6.3 kb span). The size of the individual blocks were 12.2 kb, 20.8 kb, 17.7 kb, 16.4 kb, 35.2 kb, 13.9 kb, 7.0 kb, 14.9 kb, 4.3 kb, 20.9 kb, 46.0 kb, 38.0 kb, and 17.0 kb from Block 1 to Block 13 respectively. A similar, although not identical block structure was evident in each of the five ethnic groups. Fewer blocks were evident when ESR1 was considered in the Caucasian sample (10 blocks, mean block size 25.2 kb, 78.5% of total sequence encompassed). The smallest block was 4.3 kb and the largest 82.7 kb. Even fewer blocks were evident among Japanese participants (9 blocks, 28.0kb, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 19a. African-American i«S258253 :»55525582 153 5222282 2222:52 1 2 : 5 2 2 3 33232I88 _ 2222225 : : S ! 5 5 3 B 3 £ S 3 S S 9 S 3 ! S S 5 £ S 2 3 2 S ! ! 3 2 l ! i | 5 2 2 ! 2 5 2 5 2 5 3 5 3 8 : 5 3 £ 8 I 2 ; 2 8 5 2 5 3 3 2 2 I 5 8 8 2 2 2 : 2 ! 2 5 : 2 2 ! ! 2 £ | S H 5 2 ; ! ! 2 ! ! 2 ! ! 2 2 ! 8 ! 2 3 2 ! 2 £ l 5 l l 5 : | ! 5 5 ! : ; 2 3 : 8 ! : ! ! £ 5 | 2 ! 2 ! 8 ! ! 2 ; 2 ! 8 : 2 3 ; ; 8 2 2 ! S ! 2 ! ! ! ! 2 £ £ 5 5 ! 5 5 ! 5 ! ! 5 ! 5 ! S 2 8 : 8 8 3 2 ! ! 5 5 2 2 3 2 £ ! 5 5 5 2 3 S 5 2 3 5 2 8 5 5 ; 8 5 5 2 2 2 5 2 ! S ! 2 2 ; 2 3 2 : S ! S ! : 2 2 ! 2 5 5 : 2 2 5 S I 2 2 2 ! 8 S S 3 3 8 5 | 2 5 2 S 2 a 3 2 2 8 3 : 2 2 : : 8 2 2 5 S SS 3S§ 3 8 2 2 3 8 8 8 8 3 S ! 8 | 3 8 S 2 2 8 2 2 8 8 8 8 S S 2 2 8 8 3 5 8 S 8 8 S 2 2 3 8 2 2 2 3 2 2 3 2 8 2 5 8 3 3 2 2 2 2 3 8 8 2 3 8 2 3 3 5 8 8 2 5 5 5 2 2 2 2 2 2 2 2 2 5 2 2 1 2 2 3 2 2 S | 2 3 2 2 2 5 5 3 2 3 3 2 5 2 2 5 2 5 3 5 S 2 3 8 2 2 2 2 3 3 2 2 2 3 2 2 2 2 S3 338 8222 325 5 82 535 2 | 2 3 3 i 3 3238 535 2 5 2 2 5 2 2 5 2 5 2 5 3 5 3 5 2 5 8 2 2 2 2 5 2 2 2 2 2 2 2 5 3 5 2 1 3 2 5 5 5 5 5 5 2 2 2 5 3 2 3 5 2 5 2 3 5 2 5 2 5 2 2 2 2 2 3 5 5 5 5 5 2 3 3 2 5 5 5 5 3 5 3 2 5 5 3 5 5 3 5 5 5 5 2 5 : 5 5 5 2 2 2 3 5 2 5 2 2 2 2525335522535555523552555522:53535222_ 5 5 5 5 3 5 5 533 255 2 5 5 2 3 5 5 2 2 5 2 5 5 3 2 5 2 3 S 3 B 3 5 5 2 5 _____ 3 5 5 3 3 2 5 2 | 5 5 3 5 5 5 2 3 5 2 B i H 3 H 5 | 2 5 5 5 5 3 5 S 5 | 5 5 5 5 5 235 3 2 5 5 2 5 2 5 | 5 535 3 : 5 2 3 3 3 5 5 2 3 2 5 5 : 5 3 5 5 : 2 5 5 3 2 2 5 3 2 2 2 3 5 5 2 5 2 2 2 5 2 2 5 2 2 3 5 3 5 2 5 3 8 5 8 5 3 2 3 8 2 5 5 2 3 3 5 2 3 2 2 2 : 2 3 3 3 3 3 3 3 3 3 3 5 5 3 3 2 2 2 2 3 2 2 2 2 2 5 5 8 8 5 5 2 8 3 3 5 5 5 : 2 5 3 3 3 2 5 2 8 2 2 3 5 8 8 2 5 5 2 3 2 2 : 2 5 3 3 3 : 3 5 2 3 2 : 2 2 8 5 5 2 3 5 5 3 5 5 5 5 2 5 ! 8 3 2 5 3 5 2 5 2 5 2 2 5 5 5 3 2 5 5 5 2 5 2 2 5 ! 5 ! 2 5 5 2 5 3 3 2 5 2 5 2 ! 5 | 3 5 2 5 ! 2 2 3 2 : 5 5 52 5 2 5 5 5 3 2 : : 5 2 2 2 5 2 2 5 5 5 3 3 5 5 5 : 2 : 5 5 5 5 S 3 2 2 5 : 2 2 2 : 5 5 5 : 3 5 3 5 5 3 2 8 5 3 i 2 5 : 5 ,2 5 2 2 5 5 5 5 3 2 5 3 2 2 3 3 5 3 2 5 2 5 2 2 5 3 3 2 5 5 5 3 2 2 5 5 5 2 5 2 2 2 5 2 5 2 5 2 5 5 1 2 3 5 2 5 3 5 2 5 5 3 5 3 5 5 : 5 £ £ £ S : : 5 5 £ S 2 £ 2 : 2 : 2 5 5 5 5 : : 2 ! 5 ! 3 3 8 8 3 3 2 3 5 S 2 5 2 5 2 1 2 5 8 S 2 2 8 8 5 5 5 ! I : 5 3 5 3 3 5 2 5 3 5 5 2 2 5 3 5 5 8 5 2 3 5 2 5 5 5 5 5 5 2 3 3 2 2 5 3 5 2 2 : 2 2 5 : 5 5 : 5 2 2 2 3 : 2 2 5 5 5 : 2 5 5 5 5 8 5 5 5 5 5 2 2 5 2 2 8 5 8 2 5 5 3 3 5 5 1 5 5 5 5 2 2 5 : 5 5 2 2 2 3 5 5 5 5 5 3 5 5 2 3 2 2 5 2 5 5 5 5 5 2 2 5 2 5 5 5 5 3 3 2 5 5 2 2 8 : 5 3 : 5 5 5 3 3 8 2 2 3 1 : 5 5 2 2 8 5 2 5 5 5 3 5 2 5 5 5 8 5 : 5 : 5 3 2 : 5 5 : 2 5 5 5 5 : 5 5 5 1 8 3258 5258 25 553 5 5 2 5 5 5 5 5 5 5 5 5 3 2 2 2 5 2 5 2 3 5 5 5 3 5 5 5 3 3 5 2 2 2 2 2 5 5 : 5 2 2 5 5 2 5 2 5 5 5 2 2 5 2 1 2 2 5 5 5 3 5 2 2 5 2 2 5 5 5 5 3 2 B i 2 5 2 5 5 5 2 2 5 3 5 3 5 3 5 5 5 3 5 2 3 5 2 2 8 5 2 3 5 2 2 5 5 5 5 J 5 5 | 2 | 2 5 5 : 5 5 2 2 2 2 | 5 3 5 2 2 | 5 l 5 l 5 5 5 3 2 2 3 5 5 2 2 3 3 : 2 | 5 5 2 5 3 2 2 5 2 5 5 2 | 5 2 | 5 3 5 5 5 5 2 5 5 5 8 5 8 5 5 5 5 5 5 2 3 2 8 5 3 2 5 2 5 2 2 2 255 2555 2555 5 8 5 | 2 2 5 ; 258 2 2 : 3 2 5 5 5 5 2 ! 5 8 2 1 8 2 5 55 58 8 S 552 ! ! 3 ! ! 2 8 3 ! 2 5 ! 3 ! 2 ! 5 5 5 5 5 5 5 5 5 2 5 5 3 5 3 S 3 3 5 2 5 5 5 ! ! 5 2 2 ! 5 I 8 2 5 : 2 2 ! 2 ! ! | ! ! 2 5 2 | 8 ! ! ! ! ! 5 2 5 ! 3 S ! 8 2 2 5 S 2 2 | 2 2 5 ! 5 2 3 2 5 3 ! 5 | ! 5 | 5 5 ! 2 ! 2 2 ! 2 ! ! 2 2 ; ! 5 2 2 ! : ! ! 3 2 ! ! 5 ! 5 5 2 3 5 2 5 5 8 5 5 2 ! 3 8 5 3 2 8 S 3 2 5 5 8 5 2 2 5 3 ; 2 ! ! ! ! 2 3 | 5 ! 2 2 ! £ 5 ! 2 ! ! ! l t l ! 2 2 ! 2 2 2 S 3 8 2 ! 2 3 £ 3 S ! 2 ! ! ! 5 5 8 S 2 8 | 2 3 | ! 5 8 : ! 5 5 5 5 3 ! 2 ! 5 ! : 5 5 5 ; 5 2 : 2 5 2 : 2 ; 225 2 2 2 : ! ! ! : 2 3 2 5 : 2 5 3 : 2 : 5 : 3 2 : 2 ; ; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 19b. Native Hawaiian Si';:"” ; 52253:25 2SS222 2 2 2 3 2 2 23223222 2 2 2 2 2 2 2 2 52222222 fS552S5S 154 2212:252: 53253555528 25285853 222255235 82282255 2558523222 2 2 2 2 2 :5 :2 3 ! 22ISS22S252E 522!2255!22!282225 2 2 5 2 2 2 :8 i3 !!2 2 !!2 2 2 2 ::2 ::!i!!|2 5 2 2 2 3 2 2 2 2 2 2 fS!223SS5S5535522SS5*S5= a i * 5 5 2 ! 3 2 2 ! 5 2 2 2 2 2 2 2 2 5 23332385:552233!2 |3 2 |3 2 8 3 3 !3 :5 !::!2 3 :5 2 :5 2 2 2 8 2 2 8 2 !8 8 ! S322523533523222553335223I23 3222232232223222 22223222222222222 22222322212222222223323222222 212232222 222222222222222222822322222222 22322:53:2 8:53232 55:221255 252 2 2:2222222222222232222212282222 2 ! 3 ! 2 2 ! S ! ! ! !! ! 2 ! 2 2 8 ! 5 :2 2 :2 5 :! 2 ! 222228218282825222222222222822 23222322322(2222321222222222222 52285:522255235 2 2 2 |2 2 l2 22:25335:82 323332332223133 i2 3 l |3 2 232322322 2 5222222322222233825222213223888 2222253215353322322235322552338 25553233223323223222332585225222 223232222252232222225222223522222 82522322225322225 2222255222222525 525822222522552522 3255822325222222232 28:32332323523253822 523522528332328355535535555 25832232523333:232525:22:22 285323522588:22225325585852 55253525285552552232253235388252522852555 552822222255255522222 3 2 5 2 :5 5 2 2 ::2 2 :2 2:22 1583523222:2 i!2 2 3 :2 2 332 2 |2 3 |2 ! ! 2 2! 223535 52(2222885252525822525222225525225525552 5523 22:1:225 25855:52 255255553 2555552 552552 525 222 22 52 22:2 222825:552525222523222252222:2252525222222222252525:2555:55:252 2255253222 5 : 55325!552S2:222252253222525555255S25352258553338228522:: 555 232 5255 225552 288 255:2 22525 552 52:5535 2 3 8 ::3 ; 2 :2 :2 2 5 2 :2 : :3 2 3 :5 2 25 5 2 2555525353322525355535:22525:5:2255222252352:22223:52225: ' 252252:2:32235528533:22255553825322223335552352553 255:5235332323555325352222222222:2:53322355355525 J255S3253535:2 32852 553 222222223 52555 5535538|555 555 5233 222 332352222253852222235322555235235:555:2:2323232253525228 52532222523582558223822222255232252323535255822555223 23235355:28 :2 5 8 !2 5 !5 _ 223:25525:25 23225222222 8 2 :: is ! :2 3 5 535:32222222 _ 2223222:222 ;3 ! 3 : :2 :5 : :2 2 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 19c. Japanese 155 8 9 !!!!5 ! !!!5!!!!555553 sj=i=*s:;;s !55!!559!55553 Zlliiii-.li I5333S9555 3iSI5Ii!!S 5555533555 !!55!!5!9! 5!!553! ! ; ! ! ! ! ! ! ! 9539555399 !55!!555! 33353959 5 5 3 5 5 5 5 5 33555355 55555355 333355533 53335353 33335333 9335335 33333333 3533533 33533355 35533555 3353335! 55355355 33353333 35355555 53353333 35555555 35555535 33535333 3535555555 55355333 3555S3355! 5355553535 3555355555 5553555535 3555353353 55355555 555553535555 5555555555 !!!55555!55! 33S335533333 35555555555! 5353533S3535 5555535S335 335S33533! 5535555555 5335555355 555535555535s 35553355 5355355555 5353333933 35555355 3S553I35335 J533355335 5555553555! 33355555355! 3355553S3335 53353533533 5355555!!! 5335S33535 S355353333 !53!5!!!!!!!5!!335!!!!!53!55!5555355 !!5 5 5 !!!!!!!3 !3 !5 5 !!!!3 !!!!5 5 5 5 !!5 3 ! !!5 !!!3 5 5 5 !3 !s5 !3 !!|!!3 !!5 :!3 5 5 !3 3 5 3 5 595555533353335555535553355535333S5! 333533955355333353S35355555535 5 35!!!!5553!55 !!5 !!!!!3 !!9 !!!5 !5 5 !5 5 933!35355555555!5!5355!!5!!55!5555555 33595555555 55 353!5593!3 33 3 35 555555555 5!5!!!5355555555!5!5!355595!|53555555 5 !5 5 !5 !|5 !3 5 !|!5 !5 5 |!!|!8 P 9 3 5 3 5 5 3 S 5 5 55535335555555535533335S555555355355 ! 5 5 ! S ! S 5 3 i! ! |! 5 |5 ! |3 |S ! ||5 ! 5 ! ! ! ! ! ! 3 5 5 5 !!5 5 !!3 !!5 3 !!!!!3 !!!!5 5 3 !5 |3 5 5 |!5 3333559353555533555333533533335553955:333555555355535s 5 !!3 5 5 !3 9 3 !5 !!!9 !!5 !!5 !5 5 5 5 |!!|S !H 9 |!5 |!!5 !5 8 5 :3 ! _ 55359553535355953555333333S535533355533553355335! 55555539353955335355S333935335955335555533555559555 355!3!9!5595!33!35!3!35553!35!3!5!55335!5533!!555333!3 !!3!!5!5!!9 9 !3 X 5 9 !5 !9 5 5 5 3 i!5 5 !3 !!3 !|5 9 |5 5 H 3 |!!H 3 |!!3 5 5 !5 ! 5535353S3535555353553535553355385S3S3333535S5535535555555! ! ! ! !!! !5 5 ! !!3 5 5 ! :!!5 !5 3 5 ! !!! 5 3 !! !!!! !!! 5 ! 9 ! !!! !!! :!!3 !! 5 5 ! 5S33S5595!5!35!35!3!3353:5555!!5!55!39359!!533!5!5!55 555!555!5!55!!!555!53555535!5335555553SS!555555:5!555555!3 5 ! 5 3 |3 ! |5 :3 !! |9 !5 !!5 5 5 5 !5 ! 5 !:! !!! !|:5 |5 5 i|!|5 5 B 3 !!! 3 3 3 !!: I55335355353535395533555555553533359595535355555553 55553535SS535555335555335555355555355555333355355355 555555 !!!!!3 !3 5 !!5 S !!!!!!S 5 !!!!3 I5 !5 !!I5 !!!3 3 5 !5 !5 ^ |!W 353!!5!!!3!!535955!33!!333!5!!5!33553!9|555!!5!5 5 5 5 3 9 5 5 5 5 5 !3 9 !!5 !5 !!!!!5 5 !3 |3 !|9 !||5 |9 9 B !5 !!5 !5 !3 5 53555555 !!53!!!5!!3!5!35555!55!55555!!55!5!!55!|!5!55559 335553533335595533355535935S333353S5555355535593 53355333S5333555355553555555555555555555 Limtin !3 !!!!3 3 !3 3 ! Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. : ;> S 5 :«:!3333333535 S53335553335 5555555 55555! 555555 3 ! :5 |5 55555! O a T 3 ON 3 b O • ^ Pi 5:5555 555155 !3!3!3!!3!::! :5S S |5335S :53 35533535553: 555355555555555555555555555 5355555353 3 3 |3 ! 53! !!5 3 3 !5 5 ! 5 3 5 3 5 :5 5 :; : 3 : : i : : 3 5 : 3 ! 3 3 ! ! 3 5 5555 5S5535555555555 5553555535355S53 :!5 3 |3 :;5 5 5 5 8 ! !5 5S535555555S5355S555 5555555555 diiim 55355 I S I S ; 553555155555555 5 5 5 ! :f ! p ! 5 5 5 3 5 | 5 5 8 ! 5 |5 |: 8 3 3 : : : | 55355333553553 !5 5 !5 |!3 S 3 5 I3 5 !:!3 5 3 |3 3 3 3 3 S 3 ! 553S5535I33553 55555555555555! 53I5M 33335555355535S 3 5 |j B 5 5 : 5 ! l! 5 |5 5 3 5 ! ! 555555555555535!! 5!5!35335535533335 f ! : 3 : : 3 : 5 5 ; l l : : ! : 8 : ! 5 i3 |! 5 ! 3 5 5 ! ! 5 ! 3 ! 55535355555I5355 555!3:!!S S !2535S 3!5355533!!333353 5535555555 f5 ! ! 3 5 !1 5 :5 5 !!5 :5 53553555! ;!S5553SS25____ !5 3 5 3 |5 5 !!!5 5 5 5 5 5 !5 5 5 5 5 5 5 5 5 !5 !!5 !!5 5 5 5 5 5 5 55555!5!5!55!555555!5!!5!555!5 t8 :3 S ;3 5 ! 5 ;:::::8 :3 5 5 3 3 ::2 ! 5 ! 3 3 ! 5 5 8 5 : i: 2 2 3 ! ;:; 5 2 " J 3 3 5 3 S S I !;5 5 3 !5 5 5 !5 5 ! 5 i ! 3 | 1355555 3 ! 5 |: 5 ! ! ! 5 353553 3 5 S i!!3 !3 !3 5 5 5 5 3 5 !3 5 5 5 3 |3 5 5 3 !3 535555! £ 3 !3 !!!S 5 5 :2 5 !3 3 5 5 3 5 3 S 2 3 3 :::5 5 !!5 !;;3 I3 5 5 ! ! ! 5 :! ! 2 ! 2 5 l! 5 5 ! ! 5 3 2 5 5 ! 5 ! ! ! i! 5 :5 5 ! 5 5 ! ! :5 :5 5 3 I 2 ! :2 : 5 5 8 :S :i: :S 5 ! 5 ! l3 l5 5 S 3 5 ! 3 3 5 :5 2 : 5 5 :! ! :; :! 3 3 2 3 3 5 ! 5 5 :5 :5 ! 8 ! :! ! 2 5 3 ::;::5 5 5 5 i:! 3 3 5 3 :3 !! ! ! ! 3 !5 5 5 5 !3 5 !!:5 !5 3 S 5 :: 555:355 3 2 :3 3 3 3 :: 355 535:5333 53335 323 555353:25 5555 5 : 1 !5 :2 5 :3 8 3 l3 § 3 !5 :3:55 5535535535553 5 5 :5 5 5 5 5 5 5 5 5 5:5555:!55!55:5555! 5355535 5 3 3 535353 355 33335S55553555533555!!! ! 5 ;£ S S S S 5 = g ! E S E S S ^ 35353553IS 3 5 5 ! : |! : 5 : 5 ! : : : 5 ! 3 53535|:555535555i5 353533325355 _ 35!33SSS33!333 3 3 3 5 !5 5 !!:5 :5 :5 5 5 3 ::5 5 :5 5 !5 3 5 5 !!3 5 5 5 !:5 555555555 1 3 5 1:5355 5S3|55 33!!:! 555555:355 !»!>!!!!!’ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 19e. W P t i ssissiiiiiiiiliiii 293223223 22325325522552 E5222232235522;5222:5522255;2 55:::5225|255252552522555555:2222222222 2S2522523225525525525SS55352223253222222 555325|5|2E525252:22252 55522522:555252232:52523522555:222255552:3252:35355552:5225 525552225252522222322252532225532522522252222255225535 255222555553523352595593532255359555555555553525552:5235:25 52E525252:555522!5553253352525225::5::::222555522:225335355 5222232552222222222223:3:222555223555:55:252252552:25:55:25 53525255S:S22522S55S5:35!555:3:55::!::!235525592:2255!:5555 522525552:2525352:255553:5255:5555355555:5:5552555555555525 55252222225222252522 22323225222222252 552322522222:122222 22522222 222522222222 5552:2|S5523 2|2S5:3:2522|2 2:22: 225522:22225 :l|___ 2:5222 5 !2 |5 3 5 :5 2 | ::55:5555525 225252:22 525255335 34H 5525555225 5255 2 2 2 2 2 2 222222235 32555232: 522252325 25:232:55 25355225 22255:2 22:55252 232222225 2225322:2 3|5222252: 2225:55:25 55:2255222 55:2552255 555552553 52235322 255532352 1:32:25:222 12252225225 |2222255352 12525555:5: |5225552552 2225522335 j3253222522 13555222555 3252553555 32|5555555255 225:552:5 33132:2225322 522252532 _ 22:25:522 55:53225233:32 22|255555S52! 22:352332 222353325 222522252 252522352 52225255252:22 2222233222525:2521523522222 555K 52255255 522152555:5525 5:552255332523 52:325525:5225 522555222 55555555: 32H252255322! 555555322:2525 3:11522535:525 2322222:222225 225232512 523555355 5:555255: 22522223 uinillllllHi! Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 1 4 SNPs i n ESR1. D ’ ranges fr o m 0 t o 1 w ith darker squares representing a higher degree o f LD. 78.2%). A total of 11 blocks were identified among Hawaiians (21.9kb, 74.9%) as compared to 13 in Latinos (17.0 kb, 68.5%) and 16 in African-Americans (11.9 kb, 58.9%). 6.4.1c. Haplotype Diversity and htSNP Selection We next enumerated and compared the specific haplotypes observed from genotype data when ESR1 and its flanking regions were parsed into the 13 blocks identified in the combined analysis (Table 22). Within each population, blocks typically display a limited number of common haplotypes (4-8) of frequency greater than 5% which together account for significant percentage of all chromosomes in the sample (>80%). There exist ethnic differences not only in the frequency of haplotypes by ethnic group but in fact there exist ethnic- specific haplotypes not seen within other populations. Nevertheless, each block typically is dominated by ‘shared’ haplotypes - i.e. those which are present in all five populations. These also tend to be those displaying the highest frequency. By determining the extended haplotypes within the 13 ESR1 blocks, we then identified those SNPs which were redundant in contrast to those variants which will be essential in association studies. These are outlined for each ethnic group in Table 23. In order to comprehensively catalogue haplotypes observed among Hawaiians, Japanese, Latinos, and Caucasians a total of 62 htSNPs were Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . T able 22. Common haplotype patterns in each block of ESR1 by ethnicity. Haplotype Frequencies (%) by Ethnicity Block # Haplotype Size (kb) M Combined African- American Hawaiian Japanese Latino Caucasian Block 1 (Markers 1-8) 42421213 12.6 kb 23.3 22.4 25.5 38.5 19.9 12.2 44421213 21.5 13.1 32.4 9.6 28.5 24.3 42423233 17.8 7.7 16.4 8.4 17.6 39.5 32421213 12.7 22.9 (3-3) 13.3 15.5 5.3 32333231 10.3 12.9 7.3 12.0 12.9 (0.7) 32323233 8.5 11.9 8.8 8.4 (4.6) 5.7 32323313 (3.7) (4.0) (1.6) (0) (0) 11.4 % o f Observed Chromosomes (5% threshold) 94.1 90.9 90.4 90.2 94.4 98.4 Block 2 (Markers 9-16) 33233411 20.3 kb 52.7 25.9 71.3 44.6 57.3 67.4 33213233 20.0 26.4 19.3 27.9 7.5 17.1 33433411 16.7 23.4 6.9 25.0 19.3 6.1 33233413 (4.1) 14.9 (0) (0) (2.5) (2.9) 31211433 (2.4) 7.1 (0) (0) (2.1) (2.1) 33213231 (2.2) (0) (0) (0) 9.8 (0.7) % o f Observed Chromosomes (5% threshold) 89.4 97.7 97.5 97.5 93.9 90.6 Block 3 (Markers 17-25) 212142311 14.7 kb 53.5 42.2 62.2 59.3 62.1 41.7 444321123 14.7 5.4 13.7 (<0.1) 21.5 34.1 442341321 11.7 19.1 7.3 20.8 (1.5) 12.2 442321321 7.7 10.5 10.7 10.2 (1.5) (2.3) 212142313 (1-9) 7.1 (0.7) (0) (0.7) (1.1) % o f Observed Chromosomes (5% threshold) 87.6 84.3 90.6 90.3 83.6 88.0 SO Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 22 (continued). HaDlotvDe Freauencies f%l bv Ethnicitv Block # Haplotype Size (kb) AH Combined African- American Hawaiian JaDanese Latino Caucasian Block 4 (Markers 26-30) 33331 28.0 kb 32.5 60.3 19.0 15.4 27.6 38.8 33333 30.4 8.0 45.4 44.7 41.6 14.8 33311 16.4 5.8 15.3 (1.7) 23.6 35.6 31331 7.9 11.5 10.8 10.7 (3.7) (2.2) 23333 5.6 (2.5) (4.8) 17.8 (0) (0) 23331 (4.7) 9.1 (4.0) 9.6 (2.1) (0) 33133 (1.7) (0.7) (0) (0) (0) 7.7 % o f Observed Chromosomes (5% threshold) 92.8 94.7 90.5 98.2 92.8 96.9 Block 5 (Markers 32-44) 4212424312312 34.9 kb 25.0 5.2 34.0 47.1 22.1 15.9 4242312114434 23.9 25.7 18.6 21.4 19.9 32.3 4242312134434 14.6 6.4 13.7 (2.9) 25.0 25.9 2214422312314 12.7 25.7 5.1 15.0 13.6 (4.3) 4212412312414 10.5 17.0 17.6 11.4 5.6 (0) 4212424312314 (4.8) 6.5 (2.1) (1.4) 7.9 6.3 4212412312314 (1.2) 5.9 (0) (0) (0) (0) 2412422312314 (2.1) (1.4) (0) (0) (2.1) 6.4 % o f Observed Chromosomes (5% threshold) 86.7 92.4 89.0 94.9 94.1 86.8 ON O R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 22 (continued). HaDlotvDe Freauencies (% ) bv Ethnicitv B lock # Haplotype Size (kb) am Com bined African- A m erican Haw aiian Japanese Latino Caucasian Block 6 (Markers 45-49) 23122 24.6 kb 43.9 40.0 38.0 27.7 53.2 61.4 41123 29.1 5.7 39.9 50.0 23.5 26.4 21442 15.0 28.6 5.1 14.3 15.6 11.4 21122 8.3 14.8 16.3 6.6 (2.4) (0.8) 21422 (1.5) 5.4 (0.7) (0) (0.8) (0) 21423 (0.3) (0) (0) (1.4) (0) (0) % o f Observed Chromosomes (5% threshold) 96.3 94.5 99.3 98.6 92.3 99.2 Block 7 (Markers 50-54) 32112 12.5 kb 46.2 34.3 41.1 30.7 59.3 65.7 14311 32.5 31.4 44.9 42.9 19.3 24.3 12322 13.5 20.7 7.3 15.0 15.6 8.6 12112 5.9 12.9 (3.4) 7.9 5.0 (0) % o f Observed Chromosomes (5% threshold) 98.1 99.3 93.3 96.5 99.2 98.6 Block 8 (Markers 55-64) 2224211133 18.6 kb 25.9 17.9 20.7 16.4 40.7 32.9 2224211333 20.5 5.7 20.7 15.7 33.6 27.1 2424223133 18.6 39.3 13.8 16.4 7.9 15.7 4242323111 17.8 8.6 27.0 35.7 5.0 13.6 4242223133 6.5 6.4 13.5 6.4 (2.9) (3.6) 2244221133 (4.7) 12.1 (1.5) 5.0 5.0 (0) 4242323133 (3.3) (2.1) (2.3) (0.7) (4.3) 7.1 2244223133 (1.4) 6.5 (0) (0) (0.7) (0) % o f Observed Chromosomes (5% threshold) 89.3 96.5 95.7 95.6 92.2 96.4 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . T albe 22 (continued). HaDlotvDe Frequencies (% ) bv Ethnicity Block # Haplotype Size (kb) M Com bined A frican- Am erican H aw aiian JaDanese Latino Caucasian Block 9 (Markers 65-70) 121432 4.3 kb 50.6 24.2 41.3 37.0 78.6 71.4 313411 18.7 8.6 34.6 35.0 (4.3) 12.6 323431 15.8 34.6 8.1 15.5 8.6 11.6 323132 6.7 20.7 (1.5) 5.7 5.7 (0) 323432 5.2 (0.7) 14.6 6.8 (2.1) (2.7) 323411 (2.3) 11.1 (0) (0) (0.7) (0) % o f Observed Chromosomes (5% threshold) 97.0 99.2 98.6 100.0 92.9 95.6 Block 10 (Markers 71-76) 212314 22.9 kb 53.1 32.5 52.5 43.5 70.2 66.8 434312 14.7 11.8 22.1 23.8 5.1 11.5 232312 11.2 20.2 8.5 10.4 6.8 11.0 432312 10.2 18.0 12.8 11.5 (3.5) (3.3) 412314 5.9 11.7 (3.4) 10.8 (2.3) (1.7) 212334 (2.3) (0) (0.7) (0) 5.1 5.0 % o f Observed Chromosomes (5% threshold) 95.1 94.2 95.9 100.0 87.2 94.3 Block 11 (Markers 77-87) 14123111124 46.7 kb 55.1 32.5 49.9 48.3 74.3 68.4 32311333322 14.7 11.0 19.5 21.7 9.2 11.4 14311333342 8.0 11.3 8.1 10.7 5.0 (4.2) 14323333322 7.7 (0) 16.2 9.1 (4.3) 7.1 14311333322 7.5 25.6 (2.2) (0) 6.4 7.2 12311333322 (2.0) (0) (1.8) 5.4 (0.7) (0) 14311333122 (1.1) 5.5 (0) (0) (0) (0) % o f Observed Chromosomes (5% threshold) 93.0 85.9 93.7 95.2 94.9 94.1 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 22 (continued). H adotvD e Freauencies (% ) bv Ethnicitv B lock # Haplotype Size (kb) a » Com bined A frican- Am erican Haw aiian JaDanese Latino C aucasian Block 12 (Markers 89-100) 312442323123 38.0 kb 56.9 36.3 52.2 52.9 73.6 69.3 334442121323 20.0 27.1 21.7 27.9 12.0 11.4 112444121323 7.2 (0.8) 15.8 8.6 (3.6) 7.1 112224141342 5.0 (2.1) 7.1 10.0 (3.6) (2.1) 112442121323 (2.9) 14.9 (0) (0) (0.7) (0) 112244121323 (3.0) 6.4 (1.6) (0) (0.8) 5.7 % o f Observed Chromosomes (5% threshold) 89.1 84.7 96.8 99.4 85.6 95.6 Block 13 (Markers 105-114) 3122232244 17.1 kb 34.2 45.0 38.5 36.4 29.7 24.3 3123434224 19.2 17.6 29.5 18.6 11.4 17.8 3323424224 12.6 12.5 6.4 11.3 18.1 16.2 3143434224 9.7 (0) 7.3 25.0 11.4 5.0 1122232242 9.4 (4.2) 7.2 (3.6) 10.0 19.2 3323424424 6.3 (2.8) 5.1 (3.7) 9.0 8.1 % o f Observed Chromosomes (5% threshold) 91.4 75.1 94.0 91.3 89.6 90.6 o\ u > R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 23. Haplotype tagging SNPs (htSNPs) capturing the haplotypes o f ESRJ by ethnicity __________(effective m inim um R h - 0.7).___________________________________________________________ htSN Ps bv Ethnicitv Block it Location SNP Reference A frican- H aw aiian Jaoanese Latino C aucasian SNP N um ber (rs) A m erican Block #1 (1-8) 1 5' rs2881776 1 1 1 1 1 2 Promoter rs488133 2 2 2 2 2 3 Promoter rs2071454 3 3 3 3 3 4 Promoter rs867239 4 4 4 4 5 Exon l a rs2077647 5 5 5 5 5 6 Exon l a rs746432 6 6 6 6 6 8 Intron 1 hCVl 1414976 8 8 8 8 Block #2 (9-16) 11 Intron 1 hCVl 1414947 11 11 11 11 11 12 Intron 1 hCV 1987600 12 14 Intron 1 hCV 1987602 14 14 16 Intron 1 hCV3163583 16 16 16 16 16 Block #3 (17-25) 17 Intron 1 hCV3163587 17 17 17 19 Intron 1 hCV 1987609 19 19 19 20 Intron l a pPvuII 20 20 20 20 20 21 Intron l a pXbal 21 21 21 21 21 25 Intron 2 hCV3163594 25 25 Z R b (26) Intron 2 hCV3163595 Block it4 (27-30) 27 Intron 2 hCV3163601 27 27 27 27 29 Intron 2 rsl0 3 3 182 29 29 29 29 30 Intron 2 rs2175898 30 30 30 30 30 Os R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 23 (continued). htSNPs bv Ethnicitv Block# SNP Location SNP Reference N um ber (rs) African- Am erican Haw aiian JaDanese Latino Caucasian Block #5 (31-44) 31 Intron 2 hCVl 576295 31 31 31 31 31 32 Intron 3 hCVl 576303 32 32 32 32 32 34 Intron 3 hCVl 1414864 34 34 35 Intron 3 hCVl 576306 35 35 37 Intron 3 rs2347923 37 37 40 Intron 3 hCV1576311 40 40 40 40 40 41 Intron 3 hCV1576312 41 41 41 41 41 42 Intron 3 hCV1576315 42 44 Intron 3 rs988328 44 44 44 44 44 Block #6 (45-49) 46 Intron 3 hCV50675 46 46 46 47 Intron 3 hCV50676 47 48 Intron 3 hCV50678 48 48 48 48 48 49 Exon 4a rsl8 0 1 132 49 49 49 49 49 fi/ocyfc #7 (50-54) 50 Intron 4 rs3020390 50 50 50 50 50 52 Intron 4 rsl 884051 52 52 52 52 54 Intron 4 rs985192 54 54 54 54 54 5/oc& #8 (55-64) 57 Intron 4 rsl 884049 57 58 Intron 4 rs932479 58 58 58 58 58 59 Intron 4 rsl 884052 59 59 59 59 61 Intron 4 rs3020325 61 61 61 61 61 62 Intron 4 rs3020404 62 62 62 62 62 63 Intron 4 rs2179922 63 63 63 o\ R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 23 (continued). htSNPs by Ethnicity Block # SNP Location SNP Reference N um ber (rs) A frican- A m erican Haw aiian Japanese Latino Caucasian Block #9 (65-70) 66 Intron 4 rs726282 66 67 Intron 4 rs726283 67 67 67 67 67 68 Intron 4 rs728523 68 69 Intron 4 rs932477 69 69 69 70 Intron 4 rs926777 70 70 70 70 Block #10 (71-76) 71 Intron 4 rs2144025 71 71 71 71 71 72 Intron 4 rs722208 72 72 72 72 72 73 Intron 4 rs722209 73 73 73 73 73 Block #11 (77-87) 78 Intron 5 rs2207232 78 78 78 78 78 80 Intron 5 rs926778 80 80 80 83 Intron 5 rs3020432 83 83 83 83 83 85 Intron 5 rs3020364 85 86 Intron 5 rs3020368 86 86 86 86 86 Z R b (88) 88 Intron 5 rsl884152 Block #12 (89-100) 90 Intron 6 rs974276 90 92 Intron 6 rs3020372 92 93 Intron 6 rs2982894 93 94 Intron 6 rs2982896 94 94 94 94 94 96 Intron 6 rs2982899 96 97 Intron 6 rs750686 97 97 97 97 97 99 Intron 6 rs2982900 99 99 os os R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 23 (continued). htSNPs bv Ethnicitv B lock # SNP Location SNP Reference N um ber (rs) A frican- Am erican H aw aiian JaDanese Latino C aucasian Z R b (101-104) 103 Exon 8a G499ul 103 103 103 103 103 104 3' UTRa rsl062577 104 104 104 104 104 Block #13 (105-114) 105 y rs2813543 105 105 105 105 106 y rs2813544 106 106 106 106 106 107 y rs926848 107 107 107 108 y rsl 543403 108 110 y rs2813545 110 110 110 110 110 113 y rs2747652 113 113 113 113 113 114 y rs2813549 114 TOTAL 61 47 49 52 49 a SNPs “forced” into htSNP selection algorithm. b Zone o f recombination as defined in analysis o f all populations combined. 0 \ 168 Table 24. Frequency of haplotype tagging SNPs (htSNPs) capturing the haplotypes of ESR1 by ethnicity (htSNPs in BOLD). HaolotvDe Freauencies (%1 bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) African-American Block 1 42421213 22.4 4242123 24.1 (Markers 1-8) 44421213 13.1 4442123 13.0 42423233 7.7 4242323 8.1 32421213 22.9 3242123 22.8 32333231 12.9 3233321 12.9 Effective Rh 2 32323233 11.9 3232323 14.4 0.8040 Block 2 33233411 25.9 2341 26.0 (Markers 9-16) 33213233 26.4 2123 26.5 33433411 23.4 4341 23.2 33233413 14.9 2343 14.7 Effective Rh 31211433 7.1 2143 7.1 0.9608 Block 3 212142311 42.2 22141 42.7 (Markers 17-25) 444321123 5.4 44323 7.1 442341321 19.1 42341 21.9 442321321 10.5 42321 12.9 Effective Rh 2 212142313 7.1 22143 7.3 0.7551 Block 4 3331 69.4 331 68.4 (Markers 27-30) 3333 8.0 333 12.3 3311 5.8 311 6.7 Effective Rh 1331 11.5 131 12.5 0.9345 Block 5 4212424312312 5.2 4122122 5.8 (Markers 32-44) 4242312114434 25.7 4421144 25.6 4242312134434 6.4 4421344 6.4 2214422312314 25.7 2142124 26.2 4212412312414 17.0 4412112 23.4 4212424312314 6.5 4122124 6.4 Effective Rh 4212412312314 5.9 - 0.8063 Block 6 23122 40.0 3122 40.0 (Markers 45-49) 41123 5.7 1123 8.8 21442 28.6 1442 28.6 21122 14.8 1122 14.7 Effective Rh 21422 5.4 1422 5.4 0.6970 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 169 Table 24 (continued). HaDlotvDe Freauencies (%) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) African-American (con’t) Block 7 32112 34.3 3212 34.3 (Markers 50-54) 14311 31.4 1431 31.1 12322 20.7 1232 21.2 Effective Rh2 12112 12.9 1212 12.4 0.9961 Block 8 2224211133 17.9 242113 19.2 (Markers 55-64) 2224211333 5.7 242133 3.8 2424223133 39.3 242313 40.0 4242323111 8.6 423311 7.9 4242223133 6.4 422313 6.7 2244221133 12.1 442113 11.5 Effective Rh2 2244223133 6.5 442313 6.8 0.8388 Block 9 121432 24.2 2132 23.7 (Markers 65-70) 313411 8.6 1311 8.6 323431 34.6 2331 35.6 323132 20.7 2332 20.9 Effective R ,2 323411 11.1 2311 11.3 0.9546 Block 10 212314 32.5 2124 32.4 (Markers 71-76) 434312 11.8 4342 11.7 232312 20.2 2322 22.0 432312 18.0 4322 20.6 Effective R ,2 412314 11.7 4124 11.9 0.8214 Block 11 14123111124 32.5 421124 34.0 (Markers 77-87) 32311333322 11.0 213322 13.7 14311333342 11.3 413342 11.6 14311333322 25.6 413322 24.0 Effective Rh 14311333122 5.5 413122 5.6 0.7405 Block 12 312442323123 36.3 144223 38.5 (Markers 89-100) 334442121323 27.1 344221 28.4 112442121323 14.9 144221 14.5 Effective Rh 2 112244121323 6.4 124421 6.4 0.8896 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 170 Table 24 (continued). Haplotype Frequencies (% ) bv Ethnicitv Ethnicitv Haplotype Frequency htSNP Frequency Block______________________________(% ]___________ H aplotype (% ) A frican-Am erican (con’t) Block 13 (Markers 105-114) Effective Rh2 3122232244 3123434224 3323424224 45.0 17.6 12.5 31234 31332 33322 44.0 20.7 16.0 0.7596 Hawaiian Block 1 42421213 25.5 4242123 25.1 (Markers 1-8) 44421213 32.4 4442123 32.4 42423233 16.4 4242323 16.7 32333231 7.3 3233321 7.3 Effective Rh2 32323233 8.8 3232323 11.6 0.7795 Block 2 33233411 71.3 21 71.0 (Markers 9-16) 33213233 19.3 23 19.6 Effective Rh 2 33433411 6.9 41 8.0 0.8128 Block 3 212142311 62.2 141 64.6 (Markers 17-25) 444321123 13.7 323 15.2 442341321 7.3 341 8.5 Effective R,2 442321321 10.7 321 10.8 0.8068 Block 4 3331 19.0 331 18.9 (Markers 27-30) 3333 45.4 333 45.4 3311 15.3 311 15.3 Effective Rh 2 1331 10.8 131 11.6 0.9287 Block 5 4212424312312 34.0 42122 36.8 (Markers 32-44) 4242312114434 18.6 42144 20.3 4242312134434 13.7 42344 15.2 2214422312314 5.1 24124 5.8 Effective Rh 2 4212412312414 17.6 42124 21.9 0.7507 Block 6 23122 38.0 322 38.0 (Markers 45-49) 41123 39.9 123 39.9 Effective R,2 21122 16.3 122 17.1 0.9447 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 171 Table 24 (continued). Haplotype Frequencies (%) bv Ethnicitv Ethnicitv Haplotype Frequency htSNP Frequency Block____________________________ (%)___________ Haplotype_______(%) Hawaiian (con’t) Block 7 32112 41.1 3212 40.9 (Markers 50-54) 14311 44.9 1431 44.8 12322 7.3 1232 7.5 Effective Rh2 0.9958 Block 8 2224211133 20.7 4113 22.1 (Markers 55-64) 2224211333 20.7 4133 20.6 2424223133 13.8 4313 13.8 4242323111 27.0 2311 27.5 4242223133 13.5 2313 16.0 Effective Rh2 0.9149 Block 9 121432 41.3 212 42.1 (Markers 65-70) 313411 34.6 131 34.6 323431 8.1 231 8.2 323432 14.6 232 15.2 Effective Rh 2 0.8941 Block 10 212314 52.5 2124 53.2 (Markers 71-76) 434312 22.1 4342 22.1 232312 8.5 2322 8.5 432312 12.8 4322 12.8 Effective R/,2 0.9601 Block 11 14123111124 49.9 412 50.5 (Markers 77-87) 32311333322 19.5 232 21.9 14311333342 8.1 434 8.1 14323333322 16.2 432 18.6 Effective R ,2 0.8935 Block 12 (Markers 89-100) 334442121323 21.7 44212 21.9 112444121323 15.8 44412 16.3 112224141342 7.1 22414 7.1 Effective Rh 2 0.8935 Block 13 3122232244 38.5 312344 38.4 (Markers 105-114) 3123434224 29.5 312324 31.7 3323424224 6.4 332224 11.6 3143434224 7.3 314324 7.4 1122232242 7.2 112342 8.7 3323424424 5.1 312342 2.2 Effective Rh 2 0.8652 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 172 Table 24 (continued). HaDlotvDe Freauencies (%) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) Japanese Block 1 42421213 38.5 4242123 38.9 (Markers 1-8) 44421213 9.6 4442123 9.7 42423233 8.4 4242323 7.9 32421213 13.3 3242123 12.9 32333231 12.0 3233321 18.1 Effective Rh 2 32323233 8.4 3232323 12.4 0.9966 Block 2 33233411 44.6 21 46.4 (Markers 9-16) 33213233 27.9 23 28.6 Effective Rh 2 33433411 25.0 41 25.0 0.9292 Block 3 212142311 59.3 2141 60.0 (Markers 17-25) 442341321 20.8 2341 24.3 Effective Rh 2 442321321 10.2 2321 15.0 0.8427 Block 4 3331 15.4 31 16.8 (Markers 27-30) 3333 44.7 33 45.0 1331 10.7 11 10.7 3333 17.8 33 17.5 Effective Rh 3331 9.6 31 10.0 0.9154 Block 5 4212424312312 47.1 412122 47.0 (Markers 32-44) 4242312114434 21.4 442144 21.4 2214422312314 15.0 214124 15.0 Effective Rh 4212412312414 11.4 412124 13.7 0.9441 Block 6 23122 27.7 322 27.7 (Markers 45-49) 41123 50.0 123 51.4 21442 14.3 142 14.3 Effective Rh 2 21122 6.6 122 6.6 0.9444 Block 7 32112 30.7 3212 30.7 (Markers 50-54) 14311 42.9 1431 42.8 12322 15.0 1232 15.0 Effective Rh 2 12112 7.9 1212 7.9 0.9596 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 173 Table 24 (continued). HaDlotvDe Freauencies (%) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) JaDanese (con’t) Block 8 2224211133 16.4 42113 22.1 (Markers 55-64) 2224211333 15.7 42133 15.6 2424223133 16.4 42313 16.6 4242323111 35.7 23311 35.7 4242223133 6.4 22313 6.2 Effective Rh2 2244221133 5.0 - 0.8149 Block 9 121432 37.0 2142 36.3 (Markers 65-70) 313411 35.0 1341 34.9 323431 15.5 2341 15.4 323132 5.7 2312 5.7 Effective Rh 2 323432 6.8 2342 7.7 0.9989 Block 10 212314 43.5 2124 43.5 (Markers 71-76) 434312 23.8 2124 23.8 232312 10.4 2322 10.4 432312 11.5 4322 11.5 Effective Rh 2 412314 10.8 4124 10.8 0.9017 Block 11 14123111124 48.3 412 50.0 (Markers 77-87) 32311333322 21.7 232 27.9 14311333342 10.7 434 10.7 14323333322 9.1 432 11.4 Effective Rh 2 12311333322 5.4 - 0.9017 Block 12 312442323123 52.9 44232 52.9 (Markers 89-100) 334442121323 27.9 44212 27.9 112444121323 8.6 44412 8.6 Effective R/,2 112224141342 10.0 22414 9.9 0.9259 Block 13 3122232244 36.4 1234 40.7 (Markers 105-114) 3123434224 18.6 1232 18.6 3323424224 11.3 3222 15.0 Effective R ^ 3143434224 25.0 1234 25.0 0.7816 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 174 Table 24 (continued). HaDlotvDe Freauencies (%1 bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) Latino Block 1 42421213 19.9 4242123 20.5 (Markers 1-8) 44421213 28.5 4442123 28.5 42423233 17.6 4242323 16.1 32421213 15.5 3242123 16.7 32333231 12.9 3233321 12.9 Effective Rh 2 0.9999 Block 2 33233411 57.3 21 68.6 (Markers 9-16) 33213233 7.5 41 12.1 33433411 19.3 41 19.3 33213231 9.8 - - Effective Rh 0.8646 Block 3 212142311 62.1 2141 62.1 (Markers 17-25) 444321123 21.5 4323 24.0 Effective Rh 0.8983 Block 4 3331 27.6 31 31.1 (Markers 27-30) 3333 41.6 33 41.3 3311 23.6 11 25.2 Effective Rh 0.8442 Block 5 4212424312312 22.1 412122 22.1 (Markers 32-44) 4242312114434 19.9 442144 20.1 4242312134434 25.0 442344 25.0 2214422312314 13.6 214124 13.0 4212412312414 5.6 412124 15.2 4212424312314 7.9 - - Effective Rh 0.8602 Block 6 23122 53.2 322 53.7 (Markers 45-49) 41123 23.5 123 26.4 21442 15.6 142 16.7 Effective Rh 0.8567 Block 7 32112 59.3 3212 59.3 (Markers 50-54) 14311 19.3 1431 19.3 12322 15.6 1232 16.4 Effective Rh 0.9377 Block 8 2224211133 40.7 42113 45.7 (Markers 55-64) 2224211333 33.6 42133 33.6 2424223133 7.9 42313 8.6 4242323111 5.0 23311 5.0 2244221133 5.0 - - Effective Rh 0.6918 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 175 Table 24 (continued). Haplotype Frequencies (% ) by Ethnicitv Ethnicitv Haplotype Frequency htSNP Frequency Block_____________________________ (% )___________ Haplotype_______ (% ) Latino (con’t) Block 9 121432 78.6 21432 79.2 (Markers 65-70) 323431 8.6 23431 8.6 323132 5.7 23132 5.7 Effective Rh 2 0.9907 Block 10 212314 70.2 2124 75.3 (Markers 71-76) 434312 5.1 4342 5.3 232312 6.8 2322 9.1 212334 5.1 - - Effective Rh 2 0.7706 Block 11 14123111124 74.3 412 74.3 (Markers 77-87) 32311333322 9.2 232 9.6 14311333342 5.0 434 5.0 14311333322 6.4 432 11.1 Effective Rh 2 0.9594 Block 12 312442323123 73.6 4423 76.1 (Markers 89-100) 334442121323 12.0 4421 14.5 Effective Rh 2 0.8746 Block 13 3122232244 29.7 31234 30.8 (Markers 105-114) 3123434224 11.4 31232 17.1 3323424224 18.1 33222 28.6 3143434224 11.4 31432 11.4 1122232242 10.0 11234 10.4 3323424424 9.0 - - Effective Rh 2 0.8914 Caucasian Block 1 42421213 12.2 42412 11.6 (Markers 1-8) 44421213 24.3 44412 23.2 42423233 39.5 42432 41.6 32421213 5.3 32412 6.2 32323233 5.7 32332 7.0 32323313 11.4 32333 10.9 Effective Rh 2 0.8611 Block 2 33233411 67.4 21 71.2 (Markers 9-16) 33213233 17.1 23 22.3 33433411 6.1 41 6.5 Effective Rh 0.8904 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 176 Table 24 (continued). HaDlotvDe Freauencies (% ) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP H aplotype Frequency (% ) Caucasian (con’t) Block 3 212142311 41.7 2141 41.6 (Markers 17-25) 444321123 34.1 4323 33.7 Effective Rh 2 442341321 12.2 4341 14.6 0.8824 Block 4 3331 38.8 3331 38.6 (Markers 27-30) 3333 14.8 3333 22.7 3311 35.6 3311 36.5 Effective Rh 2 3133 7.7 " 0.9824 Block 5 4212424312312 15.9 412122 17.1 (Markers 32-44) 4242312114434 32.3 442144 32.3 4242312134434 25.9 442344 26.7 4212424312314 6.3 412124 8.6 Effective Rh 2412422312314 6.4 212124 6.4 0.9367 Block 6 23122 61.4 22 62.7 (Markers 45-49) 41123 26.4 23 26.4 Effective Rh 21442 11.4 42 10.9 0.8332 Block 7 32112 65.7 212 65.7 (Markers 50-54) 14311 24.3 431 24.8 Effective Rh 12322 8.6 232 8.6 0.9653 Block 8 2224211133 32.9 42113 32.9 (Markers 55-64) 2224211333 27.1 42133 27.1 2424223133 15.7 42313 15.7 4242323111 13.6 23311 13.6 Effective Rh 4242323133 7.1 23313 7.1 0.9398 Block 9 121432 71.4 22 74.0 (Markers 65-70) 313411 12.6 21 13.2 Effective Rh 323431 11.6 11 11.9 0.8625 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 177 Table 24 (continued). HaDlotvDe Freauencies (%) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) Caucasian (con’t) Block 10 212314 66.8 2124 71.8 (Markers 71-76) 434312 11.5 4342 11.5 232312 11.0 2322 10.9 Effective Rh 212334 5.0 - 0.7824 Block 11 14123111124 68.4 412 69.2 (Markers 77-87) 32311333322 11.4 232 11.4 14323333322 7.1 - - Effective Rh2 14311333322 7.2 432 14.4 0.7921 Block 12 312442323123 69.3 4423 69.3 (Markers 89-100) 334442121323 11.4 4421 12.9 Effective Rh 112444121323 7.1 4441 7.9 0.8068 Block 13 3122232244 24.3 3134 28.1 (Markers 105-114) 3123434224 17.8 3132 27.7 3323424224 16.2 3322 23.5 3143434224 5.0 - - 1122232242 19.2 1134 19.8 Effective Rh 3323424424 8.1 “ 0.7041 required. To additionally capture rarer haplotypes, observed only among African-Americans at appreciable frequency, an additional 7 htSNPs were required. Table 24 compares the extended haplotypes within each block of ESR1 and their estimates as determined by ethnic-specific htSNPs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 178 6.4.2. ESR2 6.4.2a. Genotyping Results and Allele Frequencies We selected and assayed total of 66 variants spanning ESR2 and its flanking regions. O f these, 12 loci (18.2%) were monomorphic, 6 assays (9.1%) were not in Hardy-Weinberg equilibrium (p<0.01) and an additional 15 (22.7%) failed genotyping (>30% failures) outright. Lastly, a single variant (1.5%) was rare (<2%) in all five ethnic groups considered (Figure 20). As a result, a total of 32 SNPs (48.5% of the total considered) were retained; these were spaced with an average density of 1 SNP per 2.5kb. Gene architecture and placement of the 66 SNPs considered are presented in Figure 21. The analysis of patterns of LD and haplotype diversity covers a 77.5kb span of chromosome 14 and encompasses all 9 exons of ESR2, an additional 7.2kb upstream from the translation start site, and 9.1 kb downstream from the 3’ UTR termination. Details on location and genotyping percentage for each of the 32 retained SNPs are presented in Table 25. Allele frequencies for each retained polymorphic SNP varied across the five ethnic groups (Table 26). The majority of SNPs retained were located in introns (n=22 of 32; 68.7%) or outside of the gene itself (n=8; 25.0%). There was a single variant in an exon (3.1%) and another in the UTR (3.1 %). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 179 15 Low Genotyping % 12 Monomorphic 32 SNPs 4 Retained Figure 20. ESR2 SNP assay results: 32 SNPs were retained for LD and haplotype analysis. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 180 -29636 -11972 -2575 328 3929 11285 11737 13907 14207 14899 ' " 1 — 2a&21 J Y ~ 1 . . . J________ L . 1 t f ' f f T ' T m m m m (AA Chg.) N.B.: Exon 2- alternative start site m n 15953 17188 17777 20339 25449 27051 31722 33613 36997 39639 - I I i j L . L . .. I l_ _ X -J 44681 48452 51907 55938 59210 60036 61262 61629 +4767 +9070 J . . . _L _ J L J l_ f ~ r m ^ j. \ . i ' v monomorphic I m f t rare = retained Figure 21. Intronic and exonic organization of ESR2 with SNP assay results denoted. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 181 Table 25. ESR2 SNP location, distance between sites and genotyping percentage. SNP SNP Reference N um ber (rs) Distance from Translation S tart Site Distance between adjacent SNPs Location within gene Golden Path Contig ‘ Genotyping Failure % 1 rs3020450 -7224 5' 62529900 0% 2 rs 1271572 -839 6385 5' 62523515 0% 3 rsl256030 13908 14747 Intron 2 62508768 0.5% 4 rs 1256031 14899 991 Intron 3 62507777 0.9% 5 rsl256033 15679 780 Intron 3 62506997 5.2% 6 rs960069 16076 397 Intron 3 62506600 0% 7 rs 1269056 17188 1112 Intron 3 62505488 0% 8 rs 1256036 17746 558 Intron 3 62504930 19.5% 9 rsl256037 17777 31 Intron 3 62504899 2.0% 10 rs 125603 8 20339 2562 Intron 3 62502337 8.3% 11 rs 1256040 22684 2345 Intron 3 62499992 0% 12 rs 1256041 23049 365 Intron 3 62499627 2.6% 13 rs 1256043 26796 3747 Intron 4 62495880 0.3% 14 rs 1256049 37027 10231 Exon 6 62485649 7.2% 15 rs 1256053 44123 7096 Intron 6 62478553 0.3% 16 rsl256056 48452 4329 Intron 7 62474224 0.9% 17 rs953592 50329 1877 Intron 7 62472347 0.3% 18 rsl256059 50661 332 Intron 7 62472015 0% 19 rs944461 55938 5277 Intron 7 62466738 1.7% 20 rs 1256062 57760 1822 Intron 7 62464916 0% 21 rs944045 59632 1872 Intron 8 62463044 0.9% 22 rs867443 60036 404 Intron 8 62462640 6.0% 23 rs 1256064 60339 303 Intron 8 62462337 1.4% 24 rs944046 60779 440 Intron 8 62461897 4.3% 25 rs944047 60786 7 Intron 8 62461890 0% 26 G1730A 61262 476 3'U TR 62461414 0.6% 27 rs944459 61720 458 3' 62460956 0.9% 28 rsl256065 62146 426 3' 62460530 0.3% 29 rs 1152579 65991 3845 3' 62456685 2.0% 30 rs7229 68253 2262 3' 62454423 2.0% 31 rs 1152582 68448 195 3' 62454228 1.7% 32 rs 1152586 70294 1846 3' 62452382 4.0% “Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. The Human Genome Browser at UCSC. Genome Research, 12: 996-1006, 2002. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 182 Table 26. ESR2 minor allele frequencies for the 32 SNPs in each of five ethnic groups. Allele Freauencies bv Ethnicitv SNP Site” African- Hawaiian JaDanese Latino Caucasian American 1 rs3020450 0.39 0.19 0.24 0.26 0.28 2 rs 1271572 0.12 0.46 0.44 0.41 0.47 3 rsl256030 0.30 0.44 0.42 0.42 0.48 4 rs 1256031 0.37 0.46 0.50 0.43 0.49 5 rs 1256033 0.36 0.47 0.49 0.43 0.49 6 rs960069 0.14 0.43 0.42 0.41 0.46 7 rs 1269056 0.15 0.43 0.42 0.41 0.46 8 rsl256036 0.15 0.45 0.41 0.38 0.46 9 rsl256037 0.15 0.43 0.41 0.41 0.46 10 rs 125603 8 0.14 0.42 0.42 0.40 0.45 11 rs 1256040 0.37 0.46 0.50 0.44 0.48 12 rs 1256041 0.21 0.48 0.46 0.47 0.49 13 rs 1256043 0.14 0.44 0.42 0.41 0.46 14 rs 1256049 0.07 0.24 0.18 0.03 0.07 15 rs 1256053 0.05 0.24 0.19 0.04 0.07 16 rsl256056 0.14 0.44 0.42 0.41 0.46 17 rs953592 0.04 0.24 0.19 0.04 0.06 18 rs 1256059 0.14 0.43 0.41 0.41 0.46 19 rs944461 0.04 0.24 0.19 0.04 0.06 20 rs 1256062 0.38 0.33 0.29 0.10 0.14 21 rs944045 0.37 0.08 0.09 0.05 0.06 22 rs867443 0.05 0.12 0.15 0.15 0.20 23 rs 1256064 0.46 0.34 0.36 0.22 0.14 24 rs944046 0.14 0.44 0.41 0.45 0.45 25 rs944047 0.14 0.44 0.41 0.44 0.46 26 G1730A 0.28 0.16 0.16 0.28 0.36 27 rs944459 0.01 0.18 0.15 0.01 0.02 28 rs 1256065 0.14 0.43 0.40 0.43 0.46 29 rs 1152579 0.14 0.43 0.41 0.44 0.43 30 rs7229 0.14 0.45 0.39 0.43 0.46 31 rsl 152582 0.13 0.45 0.39 0.42 0.44 32 rs 1152586 0.14 0.40 0.39 0.42 0.43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 183 6.4.2b. Linkage Disequilibrium and Haplotype Block Structure o f ESR2 The LD patterns as defined by the 32 biallelic markers, and measured by the D’ statistic; when markers with a high frequency (>10%, Figure 22a-e) are considered, which are more likely to be ancient and thus inform on haplotypes maximally disrupted by recombination, there is reveled on inspection what appears to be an extended domain of high degree LD across the entire ESR2 locus. This pattern is largely retained for each of the ethnic groups and, further, similar LD profiles are obtained using higher minor allele frequency cutpoints (0.15 and 0.25; data not shown). We sought to identify those regions and the proportion of ESR2 inherited without significant historical recombination and, thus, contained within discrete haplotype blocks. When data from all ethnic groups were combined, two blocks were identified - the first spanning markers 1-19 (63.1 kb) and the second extending from markers 20-32 (12.5 kb). Among Hawaiian, Japanese and Latino individuals studied, a single block of 71.1 kb was observed comprising 91.8% of the total sequence considered. Two blocks with distinct boundaries were noted among Caucasians (75.7 kb of sequence in blocks, 97.7% of total sequence) and African-Americans (56.0 kb, 72.2%). The block structure identified among Caucasians was identical to that observed in the aggregate dataset of all individuals combined. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 184 Figure 22a. African-American 0.6 8 0.0 2.6 10 2.3 16 0.12 18 0.19 20 0.17 21 0.05 0.7 23 0.4 24 0.0 23 0.20 24 0.20 25 0.19 26 0.30 28 0.19 3.8 29 2.3 29 0.12 30 0 12 31 0.26 32 0.12 Figure 22b. Native Hawaiian Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 185 Figure 22c. Japanese Figure 22d. Latino Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 186 Figure 22e. Caucasian 1 6.4 Figure 22. D’ statistic plots for each ethnicity (Figures 22a-e). D’ measured as D ’ = D/ |D|m ax , in a pairwise fashion for LD across 32 SNPs in ESR2. D’ ranges from 0 to 1 with darker squares representing a higher degree of LD. 6.4.2c. Haplotype Diversity and htSNP Selection To determine the frequencies of haplotypes, the ESR2 locus was decomposed into the two blocks assuggested by the combined analysis described above of all ethnicities; haplotypes are enumerated in Table 27. Within each population, the blocks have seven or fewer common haplotypes (>5%) which together account for significant percentage of all chromosomes in the sample (>80%). There are distinct ethnic differences in the frequency of haplotypes by ethnic group with some populations displaying “private” haplotypes that are rarely, if ever, seen in other ethnic groups. This was most commonly and dramatically observed among African-Americans. For instance, one haplotype Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 187 in Block #1 (3313142413233212134) was seen among 9.9% of African- American chromosomes and not again among any of the ethnic groups with the exception of a single chromosome among Caucasians. By determining the extended haplotypes within the two ESR2 blocks, we then identified those SNPs which were redundant in contrast to those variants which will be essential in association studies. These are outlined for each ethnic group in Table 28; of the 18 htSNP ‘superset’, 2 are exclusive to African- Americans and 2 are exclusive to Latinos. A total of 10 htSNPs are shared among all ethnicities combined. Table 29 compares the extended haplotypes within each block of ESR2 and their estimates as determined by ethnic-specific htSNPs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 27. Common haplotype patterns in each block o f ESR2 by ethnicity. HaDlotvDe Freauencies f% f bv Ethnicitv Block # Haplotype Size (kb) M Combined A frican- A m erican Hawaiian Japanese Latino C aucasian Block 1 (Markers 1-19) 63.1 kb 3413124231211214114 35.7 10.7 42.7 40.7 38.6 45.7 1331342413433212134 26.1 35.7 18.1 23.6 24.3 27.9 3331342413433422332 11.3 (4.3) 23.9 18.5 (3.6) 6.4 3331342413433212134 10.7 20.7 5.8 (3.3) 12.8 12.9 3333142413233212134 (3.4) 5.4 (1.5) 5.5 (3.9) (0) 3333142413213212134 (2.8) (0.9) (0.7) (2.4) 9.7 (0.7) 3313142413233212134 (2.1) 9.9 (0) (0) (0) (0.7) % o f Observed Chromosomes (5% threshold) 82.4 82.4 90.6 88.3 85.4 92.9 Block 2 (Markers 20-32) 12.5 kb 4131333221331 35.2 12.9 41.3 36.4 41.4 44.3 4111111213123 13.1 5.3 12.1 15.7 13.1 19.6 2333113213123 12.0 32.4 7.7 9.3 (4.3) 6.4 4131111213123 10.7 21.9 (2.7) (0) 12.4 14.7 2133113413123 7.3 (0) 18.1 14.3 (1.4) (1.4) 4131113213123 6.4 9.4 5.2 6.4 6.5 5.0 4133113213123 5.9 6.8 (1.8) 7.1 13.0 (0.7) 4333113213123 (1.1) (4.6) (0) (0) (0.7) (0) 2133113213123 (3.9) (2.6) 5.2 5.0 (2.6) (4.3) % o f Observed Chromosomes (5% threshold) 90.6 88.7 89.6 94.2 86.4 90.0 00 00 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n Table 28. Haplotype tagging SNPs (htSNPs) capturing the haplotypes of ESR2 by ethnicity. htSNPs bv Ethnicity B lo c k # SNP Location SNP Reference N um ber (rs) A frican- A m erican Haw aiian JaDanese Latino C aucasian Block #1 (1-19) 1 5' ESR2 rs3020450 1 1 1 1 1 2 5' ESR2 rsl271572 2 2 2 2 2 3 Intron 2 ESR2 rsl256030 3 4 Intron 3 ESR2 rsl256031 4 4 4 12 Intron 3 ESR2 rs 1256041 12 12 12 12 12 13 Intron 4 ESR2 rs 1256043 13 14a Exon 6 ESR2 rsl256049 14 14 14 14 14 15 Intron 6 ESR2_rs 1256053 15 15 15 15 15 Block #2 (20-32) 20 Intron 7 ESR2 rs 1256062 20 20 20 20 20 21 Intron 8 ESR2 rs944045 21 21 21 21 21 22 Intron 8 ESR2 rs867443 22 22 23 Intron 8 ESR2 rsl256064 23 23 23 23 23 24 Intron 8 ESR2 rs944046 24 26a 3'U T R ESR2 G1730A 26 26 26 26 26 27 3' ESR2 rs944459 27 27 27 27 27 28 3' ESR2 rs 1256065 28 28 28 28 28 31 3' ESR2 rsl 152582 31 32 3' ESR2_rsl 152586 32 32 32 32 TOTAL 16 12 13 16 11 a SNPs “forced” into htSNP selection algorithm 00 so 190 Table 29. Haplotype tagging SNPs (htSNPs) capturing the haplotypes of ESR2 by ethnicity (htSNPs in BOLD). Haplotvpe Freauencies (%) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP Haplotype Frequency (% ) African-American Block #1 3413124231211214114 10.7 3413121 11.4 (Markers 1-19) 1331342413433212134 35.7 1331321 35.7 3331342413433212134 20.7 3331321 20.7 3333142413233212134 5.4 3333321 5.4 Effective Rh2 3313142413233212134 9.9 3313321 9.9 0.7780 Block #2 4131333221331 12.9 413132231 12.9 (Markers 20-32) 4111111213123 5.3 411112123 5.3 2333113213123 32.4 233332123 32.4 4131111213123 21.9 413112123 22.6 4131113213123 9.4 413132123 8.6 Effective Rh2 4133113213123 6.8 413332123 6.8 0.8852 Hawaiian Block #1 3413124231211214114 42.7 34121 43.4 (Markers 1-19) 1331342413433212134 18.1 13321 18.8 3331342413433422332 23.9 33342 23.9 Effective Rh 2 3331342413433212134 5.8 33321 7.2 0.9709 Block #2 4131333221331 41.3 4113221 42.0 (Markers 20-32) 4111111213123 12.1 4111213 16.5 2333113213123 7.7 2333213 7.7 2133113413123 18.1 2133413 18.1 Effective Rh 2 2133113213123 5.2 2133213 6.1 0.8473 Japanese Block #1 3413124231211214114 40.7 343121 42.1 (Markers 1-19) 1331342413433212134 23.6 131321 23.6 3331342413433422332 18.5 331342 18.5 3333142413233212134 5.5 333321 5.5 Effective Rh 2 0.9427 Block #2 4131333221331 36.4 4113221 37.1 (Markers 20-32) 4111111213123 15.7 4111213 16.2 2333113213123 9.3 2333213 9.3 2133113413123 14.3 2133413 14.3 4131113213123 6.4 4113213 6.6 4133113213123 7.1 4133213 7.1 2133113213123 5.0 2133213 5.0 Effective R ,2 0.9982 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 191 Table 29 (continued). HaDlotvDe Freauencies <%) bv Ethnicitv Ethnicitv Block Haplotype Frequency (% ) htSNP H aplotype Frequency (% ) Latino Block #1 3413124231211214114 38.6 3431121 39.3 (Markers 1-19) 1331342413433212134 24.3 1313321 24.3 3331342413433212134 12.8 3313321 12.8 3333142413213212134 9.7 3331321 10.3 Effective Rh 2 0.9120 Block #2 4131333221331 41.4 413133221 42.1 (Markers 20-32) 4111111213123 13.1 411111213 13.0 4131111213123 12.4 413111213 12.4 4131113213123 6.5 413113213 5.1 4133113213123 13.0 413313213 15.9 Effective Rh 2 0.9709 Caucasian Block #1 3413124231211214114 45.7 34121 45.6 (Markers 1-19) 1331342413433212134 27.9 13321 27.9 3331342413433422332 6.4 33342 6.4 3331342413433212134 12.9 33321 13.5 Effective Rh 2 0.9976 Block #2 4131333221331 44.3 4131322 45.7 (Markers 20-32) 4111111213123 19.6 4111121 19.6 2333113213123 6.4 2333321 6.4 4131111213123 14.7 4131121 14.7 4131113213123 5.0 4131321 5.1 Effective Rf,2 0.9997 6.5. Discussion These analyses considered two chromosomal regions, the first encompassing the ESR1 locus (chromosome 6q25.1) and the second including ESR2 (long arm of chromosome 14). The pattern and extent of linkage disequilibrium varied by ethnicity somewhat but each locus considered could Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 192 be parsed into a pattern of distinct blocks ranging in size from 5 to 75 kb in a manner consistent with previous studies (Daly et al. 2001; Jeffreys et al. 2001). There was a prevailing tendency for African-Americans, and less so Latinos, to exhibit the least LD of all populations - a phenomenon noted in other studies (Gabriel et al. 2002; Reich et al. 2001a). This observation likely is a consequence of the age of the population thus leading to the accumulation of forces eroding LD - namely recombination and to a lesser extent mutation. Said another way, the greater extend of long range LD among the other groups (Caucasian, Japanese, and Native Hawaiians) reflects the genealogic history of each population, perhaps through a recent, strong population bottleneck (Ingman et al. 2000; Tishkoff et al. 1996; Tishkoff and Williams 2002). This variation provides evidence that haplotypes and LD should be characterized for each ethnic population. Harnessing these differences, and in particular the smaller haplotype blocks seen among populations of African populations, will allow fine mapping of disease susceptibility alleles carrying with it significant potential in identifying risk alleles influencing common disease. What is notable is that not only are there a limited number of haplotypes shared among populations but that these shared haplotypes themselves account for a very high percentage of the total chromosomes observed. This pattern has been previously described across regions of high LD, with few haplotypes accounting for 80-95% of all observed chromosomes Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 193 (Bonnen et al. 2000; Daly et al. 2001). However, this current investigation provides evidence that this description of haplotype diversity holds true generally in a multiethnic population as well. When considering that under a free recombination model that 2n haplotypes are theoretically possible, the small number of observed common haplotypes becomes all the more remarkable. Having tested a sufficiently large collection of SNPs in and around these two loci, we were able to define all common haplotypes underlying the LD block structure using an extension of the E-M algorithm (Excoffier and Slatkin 1995; Stram 2003). With this structure in hand, we then formalized an approach to capturing haplotype diversity in a reduced reference set of SNPs to be carried forward to subsequent genotyping studies. These haplotype tag SNPs (htSNPs) faithfully recapitulate the haplotypes of a region o f LD. By eliminating those variants which are redundant, we then identified a set essential to association studies. It is important to note that a number of subsets of markers capture the full haplotype information and, as such, that set which comprises the ‘best’ choice remains open to debate. However, we have defined one such set as a function of the residual haplotype diversity, and record the coefficient of determination (Rh) which defines the proportion of diversity explained by the set of htSNPs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 19 4 This proposed reference set permits the common variation in ESR1 and ESR2 to be tested for association with disease. A rich geography of phenotypes has already been studied in relation to ER variants (reviewed in Chapter 5). This specific htSNP map as well as a larger haplotype map of the human genome currently under construction will allow effective evaluation of the CDCV hypothesis. Although these investigations may not directly discover the casual variant, it does focus subsequent function studies to a region of interest. And, finally, characterizing haplotypes in multiethnic populations carry with them a significant advantage that cannot be attained in traditional single variant / disease association studies. Namely, ‘trans-racial mapping’ (Todd et al. 1989)utilizing the different LD patterns seen across ethnic groups may sharpen the focus on which individual variants carry with them the greatest prior probability of being an etiologic determinant of common diseases. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 195 CHAPTER 7 ASSOCIATION BETWEEN HAPLOTYPES OF ESTROGEN RECEPTORS ALPHA (ESR1) AND BETA (ESR2) AND BREAST CANCER RISK IN A PILOT CASE-CONTROL STUDY: THE MULTIETHNIC COHORT 7.1. Summary Haplotype-based association studies of candidate genes offer a more comprehensive approach than has been previously achievable for association studies of complex disease. Genes encoding isoforms of the estrogen receptor (ESRJ and ESR2) are promising candidate loci in the pathogenesis of breast cancer because estrogen receptor not only has a biologic role in normal breast cell development and function but is also a drug target and key prognostic indicator for hormonal therapy in breast cancer. Having identified the underlying haplotype structure, extent of linkage disequilibrium (LD) and a reference set of haplotype tagging SNPs (htSNPs) across ESR1 and ESR2 in a companion project (Chapter 6), we then tested whether common variation in ESR1 and ESR2 modifies the risk of breast cancer. The relationship between haplotypic variation in these genes and breast cancer risk was evaluated in a pilot case-control study among 854 African-American, Latina, Japanese, and Caucasian women participating in the Multiethnic Cohort Study. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.2. Introduction 196 Haplotype-based studies offer a more comprehensive approach than has been previously achievable for association studies of complex disease (Collins et al. 1999; Jorde 2000; Kruglyak 1999; Ott 2000; Reich et al. 2001a). Distinguishing individual haplotypes and their relative frequency from genotypes observed among unrelated individuals, the relationships between all alleles in a region can be more clearly defined. This permits the common variation in a gene to be tested exhaustively for association with disease or other quantitative trait locus (QTL). For common haplotypes (defined herein as greater than 5% frequency), attainable sample sizes would be sufficiently powered to detect susceptibility variants of even modest relative risks - on the order of 1.5-fold (Johnson et al. 2001). By extension, should such studies fail to detect an association, other mechanisms of disease (e.g. multiple rare variants coalescing so as to result in the same phenotype) are implicated and, thus, would warrant further consideration. Genes encoding isoforms of the estrogen receptor (ESR1 and ESR2) are promising candidate loci in the pathogenesis of breast cancer because estrogen receptor not only has a biologic role in normal breast cell development and function but is also a drug target and key prognostic indicator for hormonal therapy in breast cancer. Further, lifetime exposure to estrogen has been shown to modify a woman’s risk of breast cancer. Established reproductive Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 197 history milestones such as early menarche, late menopause and later pregnancy act to increase overall risk whereas early pregnancy, lactation and premenopausal obesity function as protective factors (Pike 1983). The effects of estrogens are mediated by their intracellular receptors of which there are two subtypes - estrogen receptor a (ESR1) (Green et al. 1986) and the more recently discovered estrogen receptor p (ESR2) (Kuiper et al. 1996). Estrogen receptor sequence variants have been directly implicated in breast cancer risk (Henderson et al. 1988). ESR1 polymorphisms in particular have been associated not only with overall breast cancer risk (Anderson et al. 1994; Schubert et al. 1999) but also age at onset (Pari et al. 1989). Given that multiple SNPs had been genotyped for each of these candidate genes, we sought to perform a sensitivity analysis as to how to most efficaciously use these data to test association with breast cancer. We began by a traditional analysis in which the role of each SNP was considered individually. However, it has been shown that the haplotype can contain more information than each SNP taken separately (Verpillat et al. 2001) and as such we also test haplotypes for association with disease. Several pitfalls are inherent in a SNP by SNP approach. O f primary concern is that true associations may be missed because of incomplete information provided by individual SNPs. Further, negative results do not rule out an association and positive results do not indicate the discovery of a causal SNP for it may, in Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 198 fact, be in LD with the true casual SNP potentially some distance away (Daly etal. 2001). 7.3. Methods 7.3.1. Multiethnic Cohort Study Population This nested case-control study is part of a large, ongoing, multiethnic cohort study in Hawaii and Los Angeles, California with an emphasis on diet and other lifestyle characteristics in the etiology of cancer. Aspects of this large cohort as well as details of its design and implementation are described more fully elsewhere (Kolonel et al. 2000). Briefly, participants were recruited between 1993 and 1996 from driver’s license files in Hawaii and California; the age range at baseline was between 45-75 years. The focus was on four main ethnic groups - African-Americans, Japanese-Americans, Latinos/Latinas and Caucasians. The total number of male and female subjects who comprised the cohort was 215,251. Among women only, baseline data were collected on 22,251 African-Americans, 29,957 Japanese, 26,502 Caucasians, and 24,620 Latinas. Participants have completed a baseline questionnaire designed for self-administration which included five sections: (i) Background, including medical history and family cancer history; (ii) Diet history; (iii) Medication use; (iv) Physical activity; (v) Female reproductive Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 9 9 history, including menstruation history, parity, age at first full term pregnancy, oral contraceptive use, age at menopause and the use of hormones. Eligible cases were women enrolled in the cohort and diagnosed between 1993 and 1998 with a new primary, histologically-confirmed breast cancer identified by linkage of the cohort to population-based cancer Surveillence, Epidemiology and End Results (SEER) registries in Hawaii and California. Cases were contacted by letter and phone call and agreed to provide a blood specimen. The participation rate for providing a blood sample on request was 74% for cancer cases. Women with carcinoma in situ (non infiltrating pathology) and neoplasms of the skin of the breast were not included as breast cancer cases. Information on stage of disease was ascertained from SEER Registries and used in subgroup analyses. Stage of disease was characterized as “localized” or “high stage” - which included regional (by direct extension and/or lymph node involvement) or systemic. For this particular effort, a nested case-control study was designed with the intention of comprehensively analyzing the role of ER variants in a multi ethnic, population-based sample. Given that the prevalence of ER variants was hypothesized to be elevated among breast cancer cases, approximately 100 cases from each ethnic group were initially selected (n=428) and women diagnosed with high stage disease were over-sampled (n=222) as compared to cases who initially presented with localized disease (n=206). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 200 In this effort, we selected approximately 100 controls for each of the four ethnic groups (n=426); the participation rate for cohort controls was 66%. Only controls with no previous diagnosis of breast cancer upon were included. The study was approved by the Institutional Review Board of the Keck School of Medicine of the University of Southern California. 7.3.2. ESR1 and ESR2 htSNP Evaluation The underlying haplotype structure and the extent of linkage disequilibrium (LD) across ESR1 and ESR2 is described in a companion project (Chapter 6). In brief, we genotyped markers in a multiethnic sample of unrelated individuals and estimated haplotype frequencies using an estimation- maximization (E-M) algorithm. We quantified LD between all pairs of biallele loci using the absolute unsigned value of Lewontin’s D’ (Lewontin 1964). In summary, analysis revealed that within both regions there was evidence of discrete haplotype ‘blocks’ of varying size containing only a few common (>5%) haplotypes. We drafted a set of “haplotype tagging SNPs” (htSNPs) from this larger set o f biallelic loci taking advantage of the fact that LD and haplotype diversity within the region can be effectively captured by such a smaller subset of markers (Johnson et al. 2001). A program written in Fortran and explained more fully in Stram et al. (Stram 2003), searches all subsets with no upper Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 201 bound to size, and records the coefficient of determination (Rh2) defining the proportion of diversity explained by a set of htSNPs. We had previously evaluated a number of these candidate htSNPs used to construct estrogen receptor LD in a pilot case/control study. Some of these variants had been described previously (Castagnoli et al. 1987; Feng et al. 2001; Roodi et al. 1995; Rosenkranz et al. 1998; Schubert et al. 1999; Yaich et al. 1992; Zuppan et al. 1989) and others were chosen from variants discovered by the SNP Consortium (TSC) project (Sachidanandam et al. 2001) as well as publicly available SNPs with multiple submitters (http://www.ncbi.nlm.nih.gov/SNP/index.html). It is important to note, that at this stage we had not employed a formal approach to identify that subset of markers best capturing the full haplotype information. Nevertheless, using these assayed markers, we were able to retrospectively calculate the proportion of diversity ‘explained’ by these candidate htSNPs. Effective Rh2 within each of the haplotype blocks is presented as a measure of how completely diversity in given block was captured. 7.3.3. Genotyping Genomic DNA was purified from the buffy coats of peripheral blood samples for all cases and controls using the Quiagen DNA Blood Kit protocol (Quiagen Inc.; Hilden, Germany). SNP genotyping was performed by primer Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 202 extension of multiplex products and detection by MALDI-TOF mass spectroscopy (Tang et al. 1999). Multiplex PCR was performed using 0.1 units of Hot Star Taq (Quiagen Inc), 2.5 ng of genomic DNA, 2.5 pmol of each PCR primer, and 2.5 pmol of dNTP. Thermocycling was at 95°C for 15 minutes, followed by 45 cycles of 95°C for 20 seconds, 56°C for 30 seconds, and 72°C for 30 seconds. Unincorporated dNTPs were inactivated using 0.3U of Shrimp Alkaline Phosphatase (Roche Diagnostics; Pleasanton, CA) followed by primer extension using 5.4 pmol of each primer extension probe, 50 pmol of the associated dNTP/ddNTP combination, and 0.5 units of Thermosequenase (Sequenom Corporation; San Diego, CA). Reactions were cycled at 94°C for 2 minutes followed by 40 cycles at 94°C for 5 seconds, 50°C for 5 seconds, and , and 72°C for 30 seconds. Subsequently, a cation exchange resin was added to remove residual salt from the reaction and 7 nanoliters of the purified primer extension reaction was loaded onto a matrix pad (3-hydroxypicoloinic acid) of a SpectroCHIP (Sequenom Corporation). SpectroCHIPs were analyzed using a Bruker biflex III MALDI-TOF mass spectrometer (Bruker Daltronics; Bremen, Germany) and SpectroREADER (Sequenom Corporation). Spectra were processed using SpectroTYPER (Sequenom Corporation). Included in the experimental design was a 10% sample of intercalated repeated samples. Genotype calls were compared between repeats and, if not Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 203 100% concordant, the SNP was removed from further analysis. This phenomenon observed only in the context of violations of Hardy-Weinberg equilibrium (p<0.01) or when a high proportion of calls demonstrated uniformly heterozygous genotypes, indicative of a poorly performing genotype assay. 7.3.4. Statistical Analysis: Tests of Haplotype Association with Phenotype Given that multiple SNPs had been genotyped on each of these candidate genes, we sought to perform a sensitivity analysis as to how to most effectively use these data to test association. We began by a traditional analysis in which the role of each SNP was considered individually. Given that each association test is nearly independent, a correction for multiple comparisons was used to define the a level of significance. Testing n loci with k alleles is equivalent to performing approximately n(k-l) independent tests. For ESR1 this amounts to 114 such tests; 32 for ESR2. Hence, the overall significance levels to achieve an overall false positive rate of 5% are 0.0004 and 0.0016 respectively We next considered the association of each haplotype with breast cancer, all cases combined and high stage only, using a variety of methods. In the first, we ‘phased’ the haplotype frequency estimates from the E-M procedure. In this approach, frequencies were scanned and marked so that Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 204 individual haplotype frequencies within a given record were assigned a value of 0, 1 or 2 based on their relative frequencies. This procedure was automated using a Visual Basic program [Appendix 1.3]. It is important to note that such an approach fixes each observation to a single haplotype and thus it is a method which would permit the evaluation of gene*gene and gene*environment interactions. Next, we constructed haplotype-specific risk estimates using direct, probabilistic estimates from the E-M algorithm as the independent variable. And, lastly, variables were constructed to predict the risk associated with one or more haplotype copy, lnd(Sf,(H) > 1 (dominant effect), and two haplotype copies, Ind(<5/,(ff) = 2 (recessive effect). This alternative analyses included producing a haplotype prediction value containing E{8h{H,) | G,} for all subjects having genotype data Gh for use as the independent variables in a logistic analysis of disease status upon “haplotype dosage” Data management, descriptive and univariate analyses were performed using the SAS statistical software Version 8.01 (SAS Institute; Cary, NC). The EpiLog software system (EpiCenter Software; Pasadena, CA) was used to estimate odds ratios and 95% CIs by unconditional logistic regression while adjusting for age (disease onset for cases and entry into the cohort for controls) and ethnicity. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 205 7.4. Results 7.4.1. Overview We characterized the prevalence and distribution of ESR1 and ESR2 variants in a case-control study of 854 (428 cases and 426 controls) African- America, Latina, Japanese, and Caucasian women aged 45 years and above participating in the Multiethnic Cohort Study. Associations between established reproductive breast cancer risk factors and breast cancer risk were generally consistent with expectation in all ethnic groups among cases and controls (Table 30). For instance, cases more often reported a family history of breast cancer (18.0% of cases versus 9.9% of controls; p=0.01) and tended to have a later first full term pregnancy (after age 30: 11.6% versus 6.4%; Pheterogeneity — 0.04). 7.4.2. ESR1 7.4.2a. SNP by SNP Genotyping Results and Association with Breast Cancer Previously, we had formally identified 13 ‘blocks’, those regions of ESR1 that are inherited without significant historical recombination, by analyzing LD patterns among unrelated individuals in a multiethnic panel (Chapter 6). This current investigation considered a total of 37 markers of which 35 were contained Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 206 Table 30. Descriptive statistics of subjects stratified by case or control status: Total observations with percent in parentheses. V ariable Cases (n=428) Controls (n=426) TOTAL (n=854) p-value" Age (in years)b Less than 55 85 (19.9) 107 (25.1) 192 (22.5) 5 5 -6 4 146 (34.1) 136 (31.9) 282 (33.0) 65 and above 197 (46.0) 183 (48.2) 197 (51.8) 0.1839 Ethnicity African-American 117 (27.3) 117 (27.5) 234 (27.4) Japanese 100 (23.4) 100 (23.5) 200 (23.4) Latina 101 (23.6) 99 (23.2) 200 (23.4) White 110 (25.7) 110 (25.8) 220 (25.8) 0.9995 Family History: Breast Cancer Reported 77 (18.0) 42 (9.9) 119 (13.9) 0.0090 Any <50 years 33 (7.7) 17 (4.0) 50 (5.9) 0.0015 Family History: Ovarian Cancer 19 (4.4) 21 (4.9) 40 (4.7) 0.8594 Age at First Menstrual Period <13 years 229 (54.1) 211 (49.9) 440 (52.0) 13 years or older 194 (45.9) 212 (50.1) 406 (48.0) 0.2420 Age of First Full Term Pregnancy (of those pregnant) <20 years 108 (29.3) 129 (34.6) 237 (32.1) 21-25 years 129 (34.9) 141 (37.8) 270 (36.5) 26-30 years 87 (23.6) 78 (20.9) 162 (21.9) 31 years or older a. ___ i < < . j t .i___ 2 . . 45 (12.2) 25 (6.7) 70 (9.5) 0.0356 p-value calculated by the % test for heterogeneity (categorical variables) comparing cases to controls. b Age defined as age at blood draw among controls and age at diagnosis among cases. within these 13 blocks and two fell within zones of recombination. We compared the predicted haplotype frequency for each block obtained from the multiethnic panel to that observed when cases and controls are combined in this pilot sample (Table 31). Further, the effective Rh attained within each of the haplotype blocks is presented as a measure of how well overall haplotype Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 207 diversity is represented by the reduced set of SNPs under investigation. This value varied as a function of the density of coverage - for Block #6, no markers were assayed whereas for in Block #9 the minimum of the Rh value over the common haplotypes was 0.8625, strong evidence of adequate coverage of all observed haplotypes. Table 31. Predicted and observed haplotype frequencies in each block of ESR1 for all ethnicities combined (markers assayed in bold). Minimum of the Rh over the common haplotypes listed for each block. HaDlotVDe Freauencies (%) Block (Markers) Size (kb) Predicted Assayed Observed Haplotype Freauencv HaDlotVDe Freauencv Block 1 (Markers 1-8) 12.6 kb 42421213 23.3 2412 34.4 44421213 21.5 4412 17.4 42423233 17.8 2432 18.2 32421213 12.7 - - 32333231 10.3 2332 23.1 32323233 8.5 - - 32323313 (3.7) (3.6) Effective Block Rh 0.2513 Block 2 (Markers 9-16) 20.3 kb 33233411 52.7 3 98.0 33213233 20.0 33433411 16.7 33233413 (4.1) 31211433 (2.4) 33213231 (2.2) - - 1 (1.9) Effective Block Rh 0.0024 Block 3 (Markers 17-25) 14.7 kb 212142311 53.5 14 54.0 444321123 14.7 32 27.1 442341321 11.7 34 17.9 442321321 7.7 - - 212142313 (1.9) Effective Block Rh 0.2153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 208 Table 31 (continued). HaDlotVDe Freauencies (%) Block (Markers) Size (kb) Predicted Assaved Observed Haplotype Freauencv HaDlotVDe Freauencv Block 4 (Markers 26-30) 28.0 kb 33331 32.5 31 68.6 33333 30.4 33 28.7 33311 16.4 - - 31331 7.9 - - 23333 5.6 - - 23331 (4.7) 33133 (1.7) 13 (2.3) Effective Block Rh 0.0531 Block 5 (Markers 32-44) 34.9 kb 4212424312312 25.0 42 18.2 4242312114434 23.9 24 75.0 4242312134434 14.6 - - 2214422312314 12.7 - - 4212412312414 10.5 * - - 4212424312314 (4.8) 44 6.6 4212412312314 (1.2) 2412422312314 (2.1) Effective Block Rh 0.0537 Block 6 (Markers 45-49) 24.6 kb 23122 43.9 No SNPs 41123 29.1 21442 15.0 21122 8.3 21422 (1.5) 21423 (0.3) Effective Block Rh - Block 7 (Markers 50-54) 12.5 kb 32112 46.2 2 76.2 14311 32.5 1 23.8 12322 13.5 - - 12112 5.9 - - Effective Block Rh 0.0325 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 209 Table 31 (continued). HaDlotVDe Freauencies (% ) Block (M arkers) Size (kb) Predicted Assayed Observed H aplotype Freauencv HaDlotVDe Freauencv Block 8 (Markers 55-64) 18.6 kb 2224211133 25.9 2224213 44.7 2224211333 20.5 - - 2424223133 18.6 2424223 19.5 4242323111 17.8 4242321 12.3 4242223133 6.5 4242223 7.0 2244221133 (4.7) 2244223 8.8 4242323133 (3.3) 2244223133 (1.4) Effective Block Rh 0.2983 Block 9 (Markers 65-70) 4.3 kb 121432 50.6 1432 47.3 313411 18.7 3411 16.2 323431 15.8 3431 19.7 323132 6.7 3132 9.6 323432 5.2 3432 5.8 323411 (2.3) - - Effective Block Rh 0.8625-f Block 10 (Markers 71- 76) 22.9 kb 212314 53.1 124 61.5 434312 14.7 342 11.8 232312 11.2 322 24.4 432312 10.2 - - 412314 5.9 - - 212334 (2.3) - - Effective Block Rh 0.0393 Block 11 (Markers 77- 87) 46.7 kb 14123111124 55.1 13 59.0 32311333322 14.7 31 14.9 14311333342 8.0 11 20.7 14323333322 7.7 - - 14311333322 7.5 - - 12311333322 (2.0) - - 14311333122 (1.1) - - Effective Block Rh 0.0456 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 210 Table 31 (continued). HaDlotVDe Freauencies f%f Block (Markers) Size (kb) Predicted Assayed Observed Haplotype Freauencv HaDlotVDe Freauencv Block 12 (Markers 89- 100) 38.0 kb 312442323123 56.9 31231 55.2 334442121323 20.0 33413 19.3 112444121323 7.2 11213 17.4 112224141342 5.0 - - 112442121323 (2.9) - - 112244121323 (3.0) - - Effective Block Rh 0.3209 Block 13 (Markers 105- 114) 17.1 kb 3122232244 34.2 22 54.9 3123434224 19.2 34 40.2 3323424224 12.6 - - 3143434224 9.7 - - 1122232242 9.4 - - 3323424424 6.3 - - Effective Block Rh » « • • r. 2 . 1- „ -r 0.0654 f Minimum Rh exceeding 0.7, the threshold outlined in Chapter 6 as having effectively captured the extent o f common haplotype diversity in a given block. The prevalence of ESR1 variants varied widely by ethnicity across each of the 13 blocks and the one zone of recombination (Table 32). Two of the variants (pPVuII and pXBal) are previously described restriction fragment length polymorphisms (RFLPs) and are present in nearly equal frequencies among high stage cases, low stage cases and controls (Table 33). Most, but not all, of the other variants did not appear to be over-represented among breast cancer cases (Table 33). Two SNPs in Block #1 (rs488133 and rs2071454) both upstream of the transcription start site were positively associated with high stage breast cancer in the homozygous variant state (OR: 2.20; 95% Cl Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 211 1.03,4.68 and OR: 1.76; 95% Cl 1.07,2.89). Additionally, variants in two SNPs (rs988328 and rs985694) displayed evidence of a protective effect on high stage breast cancer. For one of these (rs988328), the p-value test for trend of each additional was p=0.0405. 7.4.2b. ‘Phased’ Haplotype Results and Association with Breast Cancer Having defined common haplotypes (greater than 5% population frequency for all ethnicities combined) within each block, we constructed raw, unadjusted risk estimates for each haplotype using ‘phased’ data. In this approach, a direct counting mechanism tallied the number of haplotype observations falling within each risk category (high stage cases, low stage cases and controls). These raw counts along with the odds ratio for each haplotype count (i.e. 0, 1 or 2 copies of the haplotype) with high stage breast cancer are presented in Table 34. In Block #1, individuals carrying two copies of the h2332 haplotype were over-represented among high stage breast cancer as compared to controls (OR 1.96; 95% Cl 1.13, 3.39). This haplotype is notable insofar as it is the only common haplotype in which the variant ‘G’ genotype of rs2071454 is presented; as noted below (Table 33) this SNP alone in the homozygous was significantly positively associated with high stage breast cancer. We also noted in the SNP-specific analysis that the T/T carriers of rs488133 were Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 32. Ethnic-specific distribution of 37 variants in the ESR1 gene among breast cancer cases and controls (n=854). E thnicitv All E thnicities A frican-A m erican L atina JaD anese C aucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B LO C K Geno Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols SNP Rs* type # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) B LO C K #1 rs488133 CC 263 (68.3) 261 (70.2) 81 (73.6) 81 (75.0) 55 (60.4) 62 (66.7) 70 (77.8) 67 (81.7) 57 (60.6) 51 (57.3) (-3603) CT 97 (25.2) 98 (26.3) 22 (20.0) 25 (23.1) 33 (36.3) 27 (29.0) 14(15.6) 14(17.1) 28 (29.8) 32 (36.0) TT 25 (6.5) 13 (3.5) 7 (6.4) 2(1 .9 ) 3 (3.3) 4(4.3) 6 (6.7) 1 (1.2) 9 (9.6) 6 (6.7) rs2071454 TT 216(55.2) 201 (53.7) 51 (44.3) 47 (44.3) 58 (61.1) 61 (64.2) 38 (43.7) 35 (43.2) 69 (73.4) 58 (63.0) (-2223) GT 111 (28.4) 132 (35.3) 43 (37.4) 40 (37.7) 27 (28.4) 28 (29.5) 26 (29.9) 35 (43.2) 15(16.0) 29(31.5) GG 64 (16.4) 41 (11.0) 21 (18.3) 19(17.9) 10(10.5) 6 (6.3) 23 (26.4) 11 (13.6) 10(10.6) 5 (5.4) rs2077647 AA 141 (35.3) 118(29.6) 39 (33.9) 30 (27.8) 34 (35.4) 31 (32.3) 39 (42.4) 28 (30.8) 29 (30.2) 29 (27.9) (30) AG 157(39.3) 198 (49.6) 46 (40.0) 55 (50.9) 41 (42.7) 42 (43.7) 33 (35.9) 47(51.7) 37 (38.5) 54 (51.9) Exon l b GG 101 (25.3) 83 (20.8) 30(26.1) 23 (21.3) 21 (21.9) 23 (24.0) 20(21.7) 16(17.6) 30 (31.3) 21 (20.2) rs746432 CC 321 (93.0) 307 (91.6) 87 (91.6) 99 (95.2) 86 (96.6) 82 (87.8) 71 (97.3) 74 (100) 63 (80.8) 76 (89.4) (261) CG 22 (6.4) 28 (8.4) 8 (8.4) 5 (4.8) 3 (3.4) 10(12.2) 2 (2.7) 0 (0 ) 15 (19.2) 7 (8.2) GG 2 (0.6) 0 (0 ) 0(0) 0 (0 ) 0 (0 ) 0 (0 ) 0 (0 ) 0 (0 ) 0 (0 ) 2 (2.3) BLOCK #2 rs576330 GG 370 (96.3) 364 (97.3) 109 (97.3) 98 (93.3) 85 (97.7) 87 (98.9) 85 (94.4) 83 (97.7) 91 (95.8) 96 (100) (3128) AG 10 (2.6) 8(2.1) 2(1.8) 5 (4.8) 1(1.1) 1(1.1) 4 (4.4) 2 (2.3) 3 (3.2) 0 (0 ) AA 4(1.0) 2 (0.5) 1 (0.9) 2(1.9) 1(1.1) 0(0 ) 1(1.1) 0 (0 ) 1(1.1) 0 (0 ) BLOCK #3 pPvuII AA 133 (31.8) 127 (31.1) 30 (26.3) 25(21.7) 35 (35.0) 35 (36.5) 30 (30.9) 32 (34.0) 38 (35.5) 35 (34.0) (34288) AG 192(45.9) 200 (49.0) 52 (45.6) 60 (52.2) 54 (54.0) 49 (51.0) 41 (42.3) 47 (50.0) 45 (42.1) 44 (42.7) GG 93 (22.3) 81 (19.9) 32 (28.1) 30(26.1) 11 (11.0) 12(12.5) 26 (26.8) 15(16.0) 24 (22.4) 24 (23.3) pXBal TT 220 (53.5) 202 (50.4) 58(51.3) 50 (44.3) 53 (54.1) 47 (49.0) 62 (64.6) 61 (66.3) 47 (45.2) 44 (44.0) (34334) CT 158(38.4) 165 (41.1) 48 (42.5) 54 (47.8) 36 (36.7) 42 (43.7) 30(31.3) 26 (28.3) 44 (42.3) 43 (43.0) CC 33 (8.0) 34 (8.5) 7 (6.2) 9 (8.0) 9 (9.2) 7 (7.3) 4 (4.2) 5 (5.4) 13 (12.5) 13 (13.0) 212 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 32 (continued). E thnicitv All E thnicities A frican-A m erican L atina Japanese C aucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B L O C K SNP Rs* Geno type Cases # (% )... C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) Controls # (% ) Cases # (% ) C ontrols # (% ) pX Bal (34334) TT CT CC 220 (53.5) 158(38.4) 33 (8.0) 202 (50.4) 165 (41.1) 34 (8.5) 58 (51.3) 48 (42.5) 7 (6.2) 50 (44.3) 54 (47.8) 9 (8.0) 53 (54.1) 36 (36.7) 9 (9.2) 47 (49.0) 42 (43.7) 7 (7.3) 62 (64.6) 30(31.3) 4 (4.2) 61 (66.3) 26 (28.3) 5 (5.4) 47 (45.2) 44 (42.3) 13(12.5) 44 (44.0) 43 (43.0) 13(13.0) BLOCK #4 rsl033181 (65774) GG AG AA 309 (94.2) 19 (5.8) 0 (0 ) 306 (95.6) 13(4.1) 1 (0.3) 93 (97.9) 2(2.1) 0 (0 ) 93 (95.9) 4(4.1) 0 (0 ) 76 (97.4) 2 (2.6) 0 (0 ) 81 (95.3) 4 (4.7) 0 (0 ) 71 (100) 0 (0 ) 0 (0 ) 67 (97.1) 2 (2.9) 0 (0 ) 69 (82.1) 15(17.9) 0 (0 ) 65 (94.2) 3 (4.3) 1(1.5) rs2175898 (67905) AA AG GG 202 (54.6) 125 (33.8) 43(11.6) 191 (51.2) 116(31.1) 66(17.7) 75 (68.2) 26 (23.6) 9 (8.2) 73 (69.5) 19(18.1) 13 (12.4) 40 (47.6) 35 (41.7) 9 (10.7) 35 (38.9) 37(41.1) 18 (20.0) 37(44.1) 30 (35.7) 17 (20.2) 30(34.5) 36(41.4) 21 (24.1) 50 (54.3) 34 (37.0) 8 (8.8) 53 (58.2) 24 (26.4) 14(15.4) BLOCK #5 rsl514347 (100398) CC CT TT 218(61.2) 109 (30.6) 29 (8.1) 190(55.6) 126 (36.8) 26 (7.6) 85 (78.7) 17(15.7) 6 (5.6) 82 (79.6) 18(17.5) 3 (2.9) 52 (59.1) 26 (29.5) 10(11.4) 42 (47.7) 35 (39.8) 11 (12.5) 34 (44.7) 35 (36.1) 7 (9.2) 27 (36.5) 39 (52.7) 8(10.8) 47 (55.9) 31 (36.9) 6(7.1) 39 (50.7) 34 (44.2) 4 (5.2) rs988328 (112103) TT CT CC 261 (70.7) 94 (25.5) 14(3.8) 232 (64.8) 112(31.3) 14 (3.9) 82 (80.4) 18(17.7) 2 (2.0) 77 (74.0) 22(21.1) 5 (4.8) 63 (68.5) 23 (25.0) 6 (6.5) 58 (64.4) 29 (32.2) 3 (3.3) 55 (67.1) 24 (29.3) 3 (3.7) 44 (54.3) 32 (39.5) 5 (6.2) 61 (65.6) 29(31.2) 3 (3.2) 53 (63.9) 29 (34.9) 1 (1.2) BLOCK #6 No SNPs BLOCK #7 rs985192 (154413) CC AC AA 239 (63.5) 105 (26.8) 38 (9.7) 244 (61.5) 112(28.2) 41 (10.3) 66 (61.7) 30 (28.0) 11 (10.3) 69 (63.9) 28 (25.9) 11 (10.2) 61 (68.5) 19(21.3) 9(10.1) 61 (67.0) 22 (24.2) 8 (8.8) 67 (69.8) 20 (20.8) 9 (9.4) 50(53.8) 31 (33.3) 12(12.9) 55 (55.0) 36 (36.0) 9 (9.0) 64 (61.0) 31 (29.5) 10(9.5) to U> R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 32 (continued). Ethnicitv All Ethnicities A frican-A m erican L atina JaDanese C aucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n= l 10) B LO C K Geno Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols SNP Rs* type # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) BLOCK #8 rs985694 CC 237(61.4) 218(57.4) 74 (65.5) 65 (59.6) 59 (68.6) 50(58.1) 49 (56.3) 46 (52.3) 55 (55.0) 57 (58.8) (157578) CT 120 (31.1) 128 (33.7) 32 (28.3) 40 (36.7) 24 (27.9) 27(31.4) 26 (29.9) 29 (32.9) 38 (38.0) 32 (33.0) TT 29 (7.5) 34 (8.9) 7 (6.2) 4 (3.7) 3 (3.5) 9(10.5) 12(13.8) 13(14.8) 7 (7.0) 8 (8.3) rs985695 CC 236 (64.8) 242 (67.8) 46 (40.7) 43 (41.3) 64 (77.1) 75 (80.7) 58 (73.4) 62 (80.5) 68 (76.4) 62 (74.7) (157658) CT 104(28.6) 94 (26.3) 52 (46.0) 45 (43.3) 18(21.7) 17(18.3) 16(20.30 12(15.6) 18(20.2) 20 (24.1) TT 24 (6.6) 21 (5.9) 15(13.3) 16(15.4) 1 (1.2) 1(1.1) 5 (6.3) 3 (3.9) 3 (3.4) 1(1.2) rsl884049 CC 170 (47.1) 174(48.7) 36(33.3) 41 (39.8) 59 (64.8) 66 (72.5) 25 (34.3) 18 (22.5) 50 (56.2) 49 (59.0) (158320) CT 146 (40.4) 142 (39.8) 57 (52.8) 52 (50.5) 26 (28.6) 23 (25.3) 28 (38.4) 35 (43.7) 35 (39.3) 32 (38.5) TT 45 (12.5) 41 (11.5) 15(13.9) 10 (9.7) 6 (6.6) 2 (2.2) 20 (27.4) 27 (33.7) 4 (4.5) 2 (2.4) rs932479 TT 220 (60.9) 233 (64.5) 71 (64.5) 76 (71.7) 63 (73.3) 74 (79.6) 31 (41.3) 26 (33.3) 55 (61.1) 57 (67.9) (158990) CT 116(32.1) 109 (30.2) 36 (32.7) 30 (28.3) 19(22.1) 18(19.3) 28 (37.3) 35 (44.9) 33 (36.7) 26 (30.9) CC 25 (6.9) 19(5.3) 3 (2.7) 0 (0 ) 4 (4.7) 1(1.1) 16(21.3) 17(21.8) 2 (2.2) 1 (1.2) rsl884052 CC 268 (68.7) 275 (72.7) 92 (80.0) 92 (83.6) 66 (71.7) 77 (82.8) 49 (53.9) 30 (48.8) 61 (66.3) 66(71.0) (162319) CG 103 (26.4) 93 (24.6) 21 (18.3) 18(16.4) 20 (21.7) 15(16.1) 33 (36.3) 33 (40.2) 29 (31.5) 27 (29.0) GG 19(4.9) 10(2.7) 2(1.7) 0 6 (6.5) 1(1.1) 9 (9.9) 9(11.0) 2 (2.2) 0 rs1884054 CC 130(33.8) 110(28.9) 69 (62.7) 66 (60.0) 9 (9.7) 6 (6.4) 37 (43.0) 44 (51.2) 15(15.6) 12(13.3) (162519) AC 160 (41.6) 142 (37.4) 34 (30.9) 34 (30.9) 44 (47.3) 34 (36.2) 37 (43.0) 32 (37.2) 45 (46.9) 42 (46.7) AA 95 (24.7) 128 (33.7) 7 (6.4) 10(9.1) 40(43.0) 54 (57.5) 12(13.9) 10(11.6 36 (37.5) 36 (40.0) rs2179922 GG 279 (76.4) 277 (78.0) 94 (84.7) 90 (89.1) 70 (79.5) 81 (86.2) 42 (53.2) 36 (48.0) 73 (83.9) 70 (82.3) (168053) AG 69(18.9) 65(18.3) 15(13.5) 11 (10.9) 15(17.1) 12(12.8) 25 (31.7) 28 (37.3) 14(16.1) 14(16.5) AA 17(4.7) 13 (3.7) 2(1.8) 0 3 (3.4) 1(1.1) 12(15.2) 11 (14.7) 0 1 (1-2) 214 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 32 (continued). E thnicitv All Ethnicities A frican-A m erican L atina JaDanese C aucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B LO C K Geno Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols SNP Rs* type # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) BLOCK #9 rs726283 GG 136(34.2) 146 (36.8) 64 (57.7) 68 (63.0) 15(15.3) 7 (7.5) 40 (46.0) 49 (53.9) 17(16.7) 22(21.2) (173962) AG 149 (37.4) 118(29.7) 40 (36.0) 27 (25.0) 41 (41.8) 30(31.9) 36(41.4) 30 (33.0) 32(31.4) 31 (29.8) AA 113(28.4) 133 (33.5) 7 (6.3) 13 (12.0) 42 (42.9) 57 (60.6) 11 (12.6) 12(13.2) 53 (52.0) 51 (49.0) rs728523 TT 334(84.1) 323 (84.1) 67 (59.8) 66 (60.6) 89 (90.8) 89 (94.7) 84 (91.3) 74 (86.0) 94 (98.9) 94 (98.9) (174144) AT 53 (13.4) 54(14.1) 39 (34.8) 37 (33.9) 7(7.1) 5 (5.3) 6 (6.5) 11 (12.8) 1(1.1) 1(1.1) AA 10 (2.5) 7(1.8) 6(5.4) 6(5.5) 2 (2.0) 0 2 (2.2) 1(1.2) 0 0 rs932477 GG 279 (69.4) 290 (73.2) 74 (63.8) 82 (73.9) 77 (80.2) 83 (85.6) 47 (51.7) 36(41.4) 81 (81.8) 89 (88.1) (175549) AG 106 (26.4) 94 (23.7) 37 (31.9) 27 (24.3) 16(16.7) 13(13.4) 36 (39.5) 42 (48.3) 17(17.2) 12(11.9) AA 17 (4.2) 12 (3.0) 5 (4.3) 2(1.8) 3(3.1) 1 (1.0) 8 (8.8) 9(10.3) 1 (1.0) 0 rs926777 AA 172 (44.0) 178 (48.0) 34 (29.6) 30 (29.1) 55 (58.5) 65 (69.9) 23 (26.7) 26 (32.1) 60 (62.5) 57 (60.6) (176000) AC 155 (39.6) 141 (38.0) 46 (40.0) 48 (46.6) 36(37.3) 26 (28.0) 44 (51.2) 37 (45.7) 29 (30.2) 30 (31.9) CC 64 (16.4) 52(14.0) 35 (30.4) 25 (24.3) 3 (3.2) 2(2.1) 19(22.1) 18(22.2) 7 (7.3) 7 (7.4) BLOCK #10 rs722208 GG 163 (42.0) 166 (43.0) 25 (21.7) 21 (18.9) 56 (60.9) 61 (64.2) 31 (36.1) 29 (34.5) 51 (53.7) 55 (57.3) (193838) AG 159(41.0) 158 (40.9) 60 (52.2) 55 (49.6) 32 (34.8) 31 (32.6) 32 (37.2) 39 (46.4) 35 (36.8) 33 (34.4) AA 66(17.0) 62(16.1) 30(26.1) 35(31.5) 4 (4.3) 3 (3.2) 23 (26.7) 16(19.1) 9(9.5) 8 (8.3) rs722209 CC 314(78.9) 283 (72.9) 93 (80.2) 86 (76.1) 82 (87.2) 73 (74.5) 62 (68.1) 47 (57.3) 77 (79.4) 77 (81.1) (194149) CT 75(18.8) 99 (25.5) 21 (18.1) 25 (22.1) 12(12.8) 23 (23.5) 23 (25.3) 33 (40.2) 19(19.6) 18(18.9) TT 9 (2.3) 6(1.6) 2(1.7) 2(1.8) 0 2 (2.0) 6 (6.6) 2 (2.4) 1 (1.0) 0 rsl569788 CC 156 (43.6) 146 (42.2) 23 (21.3) 20(19.1) 53 (61.6) 58 (63.7) 28 (38.9) 24 (32.9) 52 (56.5) 44 (57.1) (199569) CT 142 (39.7) 145 (41.9) 58 (53.7) 53 (50.5) 30 (34.9) 31 (34.1) 24 (33.3) 35 (48.0) 30 (32.6) 26 (33.8) TT 60(16.8) 55 (15.9) 27 (25.0) 32 (30.5) 3 (3.5) 2 (2.2) 20 (27.8) 14(192) 10(10.9) 7(9.1) to R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 32 (continued). Ethnicitv All Ethnicities A frican-A m erican L atina JaDanese C aucasian (Stratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B LO C K Geno Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols SNP Rs" type # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) BLOCK #11 rs2207231 AA 255 (72.0) 245 (67.7) 72 (72.7) 68 (69.4) 65 (83.3) 69 (87.3) 54 (61.4) 44 (50.0) 64 (71.9) 64 (66.0) (200837) AG 63(17.8) 80 (22.1) 20 (20.2) 22 (22.4) 7 (9.0) 9(11.4) 19(21.6) 33 (37.5) 17(19.1) 16(16.5) GG 36(10.2) 37(10.2) 7(7.1) 8 (8.2) 6 (7.7) 1 (1.3) 15(17.0) 11 (12.5) 8 (9.0) 17(17.5) rs926779 AA 165 (44.5) 173 (46.5) 17(15.6) 17 (16.2) 62 (66.7) 59(61.5) 37 (46.3) 40 (49.4) 49 (54.4) 57 (63.3) (226873) AG 158(42.6) 148 (39.8) 64 (59.3) 52 (49.5) 30 (32.3) 35 (36.5) 33(41.2) 32 (39.5) 31 (34.4) 29 (32.2) GG 48 (12.9) 51 (13.7) 27 (25.0) 36 (34.3) 1(1.1) 2(2.1) 10(12.5) 9(11.1) 10(11.1) 4 (4.4) BLOCK #12 rs2207396 GG 258 (68.1) 263 (70.7) 73 (65.2) 69 (63.9) 68 (76.4) 71 (75.5) 54 (64.3) 60 (75.9) 63 (67.0) 63 (69.2) (253335) AG 102 (26.9) 94 (25.3) 31 (27.7) 30 (27.8) 20 (22.5) 22 (23.4) 27 (32.1) 18(22.8) 24 (25.5) 24 (26.4) AA 19(5.0) 15 (4.0) 8(7.1) 9 (8.3) 1(1.1) 1(1.1) 3 (3.6) 1(1.3) 7 (7.5) 4 (4.4) rs974276 AA 233 (61.8) 236(63.3) 40 (36.0) 48 (42.9) 77 (84.6) 77 (81.9) 44 (52.4) 39 (49.4) 72 (79.1) 72 (81.8) (253373) AG 111 (29.4) 105 (28.2) 49(44.1) 41 (36.6) 14(15.4) 16(17.0) 31 (36.9) 33(41.8) 17(18.7) 15(17.1) GG 33 (8.8) 32 (8.6) 22 (19.8) 23 (20.5) 0 1(1.1) 9 (10.7) 7 (8.9) 2 (2.2) 1(1-1) rs974277 CC 233 (62.3) 229 (61.9) 36 (33.0) 42 (40.0) 78 (84.8) 77 (82.8) 48 (59.3) 37 (47.3) 71 (77.2) 73 (79.3) (253774) CT 111 (29.7) 109 (29.5) 52 (47.7) 39(37.1) 14(15.2) 16(17.2) 27 (33.3) 35 (43.7) 18(19.6) 19(20.7) TT 30(8.0) 32 (8.6) 21 (19.3) 24 (22.9) 0 0 6 (7.4) 8 (10.0) 3 (3.3) 0 rs750686 AA 149 (40.3) 139 (39.6) 23 (21.1) 20 (19.6) 59 (65.6) 52 (58.4) 24 (29.3) 24 (30.8) 43 (48.3) 43 (52.4) (279079) AG 149 (40.3) 145 (41.3) 57 (52.3) 46(45.1) 28 (31.1) 34 (38.2) 33 (40.2) 35 (44.9) 31 (34.8) 30 (36.6) GG 72(19.5) 67(19.1) 29 (26.6) 36(35.3) 3 (3.3) 3(3.4) 25 (30.5) 19(24.4) 15(16.9) 9(11.0) to R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 32 (continued). E thnicitv All Ethnicities A frican-A m erican L atina Japanese Caucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B LO C K SNP Rs* Geno type Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols » (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) Controls # (% ) BLOCK #13 (con’t) rs749491 (279839) AA AG GG 88 (22.7) 147 (38.0) 152(39.3) 83 (21.8) 146 (38.3) 152 (39.9) 38 (34.5) 18(16.4) 54 (49.1) 45 (42.1) 17(15.9) 45 (42.1) 6 (6.4) 58 (62.4) 29(31.2) 4 (4.2) 54 (56.2) 38 (39.6) 28 (31.5) 26 (29.2) 35 (39.3) 24 (28.6) 28 (33.3) 32(38.1) 6 (6.4) 58 (62.4) 29 (31.2) 4 (4.2) 54 (56.2) 39 (39.6) ZR rs2228480 (291048) Exon 8 GG AG AA 227 (55.4) 126 (30.7) 57(13.9) 228 (56.7) 116(28.9) 58 (14.4) 71 (62.3) 34 (29.8) 9 (7.9) 72 (63.2) 32(28.1) 10 (8.8) 48 (50.0) 37 (38.5) 11(11.5) 54 (57.4) 34 (36.2) 6 (6.4) 48 (50.5) 23 (24.2) 24 (25.3) 53 (58.9) 19(21.1) 18 (20.0) 60(57.1) 32 (30.5) 13 (12.4) 49(47.1) 31 (29.8) 24 (23.1) r s l062577 (294858) 3'UTR TT AT AA 269 (77.3) 66 (19.0) 13 (3.7) 247 (72.2) 77 (22.5) 18(5.3) 91 (84.3) 16(14.8) 1 (0.9) 81 (83.5) 15(15.5) 1 (1.0) 59 (72.8) 19 (23.5) 3 (3.7) 66 (72.5) 19(20.9) 6 (6.6) 44 (58.7) 24 (32.0) 7 (9.3) 34 (44.7) 31 (40.8) 11 (14.5) 75 (89.3) 7(8.3) 2 (2.4) 66 (84.6) 12(15.4) 0 BLOCK #13 r s l543403 (299657) CC CG GG 126 (32.1) 178 (45.3) 89 (22.6) 135 (35.4) 165 (43.3) 81 (21.3) 37 (33.3) 49(44.1) 25 (22.5) 37 (33.9) 53 (48.6) 19(17.4) 20 (21.3) 48 (51.1) 26 (27.7) 27 (28.7) 47 (50.0) 20(21.3) 42 (45.7) 36 (39.1) 14(15.2) 46 (54.8) 30 (35.7) 8 (9.5) 27 (28.1) 45 (46.9) 24 (25.0) 25 (26.6) 35 (37.2) 34 (36.2) rsl543404 (299791) CC CT TT 112(35.1) 140 (43.9) 67(21.0) 129 (39.3) 127 (38.7) 72 (22.0) 27 (31.0) 39 (44.8) 21 (24.1) 34 (37.4) 37 (40.7) 20 (22.0) 20 (27.4) 38(52.1) 15 (20.5) 19(24.4) 39 (50.0) 20 (25.6) 41 (51.3) 27 (33.7) 12(15.0) 49 (66.2) 20 (27.0) 5 (6.8) 24 (30.4) 36 (45.6) 19 (24.0) 27 (31.8) 31 (36.5) 27(31.8) 217 218 Table 33. SNP by SNP association between ESRJ variants and breast cancer risk by stage of disease. Odds ratio compares high stage to controls (95% Confidence Intervals in parentheses). SNP Rs Geno type High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk: High Stage to Control (95% C l) p-value (trend) BLOCK #] rs488133 CC 137(68.2) 126 (70.2) 261 (70.2) Reference (-3603) CT 49 (24.4) 48(26.1) 98 (26.3) 0.95 (0.64, 1.42) TT 15 (7.5) 10 (5.4) 13(3.5) 2.20 (1.03,4.68) 0.2644 rs2071454 TT 106 (51.5) 110(59.5) 201 (53.7) Reference (-2223) GT 62 (30.1) 49 (26.5) 132 (35.3) 0.89 (0.61, 1.31) GG 38(18.5) 26(14.1) 41 (11.0) 1.76 (1.07, 2.89) 0.1302 rs2077647 AA 66 (32.2) 75 (38.7) 118(29.6) Reference (30) AG 87 (42.4) 70 (36.1) 198(49.6) 0.79(0.53, 1.16) Exon l b GG 52 (25.4) 49 (25.3) 83 (20.8) 1.12(0.71, 1.77) 0.8000 rs746432 CC 168 (92.8) 153 (93.3) 307 (91.6) Reference (261) BLOCK #2 CG GG 13 (7.2) 11(6.7) 28 (8.4) 0.85 (0.43, 1.68) 0.7636 rs576330 GG 192 (95.5) 178 (97.3) 364 (97.3) Reference (3128) BLOCK #3 AG AA 9(3.5) 5 (2.7) 10(2.7) 1.71 (0.69, 4.23) 0.3632 pPvuII AA 65 (29.8) 68 (34.0) 127 (31.1) Reference (34288) AG 104(47.7) 88 (44.0) 200 (49.0) 1.02(0.69, 1.49) GG 49 (22.5) 44 (22.0) 81 (19.9) 1.18(0.74, 1.88) 0.5480 pXBal TT 116(54.7) 104 (52.3) 202 (50.4) Reference (34334) CT 81 (38.2) 77 (38.7) 165(41.1) 0.85 (0.60, 1.21) RFLF CC 15(7.1) 18(9.1) 34 (8.5) 0.77 (0.40, 1.47) 0.3200 BLOCK #4 rs1033181 GG 163 (94.2) 146(94.2) 306 (95.6) Reference (65774) AG / AA 10(5.8) 9 (5.8) 14 (4.4) 1.34 (0.58,3.08) 0.6364 rs2175898 AA 112(57.7) 90 (51.1) 191 (51.2) Reference (67905) AG 60 (30.9) 65 (36.9) 116(31.1) 0.88 (0.60, 1.30) GG 22(11.3) 21 (11.9) 66 (17.7) 0.57 (0.33,0.97) 0.0564 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 219 Table 33 (continued). SNP Rs Genoty pe High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk: High Stage to C ontrol (95% Cl) p-value (trend) 1 BLOCK #5 rsl514347 CC 118(64.5) 100 (57.8) 190 (55.6) Reference (100398) CT 53 (29.0) 56 (32.4) 126 (36.8) 0.68 (0.46, 1.00) TT 12(6.6) 17(9.8) 26 (7.6) 0.74(0.36, 1.53) 0.0972 rs988328 TT 139(72.8) 122 (68.5) 232 (64.8) Reference (112103) CT 49 (25.7) 45 (25.3) 112(31.3) 0.73 (0.49, 1.08) BLOCK #6 No SNPs CC 3(1.6) 11 (6.2) 14(3.9) 0.36(0.11, 1.21) 0.0405 BLOCK #7 rs985192 CC 137(67.8) 112(59.0) 244 (61.5) Reference (154413) AC 50 (24.7) 55 (29.0) 112(28.2) 0.80(0.54, 1.18) AA 15 (7.4) 23(12.1) 41 (10.3) 0.65 (0.35, 1.22) 0.1193 BLOCK #8 rs985694 CC 132 (66.3) 105 (56.1) 218(57.4) Reference (157578) CT 52(26.1) 68 (36.4) 128 (33.7) 0.67 (0.46, 0.99) TT 15(7.5) 14(7.5) 34 (8.9) 0.73 (0.38, 1.39) 0.0780 rs985695 CC 128 (66.7) 108 (62.8) 242 (67.8) Reference (157658) CT 52 (27.1) 52 (30.2) 94 (26.3) 1.05 (0.70, 1.56) TT 12 (6.3) 12 (7.0) 21 (5.9) 1.00 (0.52, 2.27) 0.8390 rsl884049 CC 79 (40.9) 91 (54.2) 174(48.7) Reference (158320) CT 92 (47.7) 54 (32.1) 142 (39.8) 1.43 (0.98, 2.07) TT 22(11.4) 23(13.7) 41 (11.5) 1.18(0.66, 2.11) 0.2249 rs932479 TT 109 (58.0) 111 (64.2) 233 (64.5) Reference (158990) CT 65 (34.6) 51 (29.5) 109 (30.2) M l (0.87, 1.87) CC 14 (7.5) 11 (6.4) 19(5.3) 1.58(0.77,3.24) 0.1257 rsl884052 CC 136(66.7) 132 (71.0) 275 (72.8) Reference (162319) CG 59 (28.9) 44 (23.7) 93 (24.6) 1.28(0.87, 1.89) GG 9 (4.4) 10(5.4) 10(2.7) 1.82 (0.73,4.53) 0.1075 rsl884054 CC 74 (36.3) 56 (30.9) 128 (33.7) Reference (162519) AC 83 (40.7) 77 (42.5) 142 (37.4) 1.01 (0.68, 1.50) AA 47(23.0) 48 (26.5) 110(29.0) 0.74 (0.47, 1.15) 0.2308 rs2179922 GG 145 (75.5) 134 (77.6) 277 (78.0) Reference (168053) AG 40 (20.8) 29(16.8) 65(18.3) 1.18(0.76, 1.83) AA 7 (3.7) 10(5.8) 13 (3.7) 1.03 (0.40, 2.63) 0.6525 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 220 Table 33 (continued). SNP Rs Genoty Pe High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk: High Stage to C ontrol (95% C l) p-value (trend) BLOCK #9 rs726283 GG 75 (35.9) 61 (32.3) 146 (36.8) Reference (173962) AG 76 (36.4) 73 (38.6) 118(29.7) 1.25 (0.84, 1.87) AA 58 (27.7) 55 (29.1) 133 (33.5) 0.85(0.56, 1.29) 0.5232 rs728S23 TT 167 (81.9) 167 (86.5) 323 (84.1) Reference (174144) A T 31 (15.2) 22(11.4) 54(14.1) 1.11 (0.69, 1.79) AA 6 (2.9) 4(2.1) 7(1.8) 1.66 (0.55,4.96) 0.4369 rs932477 GG 139 (67.5) 140 (71.4) 290 (73.2) Reference (175549) AG 56 (27.2) 50 (25.5) 94 (23.7) 1.24(0.84, 1.83) AA 11(5.3) 6(3.1) 12 (3.0) 1.91 (0.83,4.39) 0.0999 rs926777 AA 91 (45.1) 81 (42.9) 178 (48.0) Reference (176000) AC 77 (38.1) 78(41.3) 141 (38.0) 1.07 (0.73, 1.55) CC 34(16.8) 30(15.9) 52(14.0) 1.28(0.78, 2.11) 0.3944 BLOCK #10 rs722208 GG 77 (37.9) 86 (46.5) 166(43.0) Reference (193838) AG 92 (45.3) 67 (36.2) 158(40.9) 1.26(0.86, 1.82) AA 34(16.7) 32(17.3) 62(16.1) 1.18(0.72, 1.94) 0.3864 rs722209 CC 162 (77.9) 152 (80.0) 283 (72.9) Reference (194149) CT 40(19.2) 35(18.4) 99 (25.5) 0.71 (0.47, 1.07) TT 6 (2.9) 3(1.6) 6(1.6) 1.75 (0.56, 5.63) 0.4411 rsl569788 CC 77 (40.7) 79 (46.7) 146 (42.2) Reference (199569) CT 84 (44.4) 58 (34.3) 145(41.9) 1.10(0.75, 1.61) TT 28(14.8) 32(18.9) 55(15.9) 0.97 (0.57, 1.64) 0.9957 BLOCK #11 rs2207231 AA 130(71.4) 125 (72.7) 245 (67.7) Reference (200837) AG 33 (18.1) 30(17.4) 80(22.1) 0.78(0.49, 1.23) GG 19(10.4) 17(9.9) 37(10.2) 0.97 (0.54, 1.75) 0.6087 rs926779 AA 79(41.1) 86 (48.0) 173 (36.5) Reference (226873) AG 89 (46.3) 69 (38.5) 148 (39.8) 1.32 (0.91, 1.91) GG 24(12.5) 24(13.4) 51 (13.7) 1.03 (0.59, 1.79) 0.5428 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 221 Table 33 (continued). SNP Rs Genoty pe High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk: High Stage to Control (95% C l) p-value (trend) BLOCK #12 rs2207396 GG 133 (67.9) 125 (68.3) 263 (70.7) Reference (253335) AG 54 (27.5) 48 (26.2) 94 (25.3) 1.14(0.77, 1.68) AA 9 (4.6) 10(5.5) 15 (4.0) 1.19(0.51,2.78) 0.5406 rs974276 AA 113 (57A) 120 (66.7) 236(63.3) Reference (253373) AG 68 (34.5) 43 (23.9) 105 (28.1) 1.35 (0.93, 1.97) GG 16(8.1) 17(9.4) 32 (8.6) 1.04(0.55, 1.98) 0.3739 rs974277 CC 110(57.9) 123 (66.9) 229(61.9) Reference (253774) CT 67 (35.3) 44 (23.9) 109(29.5) 1.28 (0.88, 1.87) TT 13 (6.8) 17(9.2) 32 (8.7) 0.85 (0.43, 1.67) 0.7540 rs750686 AA 76 (40.0) 73 (40.6) 139 (39.6) Reference (279079) AG 75 (49.5) 74(41.1) 145(41.3) 0.95(0.64, 1.40) ZR GG 39 (20.5) 33(18.3) 67(19.1) 1.06(0.66, 1.73) 0.9247 rs749491 GG 71 (34.8) 76(41.5) 146 (38.3) Reference (279839) AG 85(41.7) 67 (36.6) 152(39.9) 1.15(0.78, 1.70) Exon 8 AA 48 (23.5) 40(21.9) 83 (21.8) 1.19(0.75, 1.87) 0.4575 rs2228480 GG 116(55.5) 11 (55.2) 228 (56.7) Reference (291048) AG 66 (31.6) 60 (29.8) 116(28.9) 1.12(0.77, 1.63) 3 ’ UTR AA 27 (12.9) 30(14.9) 58(14.4) 0.91 (0.55, 1.52)) 0.9911 BLOCK #13 rsl062577 TT 136(76.8) 133 (77.8) 247 (72.2) Reference (294757) A T 33(18.6) 33 (19.3) 77 (22.5) 0.78 (0.49, 1.25) AA 8 (4.5) 5 (2.9) 18(5.3) 0.81 (0.34, 1.90) 0.3433 rsl543403 CC 62 (30.7) 64 (33.5) 135 (35.4) Reference (299657) CG 93 (46.0) 85 (44.5) 165 (43.3) 1.23 (0.83, 1.82) GG 47 (23.3) 42 (22.0) 81 (21.3) 1.26(0.79, 2.02) 0.3213 rsl543404 CC 61 (37.0) 51 (33.1) 129 (39.3) Reference (299791) CT 75 (45.4) 65 (42.2) 127 (38.7) 1.25 (0.82, 1.90) TT 29(17.6) 38 (24.7) 72 (22.0) 0.85 (0.50, 1.44) 0.8269 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 222 over-represented among high stage breast cancer cases (Table 33). Although carriers of two copies the haplotype in which the ‘T’ allele of rs488133 is observed (h4412) are not at significantly increased risk of high stage breast cancer (OR 1.65; 95% Cl 0.69, 3.93; Table 34) there is an elevated risk of all breast cancers combined (OR 2.17; 95%CI 1.06, 4.45; Table 34). Other observations of note include evidence of a trend in risk among the h34 haplotype in Block #3 (p for trend=0.0396; Table 34). Further, in Blocks #8 and #9, individuals homozygous for the two least common haplotypes (h4242223 in Block #8 and h3432 in Block #9) were at significantly increased risk of high stage breast cancer as compared to non carriers. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 223 Table 34. Association between ESR1 ‘phased’ haplotype dose and breast cancer risk by stage of disease. Odds ratio compares high stage breast cancer to controls (95% Confidence Intervals in parentheses). Block Haplotype (Frequency) Dose High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk - High Stage to C ontrol (95% C l) p-value (trend) BLOCK #1 h2412 0 97 (43.7) 91 (44.2) 158 (37.1) Reference (34.4%) 1 93 (41.9) 91 (44.2) 215(50.5) 0.70(0.50, 1.00) 2 32 (14.4) 24(11.7) 53 (12.4) 0.98 (0.59, 1.63) 0.4432 h2332 0 122 (55.0) 130(63.1) 247 (58.0) Reference (23.1%) 1 71 (32.0) 57 (27.7) 149 (35.0) 0.96(0.68, 1.38) 2 29(13.1) 19 (9.2) 30 (7.0) 1.96 (1.13,3.39) 0.1097 h2432 0 164(73.9) 143 (69.4) 300 (70.4) Reference (18.2%) 1 51 (23.0) 52 (25.2) 105 (24.6) 0.89(0.60, 1.31) 2 7 (3.2) 11(5.3) 21 (4.9) 0.61 (0.26, 1.45) 0.2868 h4412 0 167 (75.2) 137(66.5) 303 (71.1) Reference (17.4%) 1 45 (20.3) 55 (26.7) 112(26.3) 0.73(0.49, 1.08) BLOCK #2 Single SNP 2 10 (4.5) 14 (6.8) 11 (2.6) 1.65(0.69,3.93) 0.6744 BLOCK #3 h l4 0 50 (22.5) 47 (22.8) 85 (20.0) Reference (53.9%) 1 111 (50.0) 91 (44.2) 214(50.2) 0.88 (0.58, 1.34) 2 61 (27.5) 68 (33.0) 127 (29.8) 0.82(0.81, 1.30) 0.4323 h32 0 123 (55.4) 109(52.9) 213(50.0) Reference (27.1%) 1 85 (38.3) 81 (39.3) 180(42.3) 0.82 (0.58, 1.15) 2 14 (6.3) 16(7.8) 33 (7.7) 0.73 (0.38, 1.42) 0.2100 h34 0 142 (64.0) 146 (70.9) 304 (71.4) Reference (17.9%) 1 68 (30.6) 53 (25.7) 109(25.6) 1.34 (0.93, 1.92) 2 12(5.4) 7(3.4) 13(3.1) 1.98 (0.89, 4.38) 0.0396 BLOCK #4 h31 0 23 (10.4) 27(13.1) 60(14.1) Reference (68.6%) 1 2 93(41.9) 106 (47.7) 90 (43.7) 89 (43.2) 171 (40.1) 195 (45.8) 1.42 (0.83,2.49) 1.42 (0.83,2.42) 0.3505 h33 0 115(51.8) 97(47.1) 206 (48.4) Reference (28.7%) 1 87 (39.2) 83 (40.3) 162(38.0) 0.96 (0.68, 1.36) 2 20 (9.0) 26(12.6) 58(13.6) 0.62(0.36, 1.07) 0.02267 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 224 Table 34 (continued). Block Haplotype (Frequency) Dose High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk - High Stage to Control (95% C l) p-value (trend) BLOCK #5 024 0 11(5.0) 16 (7.8) 28 (6.6) Reference (75.0%) 1 65 (29.3) 63 (30.6) 132(31.0) 1.25 (0.59, 2.67) 2 146 (65.8) 127(61.7) 266 (62.4) 1.40 (0.68,2.88) 0.3578 h42 0 160 (72.1) 145 (70.4) 301 (70.7) Reference (18.2%) 1 58(26.1) 55 (26.7) 107 (25.1) 1.02 (0.70, 1.48) 2 4(1.8) 6(2.9) 18(4.2) 0.42 (0.14, 1.22) 0.4304 h44 0 203 (91.4) 179 (86.9) 383 (89.9) Reference (6.6%) 1 18(8.1) 26 (12.6) 42 (9.9) 0.81 (0.45, 1.44) BLOCK #6 No SNPs BLOCK #7 Single SNP 2 1 (0.5) 1 (0.5) 1 (0.2) 1.89 (0.12, 28.97) 0.7023 BLOCK #8 02224213 0 81 (36.5) 69 (33.5) 140 (32.9) Reference (44.7%) 1 95 (42.8) 88 (42.7) 175(41.1) 0.94(0.65, 1.36) 2 46 (20.7) 49(23.8) 111 (26.1) 0.72 (0.46, 1.11) 0.1697 h2424223 0 143 (64.4) 124 (60.2) 274 (64.3) Reference (19.5%) 1 67 (30.2) 73 (35.4) 131 (30.8) 0.98(0.69, 1.40) 2 12 (5.4) 9 (4.4) 21 (4.9) 1.09(0.52, 2.29) 0.9937 04242321 0 162 (73.0) 162 (78.6) 329 (77.2) Reference (13.3%) 1 55 (24.8) 39(18.9) 86 (20.2) 1.30(0.88, 1.91) 2 5 (2.3) 5 (2.4) 11(2.6) 0.92 (0.32, 2.70) 0.3821 02244223 0 187 (84.2) 184 (89.3) 357 (83.8) Reference (8.8%) 1 31 (14.0) 19(9.2) 64(15.0) 0.92(0.58, 1.47) 2 4(1.8) 3(1.5) 5(1.2) 1.53 (0.41,5.70) 0.9661 04242223 0 192 (86.5) 184 (89.3) 373 (87.6) Reference (7.0%) 1 24(10.8) 20 (9.7) 51 (12.0) 0.91 (0.55, 1.53) 2 6(2.7) 2(1.0) 2 (0.5) 5.83 (1.40, 24.26) 0.3483 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 225 Table 34 (continued). Block Haplotype (Frequency) Dose High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk - High Stage to C ontrol (95% C l) p-value (trend) BLO CK#9 h i 432 0 82 (36.9) 66 (32.0) 152 (35.7) Reference (47.3%) 1 82 (36.9) 77 (37.4) 134(31.5) 1.13 (0.77, 1.67) 2 58(26.1) 63 (30.6) 140 (32.9) 0.77 (0.51, 1.51) 0.2569 H3431 0 145 (65.3) 129 (62.6) 269(63.1) Reference (19.7%) 1 68 (30.6) 65 (31.6) 137(32.2) 0.92(0.65, 1.31) 2 9(4.1) 12 (5.8) 20 (4.7) 0.83 (0.37, 1.88) 0.6044 h341l 0 149(67.1) 145 (70.4) 301 (70.7) Reference (16.2%) 1 64 (28.8) 55 (26.7) 114 (26.8) 1.13(0.79, 1.63) 2 9(4.1) 6 (2.9) 11 (2.6) 1.65 (0.68, 4.04) 0.2910 h3132 0 180 (81.1) 177 (85.9) 348 (81.7) Reference (9.6%) 1 38(17.1) 26(12.6) 71 (16.7) 1.03 (0.67, 1.60) 2 4(1.8) 3(1.5) 7(1.6) 1.10(0.32,3.82) 0.9076 h3432 0 205 (92.3) 192 (93.2) 396 (93.0) Reference (5.8%) 1 10(4.5) 10(4.9) 27 (6.3) 0.72(0.34, 1.50) 2 7 (3.2) 4(1.9) 3 (0.7) 4.51 (1.29,15.70) 0.3243 BLOCK #10 h i 24 0 35(15.8) 34(16.5) 68 (16.0) Reference (61.5%) I 108(48.6) 84 (40.8) 201 (47.2) 1.04(0.65, 1.67) 2 79 (35.6) 88 (42.7) 157 (36.9) 0.98 (0.60, 1.59) 0.8989 h322 0 114(51.4) 120 (58.3) 243 (57.0) Reference (24.4%) 1 93(41.9) 66 (32.0) 153 (35.9) 1.30(0.92, 1.82) 2 15 (6.8) 20 (9.7) 30 (7.0) 1.07 (0.55,2.06) 0.3269 h342 0 179 (80.6) 169 (82.0) 324 (76.1) Reference (11.8%) 1 37(16.7) 34(16.5) 97 (22.8) 0.69 (0.45, 1.05) 2 6 (2.7) 3(1.5) 5(1.2) 2.17(0.67,7.02) 0.4834 BLOCK #11 h l3 0 37 (16.7) 36(17.5) 82 (19.2) Reference (59.0%) 1 110(49.5) 89 (43.2) 186(43.7) 1.31 (0.83,2.06) 2 75 (33.8) 81 (39.3) 158 (37.1) 1.05 (0.65, 1.69) 0.9488 h l l 0 123 (55.4) 121 (58.7) 257 (60.3) Reference (20.7%) 1 87 (39.2) 78 (37.9) 149 (35.0) 1.22 (0.87, 1.72) 2 12 (5.4) 7 (3.4) 20 (4.7) 1.25 (0.59, 2.64) 0.2785 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 226 Table 34 (continued). Block Haplotype (Frecjuency) Dose High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk - High Stage to C ontrol (95% C l) p-value (trend) BLO C K #! 1 (con’t) h31 0 172 (77.5) 159 (77.2) 317(74.4) Reference (14.9%) 1 44(19.8) 42 (20.4) 103 (24.2) 0.79(0.53, 1.17) 2 6 (2.7) 5 (2.4) 6(1.4) 1.84 (0.60,5.71) 0.7201 BLOCK #12 H31231 0 55 (24.8) 50 (24.3) 97 (22.8) Reference (55.2%) 1 100(45.0) 81 (39.3) 185 (43.4) 0.95 (0.63, 1.44) 2 67 (30.2) 75 (36.4) 144 (33.8) 0.82 (0.55, 1.27) 0.3905 h33413 0 135 (60.8) 135 (65.5) 267 (62.7) Reference (19.3%) 1 77 (34.7) 61 (29.6) 137(32.2) 1.11 (0.79, 1.57) 2 10(4.5) 10(4.9) 22 (5.2) 0.90 (0.41, 1.95) 0.8590 h ll2 1 3 0 153 (68.9) 143 (69.4) 298 (70.0) Reference (17.4%) 1 60 (27.0) 54 (26.2) 112(26.3) 1.04(0.72, 1.51) 2 9(4.1) 9 (4.4) 16(3.8) 1.10(0.47, 2.54) 0.8280 BLOCK #13 h22 0 48(21.6) 48 (23.3) 91 (21.4) Reference (54.9%) 1 109(49.1) 95(46.1) 188(44.1) 1.10(0.72, 1.68) 2 65 (29.3) 63 (30.6) 147 (34.5) 0.84 (0.33, 1.32) 0.3931 h34 0 73 (32.9) 71 (34.5) 71 (34.5) Reference (40.2%) 1 107(48.2) 93 (45.1) 93 (45.1) 1.24 (0.86, 1.78) 2 42(18.9) 42 (20.4) 42 (20.4) 1.25 (0.78,2.00) 0.2998 Similarly as a consequence of the direct counting ‘phasing’ method, individuals could be described exactly, not probabilistically, by the haplotypes carried. Risk estimates compared all possible haplotype combinations to a reference group o f individuals carrying two copies of the most common haplotypes in a given block. Risk estimates adjusted for age of disease onset and ethnicity were calculated for all cases combined as well as high stage cases Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 227 only (Table 35). We again noted significant associations with high stage breast cancer adjusted for age of onset and ethnicity for h4242223 homozygotes in Block #8 (OR 6.27; 95% Cl 1.20, 32.79) and as well as h3432 homozgotes in Block #9 (OR: 6.38; 95% Cl 1.57,25.95). The latter was also significantly associated with all breast cancers combined - OR: 4.79 (95% Cl 1.27, 17.98). Table 35. Risk estimates using ‘phased’ data for each common haplotype (>5%) in the 13 blocks of ESR1, for all cases combined and high stage cases only - adjusted for age and ethnicity. Adiusted Risk Estim ate (Age and Ethnicitv! BLOCK H aplotype All Cases: O R (95% C l) High Stage: O R (95% C l) M inimum Block Rh2 BLOCK #1 h2412 homozygote h2332 homozygote h2432 homozygote h4412 homozygote h2412 / h2332 h2412/h2432 h2412 / h4412 h2332 / h2432 h2332 / h4412 h2432 / h4412 Effective Block Rh Reference 1.47 (0.85,2.57) 0.85 (0.42, 1.73) 1.91 (0.87,4.22) 0.72(0.47, 1.09) 0.86 (0.51, 1.46) 0.75 (0.46, 1.23) 1.17(0.59, 2.35) 0.71 (0.31, 1.62) 0.92 (0.51, 1.67) Reference 1.60 (0.86,2.99) 0.60(0.24, 1.52) 1.46(0.57,3.72) 0.75 (0.46,1.21) 0.89(0.48, 1.65) 0.52 (0.27, 0.97) 1.03 (0.45,2.36) 0.73(0.28, 1.92) 0.99 (0.39, 1.61) 0.2513 BLOCK #2 Single SNP Effective Block Rh 0.0024 BLOCK #3 h l4 homozygote h32 homozygote h34 homozygote h l4 / h32 hl4 / h34 h32 / h34 Effective Block Rh Reference 0.90 (0.51, 1.57) 1.17(0.54, 2.53) 0.81 (0.57, 1.15) 1.02(0.67, 1.55) 1.02 (0.61, 1.71) Reference 0.87(0.43, 1.75) 1.65 (0.70,3.89) 0.89(0.58, 1.36) 1.26 (0.77,2.05) 1.12(0.61,2.05) 0.2153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 228 Table 35 (continued). Adjusted Risk Estim ate fAae and Ethnicitv) BLOCK All Cases: High Stage: Minimum Haplotype OR (95% C l) O R (95% C l) Block Rh BLOCK #4 h31 homozygote Reference Reference h33 homozygote 0.76(0.49, 1.19) 0.62 (0.35, 1.09) h31 / h33 0.95 (0.70, 1.27) 0.90 (0.63, 1.28) Effective Block R f 0.0531 BLOCK #5 h24 homozygote Reference Reference h42 homozygote 0.38(0.21, 1.09) 0.39(0.13, 1.17) h44 homozygote 2.58(0.23,28.85) 2.22 (0.14, 36.09) h24 / h42 0.93 (0.67, 1.31) 0.92 (0.61, 1.37) h24 / h44 0.88(0.51, 1.52) 0.66 (0.33, 1.34) h42 / h44 1.62 (0.69, 3.83) 1.21 (0.42, 3.50) Effective Block R f 0.0537 BLOCK #6 No SNPs BLOCK #7 Single SNP Effective Block Rh 0.0325 BLOCK #8 h2224213 homozygote Reference Reference h2424223 homozygote 1.02(0.51,2.05) 1.41 (0.62,3.22) h4242321 homozygote 0.81 (0.33,2.03) 0.93 (0.30, 2.83) h2244223 homozygote 1.16(0.34,3.99) 1.84 (0.45,7.45) h4242223 homozygote 3.47 (0.70, 17.22) 6.27 (1.20,32.79) h2224213 / h2424223 1.07(0.71, 1.60) 1.00(0.60, 1.68) h2224213 / h4242321 1.14(0.71, 1.85) 1.51 (0.87, 2.63) h2224213 / h2244223 1.11(0.56, 2.19) 1.74 (0.81,3.75) h2224213 / h4242223 0.77 (0.37, 1.57) 1.00 (0.43,2.35) h2424223 / h4242321 0.83 (0.41, 1.66) 1.21 (0.54, 2.67) h2424223 / h2244223 0.59 (0.29, 1.20) 1.00 (0.45, 2.24) h2424223 / h4242223 1.29 (0.52,3.20) 1.83 (0.66,5.08) h4242321 /h2244223 0.48(0.15, 1.46) 0.46(0.10, 2.25) h4242321 / h4242223 0.56(0.22, 1.80) 0.84 (0.28, 2.54) h2244223 / h4242223 0.45 (0.11, 1.07) 0.36 (0.04,3.10) Effective Block Rh 0.2983 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 229 Table 35 (continued). Adjusted Risk Estim ate fAse and Ethnicitvl BLOCK All Cases: High Stage: M inimum Haplotype O R (95% C l) O R (95% C l) Block R h2 BLOCK #9 hl432 homozygote Reference Reference h3431 homozygote 1.07(0.53,2.15) 0.92 (0.38, 2.24) h3411 homozygote 1.51 (0.65,3.52) 1.88 (0.72,4.94) h3132 homozygote 1.09 (0.35,3.40) 1.47(0.40, 5.46) h3432 homozygote 4.79(1.27,17.98) 6.38 (1.57, 25.95) hl432 / h3431 1.31 (0.85,2.04) 1.18 (0.68, 2.05) hl432 / h3411 1.43 (0.89, 2.28) 1.51 (0.87, 2.26) hl432 / h 3 132 1.76 (0.86,3.61) 2.61 (1.18,5.78) h l4 3 2 /h 3 4 3 2 0.57(0.21, 1.51) 0.69(0.21,2.21) h3431 / h3411 1.05 (0.61, 1.78) 1.29 (0.69, 2.40) h3431 / h3132 0.71 (0.38, 1.32) 0.90(0.43, 1.87) h3431 / h 3432 2.49 (0.43, 14.48) 2.78 (0.37, 20.88) h3411 / h3132 0.78(0.34, 1.79) 0.69 (0.23, 2.06) h3411 / h3432 0.90 (0.28,2.85) 0.93 (0.23, 3.78) h3132/ h3432 Effective Block Rh 2 0.76(1.60,3.63) 0.60 (0.07, 5.56) 0.8625 BLOCKmo h i 24 homozygote Reference Reference h322 homozygote 1.11 (0.63, 1.95) 1.01 (0.50, 2.05) h342 homozygote 1.50(0.47, 1.56) 2.22 (0.63, 7.75) hl24 / h322 1.11 (0.79, 1.09) 1.39(0.93,2.07) hl24 / h342 0.70(0.45, 1.45) 0.74 (0.43, 1.28) h322 / h342 Effective Block Rh 2 0.78 (0.42, 1.07) 0.98 (0.47, 2.05) 0.0393 BLOCK U ll h i3 homozygote Reference Reference h ll homozygote 1.19(0.59, 2.40) 1.59(0.71,3.56) h31 homozygote 1.56 (0.54, 4.47) 2.05 (0.62, 6.74) hl3 / hi 1 1.35 (0.96, 1.87) 1.52 (1.02,2.27) hl3/h31 1.00(0.64, 1.57) 1.20 (0.71,2.04) hi 1 / h31 Effective Block Rh 2 0.79(0.40, 1.54) 0.55(0.21, 1.44) 0.0456 BLOCK #12 h31231 homozygote Reference Reference h33413 homozygote 0.89(0.45, 1.74) 0.91 (0.40, 2.06) hi 1213 homozygote 1.06 (0.51,2.17) 1.07 (0.45,2.56) h 3 1231 / h33413 0.95 (0.66, 1.37) 1.12(0.73, 1.71) h31231 / hi 1213 1.05 (0.71, 1.55) 1.09(0.68, 1.74) h33413 / h i 1213 1.04(0.60, 1.82) 1.19(0.63,2.29) Effective Block Rh2 0.3209 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 230 Table 35 (continued). Adjusted Risk Estim ate (Aee and Ethnicitv) BLOCK AH Cases: High Stage: M inimum Haplotype O R (95% C l) O R (95% C l) Block R h2 BLOCK #13 h22 homozygote Reference Reference h34 homozygote 1.29(0.87, 1.91) 1.27 (0.79,2.03) h22 / h34 1.24(0.91, 1.69) 1.35(0.94, 1.96) Effective Block Rh 0.0654 In contrast to the SNP-specific analysis and raw haplotype counts analysis, no excess risk was demonstrated in Block #1 for carriers of the h2332 or h4412 haplotype, either in the homozygous state or in combination with another haplotype, when compared to h2412/h2412 carriers. However, when risks were constructed for homozygous carriers of h2332 and h4412 combined, these haplotype combinations were overrepresented among all breast cancers (p=0.0023) combined as well as high stage breast cancer only (p=0.0041; Table 36). Although not always statistically significant, there was evidence of this phenomenon in each of the four ethnic groups considered in these analyses (Table 36). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 36. Association between ESR1 Block #1 ‘risk’ haplotypes (h2412 and h4412 carriers) and breast cancer, all cases and high stage cases only. HaDlotvpe C ounts O dds R atio 195% CD D-value Ethnicity ‘R isk’ H aplotype C ategory #. C ontrols (% ) # Low Stage Cases (% ) # High Stage Cases (% ) Low Stage High Stage All Combined h2414 + h4412 All Others 41 (10.6) 346 (89.4) 33 (17.6) 154 (82.4) 39(19.6) 160 (80.4) 1.81 (1.11,2.96) p=0.0023 2.04(1.25,3.32) p=0.0040 African-A merican h2414 + h4412 All Others 14(13.2) 92 (86.8) 6(13.0) 40 (87.0) 16(25.0) 48 (75.0) 0.99 (0.35, 2.75) p=1.0 2.19(1.00, 4.81) p=0.0807 Japanese h2414 + h4412 All Others 9 (9.4) 87 (90.6) 13 (25.5) 38 (74.5) 10(24.4) 31 (75.6) 3.31 (1.35,8.13) p=0.0181 3.12(1.20, 8.12) p-0.0395 Latino Caucasian h2414 + h4412 All Others h2414 + h4412 All Others 8 (8.4) 87 (91.6) 10(11.1) 80 (88.9) 5(10.6) 42 (89.4) 9 (20.9) 34 (79.1) 5(12.5) 35 (87.5) 8(14.8) 46 (85.2) 1.29 (0.40,4.19) p=0.9030 2.12(0.80, 5.59) p=0.2118 1.55 (0.48,5.04) p=0.6788 1.39(0.51,3.76) p=0.6963 232 7.4.2c. Haplotype Results with an ‘ Indicator’ Variable and Association with Breast Cancer Alternative analyses included producing a haplotype prediction value for use as the independent variables in a logistic analysis of disease status upon “haplotype dosage” with values of i = 0, 1, or 2. Risks associated with either a dominant or recessive effect of all common haplotypes (>5%) in a given block are presented in Table 37. Crude and adjusted estimates were calculated both for all cases combined and high stage cases only. In the aggregate, risk estimates were less biased towards the null than those constructed using ‘phased’ data and haplotypes associated with increased risk seen in the above analyses were similarly identified. Specifically, excess risk was noted with the h2332 and h4412 haplotypes in Block #1 as well as the h4242223 haplotype in Block #8 and the h3432 haplotype in Block #9. Further, associations which were previously of borderline significance were strengthened. For instance, a risk elevation was revealed among h322 haplotype carriers in Block #10 in contrast to a significant protective effect of the h342 haplotype in the same block. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 37. Association between ESR1 haplotype dose fitting a recessive or dominant model and breast cancer. Crude and adjusted (age and ethnicity) odds ratios are presented for all cases combined and high stage breast cancer only (95% CIs in parentheses). All Cases Com bined High Stage Cases Block HaDlotvoe (Frequency) Effect C rude O R (95% C l) A djusted O R (95% CD C rude O R (95% CD A djusted O R (95% C l) BLOCK #1 H2412 Recessive3 0.97(0.65, 1.46) 0.95 (0.64, 1.44) 1.07 (0.66,1.73) 1.04 (0.64, 1.69) (34.4%) Dominant1 5 0.77 (0.57, 1.03) 0.70 (0.52, 0.96) 0.77 (0.57,1.03) 0.70 (0.52, 0.96) BLOCK #2 h2332 Recessive 1.79 (1.09,2.95) 1.87(1.10,3.15) 2.08 (1.19,3.64) 2.16(1.20,3.88) (23.1%) Dominant 1.02(0.75, 1.37) 1.05 (0.76,1.45) 1.23 (0.86,1.75) 1.27 (0.87,1.86) h2432 Recessive 0.87(0.46, 1.67) 0.95 (0.49, 1.85) 0.63 (0.26,1.49) 0.69(0.29, 1.67) (18.2%) Dominant 0.85 (0.62, 1.17) 0.86 (0.61, 1.20) 0.79 (0.53, 1.16) 0.81 (0.54, 1.22) h4412 Recessive 2.10 (1.02, 4.32) 1.97 (0.93,4.16) 1.60 (0.66,3.87) 1.53 (0.62,3.79) (17.4%) Dominant 1.04(0.76, 1.42) 1.04 (0.74, 1.45) 0.83 (0.56, 1.22) 0.82 (0.54, 1.23) BLOCK #3 h l4 Recessive 0.44 (0.13, 1.43) 0.44 (0.13, 1.49) 0.42 (0.09, 1.97) 0.46 (0.10,2.16) (53.9%) Dominant 0.97(0.94, 1.00) 0.97 (0.94, 1.01) 0.97 (0.93, 1.01) 0.98 (0.94, 1.02) h32 Recessive 0.90 (0.54, 1.50) 0.97 (0.57, 1.65) 0.83 (0.44, 1.57) 0.89 (0.47, 1.70) (27.1%) Dominant 0.86 (0.65, 1.13) 0.87(0.65,1.15) 0.81 (0.58,1.13) 0.83 (0.59, 1.17) h34 Recessive 1.48 (0.82,3.04) 1.26(0.60,2.65) 1.84 (0.83,4.10) 1.65 (0.73,3.74) (17.9%) Dominant 1.20 (0.89, 1.62) 1.15 (0.94,1.57) 1.38(0.97, 1.97) 1.33 (0.92, 1.91) N ) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 37 (continued). All Cases Com bined H igh Stage Cases Block HaDlotvDe (Frequency) Effect C rude OR (95% C l) A djusted OR (95% C l) C rude O R (95% C l) A djusted OR (95% C l) BLOCK #4 h i I Recessive 1.05 (0.95,1.17) 1.03 (0.93, 1.14) 1.05(0.93,1.19) 1.03 (0.91,1.17) (68.6%) Dominant 1.03 (0.99, 1.07) 1.02 (0.97, 1.07) 1.02(0.97,1.08) 1.01 (0.96, 1.07) h33 Recessive 0.76 (0.50, 1.17) 0.78 (0.50, 1.22) 0.63 (0.36, 1.10) 0.66 (0.37, 1.16) (28.7%) Dominant 0.92 (0.69, 1.23) 0.92 (0.68, 1.24) 0.81 (0.56, 1.15) 0.81 (0.56, 1.18) BLOCK #5 h24 Recessive 0.97 (0.93, 1.02) 0.98 (0.93, 1.03) 0.98 (0.92, 1.04) 0.98 (0.92, 1.04) (75.0%) Dominant 0.99(0.96, 1.01) 0.99 (0.96, 1.01) 0.98 (0.95, 1.02) 0.99 (0.95, 1.02) h42 Recessive 0.58(0.27, 1.24) 0.53 (0.24, 1.14) 0.46 (0.16, 1.29) 0.45 (0.16, 1.28) (18.2%) Dominant 0.89(0.65, 1.21) 0.85 (0.62, 1.17) 0.87 (0.60,1.26) 0.83 (0.57, 1.21) h44 Recessive 2.12(0.19, 23.76) 2.86 (0.25, 32.65) 2.05 (0.13, 32.60) 2.68 (0.17,43.38) (6.6%) BLOCK #6 No SNPs BLOCK #7 Single SNP Dominant 1.03 (0.65, 1.62) 1.10(0.68, 1.76) 0.76 (0.42, 1.38) 0.79 (0.43, 1.46) BLOCK #8 h2224213 Recessive 0.81 (0.59, 1.12) 0.77 (0.54, 1.08) 0.73 (0.50, 1.09) 0.68 (0.45, 1.04) (44.7%) Dominant 0.91 (0.68, 1.22) 0.90 (0.65, 1.24) 0.86 (0.61, 1.22) 0.84 (0.57, 1.24) 234 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 37 (continued). All Cases Com bined High Staee Cases Block H aplotvoe (Frequency) Effect C rude O R (95% C l) A djusted OR (95% C l) C rude O R (95% CD Adjusted O R (95% C l) BLOCK #8 (cont.) h2424223 Recessive 0.99 (0.53, 1.83) 1.10(0.58,2.11) 1.11 (0.54, 2.29) 1.23 (0.57, 2.62) (19.5%) Dominant 1.10(0.82, 1.48) 1.09 (0.80, 1.50) 1.04 (0.73, 1.49) 1.05 (0.72, 1.55) h4242321 Recessive 0.89 (0.36,2.19) 0.82 (0.33, 2.07) 0.89 (0.30,2.69) 0.83 (0.27,2.53) (13.3%) Dominant 1.13(0.80, 1.60) 1.09 (0.76, 1.55) 1.30(0.87,1.95) 1.22 (0.81, 1.85) h2244223 Recessive 1.41 (0.44,4.53) 1.29 (0.38, 4.34) 1.58(0.41,6.01) 1.61 (0.41,6.35) (8.8%) Dominant 0.75 (0.51, 1.11) 0.73 (0.47, 1.12) 0.91 (0.58, 1.45) 0.92 (0.56, 1.51) 04242223 Recessive 4.03 (0.80, 20.46) 3.70 (0.70, 19.66) 6.38(1.17, 34.92) 5.96(1.05,33.75) (7.0%) Dominant 1.02 (0.67, 1.58) 0.99 (0.63, 1.56) 1.14(0.69, 1.91) 1.12(0.66, 1.89) BLOCK #9 hl432 Recessive 0.79 (0.59, 1.07) 0.76 (0.54, 1.05) 0.72 (0.50, 1.04) 0.67 (0.45, 1.00) (47.3%) Dominant 1.06 (0.80, 1.40) 1.05 (0.77, 1.44) 0.96 (0.68,1.34) 0.94 (0.65, 1.37) h3431 Recessive 0.95 (0.50, 1.79) 0.85 (0.43, 1.65) 0.78 (0.34, 1.81) 0.66 (0.28, 1.56) (19.7%) Dominant 1.01 (0.75, 1.37) 1.02 (0.74, 1.40) 0.92 (0.64,1.33) 0.92 (0.63, 1.35) h3411 Recessive 1.26 (0.57,2.77) 1.27 (0.56,2.87) 1.50(0.61,3.71) 1.48 (0.59.3.73) (16.2%) Dominant 1.16(0.85, 1.57) 1.16(0.84. 1.60) 1.27 (0.88, 1.84) 1.24 (0.85. 1.81) h3132 Recessive 0.82 (0.29, 2.33) 0.79 (0.27, 2.36) 0.89 (0.25,3.13) 0.97 (0.27, 3.50) (9.6%) Dominant 0.91 (0.62, 1.32) 0.93 (0.61, 1.40) 1.06 (0.68. 1.65) 1.14 (0.70, 1.85) 03432 Recessive 3.36 (0.95,11.82) 4.03(1.11,14.60) 4.25 (1.09,16.61) 5.26 (1.32,20.94) (5.8%) Dominant 0.85 (0.50, 1.45) 0.87 (0.50, 1.50) 0.92 (0.48, 1.75) 0.96 (0.50, 1.85) R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 37 (continued). All Cases C om bined H ish S taee Cases Block HaDlotvDe (Frequency) E ffect C ru d e O R (95% C l) A djusted O R (95% C l) C ru d e O R (95% C l) A djusted O R (95% CD BLO CK #10 h !2 4 Recessive 1.07(0.81, 1.42) 1.07 (0.79, 1.46) 0.92 (0.65, 1.30) 0 .92(0.63,1.33) (61.5%) Dominant 1.01 (0.70, 1.47) 1.02 (0.69, 1.51) 1.05 (0.67, 1.66) 1.08 (0.67, 1.73) h322 Recessive 1.08 (0.65, 1.79) 1.06 (0.62, 1.80) 0.85 (0.44, 1.63) 0.81 (0.41, 1.60) (24.4%) Dominant 1.17(0.88, 1.56) 1.21 (0.89, 1.66) 1.37 (0.97, 1.94) 1.46 (0.98, 2.07) h342 Recessive 1.77 (0.58, 5.44) 1.48 (0.45,4.80) 2.32 (0.67, 7.94) 2.00 (0.56, 7.13) (11.8%) Dominant 0.72 (0.51, 1.00) 0.70(0.50, 1.00) 0.74(0.50, 1.13) 0.73 (0.48, 1.12) BLO C K #11 H13 Recessive 1.02 (0.76, 1.36) 0.96 (0.70, 1.31) 0.88(0.62, 1.25) 0.84 (0.57, 1.22) (59.0%) Dominant 1.14(0.80, 1.63) 1.12(0.77, 1.63) 1.15(0.75, 1.77) 1.13 (0.72, 1.78) h l l Recessive 0.97 (0.51, 1.83) 0.99(0.51, 1.94) 1.11 (0.53,2.31) 1.12 (0.51,2.44) (20.7%) Dominant 1.08 (0.79, 1.48) 1.19(0.84, 1.69) 1.18(0.81, 1.73) 1.28 (0.85, 1.93) h31 Recessive 1.05(0.49,2.26) 0.97(0.44, 2.15) 1.10(0.44,2.78) 1.10(0.43,2.80) (14.9%) Dominant 0.88 (0.62, 1.25) 0.86 (0.60, 1.24) 0.92 (0.60, 1.41) 0.90 (0.59, 1.39) BLO CK #12 h31231 Recessive 0.96(0.72, 1.28) 0.90 (0.65, 1.22) 0.84 (0.59, 1.19) 0.79 (0.54, 1.15) (55.2%) Dominant 0.91 (0.66, 1.25) 0.90 (0.64, 1.26) 0.89 (0.61, 1.30) 0.90 (0.60, 1.35) h33413 Recessive 0.88(0.47, 1.67) 0.78 (0.40, 1.53) 0.87 (0.40, 1.90) 0.74 (0.32, 1.68) (19.3%) Dominant 0.99 (0.74,1.33) 0.98 (0.72, 1.35) 1.09 (0.76, 1.55) 1.07 (0.74, 1.56) h ll2 1 3 Recessive 1.15 (0.57,2.33) 1.09 (0.52,2.25) 1.10(0.46,2.61) 1.03 (0.43,2.50) (17.4%) Dominant 1.06 (0.79, 1.44) 1.11 (0.81, 1.52) 1.08 (0.75, 1.56) 1.13 (0.78, 1.64) U> O n R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 37 (continued). All Cases Com bined High Stage Cases Block HaolotvDe (Frequency) Effect C rude O R (95% C l) A djusted O R (95% CD C rude O R (95% CD Adjusted O R (95% CD BLOCK #13 h22 (54.9%) Recessive Dominant 0.80 (0.59, 1.07) 0.94 (0.68, 1.31) 0.79 (0.58, 1.07) 0.94 (0.67, 1.32) 0.78 (0.55, 1.13) 0.98 (0.66, 1.46) 0.79 (0.55, 1.14) 1.00 (0.67, 1.51) h34 (40.2%) Recessive Dominant 1.13 (0.79, 1.62) 1.22 (0.91, 1.63) 1.13 (0.78, 1.64) 1.20 (0.89, 1.62) 1.08 (0.70, 1.67) 1.25 (0.88, 1.77) 1.06 (0.68, 1.65) 1.21 (0.85, 1.73) a Recessive effect fit using Ind(<5/,(//) = 2 function. b Dominant effect fit using the sum o f In d (^(//) = 1 and Ind(<5/,(//) = 2 functions. 237 238 Lastly, adjusted risk estimates were constructed using the indicator function described above. In these block-by-block analyses, the reference group was defined as the group homozygous for the most common haplotype. The association between ESR1 and breast cancer as well as interaction terms were constructed for each common haplotype (>5%) in the 13 blocks of ESR1, for all cases combined and high stage cases only - adjusted for age and ethnicity. A significant associations was observed in Block #1 for the h2332 haplotype. Among all cases combined the addition of each additional haplotype copy was associated with a risk elevated by a factor of 1.37 above baseline (95% Cl: 1.01, 1.84). For high stage cases this factor was 1.41 (95% Cl: 1.01, 1.99). Further, in Block #9, an elevated risk was noted h3432 homozygote both for all cases combined (OR: 1.96; 95% Cl: 1.07, 3.58) and high stage cases only (OR: 2.24; 95% Cl: 1.16, 4.31). Finally, additional positive associations observed included the h4242223 haplotype in Block #8, the h i432 / h3132 haplotype interaction term, and the h i24 / h322 interaction in Block #10 - all seen among high stage cases only. Outside of these observations, however, there were noevidence of significant risk modification between any haplotype combination and either all breast cancer cases combined or high stage breast (Table 38). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 239 Table 38. Association between ESRJ haplotypes using an indicator function of E{I(delta_h(H)=i)} and breast cancer. Risk estimates and interaction terms are constructed for each common haplotype (>5%) in the 13 blocks of ESR1, for all cases combined and high stage cases only - adjusted for age and ethnicitya. Adjusted Risk Estimate (Aee and Ethnicitv BLOCK Haplotype All Cases: High Stage: Minimum OR (95% Cl) OR (95% Cl) Block R h2 BLOCK #1 h2412 homozygote Reference Reference h2332 haplotypeb 1.37 (1.01,1.84) 1.41 (1.01,1.99) h2432 haplotype 1.00(0.69, 1.45) 0.78 (0.48, 1.26) h4412 haplotype 1.38(0.93,2.07) 1.17(0.72, 1.90) h2412 / h2332 0.62 (0.37, 1.02) 0.62 (0.34, 1.11) h2412 / h2432 0.93 (0.52, 1.66) 1.27 (0.62,2.60) h2412 / h4412 0.76(0.43, 1.35) 0.60 (0.29,1.26) h2332 / h2432 0.79 (0.35, 1.78) 0.98 (0.37, 2.57) h2332 / h4412 0.43 (0.18, 1.02) 0.51 (0.19, 1.40) h2432/h4412 0.81 (0.40, 1.62) 0.99 (0.41,2.37) Effective Block R/,2 BLOCK #2 Effective Block Rh2 0.2513 0.0024 BLOCK #3 hl4 homozygote Reference Reference h32 haplotype 0.99 (0.75, 1.31) 0.97 (0.68, 1.38) h34 haplotype 1.13(0.77, 1.66) 1.35 (0.88 (2.07) hi 4 / h32 0.99(0.95, 1.04) 1.00(0.95, 1.06) h l4 / h34 0.73 (0.48, 1.13) 0.82 (0.50, 1.34) h32 / h34 0.97(0.53, 1.76) 0.84 (0.41, 1.72) Effective Block Rh 0.2153 BLOCK #4 h31 homozygote Reference Reference h33 haplotype 0.91 (0.74, 1.13) 0.84(0.65, 1.09) h31 / h33 1.05 (0.69, 1.58) 0.93 (0.56, 1.55) Effective Block Rh 0.0531 BLOCK #5 h24 homozygote Reference Reference h42 haplotype 0.77(0.53, 1.11) 0.76(0.48, 1.20) h44 haplotype 0.95 (0.57, 1.57) 0.74(0.39, 1.42) h24 / h42 1.02(0.83, 1.25) 1.02(0.79, 1.30) h24 / h44 0.93 (0.53, 1.63) 0.82 (0.41, 1.65) h42 / h44 2.15(0.60, 6.63) 1.70(0.41,7.11) Effective Block Rh 0.0537 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 240 Table 38 (continued). Adjusted Risk Estim ate (Aue and Ethnicitv BLOCK Haplotype All Cases: High Stage: Minimum O R (95% C l) OR (95% C l) Block R h2 BLOCK #6 No SNPs BLOCK #7 Single SNP Effective Block R/, 0.0325 BLOCK #8 h2224213 homozygote Reference Reference h2424223 haplotype 1.18(0.84, 1.66) 1.31 (0.87, 1.98) h4242321 haplotype 1.14(0.73, 1.78) 1.21 (0.71,2.05) h2244223 haplotype 1.06(0.59, 1.92) 1.38(0.70, 2.73) h4242223 haplotype 2.15(0.99, 4.68) 2.66 (1.18, 5.99) h2224213/h2424223 0.96(0.58, 1.59) 0.83(0.44, 1.54) h2224213 / h4242321 1.09(0.58,2.03) 1.31 (0.64, 2.71) h2224213/h2244223 1.32 (0.57,3.10) 1.58(0.61,4.09) h2224213 / h4242223 0.38(0.13, 1.07) 0.43 (0.14, 1.35) h2424223 / h4242321 0.94 (0.37, 2.40) 1.31 (0.46,3.71) h2424223 / h2244223 0.51 (0.21, 1.28) 0.57 (0.20, 1.64) h2424223 / h4242223 0.59 (0.18, 1.93) 0.60 (0.17, 2.19) h4242321 /h2244223 0.41 (0.09, 1.90) 0.31 (0.04,2.25) h4242321 / h4242223 0.24 (0.06, 1.01) 0.24(0.05, 1.28) h2244223 / h4242223 Effective Block Rh 0.21 (0.04, 1.16) 0..09 (0.01, 1.12) 0.2983 BLOCK #9 hl432 homozygote Reference Reference h3431 homozygote 1.01 (0.71, 1.41) 0.94(0.61, 1.46) h3411 homozygote 1.23 (0.82, 1.86) 1.40(0.87, 2.25) h3132 homozygote 1.07(0.63, 1.84) 1.31 (0.70,2.45) h3432 homozygote 1.96(1.07,3.58) 2.24 (1.16, 4.31) hl432 / h3431 1.53 (0.91,2.57) 1.51 (0.79,2.23) hl432 / h3411 1.18(0.67,2.09) 1.15 (0.59, 2.23) hl432 / h3132 2.11 (0.89,5.00) 2.77 (1.07, 7.15) h l4 3 2 /h3432 0.30 (0.10, 0.89) 0.29 (0.08, 1.10) h3431 / h3411 1.19(0.54, 2.62) 1.56(0.63,3.87) h3431 / h3132 0.72 (0.29, 1.80) 0.72, 0.23, 2.23) h3431 / h 3432 0.59 (0.09, 3.97) 0.92 (0.09, 8.89) h3411 / h3132 0.72(0.26, 1.99) 0.52 (0.15, 1.80) h3411 / h3432 0.38(0.09, 1.64) 0.31 (0.06, 1.72) h3132/ h3432 Effective Block Rh 0.27 (0.03, 2.66) 0.26 (0.01,5.11) 0.8625 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 241 Table 38 (continued). Adjusted Risk Estim ate (Aee and Ethnicity BLOCK Haplotype All Cases: O R (95% C l) High Stage: OR (95% C l) Minimum Block R h2 BLOCK #10 h i24 homozygote h322 homozygote h342 homozygote hl24 / h322 h 124 / h342 h322 / h342 Effective Block Rh 2 Reference 1.00(0.75, 1.32) 1.09(0.62, 1.90) 1.20(0.81, 1.78) 0.60(0.30, 1.20) 0.82(0.35, 1.93) Reference 0.94(0.66, 1.34) 1.29 (0.69, 2.40) 1.64 (1.03,2.64) 0.53 (0.24, 1.17) 0.92 (0.35, 2.45) 0.0393 BLOCK #11 h i3 homozygote hi 1 homozygote h31 homozygote h i3 / hi 1 hl3 / h31 hi 1 / h31 Effective Block Rh 2 Reference 1.03 (0.73, 1.46) 1.03 (0.72, 1.47) 1.28 (0.81,2.01) 0.98 (0.53, 1.83) 0.69(0.29, 1.62) Reference 1.11 (0..74, 1.68) 1.06 (0.69, 1.63) 1.38(0.82,2.33) 1.31 (0.64,2.71) 0.57(0.19, 1.69) 0.0456 BLOCK #12 h31231 homozygote h3 3 413 homozygote hi 1213 homozygote h31231 / h33413 h31231 / hi 1213 h33413 / hi 1213 Effective Block Rh 2 Reference 0.97(0.70, 1.36) 1.02 (0.71, 1.45) 1.01 (0.63, 1.63) 1.17(0.71, 1.93) 1.24 (0.60, 2.53) Reference 0.95 (0.64, 1.43) 1.04 (0.67, 1.61) 1.28(0.73,2.26) 1.22 (0.67, 2.23) 1.47(0.63,3.41) 0.3209 BLOCK #13 h22 homozygote h34 homozygote h22 / h34 Effective Block Rh 2 Reference 1.11 (0.91, 1.36) 1.13 (0.83, 1.53) Reference 1.09 (0.85, 1.38) 1.22 (0.84, 1.75) 0.0654 a Haplotype effects fit using raw probabilitic haplotype estimates (no indicator function). Interaction terms are the product: E{I(delta_hl)=l }*E{I(delta_h2)=l} + E{I(delta_hl)=l}*E{I(delta_h2)=l} which is equivalent to the indicator = 1 function. All are fit in SAS MODEL statement adjusting for age, race, and uncommon haplotypes (<5%): uncommon=h_{n+l} + h_{n+2} + ... + h_last wherehaplotypes h_{n+l} ... h last are all the uncommons b Risk estimate coefficients for the addition of each additional haplotype. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 242 7.4.3. ESR2 7.4.3a. SNP by SNP Genotyping Results and Association with Breast Cancer A previous effort had formally identified 2 ‘blocks’ in ESR2 in a multiethnic panel of unrelated individuals (Chapter 6). This current pilot case/control investigation considered a total of 15 markers contained within these blocks. Table 39 compares the predicted haplotype frequency for each block obtained from the multiethnic panel to that observed when cases and t y controls are combined in this pilot sample. The effective Rh attained within the first haplotype block was 0.1882 and was 0.1958 for the second block. The prevalence of the 15 ESR1 variants assayed varied widely by ethnicity across each of the two blocks (Table 40). With the exception of a single variant in Block #2 (rs 1256062) which appeared to be protective against high stage breast cancer in the homozygous state, the remainder of the other variants were not over-represented among high stage breast cancer cases as compared to controls (Table 41). 7.4.3b. ‘Phased’ Haplotype Results and Association with Breast Cancer Having observed four common haplotypes (greater than 5% population frequency for all ethnicities combined) in each block of ESR2, we then constructed raw, unadjusted risk estimates for each haplotype using ‘phased’ data. These raw counts, percentages, and the odds ratio for each haplotype count (i.e. 0, 1 or 2 copies of the haplotype) associated with high stage breast Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 243 cancer are presented in Table 42. There was no evidence that any given haplotype was differentially represented among high stage breast cancer cases as compared to controls. Table 39. Predicted and observed haplotype frequencies in each block of ESR2 for all ethnicities combined (markers retained in bold) HaDlotvDe Freauencies (% ) bv Ethnicitv Block (M arkers) Size Predicted Assaved Observed Haplotype (kb) Freauencv HaDlotvDe Freauencv Block I (Markers 1-19) 63.1 kb 3413124231211214114 35.7 4132432241 31.8 1331342413433212134 26.1 3314214221 39.1 3331342413433422332 11.3 3314214423 8.4 3331342413433212134 10.7 - - 3333142413233212134 (3.4) 3334212221 5.4 3333142413213212134 (2.8) - - 3313142413233212134 (2.1) - - Effective R/f 0.1882 Block 2 (Markers 20-32) 12.5 kb 4131333221331 35.2 43322 33.4 4111111213123 13.1 41121 12.9 2333113213123 12.0 23121 18.7 4131111213123 10.7 43121 25.7 2133113413123 7.3 - - 4131113213123 6.4 - - 4133113213123 5.9 - - 4333113213123 (1.1) - - 2133113213123 (3.9) - - Eiffective Rh 2 0.1958 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 40. Ethnic-specific distribution of 15 variants in the ESR2 gene among breast cancer cases and controls (n=854). Ethnicitv Ail Ethnicities A frican-A m erican L atina JaDanese Caucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) BLO CK SNPRs* Geno -type Cases # ( % ) Controls # ( % ) Cases # ( % ) Controls # ( % ) Cases # ( % ) C ontrols # ( % ) Cases # ( % ) C ontrols # ( % ) Cases # ( % ) Controls # ( % ) BLOCK #1 rs1271572 GG 1 6 1 1 7 1 7 0 8 0 3 1 2 6 3 3 3 0 2 7 3 5 ( 4 4 . 2 ) ( 3 6 . 9 ) ( 7 0 . 0 ) ( 7 9 . 2 ) ( 3 4 . 1 ) ( 3 0 . 2 ) ( 3 7 . 5 ) ( 3 5 . 3 ) ( 3 1 . 8 ) ( 3 7 . 6 ) (-839) GT 1 4 0 1 3 9 2 6 1 9 3 5 4 1 3 5 4 1 4 4 3 8 ( 3 8 . 5 ) ( 3 8 . 1 ) ( 2 6 . 0 ) ( 1 8 . 8 ) ( 3 8 . 5 ) ( 4 7 . 7 ) ( 3 9 . 8 ) ( 4 8 . 2 ) ( 5 1 . 8 ) ( 4 0 . 9 ) TT 6 3 5 5 4 2 2 5 1 9 2 0 1 4 1 4 2 0 ( 1 7 . 3 ) ( 1 5 . 1 ) ( 4 . 0 ) ( 2 . 0 ) ( 2 7 . 5 ) ( 2 2 . 1 ) ( 2 2 . 7 ) ( 1 6 - 5 ) ( 1 6 . 5 ) ( 2 1 . 5 ) rsl256030 GG 1 2 8 1 5 1 4 5 6 1 2 4 2 7 3 2 3 0 2 7 3 3 ( 3 2 . 7 ) ( 3 9 . 0 ) ( 3 9 . 8 ) ( 5 4 . 9 ) ( 2 5 . 0 ) ( 2 8 . 4 ) ( 3 5 . 2 ) ( 3 4 . 1 ) ( 2 9 . 3 ) ( 3 5 . 5 ) (13908) AG 1 8 8 1 6 9 5 4 4 1 4 7 4 6 3 8 3 9 4 9 4 3 ( 4 8 . 0 ) ( 4 3 . 7 ) ( 4 7 . 8 ) ( 3 6 . 9 ) ( 4 5 . 0 ) ( 4 8 . 4 ) ( 4 1 . 8 ) ( 4 4 . 3 ) ( 5 3 . 3 ) ( 4 6 . 2 ) AA 7 6 6 7 1 4 9 2 5 2 2 2 1 1 9 1 6 1 7 ( 1 9 . 4 ) ( 1 7 . 3 ) ( 1 2 . 4 ) ( 8 . 1 ) ( 2 6 . 0 ) ( 2 3 . 2 ) ( 2 3 . 1 ) ( 2 1 - 6 ) ( 1 7 . 4 ) ( 1 8 - 3 ) rs1256031 AA 9 7 1 2 1 3 5 4 8 1 6 2 1 2 2 2 1 2 4 3 1 ( 2 4 . 3 ) ( 3 1 . 1 ) ( 3 0 . 7 ) ( 4 2 . 9 ) ( 1 6 . 7 ) ( 2 1 . 9 ) ( 2 3 . 9 ) ( 2 4 . 4 ) ( 2 4 . 7 ) ( 3 2 . 6 ) (14899) AG 2 1 0 1 7 3 6 2 4 7 4 8 4 1 4 3 4 2 5 7 4 3 ( 5 2 . 6 ) ( 4 4 . 5 ) ( 5 4 . 4 ) ( 4 2 . 0 ) ( 5 0 . 0 ) ( 4 2 . 7 ) ( 4 6 . 7 ) ( 4 8 . 8 ) ( 5 8 . 8 ) ( 4 5 . 3 ) GG 9 2 9 5 1 7 1 7 3 2 3 4 2 7 2 3 1 6 21 ( 2 3 . 1 ) ( 2 4 . 4 ) ( 1 4 . 9 ) ( 1 5 . 2 ) ( 3 3 . 3 ) ( 3 5 . 4 ) ( 2 9 . 3 ) ( 2 6 . 7 ) ( 1 6 . 5 ) ( 2 2 . 1 ) rs960069 T T 1 7 0 1 7 8 7 5 7 9 3 0 2 7 3 3 3 2 3 2 4 0 ( 4 2 . 7 ) ( 4 5 . 1 ) ( 6 4 . 7 ) ( 7 1 . 2 ) ( 3 1 . 6 ) ( 2 8 . 4 ) ( 3 6 . 3 ) ( 3 6 . 8 ) ( 3 3 . 3 ) ( 3 9 . 2 ) (16076) C T 1 6 5 1 5 7 3 6 2 7 4 1 4 8 3 7 3 8 5 1 4 4 ( 4 1 . 5 ) ( 3 9 . 7 ) ( 3 1 . 0 ) ( 2 4 . 3 ) ( 4 3 . 2 ) ( 5 0 . 5 ) ( 4 0 . 7 ) ( 4 3 . 7 ) ( 5 3 . 1 ) ( 4 3 . 1 ) C C 6 3 6 0 5 5 2 4 2 0 2 1 1 7 13 1 8 ( 1 5 . 8 ) ( 1 5 . 2 ) ( 4 . 3 ) ( 4 . 5 ) ( 2 5 . 3 ) ( 2 1 . 1 ) ( 2 3 . 1 ) ( 1 9 . 5 ) ( 1 3 . 5 ) ( 1 7 . 7 ) 244 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 40 (continued). Ethnicitv All Ethnicities African-A m erican Latina JaDanese C aucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) BLO C K SNP Rs* Geno -type Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) BLOCK #1 (con’t) rs!269056 CC 167 172 75 78 27 27 34 31 31 36 (41.5) (44.3) (64.7) (70.3) (29.3) (28.4) (36.2) (36.1) (31.0) (37.5) (17188) CT 166 158 35 28 41 47 39 40 51 43 (41.3) (40.7) (30.2) (25.2) (44.6) (49.5) (41.5) (46.5) (51.0) (44.8) TT 69 59 6 5 24 21 21 15 18 17 (17.2) (14.0) (5.2) (4.5) (26.1) (22.1) (22.3) (17.4) (18.0) (17.7) rs1256037 AA 153 167 67 74 28 25 30 30 28 38 (41.1) (44.8) (64.4) (70.5) (31.8) (28.1) (34.1) (35.7) (30.4) (40.0) (17777) AG 156 149 34 27 36 44 39 39 57 39 (41.9) (39.9) (32.7) (25.7) (40.9) (49.4) (44.3) (46.4) (51.1) (41.1) GG 63 57 3 4 24 20 19 15 17 18 (16.9) (15.3) (2.9) (3.8) (27.3) (22.5) (21.6) (17.9) (18.5) (18.9) rsl256040 TT 98 92 20 19 30 33 27 20 15 20 (25.0) (23.9) (17.4) (17.1) (32.6) (34.7) (29.7) (24.1) (16.0) (20.8) (22684) CT 202 171 61 46 47 41 40 40 54 44 (51.5) (44.4) (53.0) (41.4) (51.1) (43.2) (44.0) (48.2) (57.4) (45.8) CC 92 122 34 46 15 21 24 23 25 32 (24.5) (31.7) (29.6) (41.4) (16.3) (22.1) (26.4) (27.7) (26.6) (33.3) 245 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 40 (continued). Ethnicitv All Ethnicities A frican-A m erican L atina JaDanese Caucasian (S tratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) BLO CK SN PR s" Geno -type Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) Controls # (% ) Cases # (% ) C ontrols #.(% ) Cases # (% ) Controls # (% ) BLOCK #1 (con’t) rs 1256049 CC 292 293 88 88 77 85 43 40 84 80 (82,9) (82.8) (83.8) (78.1) (90.6) (92.4) (58.9) (50.0) (94.4) (98.8) (37027) CT 51 48 15 11 8 7 23 29 5 1 (14.5) (13.6) (14.3) (10.9) (9.4) (7.6) (31.5) (36.3) (5.6) (1.2) Exon 6 TT 9 13 2 2 0 0 7 11 0 0 (2.6) (3.7) (1.9) (2.0) (9.6) (13.7) rs!256056 CC 170 169 74 79 30 26 33 27 33 37 (42.8) (43.7) (64.3) (70.5) (30.6) (27.4) (38.8) (32.1) (33.3) (38.5) (48452) CT 164 158 36 27 43 48 33 41 52 42 (41.3) (40.8) (31.3) (24.1) (43.9) (50.5) (38.8) (48.8) (52.5) (43.8) TT 63 60 5 6 25 21 19 16 14 17 (15.9) (15.5) (4.3) (5.4) (25.5) (22.1) (22.4) (19.1) (14.1) (17.7) rs953592 AA 336 339 107 109 85 88 51 48 93 94 (84.8) (87.1) (95.5) (97.3) (91.4) (83.6) (54.8) (55.8) (94.9) (96.9) (50329) AG 51 42 5 3 8 6 33 30 5 3 (12.9) (10.8) (4.5) (2.7) (8.6) (6.4) (35.5) (34.9) (5.1) (3-1) GG 9 8 0 0 0 0 9 8 0 0 (2.3) (2-1) (9.7) (9.3) K > C T \ R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 40 (continued). Ethnicitv All Ethnicities A frican-A m erican L atina JaDanese C aucasian (Stratum n) (n=428) (n=426) (n=117) (n=117) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B LO CK SNP Rs* Geno -type Cases # (% ) Controls # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) Controls # (% ) Cases # (% ) Controls # (% ) BLOCK #2 rsl256062 TT 217 228 44 50 67 71 37 38 69 69 (56.1) (59.8) (39.6) (45.9) (70.5) (75.5) (43.5) (45.8) (71.9) (72.6) (57760) CT 143 114 56 45 25 19 37 31 25 19 (36.9) (29.9) (50.5) (41.3) (26.3) (20.2) (43.5) (37.3) (26.0) (20.0) CC 27 39 11 14 3 4 11 14 2 7 (7.0) (10.2) (9-9) (12.8) (3.2) (4.3) (12.9) (16.9) (2.1) (7.4) rs867443 GG 310 278 94 88 81 69 78 72 57 49 (76.9) (71.3) (83.9) (79.3) (81.8) (72.6) (82.1) (80.9) (58.8) (51.6) (60036) AG 86 101 17 22 18 25 17 14 34 40 (21.3) (25.9) (15.2) (19.8) (18.2) (26.3) (17.9) (15.7) (35.0) (42.1) AA 7 11 1 1 0 1 0 3 6 6 (1.7) (2.8) (0.9) (0.9) (1.1) (3.4) (6.2) (6.3) rs944047 AA 165 174 69 79 29 29 34 30 33 36 (42.1) (44.9) (63.3) (71.8) (29.9) (30.4) (37.8) (34.1) (34.4) (37.9) (60786) AG 166 153 37 25 44 45 38 43 47 40 (42.3) (39.7) (33.9) (22.7) (45.4) (48.9) (42.2) (48.9) (49.0) (42.1) GG 61 59 3 6 24 19 18 15 16 19 (15.6) (15.3) (2.8) (5.5) (24.7) (20.7) (20.0) (17.0) (16.7) (20.0) 247 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 40 (continued). Ethnicitv All Ethnicities African-A m erican Latina JaDanese Caucasian (S tratum n) (n=428) (n=426) (n=117) (n= l 17) (n=101) (n=99) (n=100) (n=100) (n=110) (n=110) B LO CK SNP Rs’ Geno -type Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) C ontrols # (% ) Cases # (% ) Controls # (% ) Cases # (% ) C ontrols # (% ) BLOCK #2 (cont.) rs944459 CC 359 357 112 111 93 93 60 55 94 98 (91.3) (91.3) (100) (100) (96.9) (98.9) (66.7) (62.5) (98.9) (100) (61720) CT 31 26 0 0 3 1 27 25 1 0 (7.9) (6.7) (3.1) (1.1) (30.0) (28.4) (1.1) TT 3 8 0 0 0 0 3 8 0 0 (0.8) (2.0) (3.3) (9.1) rsl256065 AA 164 171 70 77 31 29 34 33 29 32 (43.6) (45.8) (63.6) (74.0) (34.8) (32.2) (39.1) (37.5) (32.2) (35.2) (62146) AC 150 149 36 23 34 42 32 41 48 43 (39.9) (40.0) (32.7) (22.1) (38.2) (36.7) (36.8) (46.6) (53.3) (47.2) CC 62 53 4 4 24 19 21 14 13 16 (16.5) (14.2) (3.6) (3.9) (27.0) (21.1) (24.1) (15.9) (14.4) (17.6) 248 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 41. SNP by SNP association between ESR2 variants and breast cancer risk by stage of disease. Odds ratio compares high stage to controls (95% Confidence Intervals in parentheses). SNP Rs” Genotype High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk: High Stage to C ontrol (95% C l) p-value (trend) BLOCK #1 rsl271572 GG 87 (46.8) 74 (41.6) 171 (46.9) Reference (-839) G T 70 (37.6) 70 (39.3) 139(38.1) 0.99(0.67, 1.46) TT 29(15.6) 34(19.1) 55(15.1) 1.04 (0.62, 1.74) 0.9764 rsl256030 GG 70 (34.5) 58 (30.7) 151 (39.0) Reference (13908) AG 97 (47.8) 91 (48.1) 169 (43.7) 1.24(0.85, 1.81) AA 36 (17.7) 40 (21.2) 67(17.3) 1.16(0.71, 1.90) 0.4592 rs!256031 AA 56 (27.1) 41 (21.3) 121 (31.1) Reference (14899) AG 107 (51.7) 103 (53.7) 173 (44.5) 1.34 (0.90, 1.99) GG 44(21.3) 48 (25.0) 95 (24.4) 1.00 (0.60, 1.66) 0.9341 rs960069 TT 92 (44.0) 78(41.3) 178(45.1) Reference (16076) CT 84 (40.2) 81 (42.9) 157 (39.7) 1.04(0.71, 1.52) CC 33 (15.8) 30(15.9) 60(15.2) 1.06(0.63, 1.79) 0.8352 rsl269056 CC 90 (43.5) 77 (39.5) 172 (44.3) Reference (17188) CT 82 (39.6) 84 (43.1) 158 (40.7) 0.99 (0.67, 1.46) TT 35 (16.9) 34(17.4) 58 (15.0) 1.15 (0.68, 1.93) 0.6932 rs1256037 AA 82 (43.2) 71 (39.0) 167 (44.8) Reference (17777) AG 78 (41.1) 78 (42.9) 149 (40.0) 1.07(0.72, 1.59) GG 30(15.8) 33 (18.1) 57(15.3) 1.07(0.62, 1.84) 0.7874 rsl256040 TT 55 (27.2) 43 (22.6) 122 (31.7) Reference (22684) CT 101 (50.0) 101 (53.2) 171 (44.4) 1.31 (0.86, 2.00) CC 46 (22.8) 46 (24.2) 92 (23.9) 1.11 (0.67, 1.83) 0.6408 to R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 41 (continued). SNP Rs* Genotype High Stage Cases # (% ) Low Stage Cases # (% ) C ontrols # (% ) Relative Risk: High Stage to C ontrol (95% C l) p-value (trend) BLOCK #1 (cont.) rsl256049 CC 153 (82.7) 139 (83.2) 293 (82.8) 1.08 (0.62, 1.84) (37027) CT 27 (14.6) 24 (14.4) 48(13.6) 0.74 (0.20, 2.25) 0.9100 Exon 6 TT 5 (2.7) 4 (2.4) 13 (3.7) rsl256056 CC 92 (44.4) 78(41.1) 169 (43.7) Reference (48452) CT 83 (40.1) 81 (42.6) 158 (40.8) 0.96 (0.66, 1.42) TT 32(15.5) 31 (16.3) 60(15.5) 0.98 (0.57, 1.65) 0.9420 rs953592 AA 179 (85.7) 157(84.0) 339(87.1) Reference (50329) AG 25 (12.0) 26(13.9) 42 (10.8) 1.13(0.64, 1.96) GG 5 (2.4) 4(2.1) 8(2.1) 1.18(0.30,4.17) 0.6824 BLOCK #2 rs!256062 TT 119(58.9) 98 (53.0) 228 (59.8) Reference (57760) CT 74 (36.6) 69 (37.3) 114 (29.9) 1.24 (0.85, 1.82) CC 9 (4.5) 18(9.7) 39 (10.2) 0.44 (0.18,0.97) 0.4254 rs867443 AA 155 (74.2) 155 (79.9) 278 (71.3) Reference (60036) AG 49 (23.4) 37(19.1) 101 (25.9) 0.87 (0.57,1.31) GG 5 (2.4) 2(1.0) 11 (2.8) 0.82 (0.22, 2.60) 0.5054 rs944047 CC 91 (44.6) 74 (39.4) 173 (44.9) Reference (60786) CT 84 (41.2) 82 (43.6) 153 (39.7) 1.04 (0.71, 1.53) TT 29(14.2) 32(17.0) 59(15.3) 0.93 (0.54, 1.60) 0.9475 250 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 41 (continued). SNP Rs’ G enotype High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk: High Stage to C ontrol (95% C l) p-value (trend) BLOCK #2 (cont.) rs944459 CC 182 (88.8) 177(94.1) 357(91.3) Reference (61720) CT 22(10.7) 9 (4.8) 26 (2.7) 1.66 (0.87,3.14) TT 1 (0.5) 2(1.1) 8 (2.0) 0.25 (0.01, 1.86) 0.8479 rsl256065 AA 88(45.1) 76 (42.0) 171 (45.8) Reference (62146) CA 75 (38.5) 75 (41.4) 149(40.0) 0.98 (0.66, 1.45) CC 32 (16.4) 30(16.6) 53 (14.2) 1.17(0.68,2.01) 0.6896 t o LA R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 42. Association between ESR2 ‘phased’ haplotype dose and breast cancer risk by stage of disease. Odds ratio compares high stage breast cancer to controls (95% Confidence Intervals in parentheses). Block H aplotype (Frequency) Dose High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative Risk - High Stage to C ontrol (95% C l) p-value (trend) B LO C K #1 h3314214221 0 82 (36.9) 83 (40.3) 162 (38.0) Reference (39.1%) 1 98 (44.1) 98 (47.6) 170 (39.9) 1.14(0.79,1.64) 2 42(18.9) 25(12.1) 94 (22.1) 0.88 (0.56, 1.38) 0.7820 h413243224 0 108(48.6) 92 (44.7) 209(49.1) Reference (31.8%) 1 86 (38.7) 87 (42.2) 161 (37.8) 1.03 (0.73, 1.47) 2 28 (12.6) 27(13.1) 56(13.1) 0.97 (0.58,1.61) 0.9694 h3314214423 0 196 (88.3) 179 (86.9) 376 (88.3) Reference (8.4%) 1 23 (10.4) 23 (11.2) 42 (9.9) 1.05 (0.61, 1.80) 2 3(1.4) 4(1.9) 8(1.9) 0.72(0.19,2.73) 0.9480 h3334212221 0 204 (91.9) 182 (88.3) 384(90.1) Reference (5.4%) 1 17(7.7) 24(11.7) 39 (9.2) 0.82 (0.45, 1.49) 2 1 (0.5) 0 3 (0.7) 0.63 (0.07, 5.95) 1.0 BLOCK #2 h43322 0 100 (45.0) 86 (41.7) 188 (44.1) Reference (33.4%) 1 95 (42.8) 91 (44.2) 183 (43.0) 0.98 (0.69, 1.38) 2 27(12.2) 29(14.1) 55(12.9) 0.92 (0.55, 1.55) 0.8162 h43121 0 110(49.5) 108 (52.4) 231 (54.2) Reference (25.7%) 1 97 (43.7) 89 (43.2) 167 (39.2) 1.22 (0.87, 1.71) 2 15 (6.8) 9 (4.4) 28 (6.6) 1.13 (0.58,2.19) 0.3771 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 42 (continued). Block H aplotype (Frequency) Dose High Stage Cases # (% ) Low Stage Cases # (% ) Controls # (% ) Relative R isk - High Stage to C ontrol (95% C l) p-value (trend) BLOCK #2 (con’t) h23121 0 155 (69.8) 127 (61.7) 294 (69.0) Reference (18.7%) 1 62 (27.9) 68 (33.0) 111 (26.1) 1.06 (0.73, 1.53) 2 5 (2.3) 11(5.3) 21 (4.9) 0.45 (0.17, 1.19) 0.4938 h41121 0 171 (77.0) 170 (82.5) 317(74.4) Reference (12.9%) 1 47 (21.2) 35 (17.0) 101 (23.7) 0.86 (0.58, 1.28) 2 4(1.8) 1 (0.5) 8(1.9) 0.93 (0.28,3.12) 0.5563 N ) L /l U > 254 Risk estimates, adjusted for age at onset and ethnicity, compared all common haplotype combinations to a reference group of individuals carrying two copies of the most common haplotypes in each ESR2 block. Significant associations were not observed between any haplotype combination and either all breast cancer cases combined or high stage breast (Table 43). 7.4.3c. Haplotype Results with an ‘ Indicator ’ Variable and Association with Breast Cancer Alternative analyses examined haplotype ‘dosage’ risks associated with 0, 1 and 2 copies of all common haplotypes (>5%); these are presented in Table 44. Crude and adjusted estimates were calculated both for all cases combined and high stage cases only. A modest protective effect was observed among individuals carrying two copies of the most common (h3314214221) haplotype in Block #1. This effect was diminished, however, when adjusted for age and ethnicity and, further, was not observed among high stage cases. A protective effect of similar magnitude and level of significance was also noted in Block #2 in a dominant model when the least frequently observed haplotype (h41121) was considered. Again, this effect was no longer statistically significant when adjusted for age and ethnicity and confined to all cases combined. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 255 Table 43. Risk estimates using ‘phased’ data for each common haplotype (>5%) in the two blocks of ESR2, for all cases combined and high stage cases only - adjusted for age and ethnicity. Adjusted Risk Estimate (Age and Ethnicity) BLOCK All Cases: High Stage: Minimum Haplotype OR (95% C l) OR (95% C l) Block R h2 BLOCK #1 h3314214221 homozygote Reference Reference h4132432241 homozygote 0.94(0.60, 1.46) 0.92 (0.54, 1.56) h3314214423 homozygote 0.83 (0.29,2.39) 0.72 (0.18,2.79) h3334212221 homozygote 0.34 (0.03, 3.55) 0.63 (0.06, 6.38) h3314214221 / h4132432241 1.14(0.80, 1.62) 1.06(0.69, 1.62) h3314214221/h3314214423 1.19(0.57, 2.44) 1.06 (0.44, 2.52) h3314214221/h3334212221 1.74 (0.77,3.93) 1.65(0.64, 4.27) h4132432241/h3314214423 0.94(0.46, 1.93) 1.25(0.57, 2.72) h4132432241 /h3334212221 0.47(0.21, 1.05) 0.35 (0.11, 1.07) h3314214423 / h3334212221 1.44 (0.41,5.09) 0.84 (0.15,4.71) Effective Rh 2 0.1881 BLOCK #2 h43322 homozygote Reference Reference h43121 homozygote 0.90(0.48, 1.68) 0.94(0.36, 1.93) h23121 homozygote 0.75 (0.36, 1.55) 0.43 (0.15, 1.20) h 4 1121 homozygote 0.55 (0.17, 1.76) 0.80(0.23,2.81) h43322 / h43121 0.96(0.64, 1.44) 0.83 (0.51, 1.34) h43322 / h23121 1.13(0.66, 1.94) 0.84 (0.43, 1.65) h43322 /h 4 1 121 0.63 (0.37, 1.08) 0.67 (0.35, 1.27) h43121 / h23121 1.28(0.78, 2.11) 1.21 (0.68, 2.14) h43121 / h4 1121 0.91 (0.50, 1.65) 0.84(0.41, 1.70) h23121 / h41121 0.94(0.47, 1.91) 0.93(0.40,2.14) Effective Rh7 0.1958 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n Table 44. Association between ESR2 haplotype dose using an indicator function of E{I(delta_h(H)=i)} and breast cancer. Crude and adjusted (age and ethnicity) odds ratios are presented for all cases combined and high stage breast cancer only (95% Confidence Intervals in parentheses). All Cases Com bined__________________________ High Stage Cases Block Effect C rude O R A djusted O R C rude O R A djusted OR H aplotype (Frequency) BLOCK #1 (95% C l) (95% C l) (95% C l) (95% C l) h3314214221 Recessive3 0.62 (0.43,0.91) 0.68 (0.47, 1.00) 0.79(0.51, 1.22) 0.83 (0.53, 1.29) (39.1%) Dominant1 * 1.00 (0.75, 1.33) 1.07 (0.80, 1.43) 1.09(0.77, 1.54) 1.15(0.81, 1.63) h4132432241 Recessive 0.99 (0.66, 1.49) 0.92 (0.60, 1.41) 0.95 (0.58, 1.57) 0.92 (0.55, 1.53) (31.8%) Dominant 1.13 (0.86, 1.49) 1.08 (0.81, 1.44) 1.05 (0.75, 1.46) 1.00 (0.71, 1.41) h3314214423 Recessive 0.88 (0.31,2.51) 0.82 (0.28,2.42) 0.74(0.19, 2.86) 0.75 (0.19,2.95) (8.4%) Dominant 1.06(0.69, 1.62) 1.02 (0.66, 1.58) 1.01 (0.60, 1.69) 0.97(0.57, 1.64) h3334212221 Recessive 0.36 (0.04, 3.27) 0.34 (0.03, 3.50) 0.62 (0.06, 6.11) 0.62 (0.06, 6.36) (5.4%) Dominant 0.93 (0.58, 1.48) 0.82 (0.51, 1.34) 0.78 (0.43, 1.41) 0.73 (0.40, 1.34) BLOCK #2 h43322 Recessive 1.01 (0.67, 1.52) 0.95 (0.62,1.44) 0.95(0.58, 1.56) 0.91 (0.55, 1.52) (33.4%) Dominant 1.07 (0.81, 1.42) 1.03 (0.77,1.39) 1.00(0.71, 1.40) 0.97 (0.68, 1.38) h43121 Recessive 0.81 (0.46, 1.42) 0.93 (0.52, 1.67) 1.00 (0.52, 1.92) 1.07 (0.55,2.09) (25.7%) Dominant 1.19(0.89, 1.60) 1.27(0.94, 1.73) 1.23 (0.86, 1.74) 1.27 (0.89, 1.83) h23121 Recessive 0.68 (0.34, 1.37) 0.72 (0.35, 1.49) 0.44 (0.16, 1.20) 0.48(0.17, 1.30) (18.7%) Dominant 1.13 (0.83, 1.54) 1.09 (0.79, 1.52) 0.97 (0.67, 1.41) 0.93 (0.63, 1.38) h41121 Recessive 0.55 (0.17, 1.74) 0.52 (0.16, 1.71) 0.90 (0.27,3.06) 0.87 (0.25, 3.00) (12.9%) Dominant 0.72 (0.51, 1.00) 0.72 (0.51, 1.02) 0.85 (0.57, 1.26) 0.85 (0.57, 1.29) ON 257 Lastly, adjusted risk estimates were constructed using the indicator function described above. In these block-by-block analyses, the reference group was defined as the group homozygous for the most common haplotype. The association between ESR2 and breast cancer as well as interaction terms were constructed for each common haplotype (>5%) in the 2 blocks of ESR1, for all cases combined and high stage cases only - adjusted for age and ethnicity (Table 45). 7.5. Discussion These analyses considered two separate chromosomal regions, encompassing the ESR1 and ESR2 loci on chromosome 6q25 and the long arm of chromosome 14 respectively. Whereas an earlier effort considered the pattern and extent of linkage disequilibrium across these loci (Chapter 6), the focus here instead was to evaluate various methods of haplotype risk estimation in a pilot case/control effort. We took a step-wise approach in evaluating beginning with a traditional SNP by SNP approach which revealed several individual variants in each of the two blocks that appeared to be over-represented among cases as compared to controls. These analyses then extended to a haplotype-based association study using two distinct approaches - one employing ‘phased’ data and the other Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 258 Table 45. Association between ESR2 haplotypes using an indicator function of E{I(delta_h(H)=i)} and breast cancer. Risk estimates and interaction terms are constructed for each common haplotype (>5%) in the 2 blocks of ESR2, for all cases combined and high stage cases only - adjusted for age and ethnicitya. M inim um Block _____________ Adjusted Risk Estim ate (Age and Ethnicity) BLOCK All Cases: High Stage: M inimum H aplotype O R (95% C l) O R (95% C l) Block R h2 BLOCK #1 h3314214221 homozygote Reference Reference h4132432241 haplotypeb 1.15 (0.90, 1.47) 1.03 (0.77, 1.38) h3314214423 haplotype 1.03 (0.61, 1.73) 0.77 (0.38, 1.58) h3334212221 haplotype 0.70(0.28, 1.77) 0.82 (0.30, 2.24) h3314214221 /h 4 132432241 1.40 (0.95,2.07) 1.27 (0.80, 2.01) h3314214221/h3314214423 1.59(0.63,3.96) 1.66 (0.53,5.23) h3314214221/h3334212221 3.10(0.91, 10.65) 2.22 (0.57, 8.75) h4132432241 / h3314214423 1.05 (0.44, 2.53) 1.82(0.64,5.16) h4132432241 / h3334212221 0.76 (0.23, 2.55) 0.46(0.10, 2.07) h3314214423 / h3334212221 Effective Rh 2 2.47(0.39, 15.57) 1.30(0.12, 14.12) 0.1881 BLOCK #2 h43322 homozygote Reference Reference h43121 haplotype 1.03 (0.75, 1.44) 1.10(0.75, 1.60) h23121 haplotype 0.82(0.56, 1.20) 0.63 (0.37, 1.06) h41121 haplotype 0.72(0.41, 1.23) 0.91 (0.51. 1.63) h43322 / h43121 1.01 (0.62, 1.65) 0.77(0.43, 1.41) h43322 / h23121 1.25(0.66, 2.33) 1.23(0.54, 2.81) h43322 / h41121 0.90(0.43, 1.88) 0.81 (0.36, 1.85) h43121 / h23121 1.61 (0.87,3.00) 2.04 (0.95,4.37) h43121 / h41121 1.15(0.51,2.58) 0.82 (0.33, 2.06) h23121 / h41121 Effective Rh 2 1.58(0.59,4.28) 1.64(0.50,5.33) 0.1958 a Haplotype effects fit using raw probabilitic haplotype estimates (no indicator function). Interaction terms are the product: E {I(delta_h 1 )= 1} *E {I(delta_h2)= 1} + E {I(delta_h 1 )= 1} *E {I(delta_h2)= 1} which is equivalent to the indicator = 1 function. All are fit in SAS MODEL statement adjusting for age, race, and uncommon haplotypes (<5%): uncommon=h_{n+l} + h_{n+2} + ... + h la s t wherehaplotypes h_{n+l} ... h last are all the uncommons b Risk estimate coefficients for the addition of each additional haplotype. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 259 probabilistic haplotype determinations. The two gave rather consistent results when compared directly to one another. Lastly, models of dominant and recessive effects were constructed to suggest whether only a single copy versus two of a given haplotype was sufficient to elevate a carrier’s risk of breast cancer. When considering the association between individual SNPs in ESR1 and the risk of breast cancer, several appeared to share an association with disease. However, two which notably did not were two RFLPs located in Intron 1, pPvuII and pXBal. These two variants have been studied previously as risk predictors for breast cancer but results have not been consistently replicated (Chapter 5, Table 15). On the other hand, we did note that two variants in the ESR1 promoter region were more often seen in the homozygous state among high stage cases as compared to controls. This relationship was also revealed when considering a haplotype by haplotype analysis. Interestingly, these two variants appeared on different haplotypes, each of which exhibited a recessive effect on disease. One might hypothesize from these results that any perturbation in the promoter region of ESR1 modifies the receptor expression in breast tissue, thus influencing the risk of breast cancer. To date, no studies have considered the role of ESR2 variants in breast cancer susceptibility. Analyses presented here, including both SNP and haplotype-specific approaches, provide no compelling evidence that common variation at this modifies the risk of breast cancer. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 260 The primary limitation of this effort is that the htSNPs assayed in each haplotype block were typically not sufficient to to faithfully recapitulate the haplotypes of a region of LD. We measured the efficacy of a given subset of markers in capturing the full haplotype information through the use of the coefficient of determination (Rh ) which defines the proportion of diversity explained by the set of htSNPs. Only a single block in ESR1 attained our stated minimum threshold (Chapter 6) of 0.7. No blocks in ESR2 exhibited Rh values above this value. A rich geography of phenotypes has already been studied in relation to ER variants generally - these are detailed in Chapter 5 above. A specific htSNP map as well as a larger haplotype map of the human genome currently under construction will allow effective evaluation of the common disease/common variant (CDCV) hypothesis. Although investigations such as that presented in this chapter may not directly discover the casual variant, it does focus subsequent function studies to a region of interest. And, finally, characterizing haplotypes in multiethnic populations carry with them a significant advantage that cannot be attained in traditional single variant / disease association studies. Namely, ‘trans-racial mapping’ (Todd et al. 1989) utilizing the different LD patterns seen across ethnic groups may sharpen the focus on which individual variants carry with them the greatest prior probability of being an etiologic determinant of common diseases. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 261 CHAPTER 8 FUTURE DIRECTIONS IN THE STUDY OF HAPLOTYPIC VARIATION AND THE RISK OF BREAST CANCER: THE MULTIETHNIC COHORT 8.1. Summary Previous chapters of this thesis have explored the role of genetic variation in three genes, ATM, ESR1, and ESR2, and their influence on the risk of breast cancer in a multiethnic cohort. The analysis of ATM and its association with breast cancer took the form of a more traditional SNP by SNP variant study in which each variant was considered singly. An appropriate dimunition of the level of significance was made to account for multiple comparisons. Subsequent analysis of genetic variation at the ESR1 and ESR2 locus began by describing the ‘block-like’ pattern of LD in these two genomic regions as well as enumerating the haplotype diversity containing within each block. We then outlined and compared case/control analysis techniques that permit the testing of individual haplotypes for association with disease. The data used for these analyses came from a pilot case control study of n=854 women. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 262 In this chapter, future directions of study are considered. Namely, an experimental design is outlined which would permit a comprehensive haplotype based analysis of the ATM locus. Included in this design are those missense variants evaluated in the SNP by SNP study which were either polymorphic or showed some relationship with breast cancer so that their patterns of LD with other, perhaps functional, variants can be detailed. Next, we report the characteristics of a larger multiethnic, breast cancer case/control study (n=3936 women) in which genotyping studies of haplotype tagging SNPs (htSNPs) at these and other candidate loci can be performed. This larger study population not only permits sufficient power so as to evaluate moderate relative risks but has the potential to consider a risk estimate that is ethnic-specific. 8.2. ATM: Haplotype and Linkage Disequilibrium Architecture The chapters on ATM specifically considered 20 missense variants - the majority of which were rare. There was a modestly elevated risk of breast cancer associated with one of these variants - L546V (Chapter 2). Further, it appeared that this variant was over-represented among women with a reported first degree family history of breast cancer and women carrying this variant also had an early age of onset of disease as compared to those that did not Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 263 (Chapter 3). Finally, the relative correlation between this variant and others was explored as a mechanism by which polymorphisms could be distinguished from mutations (Chapter 4). Utilization of haplotypes in association studies for identification of commonly occurring variants have a demonstrated increase in power over single-allele investigations (Johnson et al. 2001). Although data presented in previous chapters included a number of variants, our primary interest was rare missense mutations - although some polymorphic sites (D126E and D1853N) were included. Exploiting these markers for characterizing LD and constructing haplotypes was not feasible given that it has been shown that variants with low frequency have little power for the detection of LD (Goddard et al. 2000; Lewontin 1995). Three previous studies of ATM have shown that diversity this gene is significantly constrained (Thorstenson et al. 2001) and reveals regions of LD extending beyond 100 kb (Bonnen et al. 2000; Bonnen et al. 2002). The primary limitation of these investigations is the inadequate density of polymorphic markers. Thorstenson et al. (Thorstenson et al. 2001) infer haplotypes using genotyping data from 17 polymorphic sites across ATM. Similarly, Bonnen et al.’s first ATM haplotype paper (Bonnen et al. 2000) details only 14 sites, their subsequent study (Bonnen et al. 2002) not only has a target SNP density of 1 SNP every 30 kb but excludes population-specific Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 264 SNPs (those with frequency >0.15 in one population and <0.05 in all others) from consideration. Such coarse assessment of broad patterns of LD has created an overall picture of inconsistency insofar as patterns differ between genomic regions. The suggestion of extremely variable LD patterns (Clark et al. 1998) and/or greatly reduced LD particularly in African populations (Reich et al. 2001a) could profoundly complicate proposed association studies. However, several lines of evidence argue instead that human LD structure is highly structured, characterized stretches of strong LD with intercalated short intervals of meiotic recombination (Daly et al. 2001; Jeffreys et al. 2001). At a high density of polymorphic markers, Gabriel et al. note that the vast majority of sequence examined <90%) was found in blocks (Gabriel et al. 2002). As this spacing increased, only larger blocks could be captured thus reducing the percentage of sequence observed (73% at a 5 kb marker spacing, 41% at 20 kb and 19% at 50 kb). Hence, it is unlikely that an association map based on lower density SNP spacing will capture all common variation within a region ofLD. Given the suggestion, as yet unproven, the ATM locus may be dominated by a significant stretch of LD, a comprehensive haplotype map of this region would have clear value in future medical genetic studies. Of particular interest would be breast cancer as a consequence of its functional interaction with BRCA1 (Cortez et al. 1999) and as yet unresolved family Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 265 study data suggesting an elevated risk of this phenotype among presumptive ATM heterozygotes (reviewed in Chapter 1). In the table below (Table 46) we outline variants in a proposed comprehensive SNP genotyping project which will comprehensively define LD patterns across the ATM locus. As part of this proposed effort, we have included polymorphic miseense variants identified in association with particular population ancestry (e.g. D126E; (Thorstenson et al. 2001)), across ethnic groups (e.g. D1853N; (Thorstenson et al. 2001)) and of higher prevalence among breast cancer cases as compared to controls (L546V; Chapter 2). 8.3. ESR & ESR2: Haplotype-based Association Study Design One section of this thesis (Chapter 7) examines the relationship between haplotypic variation in ESR1 and ESR2 and the risk of breast cancer in a pilot case/control investigation. Two major limitations exist: firstly, the attained Rh values generally did not reach a minimum critical value of >0.7 which was used to define a ‘haplotype tagging’ SNP set in Chapter 6. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 266 Table 46. ATM SNP location, and distance between sites: LD mapping project. SNP SNP Distance Distance Location Golden Path Alleles Reference from Start between within gene Contig “ Number Site SNPs 1 rs673545 -19510 5' Upstream 109430902 [C/T] 2 rsl263766 -10511 8999 Promoter 109439901 [T/A] 3 rs609557 -9329 1182 Promoter 109441083 [G/T] 4 rs228606 -5995 3334 Promoter 109444417 [G/T] 5 rs228607 -5349 646 Promoter 109445063 [A/G] 6 rs228608 -4942 407 Promoter 109445470 [A/G] 7 rs 183459 -4645 297 Promoter 109445767 [G/T] 8 rs 183460 -3132 1513 Promoter 109447280 [A/C] 9 rs 1442729 -2665 467 Promoter 109447747 [A/C] 10 rs228589 -634 2031 Promoter 109449778 [A/T] 11 rs 1800066 132 766 Intron 1 109450544 [A/G] 12 rs228590 2299 2167 Intron 2 109452711 [G/A] 13 rs3218684 4721 2422 Exon 4 109455133 [C/T] 14 rs 1800054 4734 13 Exon 4 109455146 [C/G] 15 rs3218690 4750 16 Exon 4 109455162 [C/T] 16 rs641605 8077 3327 Intron 5 109458489 [A/G] 17 rs228595 11751 3674 Intron 5 109462163 [A/G] 18 rs2234996 12484 733 Intron 5 109462896 [A/G] 19 rs2234997 12601 117 Exon 6 109463013 [A/T] 20 rs2234998 12623 22 Exon 6 109463035 [T/C] 21 rs2234999 12803 180 Intron 6 109463215 [A/G] 22 rs228597 13167 364 Intron 6 109463579 [A/G] 23 rsl70616 15024 1857 Intron 6 109465436 [A/C] 24 rs676719 18632 3608 Intron 6 109469044 [C/T] 25 rs3218707 20885 2253 Exon 7 109471297 [C/G] 26 rs2235002 20987 102 Exon 7 109471399 [G/T] 27 rs2235003 20998 11 Exon 7 109471410 [G/T] 28 rs2235004 21041 43 Intron 7 109471453 [C/T] 29 rs3218706 21697 656 Exon 8 109472109 [A/G] 30 rs3218674 21745 48 Exon 8 109472157 [C/T] 31 rsl 800735 21936 191 Intron 8 109472348 [G/T] 32 rs694376 25206 3270 Intron 9 109475618 [G/T] 33 rsl 800727 25928 722 Exon 10 109476340 [C/G] 34 rsl 800737 27838 1910 Exon 11 109478250 [C/T] 35 rs2235000 27891 53 Exon 11 109478303 [A/G] 36 AT L546V 28749 858 Exon 12 109479161 [C/G] 37 rsl 060788 28774 25 Exon 12 109479186 [A/G] 38 rs2235006 28858 84 Exon 12 109479270 [C/T] 39 rs2235005 29617 759 Intron 12 109480029 [A/G] 40 rsl 800701 33123 3506 Exon 15 109483535 [C/G] 41 rs2235010 33303 180 Intron 15 109483715 [G/A] 42 rs2235011 34396 1093 Exon 16 109484808 [A/T] 43 rs641252 34477 81 Exon 16 109484889 [G/T] 44 rs672655 35815 1338 Intron 16 109486227 [A/G] Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 267 Table 46 (continued). SNF SNP Reference Number Distance from Start Site Distance between SNPs Location within gene Golden Path Contig “ Alleles 45 rsl 263 822 38603 2788 Intron 17 109489015 [G/T] 46 rs228604 42768 4165 Intron 17 109493180 [A/T] 47 rsl 800056 44161 1393 Exon 18 109494573 [C/T] 48 rs3218673 44203 42 Exon 18 109494615 [C/T] 49 rs2234994 45278 1075 Intron 18 109495690 [G/T] 50 rs3218687 45341 63 Exon 19 109495753 [A/G] 51 rs3218708 45460 119 Exon 19 109495872 [C/T] 52 rsl 064815 48222 2762 Exon 21 109498634 [A/T] 53 rs3092857 49457 1235 Exon 22 109499869 [A/G] 54 rsl 800057 49614 157 Exon 23 109500026 [C/G] 55 rsl 150198 54247 4633 Intron 23 109504659 [A/G] 56 rs3092851 58081 3834 Exon 25 109508493 [A/C] 57 rs3205813 59690 1609 Exon 26 109510102 [C/T] 58 rs3092841 61302 1612 Exon 27 109511714 [C/G] 59 rs609261 64292 2990 Intron 27 109514704 [A/G] 60 rs3218697 64444 152 Intron 27 109514856 [A/G] 61 rs3092856 65890 1446 Exon 29 109516302 [C/T] 62 rsl 800058 66508 618 Exon 30 109516920 [C/T] 63 rs3092849 70209 3701 Exon 32 109520621 [G/T] 64 rs681479 71862 1653 Exon 33 109522274 [A/C] 65 rs681518 71894 32 Exon 33 109522306 [A/C] 66 rs3092872 74165 2271 Intron 33 109524577 [C/T] 67 rs3092906 76531 2366 Intron 34 109526943 [G/T] 68 rsl 800059 76664 133 Exon 35 109527076 [A/C] 69 rs3092907 78540 1876 Exon 36 109528952 [C/G] 70 rs3092828 81545 3005 Intron 37 109531957 [G/C] 71 rsl801516 81620 75 Exon 38 109532032 [G/A] 72 rs3218686 84896 3276 Exon 39 109535308 [A/G] 73 rs3092910 87075 2179 Exon 40 109537487 [C/T] 74 rs673281 88227 1152 Intron 40 109538639 [A/G] 75 rs3218676 89324 1097 Exon 41 109539736 [C/T] 76 rs659243 89325 1 Exon 41 109539737 [A/G] 77 rs3092826 92908 3583 Exon 43 109543320 [C/T] 78 rs2515885 95021 2113 Intron 44 109545433 [A/T] 79 rs676570 98796 3775 Intron 46 109549208 [A/T] 80 rs595747 100231 1435 Intron 46 109550643 [C/T] 81 rs3218699 102415 2184 Exon 47 109552827 [C/T] 82 rs3218700 102919 504 Exon 48 109553331 [A/C] 83 rs3092830 104457 1538 Intron 48 109554869 [C/T] 84 rs3092831 104556 99 Exon 49 109554968 [A/C] 85 rs3218675 105930 1374 Exon 50 109556342 [C/T] 86 rs227059 109925 3995 Exon 54 109560337 [A/G] 87 rs227065 114095 4170 Intron 57 109564507 [A/G] 88 rs227068 116265 2170 Intron 57 109566677 [A/G] 89 rs227072 118249 1984 Intron 57 109568661 [A/C] 90 rs3017873 120283 2034 Intron 57 109570695 [C/T] Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 268 Table 46 (continued). SNP SNP Distance Distance Location Golden Path Alleles Reference from Start between within gene Contig “ Number Site SNPs 91 rs5 88746 121891 1608 Intron 58 109572303 [A/C] 92 rsl657971 123696 1805 Intron 59 109574108 [A/C] 93 rsl 784308 125046 1350 Intron 60 109575458 [C/T] 94 rs373759 126813 1767 Intron 60 109577225 [A/G] 95 rsl79108 130853 4040 Intron 61 109581265 [C/T] 96 rs664982 131639 786 Intron 61 109582051 [G/A] 97 rs664143 131817 178 Intron 62 109582229 [T/C] 98 rsl 150206 136592 4775 Intron 62 109587004 [C/T] 99 rs227093 140095 3503 Intron 62 109590507 [A/T] 100 rsl 060795 142313 2218 Exon 64 109592725 [A/G] 101 rs3218711 142420 107 5' Upstream 109592832 [C/G] 102 rs3092834 142627 207 5' Upstream 109593039 [C/T] 103 rs3092835 142762 135 5' Upstream 109593174 [A/G] 104 rs652311 146225 3463 5' Upstream 109596637 [A/G] 105 rs227088 149911 3686 5' Upstream 109600323 [C/T] 106 rsl 150213 152393 2482 5' Upstream 109602805 [A/G] 107 rsl 150215 153014 621 5' Upstream 109603426 [C/T] 108 rs227078 156537 3523 5' Upstream 109606949 [C/T] 109 rs227077 159408 2871 5' Upstream 109609820 [G/A] 110 rsl 13995 162992 3584 5' Upstream 109613404 [T/A] “Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. The Human Genome Browser at UCSC. Genome Research, 12: 996-1006, 2002. Secondly, the overall sample size was not sufficient to detect variants carrying with them only substantial susceptibility risks (i.e. odds ratios on the order of 2.5 and above). However, larger sample sizes would have adequate power to detect more moderate risks, assuming that the etiological variant could be defined by ht SNPs. The relationship between allele/haplotype frequency and sample-size requirements for a population-based association study is outlined in Figure 23 below. Calculations assume 80% power to detect a disease effect o f OR=l .5 at a significance level of p = 10"4. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 269 16000 i 14000 c o U ■g 12000 Sample Size: Rh2 = 0.9 Sample Size: Rh2 = 0.7 (/) V ) (2 10000 O !b 8000 E s Z *5 6000 s O ' w 4000 N g* 2000 « 0.1 0.15 0.2 0.25 0.02 0.03 0.05 0.06 0.07 0.08 0.09 0.01 0.04 Allele Frequency Figure 23. Relationship between allele and/or haplotype frequency and sample-size requirements for population-based association studies. The solid line represents a Rh value of 0.7 and the dashed line a R v a lu e of 0.9. Hence, for common haplotypes (>5%) frequency, attainable sample sizes would provide sufficient statistical power to detect susceptibility variants or regions with even moderate odds ratios. Outlined in Table 47 below are the characteristics of a multiethnic case/control study of breast cancer from five ethnic groups - African-American, Native Hawaiian, Japanese, Latina, and Caucasian - which includes n=3936 women. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 47. Descriptive statistics of subjects (n= 3936) stratified by ethnicity: total observations with percent in parentheses8 . Ethnicitv A frican-A m erican Native H aw aiian Japanese Latina Caucasian (S tratum n) (n=278) (n=672) (n=92) (n=311) (n=358) (n=429) (n=272) (n=706) (n=356)1 (n=462) V ariable Cases C ontrols Cases C ontrols Cases C ontrols Cases C ontrols Cases Controls # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) Age (mean in years) 64.1 64.3 60.7 59.5 64.5 64.2 63.9 62.8 64.6 62.3 Menopausal Status Premenopausal 14 76 16 84 45 89 27 77 29 93 (14.4) (11.3) (17.4) (27.0) (12.6) (20.7) (4.9) (10.9) (8.1) (20.1) Postmenopausal 154 373 53 162 240 264 180 430 137 280 (55.4) (55.5) (57.6) (52.1) (67.0) (61.5) (66.2) (60.9) (69.6) (60.6) Simple Hysterectomy 54 155 12 44 34 43 39 132 62 65 (19.4) (23.1) (13-0) (14.1) (9.5) (10.0) (14.3) (18.7) (17.4) (14.1) HRT Use Never 153 361 45 176 129 191 127 355 117 216 (55.) (53.7) (48.9) (56.6) (36.1) (44.5) (46.7) (50.3) (32.9) (46.7) Past 63 131 13 47 48 50 47 113 59 58 (22.7) (19.5) (14.1) (15.1) (13.5) (11.7) (17.3) (16.0) (16.6) (12.6) Current 49 143 29 82 168 175 78 181 173 182 (17.6) (21.3) (31.5) (26.4) (46.9) (40.8) (28.7) (25.6) (48.1) (39.4) Age at Menarche (years) 12 or less 149 310 52 185 199 216 131 343 197 224 (53.6) (46.1) (56.5) (59.5) (55.6) (50.3) (48.2) (48.6) (55.3) (48.5) 1 3 -1 4 95 260 26 90 116 151 100 274 125 202 (34.2) (38.7) (28.3) (28.9) (32.4) (35.2) (36.8) (38.8) (35.1) (43.7) 15 and above 32 91 10 34 32 58 34 76 31 36 (11-5) (13.5) (10.9) (10.9) (8.9) (13.5) (12.5) (10.8) (8.7) (7.8) 270 R e p r o d u c e d with p e r m i s s i o n of the c o p y r i g h t o w n e r . F u r t h e r r e p r o d u c t i o n p r o h i b i t e d w i t h o u t p e r m i s s i o n . Table 47 (continued). Ethnicitv A frican-A m erican Native Haw aiian JaDanese Latina Caucasian (S tratum n) (n=278) (n=672) (n=92) (n=311) (n=358) (n=429) (n=272) (n=706) (n=356) (n=462) V ariable Cases C ontrols Cases Controls Cases C ontrols Cases Controls Cases Controls # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) # (% ) Number o f Children None 32 77 8 27 58 48 27 51 64 72 (11.5) (11.5) (8.7) (8.7) (16.2) (11.2) (9.9) (7.2) (18.0) (15.6) 1 56 106 3 31 39 42 23 44 40 43 (20.1) (15.8) (3.3) (10.0) (10.9) (9.8) (8.5) (6.2) (11.2) (8.1) 2 - 3 110 268 44 121 194 257 98 253 182 245 (39.6) (39.9) (47.8) (38.9) (54.2) (59.9) (36.0) (35.8) (51.1) (53.0) 4 or more 70 212 37 132 62 76 122 350 66 100 (25.2) (31.6) (40.2) (42.4) (17.3) (17.7) (44.9) (49.6) (18.5) (21.7) Age at First Birth (years)b 20 or less 112 297 36 110 25 43 85 272 66 91 (41.8) (45.7) (40.9) (36.7) (7.3) (10.2) (32.4) (39.7) (18.9) (20.0) 21 - 3 0 105 244 44 143 219 285 128 336 190 246 (39.2) (37.5) (50.0) (47.7) (64.0) (67.9) (48.9) (49.0) (54.3) (54.1) 31 and above 19 32 0 20 40 44 22 27 30 46 (7.1) (4.9) (6.7) (11.7) (11.5) (8.4) (3.9) (8.6) (10.1) Family History o f Breast 63 83 14 44 63 47 46 73 55 42 Cancer (22.7) (12.3) (15.2) (14.1) (17.6) (11.0) (16.9) (10.3) (15.5) (9.1) “Numbers may not add to total due to missing observations and percentages may not sum to 100% due to rounding b Among parous women. ro 272 8.4. Conclusions Haplotype-based studies offer a more comprehensive approach than has been previously achievable for association studies of complex disease (Collins et al. 1999; Jorde 2000; Kruglyak 1999; Ott 2000; Reich et al. 2001a). Evaluation of associations between DNA sequence variants and phenotype has been widely employed in genetic epidemiologic studies to identify regions of the genome contributing to disease. Whereas in the past, there have been relatively few DNA variants available for study, the discovery and deposition in public databases of more than two million Single Nucleotide Polymorphisms (SNPs) has considerably changed the dimensions of this inquiry (Altshuler et al. 2000; Sachidanandam et al. 2001). These variants can either be investigated directly or through their non- random association with neighboring markers - termed linkage disequilibrium (LD). Several studies have independently demonstrated that on typing a large number of SNPs within a small genomic region, a crisp haplotype pattern emerges -regional blocks of variable length over which there is relatively little haplotype diversity (Daly et al. 2001; Jeffreys et al. 2001; Johnson et al. 2001). Such a pattern has important implications for LD as a tool to effectively evaluate and detect disease susceptibility variants in case/control studies. As compared to a series of single allele surveys, methods based on haplotypes composed of multiple SNPs in combination has increased power of detection Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 273 This thesis has summarized the literature to date and begun to outline the risks associated with three candidate genes in the pathogenesis of breast cancer - ATM, ESR1, and ESR2. For the two estrogen receptor genes we began by reporting measurement of LD across these loci using a dense set of polymorphic markers in a multiethnic panel thus providing a comprehensive description of the haplotype block structure and common haplotypes across these genes. We additionally outline a methodology that will permit the same for the ATM locus. Within blocks, we then attempt to capture and efficiently represent haplotype variation using a reduced number of “haplotype tag” SNPs (htSNPs). This catalogue htSNPs provide a useful reference set for our subsequent genotyping study in which the effect common variation in candidate genes on the risk of breast cancer can be evaluated. The main obstacles to these and other association studies are an inclusion of multiple populations and to develop analysis methods which increase the likelihood of detecting real effects while minimizing false positives. In this way, the genetic characteristics of common disease can be widely applicable and used to study disease prevalent in any human population. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 274 BIBLIOGRAPHY Allegra, J.C., Lippmann, M.E., and Green, L. (1979). Estrogen receptor values in patients with benign breast disease. Cancer 44: 228-231. Altshuler, D., Pollara, V.J., Cowles, CR, Van Etten, W.J., Baldwin, J., Linton, L., and Lander, E.S. (2000). An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407: 513-516. Anderson, T.I., Heimdal, K.R., Skrede, M., Tveit, K., Berg, K., and Borresen, A.L. (1994). Oestrogen receptor (ESR) polymorphisms and breast cancer susceptibility. Human Genetics 94: 665-670. Anderson, T.I., Wooster, R., Laake, K., Collins, N., Warren, W., Skrede, M., Elies, R., Tveit, K.M., Johnston, S.R., Dowsett, M., Olsen, A.O., Moller, P., Stratton, M.R., and Borresen-Dale, A.L. (1997). Screening for ESR mutations in breast and ovarian cancer patients. Human Mutation 9: 531- 536. Appleby, J.M., Barber, J.B., Levine, E., Varley, J.M., Taylor, A.M., Stankovic, T., Heighway, J., and Warren, C.D.S. (1997). Absence of mutations in the ATM gene in breast cancer patients with severe responses to radiotherapy. British Journal of Cancer 76: 1546-1549. Arlett, C.F., Green, M.H., Priestley, A., Harcourt, S.A., and Mayne, L.V. (1988). Comparative human cellular radiosensitivity I. The effect of SV40 transformation and immortalisation on the gamma-irradiation survival of skin derived fibroblasts from normal individuals and from ataxia- telangiectasia patients and heterozygotes. International Journal of Radiation Biology 54: 911-928. Athma, P., Rappaport, R., and Swift, M. (1996). Molecular genotyping shows that ataxia-telangiectasia heterozygotes are predisposed to breast cancer. Cancer Genetics Cytogenetics 92: 130-134. Baskaran, R., Wood, L.D., Whitaker, L.L., Canman, C.E., Morgan, S.E., Xu, Y., Barlow, C., Baltimore, D., Wynshaw-Boris, A., Kasta, M.B., and Wang, J.Y.J. (1997). Ataxia telangiectasia mutant protein activates c-Abl tyrosine in response to ionizing radiation. Nature 387: 516-519. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 275 Bay, J., Uhrhammer, N., Perain, D., Presneau, N., Tchirkov, A., Vuillaume, M., Laplace, V., Grancho, M., Verrelle, P., Hall, J., and Bignon, Y. (1999). High incidence of cancer in a family segregating a mutation of the ATM gene: possible role of ATM heterozygosity in cancer. Human Mutation 14:485-492. Bebb, D.G., Yu, Z., Chen, J., Telatar, M., Gelmon, K., Phillips, N., Gatti, R.A., and Glickman, B.W. (1999). Absence of mutations in the ATM gene in forty-seven cases of sporadic breast cancer. British Journal of Cancer 80: 1979-1981. Blasina, C., Van de Weyer, I., Laus, M.C., Luyten, W.H.M.L., Parker, A.E., and McGowan, C.H. (1999). The human homologue of the checkpoint kinase Cdsl directly inhibits Cdc25 phosphatase. Current Biology 9: 1-10. Boder, E. (1985). Ataxia-telangiectasia: An Overview. In Gatti, R.A.S.M. (ed), Ataxia-telangiectasia: Genetics, Neuropathology and Immunology of a Degenerative Disease of Childhood. Alan R. Liss, New York, pp 1-63. Bonnen, P.E, Story, M.D., Ashom, C..L, Buchholz, T.A., Weil, M.M., and Nelson, D.L. (2000). Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium. American Journal of Human Genetics 67: 1437-1451. Bonnen, P.E., Wang, P.J., Kimmel, M., Chakraborty, R., and Nelson, D.L. (2002). Haplotype and linkage disequlibrium architecture for human cancer-associated genes. Genome Research 12: 1846-1853. Borresen, A..L, Andersen, T.I., Tretli, S., Heiberg, A., and Moller, P. (1990a). Breast cancer and other cancers in Norwegian families with ataxia- telangiectasia. Genes Chromosomes and Cancer 2: 339-340. Borresen, A.L., Anderson, T.I., Tretli, S., and Heiber, A.P.M. (1990b). Breast cancer and other cancers in Norwegian families with AT. Genes Chromosomes Cancer 2: 339-403. Bottema, C.D.K., Ketterlink, R.P., Ii, S., Yoon, H-S., Phillips, J.A., III, and Sommer, S.S. (1991). Missense mutations and evolutionary conservation of amino acids: evidence that many of the amino acids in factor IX function as "spacer" elements. American Journal of Human Genetics 49: 820-838. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 276 Brandi, M.L., Becherini, L., Gennari, L., Racchi, M., Bianchetti, A., Nacmias, B., Sorbi, S., Mecocci, P., Senin, U., and Govoni, S. (1999). Association of the estrogen receptor alpha gene polymorphism with sporadic Alzheimer's Disease. Biochemical and Biophysical Research Communications 265: 335-338. Breslow, N.E. and Day, N.E. (1980). Statistical Methods in Cancer Research. International Agency for Research on Cancer. Lyon. Broca, P.P. (1866). Traite des Tumeurs. Asselin, Paris. Broeks, A., Urbanus, J.H.M., Floore, A.N., Dahler, E.C., Klijn, J.G.M., Rutgers, E.J.T., Devilee, P., Russell, N.S., van Leeuwen, F.E., and van T Veer, L.J. (2000). ATM-heterozygous germline mutations contribute to breast cancer-susceptibility. American Journal of Human Genetics 66: 494-500. Brown, K.D., Ziv, Y., Sadanandan, S.N., Chessa, L., Collins, F.S., Shiloh, Y., and Tagle, D.A. (1997). The ataxia-telangiectasia gene product, a constitutively expressed nuclear protein that is not up-regulated following genome damage. Proceedings of the National Academy of Science, USA 94: 1840-1845. Bunker JP, Houghton J, Baum M (1998) Putting the risk of breat cancer in perspective. British Medical Journal 317: 1307-1309. Cabana, M.D., Crawford, T.O., Winkelstein, J.A., Christensen, J.R., Lederman, H.M. (1998). Consequences of the delayed diagnosis of ataxia- telangiectasia. Pediatrics 102: 98-100. Canman, C.E., Lim, D.S., Cimprick, K.A., Taya, Y., Tamai, K., Sakaguchi, K., Appella, E., Kastan, M.B., and Siliciano, J.D. (1998). Activation of the ATM kinase by ionizing radiation and phosphorylation o f p53. Science 281: 1677-1679. Cao, Q., Martinez, M., Zhang, J., Sanders, A.R., Badner, J.A., Cravchik, A., Markey, C.J., Beshah, E., Guroff, J.J., Maxwell, M.E., Kazuba, D.M., Whiten, R., Goldin, L.R., Gershon, E.S., and Gejman, P.V. (1997). Suggestive evidence for a schizophrenia susceptibility locus on chromosome 6q and a confirmation in an independent series of pedigrees. Genomics 43: 1-8. Cardon, L.R. and Bell, J.I. (2001). Association study designs for complex diseases. Nature Reviews Genetics 2: 91-99. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 277 Carter, S.L., Negrini, M., Baffa, R., Gillum, D.R., Rosenberg, A.L., Schwartz, G.F., and Croce, C.M. (1994). Loss of heterozygosity at 1 Iq22-q23 in breast cancer. Cancer Research 54: 6270-6274. Castagnoli, A., Maestri, I., and Del Sanno, L. (1987). PvuII RFLP inside the human estrogen receptor gene. Nucleic Acid Research 15: 866. Chakraborty, R. and Weiss, K.M. (1988). Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proceedings of the National Academy of Sciences, U.S.A. 85: 9119-9123. Chappell, S.A., Walsh, T., Walker, R.A., and Shaw, J.A. (1997). Loss of heterozygosity at chromosome 6q in preinvasive and early invasive breast carcinoma. British Journal of Cancer 75: 1324-1329. Chaturvedi, P., Eng, W.K., Zhu, Y., Mattem, M.R., Mishra, R., Hurle, M.R., Zhang, X., Annan, R.S., Lu, Q., Faucette, L.F., Scott, S.F., Li, X., Carr, S.A., Johnson, R.K., Winkler, J.D., Zhou, B.B.S. (1999). Mammalian Chk2 is a downstream effector of the ATM-dependent DNA damage checkpoint pathway. Oncogene 18:4047-4054. Chen, G. and Lee, E. (1996). The product of the ATM gene is a 370-kDA nuclear phosphoprotein. Journal of Biological Chemistry 271: 33693- 33697. Chen, J., Birkholtz, G.G., Lindblom, P., Rubio, C., and Lindblom, A. (1998). The role of ataxia-telangiectasia heterozygotes in familial breast cancer. Cancer Research 58: 1376-1379. Chenevix-Trench, G., Spurdle, A.B., Gatei, M., Kelly, H., Marsh, A., Chen, X., Donn, K., Cummings, M., Nyholt, D., Jenkins, M.A., Scott, C., Pupo, G.M., Dork, T., Bendix, R., Kirk, J., Tucker, K., McCredie, M.R.E., Hopper, J.L., Sambrook, J., Mann, G.J., and Khanna, K.K. (2002). Dominant negative ATM mutations in breast cancer families. Journal of the National Cancer Institute 94: 205-215. Clark, A.G., Weiss, K.M., Nickerson, D.A., Taylor, S.L., Buchanan, A., Stengard, J., Salomaa, V., Vartiainen, E., Perola, M., Boerwinkle, E., and Sing, C.F. (1998). Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. American Journal of Human Genetics 63: 595-612. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 278 Claus, E.B., Schildkraut, J., Iversen, E.S.J., Berry, D., and Parmigiani, G. (1998). Effect of BRCA1 and BRCA2 on the association between breast cancer risk and family history. Journal of the National Cancer Institute 90: 1824- 1829. Cole, J., Arlett, C.F., Green, M.H., Harcourt, S.A., Priestly, A., Henderson, L., Cole, H., James, S.E., and Richmond, F. (1988). Comparative human cellular radiosensitivity II. The survival following gamma-irradiation of unstimulated (GO) T-lymphocytes, T-lymphocyte lines, lymphoblastoid cell lines and fibroblasts from normal donors, from ataxia-telangiectasia patients and from ataxia-telangiectasia heterozygotes. International Journal of Radiation Biology 54: 929-943. Collins, A., Lonjou, C., and Morton, N.E. (1999). Genetic epidemiology of single-nucleotide polymorphisms. Proceedings of the National Academy of Sciences USA 96: 15173-15177. Concannon, P. and Gatti, R.A. (1997). Diversity of ATM gene mutations detected in patients with ataxia-telangiectasia. Human Mutation 10: 100- 107. Cortessis, V., Ingles, S., Millikan, R., Diep, A., Gatti, R.A., Richardson, L., Thompson, W.D., Paganini-Hill, A., Sparkes, R.S., and Haile, R.W. (1993). Linkage analysis of DRD2, a marker linked to the ataxia- telangiectasis gene, in 64 families with premenopausal bilateral breast cancer. 53: 5083-5086. Cortez, D., Wang, Y., Qin, J., and Elledge, S.J. (1999). Requirement of ATM- dependent phosphorylation of Brcal in the DNA damage response to double-strand breaks.. Science 286: 1162-1166. Cowley, S.M., Hoare, S., Mosselman, S., and Parker, M.G. (1997). Estrogen receptors alpha and beta form heterodimers on DNA. Journal of Biological Chemistry 272: 19858-19862. Cullen, R., MacGuire, T., Diggin, P., Hill, A., McDermott, E., O'Higgins, N., and Duffy, M.J. (2000). Detection of estrogen receptor-beta mRNA in breast cancer using RT-PCR. International Journal of Biological Markers 15: 114-115. Curren, J.E., Lea, R.A., Rutherford, S., Weinstein, S.R., and Griffiths, L.R. (2001). Association of estrogen receptor and glucocorticoid receptor gene polymorphisms with sporadic breast cancer. International Journal of Cancer 95: 271-275. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 279 Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. (2001). High-resolution haplotype structure in the human genome. Nature Genetics 29: 229-232. Danielian, P.S., White, R., Lees, J.A., and Parker, M.G. (1992). Identification of a conserved region required for hormone dependent transcriptional activation by steroid hormone receptors. EMBO Journal 11: 1025-1033. Devilee, P., van Vliet, M., van Sloun, P., Kuipers Dijkshoom, N., Hermans, J., Peterson, P.L., and Comelisse, C.J. (1991). Allelotype of human breast carcinoma: a second major site for loss of heterozygosity is on chromosome 6q. Oncogene 6: 1705-1711. Dork, T., Bendix, R., Bremner, M., Rades, D., Klopper, K., Nicke, M., Skawran, B., Hector, A., Yamini, P., Steinmann, D., Weise, S., Stuhrmann, M., and Karstens, J.H. (2001). Spectrum of ATM gene mutations in a hospital- based series o f unselected breast cancer patients. Cancer Research 61: 7608-7615. Dotzlaw, H., Leygue, E., Watson, P.H., and Murphy, L.C. (1997). Expression of estrogen receptor-beta in human breast tumors. Journal of Clinical Endocrinology and Metabolism 82: 2371-2374. Easton, D.F. (1994a). Cancer risks in A-T heterozygotes. International Journal of Radiation Biology 66: S177-S182. Eastwood, H., Brown, K.M.O., Maxkovic, D., and Pieri, L.F. (2002). Variation in the ESR1 and ESR2 genes and genetic susceptibility to anorexia nervosa. Molecular Psychiatry 7: 86-89. Elledge, S.J. (1996). Cell cycle checkpoints: preventing an identity crisis. Science 274: 1664-1671. Enmark, E., Pelto-Huikko, M., Grandien, K., et al. (1997). Human estrogen receptor B-gene structure, chromosomal localization, and expression pattern. Journal of Clinical Endocrinology and Metabolism 82: 4258- 4265. Evans, R.M. (1988). The steroid and thyroid hormone receptor superfamily. Science 240: 889-895. Excoffier, L. and Slatkin, M. (1995). Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution 12: 921-927. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 280 Eyra, J.A., Gardner-Medwin, D., and Summerfield, G.P. (1988). Leukoencephalopathy after prophylactic radiation for leukaemia in ataxia telangiectasia. Archives of the Diseased Child 63: 1079-1080. Farrer, L.A., Cupples, L.A., Haines, J.L., et al. (1997). Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease: A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. Journal of the American Medical Association 278: 1349-1356. Fawell, S.E., Lees, J.A., White, R., and Parker, M.G. (1990)., Characterization and colocalization of steroid binding and dimerization activities in the mouse estrogen receptor. Cell 60: 953-962. Feigelson, H.S., McKean-Cowdin, R., Pike, M.C., Coetzee, G.A., Kolonel, L.N., Nomura, A.M., LeMarchand, L., and Henderson, B.E. (1999). Cytochrome p450cl7alpha gene (CYP17) polymorphism predicts use of hormone replacement therapy. .Cancer Research 59: 3908-3910. Feng, J., Yan, J., Michaud, S., Craddock, N., Jones, I.R., Cook, E.H., Goldman, D., Heston, L.L., Peltonen, L., Delisi, L.E., and Sommer, S.S. (2001). Scanning of estrogen receptor alpha (ERalpha) and thyroid hormone receptor alpha (TRalpha) genes in patients with psychiatric diseases: Four missense mutations identified in ERalpha gene. American Journal of Medical Genetics 105: 369-374. Femo, M. (1998). Prognostic factors in breast cancer: a brief review. Anticancer Research 18: 2167-2171. Fiorilli, M., Antonelli, A., Russo, G., Crescenzi, M., Carbonari, M., and Petrinelli, P. (1985). Variant of ataxia-telangiectasia with low-level radiosensitivity. Human Genetics 70: 274-277. Fisher, B., Costantino, J.P., Wickerham, L., Redmond, C.K., Kavanah, M., Cronin, W.M., Vogel, V., Robidoux, A., Dimitrov, N., Atkins, J., Daly, M., Wieand, S., Tan-Chiu, E., Ford, L., and Wolmark, N. (1998). Tamixofen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P-l study. Journal of the National Cancer Institute 90: 1371-1388. FitzGerald, M.G., Bean, J.M., Hegde, S.R., Unsal, H., MacDonald, D.J., Harkin, D.P., Finkelstein, D.M., Isselbacher, K.J., and Haber, D.A. (1997). Heterozygous ATM mutations do not contribute to early onset of breast cancer. Nature Genetics 15: 307-310. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 281 Ford, D., Easton, D.F., Stratton, M., Narod, S., Goldgar, D., Devilee, P., Bishop, D.T., Weber, B., Lenoir, G., Chang-Claude, J., et al. (1998). Genetic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes in breast cancer families. The Breast Cancer Linkage Consortium. American Journal of Human Genetics 62: 676-689. Fujii, H., Zhou, W., and Gabrielson, E. (1996). Detection of frequent allelic loss of 6q23-q25.2 in microdissected human breast cancer tissues. Gene Chromosomes Cancer 16: 35-39. Fuqua, S. A., Chamness, G.C., and McGuire, W.L. (1993). Estrogen receptor mutations in breast cancer. Journal of Cellular Biochemistry 51: 135-139. Fuqua, S.A., Schiff, R., Parra, I., Friedrichs, W.E., Su, J.L., McKee, D.D., Slentz- Kesler, K., Moore, L.B., Willson, T.M., and Moore, J.T. (1999). Expression of wild-type estrogen receptor beta and variant isoforms in human breast cancer. Cancer Research 59: 5425-5428. Fuqua, S.A., Wiltschke, C., Zhang, Q.X., Borg, A., Castles, C.G., Friedrichs, W.E., Hopp, T., Hilsenbeck, S., Mohsin, S., O'Connell, P., and Allred, D.C. (2000). A hypersensitive estrogen receptor-alpha mutation in premalignant breast lesions. Cancer Research 60: 4026-4029. Fuqua, S.A.W., Allred, D.C., Elledge, R.M., Krieg, S.L., Benedix, M.G., Nawaz, Z., O'Malley, B.W., Green, G.L., and McGuire, M.L. (1991). The ER- positive/PgR-negative breast cancer phenotype is not associated with mutations within the DNA binding domain. Breast Cancer Research and Treatment 26: 191-202. Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., and Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science 296: 2225-2229. Gambaro, G., Anglani, F., and D'Angelo, A. (2000). Association studies of genetic polymorphisms and complex disease. Lancet 355: 308-311. Garfmkel, L., Boring, C.C., and Heath, C.W. (1994). Changing trends: an overview of breast cancer incidence and mortality. Cancer 74: 222-227. Gatti, R. (1998). Ataxia-telangiectasia. In Vogelstein, B., Kinzler, K. (eds), The Genetic Basis of Human Cancer. McGraw-Hill, New York, pp 275-300. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 282 Gatti, R.A., Berkel, I., Boder, E., Braedt, G., Charmley, P., Concannon, P., Ersoy, F., Foroud, T., Jaspers, N.G.J., Lange, K., Lathrop, G.M., Leppert, M., Nakamura, Y., O'Connell, P., Paterson, M., Salser, W., Sanal, O., Silver, J., Sparkes, R.S., Susi, E., Weeks, D.E., Wei, S., White, R., and Yoder, F. (1988). Localization of an ataxia-telangiectasia gene to chromosome llq22-23. Nature 366: 577-580. Gatti, R.A., Boder, E., Vinters, H.V., Sparkes, R.S., Norman, A., and Lange, K. (1991). Ataxia-telangiectasia: an interdiciplinary approach to pathogenesis. Medicine 70: 99-117. Gatti, R.A., Tward, A., and Concannon, P. (1999). Cancer risk in ATM heterozygotes: a model of phenotypic and mechanistic differences between missense and truncating mutations. Molecular Genetics and Metabolism 68: 419-423. Geoffroy-Perez, B., Janin, N., Ossian, K., Lauge, A., Croquette, M.F., Griscelli, C., Debre, M., Bressac-de-Paillerets, B., Aurias, A., Stoppa-Lyonnet, D., and Andrieu, N. (2001). Cancer risk in heterozygotes for ataxia- telangiectasia. International Journal of Cancer 93: 288-293. Gilad, S., Khosravi, R., Shkedy, D., Uziel, T., Ziv, Y., Savitsky, K., Rotman, G., Smith, S., Chessa, L., Jorgensen, T.J., Hamik, R., Frydman, M., Sanal, O., Portnoi, S., Goldwicz, Z., Jaspers, N.G.J., Gatti, R.A., Lenoir, G., Lavin, M.F., Tatsumi, J., Wegner, R.D., Shiloh, Y., and Bar-Shira, A. (1996). Predominance of null mutations in ataxia-telangiectasia. Human Molecular Genetics 3: 433-439. Goddard, K.A., Hopkins, P.J., Hall, J.M., and Witte, J.S. (2000). Linkage disequilibrium and allele-ffequency distributions for 114 single-nucleotide polymorphisms in five populations. American Journal of Human Genetics 66: 216-234. Gotoff, S.P., Amirmokri, E., and Liebner, E.J. (1967). Neoplasia, untoward response to X-irradiation, and tuberous sclerosis. Am J Dis Child 114: 617-625. Green, S., Walter, P., Kumar, V., Krust, A., Bomert, J.M., Argos, P., and Chambon, P. (1986). Human oestrogen receptor cDNA: sequence, expression and homology to v-erb-A. Nature 320: 134-139. Greene, G.L., Gilna, P., Waterfield, M., Baker, A., Hort, Y., and Shine, J. (1986). Sequence and expression of human estrogen receptor complementary DNA. Science 231: 1150-1154. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 283 Greenwell, P.W., Kronmal, S.L., Porter, S.E., Gassenhuber, J., Obermaier, B., and Petes, T.D. (1995). TEL 1, a gene involved in controlling telomere length in S. cerevisiae, is homologous to the human ataxia telangiectasia gene. Cell 82: 823-829. Hampton, G.M., Mannermaa, A., Winquist R., Alavaikko M., Blanco, G., Taskinen, P.J., Kiviniemi, H., Newshma, I., Cavenee, W.K., and Evans, G.A. (1994). Loss of heterozygosity in sporadic human breast carcinoma: a common region between 1 lq22 and qql23.3. Cancer Research 54: 4586-4589. Hanahan, D. and Weinberg, R.A. (2000). The hallmarks of cancer. Cell 100: 57- 70. Harris, J.R., Lippman, M.E., Veronesi, U., and Willet, W. (1992). Breast cancer. New England Journal of Medicine 327: 319-328. Hecht, F. and Hecht, B.K. (1990). Cancer in ataxia-telangiectasia patients. Cancer Genet Cytogenet 46: 9-19. Hedrick, P.W. (1988). Inference of recombinational hotspots using gametic disequilibrium values. Heredity 60: 435-438. Henderson, B.E., Ross, R., and Bernstein, L. (1988). Estrogens as a cause of human cancer: the Richard and Hinda Rosenthal Foundation award lecture. Cancer Research 48: 246-253. Henderson, V.W. (1997). The epidemiology of estrogen replacement therapy in Alzheimer's disease. Neurology 48:(Suppl) 7: S27-S35. Herrington, D.M., Howard, T.D., Brosnihan, K.B., McDonnell, D.P., Li, X., Hawkin,s G.A., Reboussin, D.M., Xu, J., Zheng, S.L., Meyers, D.A., and Bleecker, E.R. (2002). Common estrogen receptor polymorphism augments effects of hormone replacement therapy on E-selectin but not C- reactive protein. Circulation 105: 1879-1882. Hill, S.M., Fuqua, S.A., Chamness, G.C., Green, G.L., and McGuire, W.L. (1989). Estrogen receptor expression in human breast cancer associated with an estrogen receptor gene restriction fragment length polymorphism. Cancer Research 49: 145-148. Hill, W.G. and Robertson, A.R. (1968). Linkage disequilibrium in finite populations. Theoretical Applied Genetics 38: 226-231. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 8 4 Hirschhom, J.N., Lindgren, C.M., Daly, M.J., Kirby, A., Schaffner, S.F., Burtt, N.P., Altshuler, D., Parker, A., Rioux, J.D., Platko, J., Gaudet, D., Hudson, T.J., Groop, L.C., and Lander, E.S. (2001). Genomewide linkage analysis of stature in multiple populations reveals several regions with evidence of linkage to adult height. American Journal of Human Genetics 69: 106-116. Hirschhom, J.N., Lohmueller, K., Byme, E., and Hirschhom, K. (2002). A comprehensive review of genetic association studies. Genetics in Medicine 4: 45-61. Hollstein, M., Sidransky, D., Vogelstein, B., and Harris, C.C. (1991). p53 mutations in human cancers. Science 253: 49-53. Horowitz, M.C. (1993). Cytokines and estrogen in bone: anti-osteoporotic effects. Science 260: 626-627. Hsing, A.W., Tsao, L., and Devesa, S.S. (2000). International trends and patterns of prostate cancer incidence and mortality. International Journal of Cancer 85: 60-67. Hu, Y.F., Lau, K.M., Ho, S-M., and Russo, J. (1998). Increased expression of estrogen receptor B in chemically transformed human breast epithelial cells. International Journal of Oncology 12: 1225-1228. Ingman, M., Kaessmann, H., Paabo, S., and Gyllensten, U. (2000). Mitochondrial genome variation and the origin of modem humans. Nature 408: 708-713. Inskip, H.M., Kinlen, L.J., Taylor, A.M.R., Woods C.G., and Arlett, C.F. (1999). Risk of breast cancer and other cancers in heterozygotes for ataxia- telangiectasia. British Journal of Cancer 79: 1304-1307. Ioannidis, J.P., Stavrou, I., Trikalinos, T.A., Zois, C., Brandi, M.L., Gennari, L., Albagha, O., Ralston, S.H., and Tsatsoulis, A. (2002). Association of polymorphisms of the estrogen receptor alpha gene with bone mineral density and fracture risk in women: a meta-analysis. Journal of Bone Mineral Research 17: 2048-2060. Isoe-Wada, K., Maeda, M., Yong, J., Adachi, Y., Harada, H., Urakami, K., and Nakashima, K. (1999). Positive association between an estrogen receptor gene polymorphism and Parkinson's disease with dementia. European Journal of Neurology 6: 431-435. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 285 Iwao, K., Miyoshi, Y., Egawa, C., Ikeda, N., and Noguchi, S. (2000). Quantitative analysis of estrogen receptor-beta mRNA and its variants in human breast cancers. International Journal of Cancer 88: 733-736. Iwase, H., Greenman, J.M., Barnes, D.M., Hodgson, S., Bobrow, L., and Mathew, C.G. (1996). Sequence variants of the estrogen receptor (ER) gene found in breast cancer patients with ER negative and progesterone receptor positive tumors. Cancer Letters 108: 179-184. Izatt, L., Greenman, J., Hodgson, S., Ellis, D., Watts, S., Scott, G., Jacobs, C., Liebmann, R., Zvelebil, M.J., Mathew, C., and Solomon, E. (1999a). Identification of germline missense mutations and rare allelic variants in the ATM gene in early-onset breast cancer. Genes, Chromosomes & Cancer 26: 286-294. Janin, N., Andrieu, N., Ossian, K., Lauge, A., Croquette, M-F., Griscelli, C., Debre, M., Bressac-de-Paillerets, B., Aurias, A., and Stoppa-Lyonnet, D. (1999). Breast cancer risk in ataxia telangiectasia (AT) heterozygotes: haplotype study in French AT families. British Journal of Cancer 80: 1042-1045. Jarvinen, T.A., Pelto-Huikko, M., Holli, K., and Isola, J. (2000). Estrogen receptor beta is coexpressed with ERalpha and PR and associated with nodal status, grade, and proliferation rate in breast cancer. American Journal of Pathology 156: 29-35. Jaspers, N.G.J., Gatti, R.A., Baan, C., Linssen, P.C., and Bootsma, D. (1988). Genetic complementation analysis of ataxia telangiectasia and Nijmegen breakage syndrome: a survey of 50 patients. Cytogenet Cell Genet 49: 259-263. Jeffreys, A.J., Kauppi, L., and Neumann, R. (2001). Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics 29. Jensen, E.V. (1981). Hormone dependency of breast cancer. Cancer 47: 2319- 2326. Ji, Y., Urakami, K., Isoe-Wada, K., Adachi, Y., and Nakashima, K. (2000). Estrogen receptor gene polymorphisms in patients with Alzheimer's disease, vascular dementia, and alcohol-associated dementia. Dementia and Geriatric Cognitive Disorders 11: 119-122. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 286 Johnson, G.C., Esposito, L., Barratt, B.J., Smith, A.N., Heward, J., Di Genova, G., Ueda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F., Twells, R.C., Payne, F., Hughes, W., Nutland, S., Stevens, H., Carr, P., Tuomilehto- Wolf, E., Tuomilehto, J., Gough, S.C., Clayton, D.G., and Todd, J.A. (2001). Haplotype tagging for the identification of common disease genes. Nature Genetics 29: 233-237. Jordan, V.C. (1992). The strategic use of antiestrogen to control the development and growth of breast cancer. Cancer 70: 977-982. Jorde, L.B. (2000). Linkage disequilibrium and the search for complex disease genes. Genome Research 10: 1435-1444. Kahlert, S., Nuedling, S., van Eickels, M., Vetter, H., Meyers, R., and Grohe, C. (2000). Estrogen receptor-alpha rapidly activated the IGF-1 receptor pathway. Journal of Biological Chemistry 275: 18447-18453. Kamik, P.S., Kulkami, S., Liu, X.P., Budd, G.T., and Bukowski, R.M. (1994). Estrogen receptor mutations in tamoxifen-resistant breast cancer. Cancer Research 54: 349-353. Kastan, M.B. and Lim, D.S. (2000). The many substrates and functions of ATM. Nature Reviews Molecular and Cellular Biology 1: 179-186. Kastan, M.B., Zhan, Q., el-Deiry, W..S, Carrier, F., Jacks T., Walsh, W.V., Plunkett, B.S., Vogelstein, B., and Fomace, A.J. (1992). A mammalian cell cycle checkpoint pathway utilizing p53 and GADD45 is defective in ataxia-telangiectasia. Cell 71: 587-597. Kato, S., Endoh, H., Masuhiro, Y., Kitamoto, T., Uchiyama, S., Sasaki, H., Masushuge, S., Gotoh, Y., Nishida, E., Kawashima, H., Metzger, D., and Chambon, P. (1995). Activation of the estrogen receptor through phosphorylation by mitogen-activated protein kinase. Science: 1491- 1494. Kawas, C., Resnick, S., Morrison, A., et al. (1997). A prospective study of estrogen replacement therapy and the risk of developing Alzheimer's disease: The Baltimore Longitudinal Study of Aging. Neurology 48: 1517-1521. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. (2002). The Human Genome Browser at UCSC. Genome Research 12: 996-1006. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 287 Khanna, K.K. (2000). Cancer risk and the ATM gene: a continuing debate. Journal of the National Cancer Institute 92: 795-802. Khanna, K.K., Gatti, R., Concannon, P., Weemaes, C., Heokstra, M.F., and Lavin, M. (1998). Cellular Responses to DNA Damage and Human Chromosome Instability Syndromes. In Nickloff, J.A., F. H.M. (eds), DNA Damage and Repair. Humana Press, Totowa (NJ), pp 395-442. Khanna, K.K. and Jackson, S.P. (2001). DNA double-strand breaks: signaling, repair and the cancer connection. Nature Genetics 27: 247-254. Kolonel, L.N., Henderson, B.E., Hankin, J.H., Nomura, A.M., Wilkens, L.R., Pike, M.C., Stram, D.O., Monroe, K.R., Earle, M.E., and Nagamine, F.S. (2000). A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. American Journal of Epidemiology 151: 346-357. Komm, B.S., Terpening, C.M., Benz, D.J., Graeme, K.A., Gallegos, A., Korc, M., Greene, G.L., O'Malley, B.W., and Haussler, M.R. (1988). Estrogen binding, receptor mRNA, and biologic response in osteoblast-like osteosarcoma cells. Science 241: 81-84. Kruglyak, L. (1999). Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics 22: 139-144. Kuiper, G.G., Enmark, E., Pelto-Huikko, M., Nilsson, S., Gustafsson, J.A. (1996). Cloning of a novel receptor expressed in rat prostate and ovary. Proceedings of the National Academy o f Sciences of the United States of America 93: 5925-5930. Kuller, L.H., Modan, B. (1991). Risk of breast cancer in ataxia-telangiectasia. Letter. New England Journal of Medicine 326: 1357. Laake, K., Odegard, A., Anderson, T.I., Bukholm, I.K., Karesen, R., Nesland, J.M., Ottestad, L., Shiloh, Y., Borresen-Dale, A.L. (1997). Loss of heterozygosity at 1 lq23.1 in breast carcinomas: indication for involvement of a gene distal and close to ATM. .Genes, Chromosomes & Cancer 18: 175-180. Laake, K., Vu, P., Anderson, T.I., Erikstein, B., Karesen, R., Lonning, P.E., Skovlund, E., Borresen-Dale, A.L. (2000). Screening breast cancer patients for Norwegian ATM mutations. British Journal of Cancer 83: 1650-1653. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 288 Lambert, J-C., Harris, J.M., Mann, D., Lemmon, H., Coates, J., Cumming, A., St- Clair, D., and Lendon, C. (2001). Are the estrogen receptors involved in Alzheimer's disease. Neuroscience Letters 306: 193-197. Lander, E.S. and Schork, N.J. (1994). Genetic dissection of complex traits. Science 265: 2037-2048. Lange, E., Borresen, A-L., Chen, X., Chessa, L., Chiplunkar, S., Concannon, P., and Dandekar, S. (1995). Localization of an ataxia-telangiectasis gene to an ~500-kb interval on chromosome 1 lq23.1: linkage analysis of 176 families by an international consortium. American Journal of Human Genetics 57: 112-119. Lavin, M.F. and Shiloh, Y. (1997). The genetic defect in ataxia-telangiectasis. Ann Rev Immunol 15: 177-202. Lee, L., Connell, C., and Bloch, W. (1993). Allelic discrimination by nick- translation PCR with fluorogenic probes. Nucleic Acids Research 21: 3761-3766. Lemieux, P. and Fuqua, S. (1996). The role of estrogen receptor in tumor progression. Journal of Steroid Biochemistry and Molecular Biology 56: 87-91. Levine, A.J., Momand, J., Finlay, C.A. (1991). The p53 tumour suppressor gene. Nature 351: 453-456. Levinson, D.F., Holmans P., Straub, R.E., Owen, M.J., Wildenauer, D.B., Gejman, P.V., Pulver, A.E., Laurent, C., Kendler, K.S., Walsh, D., Norton, N., Williams, N.M., Schwab, S.G., Lerer, B., Mowry, B.J., Sanders, A.R., Antonarakis, S.E., Blouin, J.L., DeLeuze, J.F., Mallet, J. (2000). Multicenter linkage study of schizophrenia candidate regions on chromosomes 5q, 6q, lOp, and 13q: schizophrenia linkage collaborative group III. American Journal of Human Genetics 67: 652-663. Lewontin, R.C. (1964). The interaction of selection and linkage. I. general considerations; heterotic models. Genetics 49: 49-67. Lewontin, R.C. (1995). The detection of linkage disequilibrium in molecular sequence data. Genetics 140: 377-388. Li, A., Huang, Y.M.S. (1999). Mutation Types and the Survival of A-T Patients. In Gatti, R. and Concannon, P. (eds), Eighth International Workshop on Ataxia-Telangiectasia, Las Vegas, NV. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 289 Lim, D.S., Kim, S.T., Xu, B., Maser, R.S., Lin, J., Petrini, J.H., and Kastan, M.B. (2000). ATM phosphyrylates p95/nbsl in an S-phase checkpoint pathway. Nature 404: 613-617. Louis-Bar, D. (1941). Sur un syndrome progressif comprenant des telangiectasies capillaries cutanees et conjonctivales symetriques, a disposition naevode et de troubles cerebelleux. Confin Neurol (Basel) 4: 32-42. Maass, H., Jonat W., Stolzenbach G., etal. (1980). The problem of non responding estrogen receptor-positive patients with advance breast cancer. Cancer 46 (12 Suppl): 2835-2837. Mangelsdorf, D.J., Thummel, C., Beato, M., Herrlich, P., Schiitz, G., Umesono, K., Blumberg, B., Kastner, P., Mark, M., Chambon, P., and Evans, R.M. (1995). The nuclear receptor superfamily: The second decade. Cell 83: 835-839. Maniatis, N., Collins, A., Xu, C-F., McCarthy, L.C., Hewett, D.R., Tapper, W., Ennis, S., Ke, X., and Morton, N.E. (2002). The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proceedings of the National Academy of Sciences, U.S.A. 99: 2228-2233. Mann, S., Laucirica, R., Carlson, N., Younes, P.S., Ali, N., Younes, A., Li, Y., andYounes, M. (2001). Estrogen receptor beta expression in invasive breast cancer. Human Pathology 32: 113-118. Manni, A., Arafah, B., and Pearson, O.G. (1980). Estrogen and progesterone receptors in the prediction of response of breast cancer to endocrine therapy. Cancer 46 (12 Suppl): 2835-2841. Maruyama, H., Toji, H., Harrington, C.R., Sasaki, K., Izumi, Y., Ohnuma, T., Arai, H., Yasuda, M., Tanaka, C., Emson, P.C., Nakamura, S., and Kawakami, H. (2000). Lack of an association of estrogen receptor alpha gene polymorphisms and transcriptional activity with Alzheimer disease. Archives of Neurology 57: 236-240. Matsuoka, S., Huang, M., and Elledge, S.J. (1998). Linkage of ATM to cell cycle regulation by the Chk2 protein kinase. Science 282: 1893-1897. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 290 Mattila, K.M., Axelman, K., Rinne, J.O., Blomberg, M., Lehtimaki, T., Laippala, P., Roytta, M., Viitanen, M., Wahlund, L-O., Winblad, B., and Lannfelt, L. (2000). Interaction between estrogen receptor 1 and the e4 allele of apolipoprotein E increases the risk of familial Alzheimer's disease in women. Neuroscience Letters 282: 45-48. McEwan, B.S. and Alves, S.E. (1999). Estrogen actions in the central nervous system. Endocrinology Reviews 20: 279-307. McGuire, W.L., Chamness, G.C., and Fuqua, S.A. (1991). Estrogen receptor variants in clinical breast cancer. Molecular Endocrinology 5: 1571-1577. McGuire, W.L. and Clark, G.M. (1989). Prognostic factors for recurrence and survival in axillary node-negative breast cancer. Journal of Steroid Biochemistry 34: 145-148. McKean-Cowdin, R., Feigelson, H.S., Pike, M.C., Coetzee, G.A., Kolonel, L.N., and Henderson, B.E. (2001). Risk of endometrial cancer and estrogen replacement therapy history by CYP17 genotype. Cancer Research 61: 848-849. Mendelsohn, M.E. and Karas, R.H. (1991). The protective effects of estrogen on the cardiovascular system. New England Journal of Medicine 340: 1801- 1811. Meyn, M.S. (1997). The chromosome instability syndromes: lessons for carcinogenesis. Curr Top Microbiol Immunol 221: 71-148. Modugno, F., Weissfeld, J.L., Trump, D.L., Zmuda, J.M., Shea, P., Cauley, J.A., and Ferrell, R.E. (2001). Allelic variants of aromatase and the androgen and estrogen receptors: Toward a multigenic model of prostate cancer risk. Clinical Cancer Research 7: 3092-3096. Morgan, J.L., Holcomb, T.M., and Morrissey, R.W. (1968). Radiation reaction in ataxia telangiectasia. American Journal of the Diseased Child 116: 557- 558 Morgan, S.E. and Kastan, M.B. (1997). p53 and ATM: cell cycle, cell death, and cancer. Advances in Cancer Research 71: 1-25. Morgan, S.E., Lovl, C., Pandita, T.K., Shiloh, Y., and Kastan, M.B. (1997). Fragments of ATM which have dominant-negative or complementing activity. Molecular Cell Biology 17: 2020-2029. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 291 Morrell, D., Chase, C.L., and Swift, M. (1990). Cancers in 44 families with ataxia-telangiectasia. Cancer Genet Cytogenet 50: 119-123. Morrell, D., Cromartie, E., and Swift, M. (1986). Mortality and cancer incidence in 263 patients with ataxia-telangiectasia. Journal of the National Cancer Institute 77: 89-92. Mosselman, S., Polman, J., and Dijkema, R. (1996). ERbeta: idenitification and characterization of a novel human receptor. Federation of European Biochemical Socities Letters 392: 49-53. Mumane, J.P. and Painter, R.B. (1982). Complementation of the defects of DNA synthesis in irradiated and unirradiated ataxia-telangiectasia cells. Proceedings of the National Academy of Science, USA 79: 1960-1963. Nichols, K.E., Levitz, S., Shannon, K.E., Wahrer, D.C., Bell, D.W., Chang, G., Hedge, S., Neuberg, D., Shafman, T., Tarbell, N.J., Mauch, P., Ishioka, C., Habner, D.A., and Diller, L. (1999). Heterozygous germline ATM mutations do not contribute to radiation-associated malignancies after Hodgkin's disease. Journal of Clinical Oncology 17: 1259. Nigro, J.M., Baker, S.J., Preisinger, A.C., Jessup, J.M., Hostetter, R., Cleary, K., Bigner, S.H., Davidson, N., Baylin, S., and Devilee, P. (1989). Mutations in the p53 gene occur in diverse human tumour types. Nature 342: 705- 708. Niino, M., Kikuchi, S., Fukazawa, T., Yabe, I, and Tashiro, K. (2000). Estrogen receptor gene polymorphism in Japanese patients with multiple sclerosis. Journal of the Neurological Sciences 179: 70-75. Niu, T., Qin, Z.S., Xu, X., and Liu, J.S. (2002). Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics 70: 157-169. Olsen, J., Hahnemann, J., Borresen-Dale, A-L., Brondum-Nielsen, K., Hammarstrom, L., Kleinerman, R., Kaariainen, H., Lonnqvist, T., Sankila, R., Seersholm, N., Tretli, S., Yuen, J., Boice, J., and Tucker, M. (2001). Cancer in patients with ataxia-telangiectasia and their relatives in the Nordic countries. Journal of the National Cancer Institute 93: 121-127. Omoto, Y., Inoue, S., Ogawa, S., Toyama, T., Yamashita, H., Muramatsu, M., Kobayashi, S., and Iwase, H. (2001). Clinical value of the wild-type estrogen receptor beta expression in breast cancer. Cancer Letters 163: 207-212. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 292 Oppitz, U., Bemthaler, U., Schindler, D., Sobeck, A., Hoehn, H., Platzer, M., Rosenthal, A., and Flentje, M. (1999). Sequence analysis of ATM gene in 20 patients with RTOG grade 3 or 4 acute and/or late tissue radiation side effects. International Journal of Radiation Oncology Biology Physiology 44: 981-988. Orphanos, V., McGown, G., Hey, Y., Boyle, J.M., and Santibanez-Koref, M. (1995). Proximal 6q, a region showing allele loss in primary breast cancer. British Journal of Cancer 71: 290-293. Osbome, C.K. (1998). Steroid hormone receptors in breast cancer management. Breast Cancer Research and Treatment 51: 226-238. Osbome, C.K., Youhmowitz, M.G., Knight, W.A., et al. (1980). The value of estrogen and progesterone receptors in the treatment of breast cancer. Cancer 46: 2829-2834. Ott, J. (2000). Predicting the range of linkage disequilibrium. Proceedings of the National Academy of Sciences USA 97: 2-3. Pardee, A.B. (1989). G1 events and regulation of cell proliferation. Science 246: 603-608. Pari, F.F., Cavener, D.R., and Dupont, W.D. (1989). Genomic DNA analysis of th estrogen receptor gene in breast cancer. Breast Cancer Research and Treatment 14: 57-64. Pecker, I., Avraham, K.B., Gilbert, D.J., Savitsky, K., Rotman, G., Hamik, R, et al. (1996). Identification and chromosomal localization of Atm, the mouse homolog of the ataxia-telangiectasia gene. Genomics 35: 39-45. Peterson, R.D., Funkhouser, J.D., Tuck-Muller, C.M., and Gatti, R.A. (1992). Cancer susceptibility in ataxia-telangiectasis. Leukemia 6: 8-13. Peto, J., Collins, N., Barfoot, R., Seal, S., Warren, W., Rahman, N., Easton, D.F., Evans, C., Deacon, J., and Stratton, M.R. (1999). Prevalence of BRCA1 and BRCA2 gene mutations in patients with early-onset breast cancer. Journal of the National Cancer Institute 90: 1824-1829. Pettersson, K., Grandien, K., Kuiper, G.G., and Gustafsson, J.A. (1997). Mouse estrogen receptor 1 3 forms estrogen response element-binding heterodimers with estrogen receptor [alpha]. Molecular Endocrinology 11: 1486-1496. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 293 Pharoah, P.D., Antoniou, A., Bobrow, M., Zimmem, R.L., Easton, D.F., and Ponder, B.A. (2002). Polygenic susceptibility to breast cancer and implications for prevention. Nature Genetics 31: 33-36. Picard, D., Bunone, G., Liu, J.W., and Donze, O. (1997). Steroid-independent activation of steroid receptors in mammalian and yeast cells and in breast cancer. Biochemical Society Transaction 25: 597-602. Pike, M.C. (1983). Hormonal risk factors, breast tissue age, and the age- incidence of breast cancer. Nature 303: 767-770. Pike, M.C., Kolonel, L.N., Henderson, B.E., Wilkens, L.R., Hankin, J.H., Feigelson, H.S., Wan, P.C., Stram, D.O., and Nomura, A.M. (2002). Breast cancer in a multiethnic cohort in Hawaii and Los Angeles: risk factor-adjusted incidence in Japanese equals and in Hawaiians exceeds that in whites. Cancer Epidemiology Biomarkers and Prevention 11: 795- 800. Pippard, E.C., Hall, A.J., Barker, D.J., and Bridges, B.A. (1988). Cancer in homozygotes and heterozygotes of ataxia-telangiectasia and xeroderma pigmentosum in Britain. Cancer Research 48: 2929-2932. Pritchard, J.J. and Rosenberg, N.A. (1999). Use of unlinked genetic markers to detect population stratification in association studies. American Journal of Human Genetics 65: 220-228. Pritchard, J.K. and Przeworski, M. (2001). Linkage disequilibrium in humans: models and data. American Journal of Human Genetics 69: 1-14. Rajan, J.V., Marquis, S.T., Gardener, H.P., and Chodosh, L.A. (1997). Developmental expression of Brca2 colocalizes with Brcal and is associated with proliferation and differentiation in multiple tissues. Developmental Biology 184: 385-401. Razandi, M., Pedram, A., and Levin, E.R. (2000). Plasma membrane estrogen receptors signal to antiapoptosis in breast cancer. Molecular Endocrinology 14: 1434-1447. Reed, W.B., Epstein, W.L., Boder, E., and Sedgwick, R. (1966). Cutaneous manifestations of ataxia-telangiectasia. JAMA 195: 746-753. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 294 Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R., and Lander, E.S. (2001a). Linkage disequilibrium in the human genome. Nature 411: 199- 204. Reich, D.E. and Goldstein, D.B. (2001b). Detecting association in a case-control study while correcting for population stratification. Genetic Epidemiology 20: 4-16. Resnick, S.M., Metter, E.J., and Zonderman, A.B. (1997). Estrogen replacement therapy and longitudinal decline in visual memory: A possible protective effect? Neurology 49: 1491-1497. Ries, L.A.G., Kosary, C.L., Hankey, B.F., Miller, B.A., Harras, A., and Edwards, B.K. (1997). SEER cancer statistics review, 1973-1994. National Cancer Institute, Bethesda, MD. Risch, N. and Merikangas, K. (1996). The future of genetic studies of complex human diseases. Science 273: 1516-1517. Risch, N.J. (2000). Searching for genetic determinants in the new millennium. Nature 405: 847-856. Roodi, N., Bailey, L.R., Kao, W.Y., Verrier, C.S., Yee, C.J., Dupont, W.D., and Pari, F.F. (1995). Estrogen receptor gene analysis in estrogen receptor- positive and receptor-negative primary breast cancer. Journal of the National Cancer Institute 87: 446-451. Rosenkranz, K., Hinney, A., Ziegler, A., Hermann, H., Fichter, M., Mayer, H., Siegfried, W., Young, J.K., Remschmidt, H., and Hebebrand, J. (1998). Systemic mutation screening of the estrogen receptor beta gene in probands of different weight extremes: Identification of several genetic variants. Journal of Clinical Endocrinology and Metabolism 83: 4524- 4527. Rubens, R.D. and Hayward, J.L. (1980). Estrogen receptors and response to endocrine therapy and cytotoxic chemotherapy in advanced breast cancer. Cancer 46: 2924-2992. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 295 Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., Hunt, S.E., Cole, C.G., Coggill, P.C., Rice, C.M., Ning, Z., Rogers, J., Bentley, D.R., Kwok, P.Y., Mardis, E.R., Yeh, R.T., Schultz, B., Cook, L., Davenport, R., Dante, M., Fulton, L., Hillier, L., Waterston, R.H., McPherson, J.D., Gilman, B., Schaffner, S., Van Etten, W.J., Reich, D., Higgins, J., Daly, M.J., Blumenstiel, B., Baldwin, J., Stange-Thomann, N., Zody, M.C., Linton, L., Lander, E.S., and Altshuler, D. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 822-823. Sandoval, C. and Swift, M. (1998). Treatment of lymphoid malignancies in patients with ataxia-telangiectasia. Med Pediatr Oncol 31: 491-497. Sandoval, N., Platzer, M., Rosenthal, A., Dork, T., Bendix, R., Skawran, B., Stuhrmann, M., Wegner, R.D., Sperling, K., Banin, S., Shiloh, Y., Baumer, A., Bemthaler, U., Sennefelder, H., Brohm, M., Weber, B.H., and Schindler, D. (1999). Characterization of ATM gene mutations in 66 ataxia telangiectasia families. Human Molecular Genetics 8: 79-69. Savitsky, K., Bar-Shira, A., Gilad, S., Rotman, G., Ziv, Y., Vanagaite, L., Tagle, D.A., Smith, S., Uziel, T., Sfez, S., Ashkenazi, M., Pecker, I., Frydman, M., Hamik, R., Patanjali, S.R., Simmons, A., Clines, G.A., Sartiel, A., Gatti, R.A., Chessa, L., Sanal, O., Lavin, M.F., Jaspers, N.G.J., Taylor, M.R., Arlett, C.F., Miki, T., Weissman, S.M., Lovett, M., Collins, F.S., and Shiloh, Y. (1995a). A single ataxia telangiectasia gene with a product similar to PI-3 kinase. Science 268: 1749-1753. Savitsky, K., Sfez, S., Tagle, D.A., Ziv, Y., Sartiel, A., Collins, F.S., Shiloh, Y., and Rotman, G. (1995b). The complete sequence of the coding region of the ATM gene reveals similarity to cell cycle regulators in different species. Human Molecular Genetics 4: 2025-2032. Schubert, E.L., Lee, M.K., Newman, B., and King, M.C. (1999). Single nucleotide polymorphisms (SNPs) in the estrogen receptor gene and breast cancer susceptibility. Journal of Steroid Biochemistry and Molecular Biology 71: 21-27. Scott, S.P., Bendix, R., Chen, P., Clark, R., Dork, T., and Lavin, M.F. (2002). Missense mutations but not allelic variants alter the function of ATM by dominant interference in patients with breast cancer. Proceedings of the National Academy of Science, USA 99: 925-930. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 296 Scully, R., Chen, J., Ochs, RL., Keegan, K., Hoekstra, M., Feunteun, J., and Livingston, D.M. (1997). Dynamic changes of BRCA1 subnuclear location and phosphorylation state are initiated by DNA damage. Cell 90: 425-435. Sedgwick, R.P. and Boder, E. (1991). Ataxia-Telangiectasia. In: deJong J (ed) Neuropathies and Spinocerebellar Atrophies. Elsevier, New York, pp 347-423. Shafman, T., Khanna, K.K., Kedar, P., Spring, K., Kozlov, S., Yen, T., Hobson, K., Gatei, M., Zhang, N., Watters, D., Egerton, M., Shiloh, Y., Kharbanda, S., Kufe, D., and Lavin, M.F. (1997). Interaction between ATM protein and c-Abl in response to DNA damage. Nature 387: 520-523. Shayeghi, M., Seal, S., Regan, J., Collins, N., Barfoot, R., Rahman, N., Ashton, A., Moohan, M., Wooster, R., Owen, R., Bliss, J.M., Stratton, M.R., and Yamold, J. (1998). Heterozygosity for mutations in the ataxia telangiectasia gene is not a major cause of radiotherapy complications in breast cancer patients. British Journal of Cancer 78: 922-927. Sherr, C.J. (1994). G1 phase progression: cycling on cue. Cell 79: 551-555. Shiloh, Y. (1995). Ataxia-telangiectasia: Closer to unraveling the mystery. Eur. J. Hum. Genetic 3: 116-138. Skliris, G.P., Carder, P.J., Lansdown, M.R., and Spiers, V. (2001). Immunohistochemical detection of ERbeta in breast cancer: towards more detailed receptor profiling? British Journal of Cancer 84: 1095-1098. Sluyser, M. (1992). Role of estrogen receptor variants in the development of hormone resistance in advanced breast cancer. Cancer 46: 2924-2992. Smith, E.P., Boyd, J., Frank, G.R., et al. (1994). Estrogen resistance caused by a mutation in the estrogen-receptor gene in a man. New England Journal of Medicine 331: 1056-1661. Sommer, S.S., Buzin, C.H., Jung, M., Zheng, J., Liu, Q., Jeong, S.J., Moulds, J., Nguyen, V.Q., Feng, J., Bennett, W.P., and Dritschilo, A. (2002). Elevated frequency of ATM gene missense mutations in breast cancer relative to ethnically matched controls. Cancer Genetics and Cytogenetics 134:25-32. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 297 Southey, M.C., Batten, L.E., McCredie, M.R.E., Giles, G.G., Dite, G., Hopper, J.L., and Venter, D.J. (1998). Estrogen receptor polymorphism at codon 325 and risk of breast cancer in women before age forty. Journal of the National Cancer Institute 90: 532-536. Spector, B.D., Filipovick, A.H., Perry, G.S.I., and Kersey, J.H. (1982). Epidemiology of Cancer in Ataxia-telangiectasia. In Bridges, B.A. and Hamden, D.G. (eds). Ataxia-Telangiectasia. John Wiley and Sons Ltd., New York, pp 103-138. Spiers, V. and Kerin, M.J. (2000). Prognostic significance of oestrogen receptor 1 3 in breast cancer. British Journal of Surgery 87: 405-409. Spiers, V., Parkes, A.T., Kerin, M.J., Walton, D.S., Carleton, P.J., Fox, J.N., et al. (1999). Co-expression of estrogen receptor-alpha and beta: poor prognostic factors in human breast cancer? Cancer Research 59: 525-528. Spring, K., Ahangari, F., Scott, S.P., Waring, P., Purdie, D.M., Chen, P.C., Hourigan, K., Ramsay, J., McKinnon, P.J., Swift, M., and Lavin, M.F. (2002). Mice heterozygous for mutation in ATM, the gene involved in ataxia-telangiectasia, have heightened susceptibility to cancer. Nature Genetics 32: 185-190. Srivastava, R.A., Srivastava, N., Avema, M., et al. (1997),. Estrogen up-regulates apolipoprotein E (ApoE) gene expression by increasing ApoE mRNA in the translating pool via the estrogen receptor alpha-mediated pathway. Journal of Biological Chemistry 272: 33360-33366. Stankovic, T., Kidd, A.M., Sutcliffe, A., McGuire, G.M., Robinson, P., Weber, P., Bedenham, T., Bradwell, A.R., Easton, D.F., Lennox, G.G., Haites, N., Byrd, P.J., and Taylor, A.M. (1998). ATM mutations and phenotypes in ataxia-telangiectasia families in the British Isles: expression of mutant ATM and the risk of leukemia, lymphoma and breast cancer. American Journal of Human Genetics 62: 334-345. Stavrou, I., Zois, C., Ioannidis, J.P., and Tsatsoulis, A. (2002). Association of polymorphisms of the oestrogen receptor alpha gene with the age of menarche. Human Reproduction 17: 1101-1105. Stein, B. and Yang, M. (1995). Repression of the interleukin-6 promoter by estrogen receptor is mediated by NF-kappa B and C/EBP beta. Molecular and Cellular Biology 15: 4971-4979. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 298 Stone, D.J., Rozovsky, I., Morgan, T.E., et al. (1998). Increased synaptic sprouting in response to estrogen via an apolipoprotein E-dependent mechanism: Implications for Alzheimer's disease. Journal of Neuroscience 18: 3180-3185. Stram, D.O. (2003). Human Heredity. Sullivan, J.M., Vander Zwaag, R., Lemp, G.F., Hughes, J.P, Maddock, V., Kroetz, F.W., Ramanathan, K.B., and Mirvis, D.M. (1988. Postmenopausal estrogen use and coronary atherosclerosis. Annals of Internal Medicine 108: 358-363. Sundarrajan, C., Liao, W.X., Roy, A.C., and Ng, S.C. (2001). Association between estrogen receptor-beta gene polymorphisms and ovulatory dysfunctions in patients with menstrual disorders. Journal of Clinical Endocrinology and Metabolism 86: 135-139. Swift, M., Morrell, D., Massey, R.B., and Chase, C.L. (1991). Incidence of cancer in 161 families affected by ataxia-telangiectasia. New England Journal of Medicine 325: 1831-1836. Swift, M., Reitnauer, P.J., Morrell, D., and Chase, C.L. (1987). Breast and other cancers in families with ataxia-telangiectasia. New England Journal of Medicine 316: 1289-1294. Swift, M., Sholman, L., Perry, M., and Chase, C. (1976). Malignant neoplasms in the families of patients with ataxia-telangiectasia. Cancer Research 36: 209-215. Tang, K., Fu, D.J., Julien, D., Braun, A., Cantor, C.R., and Koster, H. (1999). Chip-based genotyping by mass spectrometry. Proc Natl Acad Sci U S A 96: 10016-10020. Tang, M.X., Jacobs, D., Stem, Y., et al. (1996). Effect of oestrogen during menopause on risk and age at onset of Alzheimer's disease. Lancet 348: 429-432. Tavtigian, S.V., Simard, J., Rommens, J., Couch, F., Shattuck-Eidens, D., Neuhausen, S., Merajver, S., Thorlacius, S., Offit, K., Stoppa-Lyonnet, D., Belanger, C., Bell, R., Berry, S., Bogden, R., Chen, Q., Davis, T., Dumont, M., Frye, C., Hattier, T., Jammulapati, S., Janecki, T., Jiang, P., Kehrer, R., Leblanc, J.F., and Goldgar, D.E. (1996). The complete BRCA2 gene and mutations in chromosome 13q-linked kindreds. Nature Genetics 12: 333-337. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 299 Taylor, A.M. (1992). Ataxia telangiectasia genes and predisposition to leukaemia, lymphoma and breast cancer. Br J Cancer 66: 5-9. Taylor, A.M.R., Flude, E., Laher, B., Stacey, M., McKay, E., Watt, J., Green, S.H., and Harding, A.E. (1987). Variant forms of ataxia telangiectasia. Journal of Medical Genetics 24: 669-677. Telatar, M., Teraoka, S., Wang, Z., Chun, H.H., Liang, T., Castellvi-Bel, S., et al. (1998a). Ataxia-telangiectasia: identification and detection of founder- effect mutations in the ATM gene in ethnic populations. American Journal of Human Genetics 62: 86-97. Telatar, M., Wang, S., Castellvi-Bel, S., Tai, L.Q., Sheikhavandi, S., Regueiro, J.R., etal. (1998b). A model for ATM heterozygote identification in a large population: Four founder-effect ATM mutations identify most of Costa Rican patients with ataxia-telangiectasia. Molecular Genetcs and Metabolism 64: 36-43. Teraoka, S.N., Malone, K.E., Doody, D.R., Suter, N.M., Ostrander, E.A., Daling, J.R., and Concannon, P. (2001). Increased frequency of ATM mutations in breast carcinoma patients with early onset disease and positive family history. Cancer 92: 479-487. Theile, M., Seitz, S., Arnold, W., Jandrig, B., Frege, R., Schlag, P.M., Haensch, W., Gusk,i H., and Winzer, K.J. (1996). A defined chromosome 6q fragment (at D6S310) harbors a putative tumor suppressor gene for breast cancer. Oncogene 13: 677-685. Thomas, H.V., Key, T.J., Allen, D.S., et al. (1997b). A prospective study of endogenous serum hormone concentrations and breast cancer risk in premenopausal women on the island of Guernsey. British Journal of Cancer 75: 1075-1079. Thomas, H.V., Reeves, G.K., and Key, T.J. (1997a). Endogenous estrogen and postmenopausal breast cancer: a quantitative review. Cancer Causes & Control 9: 922-928. Thorstenson, Y.R., Shen, P., Tusher, V.G., Wayne, T.L., Davis, R.W., Chu, G., and Oefner, P.J. (2001). Global constraint of ATM polymorphism reveals significant functional constraint. American Journal of Human Genetics 69: 396-412. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 300 Tishkoff, S.A., Dietzsch, E., Speed, W., Pakstis, A.J., Kidd, J.R., Cheung, K., Bonne-Tamir, B., Santachiara-Benerecetti, A.S., Moral, P., and Krings, M. (1996). Global patterns of linkage disequilibrium at the CD4 locus and modem human origins. Science 271: 1380-1387. Tishkoff, S.A., Pakstis, A.J., Ruano, G., and Kidd, K.K. (2000). The accuracy of statistical methods for estimation of haplotype frequencies: An example from the CD4 locus. American Journal of Human Genetics 67: 518-522. Tishkoff, S.A. and Williams, S.M. (2002). Genetic analysis of African populations: human evolution and complex disease. Nature Reviews Genetics 3: 611-621. Todd, J.A., Mijovic, C., Fletcher, J., Jenkins, D., Bradwell, A.R., and Barnett, A.H. (1989). Identification of susceptibility loci for insulin-dependent diabetes mellitus by trans-racial gene mapping. Nature 338: 587-589. van Agthoven, T., Timmermans, M., Foekens, J.A., Dorssers, L., and Henzen- Logmans, S.C. (1994). Differential expression of estrogen, progesterone, and epidermal growth factor receptors in normal, benign, and malignant human breast tissues using dual staining immunohistochemistry. American Journal of Pathology 144: 1238-1246. Verpillat, P., Bouley, S., Campion, D., Hannequin, D., Dubois, B., Belliard, S., Puel, M., Thomas-Anterion, C., Agid, Y., Brice, A., and Clerget-Darpoux, F. (2001). Use of haplotype information to test involvement of the LRP gene in Alzheimer's disease in the French population. European Journal of Human Genetics 9: 464-468. Vladusic, E.A., Homby, A.E., Guerra-Vladusic, F.K., and Lupu, R. (1998). Expression of estrogen receptor beta messenger RNA variant in breast cancer. Cancer Research 58: 210-214. Vorechovsky, I., Luo, L., Lindblom, A., Negrini, M., Webster, A.B.A., Croce, C.M., and Hammarstrom, L. (1996b). ATM mutations in cancer families. Cancer Research 56: 4130-4133. Vorechovsky, I., Rasio, D., Luo, L., Monaco, C., Hammarstrom, L., Webster, A.D.B., Zaloudik, J., Barbanti-Brodano, G., James, M., Russo, G., Croce, C.M., and Negrini, M. (1996a). The ATM gene and susceptibility to breast cancer: analysis of 38 breast tumors reveals no evidence for mutation. Cancer Research 56: 2726-2732. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 301 Wang, Y„ Cortez, D„ Yazdi, P., Neff, N., Elledge, S.J., and Qin, J. (2000). BASC, a super complex of BRCA1-associated proteins involvement in the recognition and repair of aberrant DNA structures. Genes and Development 14: 927-939. Watters, D., Khanna, K.K., Beamish, H., Birrell, G., Spring, K., Kedar, P., Gatei, M., Stenzel, D., Hobson, K., Kozlov, S., Zhang, N., Farrell, A., Ramsay, J., Gatti, R., and Lavin, M. (1997). Cellular localization of the ataxia- telangiectasia (ATM) gene product and discrimination between mutated and normal forms. Oncogene 14: 1911-1921. Weiss, K.V. and Terwilliger, J.D. (2000). How many diseases does it take to map a gene with SNPs. Nature Genetics 26: 151-157. Woods, C.G. and Taylor, A.M. (1992). Ataxia telangiectasia in the British Isles: the clinical and laboratory features of 70 affected individuals. Q J Med 82: 169-179. Wooster, R., Bignell, G., Lancaster, J., Swift, S., Seal,S., Mangion, J., Collins, N., Gregory, S., Gumbs, C., and Micklem, G. (1995). Identification of the breast cancer susceptibility gene BRCA2. Nature 378: 789-792. Wooster, R., Ford, D., Mangion, J., Ponder, B.A., Peto, J., Easton, D.F., and Stratton, M.R. (1993). Absence of linkage to the ataxia telangiectasia locus in familial breast cancer. Human Genetics 92: 91-94. Wright, J., Teraoka, S., Onengut, S., Tolun, A., Gatti, R.A., Ochs, H.D., and Concannon, P. (1996). A high frequency of distinct ATM gene mutations in ataxia-telangiectasia. American Journal of Human Genetics 59: 839- 846. Yaffe, K., Lui, L-Y., Grady D., Stone, K., and Morin, P. (2002). Estrogen receptor 1 polymorphisms and risk of cognitive impairment in older women. Society of Biological Psychiatry 51: 677-682. Yaich, L., Dupont, W.D., Cavener, D.R., and Pari, F.F. (1992). Analysis of the PvuII estriction-ffagment-length-polymorphism and exon structure of the estrogen-receptor gene in breast-cancer and peripheral-blood. Cancer Research 52: 77-83. Zakian, V.A. (1995). ATM-related genes: What do they tell us about functions of the human gene? Cell 82: 685-687. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 302 Zuppan, P., Hall, J.M., Lee M.K., et al. (1991). Possible linkage of the estrogen receptor gene to breast cancer in a family with late-onset disease. American Journal of Human Genetics 48: 1065-1068. Zuppan, P.J., Hall, J.M., Ponglikitmongkol, M., Spielman, R., and King, M.C. (1989). Polymorphisms at the estrogen-receptor (ESR) locus and linkage relationships on chromosome-6Q. Cytogenetics and Cellular Genetics 51:1116. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 303 APPENDIX A VISUAL BASIC PROGRAM CALCULATING D’ AND 95% CONFIDENCE INTERVALS Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 304 Option Explicit Sub Dprime_VBA_030104() 'MCP's macro revised by PB and MCP Dim pop(6) As String Dim cpop(6) As String Application.Display Alerts = False pop(0) = "t" pop(l) = "b" pop(2) = "h" P°P(3) = "j" pop(4) = "1" pop(5) = "w" P°P(6) = "g" cpop(0) = "T" cpop(l) = "B" cpop(2) = "H" cpop(3) - "J" cpop(4) = "L" cpop(5) = "W" cpop(6) = "G" Call CombineAndOrder(pop, cpop) Call Calc(pop, cpop) Call Plot(pop, cpop) Windows("Combined.xls").Activate ActiveWorkbook.Save ActiveWorkbook.Close End Sub Sub ACGT(m, mm, k, i) Dim Ac As Integer Dim Cc As Integer Dim Gc As Integer Dim Tc As Integer Dim Ape As Single Dim Cpc As Single Dim Blankpc As Single Dim Gpc As Single Dim Tpc As Single Dim Alpc As Single Dim HWp As Single Ac = Cells(mm, k) Cc = Cells(mm + 1, k) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 305 Sub ACGT(m, mm, k, i) [continued] Gc = Cells(mm + 2, k) Tc = Cells(mm + 3, k) Blankpc = Cells(mm + 4, k) Ape = Cells(mm + 6, k) Cpc = Cells(mm + 7, k) Gpc = Cells(mm + 8, k) Tpc = Cells(mm + 9, k) Alpc = 1# If (Ac > 0 And Ape < Alpc) Then Alpc = Ape If (Cc > 0 And Cpc < Alpc) Then Alpc = Cpc If (Gc > 0 And Gpc < Alpc) Then Alpc = Gpc If (Tc > 0 And Tpc < Alpc) Then Alpc = Tpc If Alpc >0.5 Then Alpc = 0# HWp = Cells(mm + 16, k) Windows("SNP_Structure.xls").Activate Sheets("SNPs_All").Select Cells(i, m) = Blankpc Cells(i, m + 1) = Ape Cells(i, m + 2) = Cpc Cells(i, m + 3) = Gpc Cells(i, m + 4) = Tpc Cells(i, m + 5) = Alpc Cells(i, m + 6) = HWp End Sub Sub AddPlotW orkbook(kk) Workbooks.Add Sheets.Add Sheets.Add Sheets.Add Sheets.Add Sheets.Add Sheets.Add Sheets("Sheetl ").Name = "Input" Sheets("Sheet2").Name = "Lod" Sheets("Sheet3").Name = "r2" Sheets("Sheet4").Name = "DPrime" Sheets("Sheet5").Name = "DPrimeGelOpc" Sheets("Sheet6").Name = "DPrime_ci" Sheets("Sheet7").Name = "D Prim eciG elO pc" End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 306 Sub Calc(pop, cpop) Dim i As Integer Dim ijk As Integer Dim j As Integer Dim k As Integer Dim kstart As Integer Dim kstop As Integer Dim m As Integer Dim mO As Integer Dim m l As Integer Dim m2 As Integer Dim m9 As Integer Dim mij As Integer Dim mpop As Integer Dim mstop As Integer Dim n As Integer Dim SNPn As Integer Dim a As Double Dim adplow As Double Dim adpmax As Double Dim adpupp As Double Dim b As Double Dim c(l To 10) As Double Dim ca As Double Dim caml As Double Dim cb As Double Dim cp As Double Dim crit As Double Dim data(l To 3, 1 To 3) As Double Dim dif As Double Dim dplow As Double Dim dpmax As Double Dim dpupp As Double Dim lik As Double Dim likequ As Double Dim liklow As Double Dim likmax As Double Dim likupp As Double Dim oldlik As Double Dim lodmax As Double Dim pdO As Double Dim pdl As Double Dim pOd As Double Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 307 Sub Calc(pop, cpop) [continued] Dim p ld As Double Dim pOO As Double Dim pOl As Double Dim plO As Double Dim pi 1 As Double Dim pOOequ As Double Dim pOlequ As Double Dim plOequ As Double Dim pi lequ As Double Dim pOOlow As Double Dim pOllow As Double Dim plOlow As Double Dim pi 1 low As Double Dim pOOmax As Double Dim pOlmax As Double Dim plOmax As Double Dim pi lmax As Double Dim pOOupp As Double Dim pOlupp As Double Dim plOupp As Double Dim pi lupp As Double Dim t2 A s Double Dim temp As Double Dim tot As Double Dim xOl As Double Dim xl As Double Dim x2 As Double Dim x3 As Double Dim xnew As Double Dim yaim As Double Dim yl As Double Dim y2 As Double Dim y3 As Double Dim ynew As Double Dim cxpop As String Dim DPrime_r As String Dim xpop As String Windows("Combined.xls").Activate Active W orkbook. Save Sheets("Combined").Select i = 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 308 Sub Calc(pop, cpop) [continued] Do While i <= 121 If (Cells(l, i + 3) = "") Then Exit Do i = i+ 1 Loop SNPn = i - 1 For mpop - 0 To 6 xpop = pop(mpop) cxpop = cpop(mpop) DPrime r = "DPrime " & xpop Workbooks.Add Active W orkbook. Save As Filename:=DPrime_r Windows("Combined.xls").Activate Sheets("Combined").Select Cells(500, 1) = "i" Cells(500, 2) = "j" Cells(500, 3) = "N" Cells(500,4) = "pOOmax" Cells(500, 5) = "pOlmax" Cells(500, 6) = "plOmax" Cells(500, 7) = "pi lmax" Cells(500, 8) = "pld" Cells(500, 9) = "pdl" Cells(500, 10) = "adpmax" Cells(500, 11) = "lod" Cells(500, 12) = "r2" Cells(500, 13) = "adplow" Cells(500, 14) = "adpupp" Cells(500,16) = "data(l,l)M Cells(500,17) = "data(l,2)M Cells(500, 18) = "data(l,3)" Cells(500,19) = "data(2,l)M Cells(500, 20) - "data(2,2)M Cells(500,21) = "data(2,3)" Cells(500, 22) = "data(3,l)M Cells(500,23) = "data(3,2)M Cells(500, 24) = ”data(3,3)" Cells(500,26) = "alli%" Cells(500,27) = "faili%" Cells(500, 28) = "allj%" Cells(500,29) = "failj%" Rows("500:500").HorizontalAlignment = xlCenter kstart = -1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 309 Sub Calc(pop, cpop) [continued] kstop = -1 For k = 2 To 350 If ((Cells(k, 2) = cxpop Or cxpop = "T" _ Or (cxpop = "G" And Cells(k, 2) o "B")) _ And kstart = -1) Then kstart = k If ((Cells(k, 2) = cxpop Or cxpop = "T" _ Or (cxpop = "G" And Cells(k, 2) o "B")) _ And kstart > 0) Then kstop = k Next k ijk = 1 i = 1 Do While i <= SNPn -1 j = i+ 1 Do While j <= SNPn m = 1 Do While m <= 3 n = 1 Do While n <= 3 data(m, n) = 0 n = n + 1 Loop m = m + 1 Loop k = kstart Do While k <= kstop m = Cells(k, i + 3) + 1 n = Cells(k, j + 3) + 1 If (m <= 3 And n <= 3) Then _ data(m, n) = data(m, n) + 1 k = k + 1 Loop m - 1 Do While m <= 3 n = 1 Do While n <= 3 Cells(500 + ijk, 16 + (m - 1) * 3 + (n - 1)) = data(m, n) n = n + 1 Loop m = m + 1 Loop c(l) = data(l, 1) c(2) = data(l, 2) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sub Calc(pop, cpop) [continued]\ c(3) = data(l, 3) c(4) = data(2, 1) c(5) = 0.5 * data(2, 2) c(6) = c(5) c(7) = data(2, 3) c(8) = data(3, 1) c(9) = data(3,2) c(10) = data(3, 3) m = 1 tot = 0 Do While m <= 10 tot = tot + c(m) m = m + 1 Loop pdO = (2 * (c(l) + c(4) + c(8» + (c(2) + c(5) + c(6) + c(9))) / (2 * tot) pdl = 1 - pdO pOd = (2 * (c(l) + c(2) + c(3)) + (c(4) + c(5) + c(6) + c(7))) / (2 * tot) p ld = 1 - pOd pOO = pOd * pdO pOl = pOd * pdl plO = p ld * pdO p ll = p ld * pdl Call LogLikCalc(lik, pOO, pOd, pdO, data()) likequ = lik pOOequ = pO O pOlequ = pOl plOequ = plO p lle q u = p ll LI: If (pOO * p ll +p01 * pi 0 < 0.0000001) Then pOOmax = pO O pOlmax = pOl plOmax = plO pi lmax = p i 1 adpmax = 1 likmax = likequ pOOlow = pOOmax pOllow = pOd - pOOlow plOlow = pdO - pOOlow pi 1 low = 1 - pOOlow - pOl low - pi Olow Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 311 LI: [continued] liklow = likmax adplow = 1 pOOupp = pOOmax pOlupp = pOd - pOOupp plOupp = pdO - pOOupp pi lupp = 1 - pOOupp - pOlupp - plOupp likupp = likmax adpupp =1 GoTo LA End If cp = (pOO * pi 1) / (pOO * pi 1 + pOl * plO) c(5) = cp * data(2,2) c(6) = (1 - cp) * data(2,2) oldlik = lik pOO = (2 * c(l) + c(2) + c(4) + c(5)) / (2 * tot) pOl = pOd - pO O plO = pdO - pO O p ll = 1 -pOO-pOl - plO Call LogLikCalc(lik, pOO, pOd, pdO, data()) If (Abs(oldlik - lik) > 0.000001) Then GoTo LI likmax = lik pOOmax = pO O pOlmax = pOl plOmax = plO pi lmax = pi 1 dpmax = 0 Call DPrimeCalc(pl 1, pld, pdl, dpmax) adpmax = Abs(dpmax) crit = 3.84146 If (pOOequ < pOOmax) Then dif = 2 * (likmax - likequ) If (dif < crit) Then pOOlow = pOOequ lik = likequ GoTo L3 End If xl = pOOequ 'xl is associated with low lik, viz. yl yl = likequ x2 = pOOmax 'x2 is associated with high lik, viz. y2 y2 = likmax Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 312 LI: [continued] yaim = likmax - crit / 2 'yaim is the lik that is sought, yl and y2 must lie either side of it L2: x3 = (xl + x2) / 2 Call LogLikCalc(lik, x3, pOd, pdO, data()) If (Abs(x2 - x l) < 0.00001) Then pOOlow = x3 GoTo L3 End If y3 = lik If (y3 <= yaim) Then xl = x3 y i = y3 End If If (y3 > yaim) Then x2 = x3 y2 - y3 End If GoTo L2 L3: pOllow = pOd - pOOlow plOlow = pdO - pOOlow pi llow = 1 - pOOlow - pOl low - piOlow liklow = lik dplow = 0 Call DPrimeCalc(pl llow, pld, pdl, dplow) adplow = Abs(dplow) If (adpmax > 0.9999) Then pOOupp = pOOmax pOlupp = pOd - pOOupp plOupp = pdO - pOOupp pi lupp = 1 - pOOupp - pOlupp - plOupp likupp = likmax adpupp = adpmax GoTo L6 End If xl = pdO If (xl > pOd) Then xl = pOd xl = (pOOmax + x l) / 2 mij = 0 Call LogLikCalc(lik, x l, pOd, pdO, data()) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L3: [continued] dif = 2 * (likmax - lik) If (dif < crit) Then pOOupp = xl GoTo L5 End If temp = xl y l = lik xl = pOOmax yl = likmax x2 = temp L4: b = (y2 - y l) / (x2 - xl) a = y2 - b * x2 ynew = likmax - crit / 2 xnew = (ynew - a) / b Call LogLikCalc(lik, xnew, pOd, pdO, data()) If (Abs(xnew - x l) < 0.00001) Then pOOupp = xnew GoTo L5 End If x2 = xl y2 = yl xl = xnew yl = lik GoTo L4 L5: pOlupp = pOd - pOOupp plOupp = pdO - pOOupp pi lupp = 1 - pOOupp - pOlupp - plOupp likupp = lik dpupp = 0 Call DPrimeCalc(pl lupp, pld, pdl, dpupp) adpupp = Abs(dpupp) L6: End If If (pOOequ >= pOOmax) Then dif = 2 * (likmax - likequ) If (dif < crit) Then pOOupp = pOOequ lik = likequ GoTo L I3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 314 L6: [continued] End If xl = pOOequ 'xl is associated with low lik, viz. yl yl = likequ x2 = pOOmax 'x2 is associated with high lik, viz. y2 y2 = likmax yaim = likmax - crit / 2 'yaim is the lik that is sought, yl and y2 must lie either side of it L12: x3 = (xl + x2) / 2 Call LogLikCalc(lik, x3, pOd, pdO, data()) If (Abs(x2 - x l) < 0.00001) Then pOOupp = x3 GoTo L I3 End If y3 = lik If (y3 <= yaim) Then xl — x3 y l= y 3 End If If (y3 > yaim) Then x2 = x3 y2 = y3 End If GoTo L I2 L13: pOlupp = pOd - pOOupp plOupp = pdO - pOOupp pi lupp = 1 - pOOupp - pOlupp - plOupp likupp = lik dpupp = 0 Call DPrimeCalc(pl lupp, pld, pdl, dpupp) adpupp = Abs(dpupp) If (adpmax > 0.9999) Then pOOlow = pOOmax pOllow = pOd - pOOlow plOlow = pdO - pOOlow pi llow = 1 - pOOlow - pOllow - plOlow liklow = likmax adplow = adpmax GoTo L I6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 315 L I3: [continued] End If x O l= pOd If (xOl > pdl) Then xOl = pdl xl = pOd - xOl xl = (pOOmax + x l) / 2 mij = 0 Call LogLikCalc(lik, x l, pOd, pdO, data()) dif = 2 * (likmax - lik) If (dif < crit) Then pOOlow = xl GoTo L I5 End If y2 = lik temp = xl xl = pOOmax yl = likmax x2 = temp L14: b = (y2 - y l) / (x2 - x l) a = y2 - b * x2 ynew = likmax - crit / 2 xnew = (ynew - a) / b Call LogLikCalc(lik, xnew, pOd, pdO, data()) If (Abs(xnew - x l) < 0.000001) Then pOOlow = xnew GoTo L I5 End If x2 = xl y2 = yl x l = xnew yl = lik mij = mij + 1 GoTo L14 L15: pOllow = pOd - pOOlow plOlow = pdO - pOOlow pi llow = 1 - pOOlow - pOllow - plOlow liklow = lik dplow = 0 Call DPrimeCalc(pl llow, pld, pdl, dplow) adplow = Abs(dplow) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 316 L16: LA: End If + ijk, 1) = i + ijk, 2) = j + ijk, 3) = tot + ijk, 4) = pOOmax + ijk, 5) = pOlmax + ijk, 6) = plOmax + ijk, 7) = pi lmax + ijk, 8) = pld + ijk, 9) = pdl + ijk, 10) = adpmax lodmax = LoglO(Exp(likmax - likequ)) Cells(500 + ijk, 11) = lodmax r2 = (pllm ax - pdl * p ld ) A 2 temp = ((pdl - pdl * pdl) * (pld - p ld * pld)) If (temp > 0) Then r2 = r2 / temp Cells(500 + ijk, 12) = r2 If (adplow > adpupp) Then temp = adplow adplow = adpupp adpupp = temp End If If (adplow = 1 ) Then adplow = 0 Cells(500 + ijk, 10) = "" End If Cells(500 + ijk, 13) = adplow Cells(500 + ijk, 14) = adpupp Range(Cells(500 + ijk, 1), Cells(500 + ijk, 3)).Select Selection.NumberFormat = "0" Range(Cells(500 + ijk, 4), Cells(500 + ijk, 14)).Select Selection.NumberFormat = "0.0000" ijk = ijk + 1 j = j + l Loop i = i+ 1 Loop i = 1 Do While i <= SNPn mO = 0 m l = 0 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 317 LA: [continued] m2 = 0 m9 = 0 k = kstart Do While k <= kstop m = Cells(k, i + 3) If (m = 0) Then mO = mO + 1 If (m = 1) Then m l = m l + 1 If (m = 2) Then m2 = m2 + 1 If (m = 9) Then m9 = m9 + 1 k = k + 1 Loop ca = (2 * mO + m l) / (2 * (mO + ml + m2)) caml = 1 - ca If (caml < ca) Then ca = caml cb = m9 / (mO + ml + m2 + m9) m = 1 mstop = (SNPn * (SNPn -1)) / 2 + 1 Do While m <= mstop If (Cells(500 + m, 1) = i) Then Cells(500 + m, 26) = ca Cells(500 + m, 27) = cb End If If (Cells(500 + m, 2) = i) Then Cells(500 + m, 28) = ca Cells(500 + m, 29) = cb End If Range(Cells(500 + m, 26), Cells(500 + m, 29)).Select Selection.NumberFormat = "0.00%" m = m + 1 Loop i = i+ 1 Loop Cells.EntireColumn.AutoFit Range(Cells(500, 1), Cells(500 + ijk - 1, 29)).Cut Windows(DPrime_r).Activate Range(Cells(l, 1), Cells(l, 1)).Select ActiveSheet. Paste Active Workbook. Save ActiveWorkbook.Close Windows("Combined.xls"). Activate Next mpop End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sub CombineAndOrder(pop, cpop) 'Create Excel file called Combined of all SNPs from all plates 'The maximum number of SNPs allowed = 120 Dim i As Integer Dim k As Integer Dim k l As Integer Dim k2 As Integer Dim kk As Integer Dim kend As Integer Dim m As Integer Dim nplates As Integer Dim mpop As Integer Dim mpopl As Integer Dim mpop7 As Integer Dim Blankpc As Single Dim cxpop As String Dim mO As String Dim m l As String Dim m2 As String Dim npath As String Dim plate_name As String Dim ShN As String Dim SNPid As String Dim SNP_Structure_r As String Dim xpop As String npath = InputBox("Enter the directory where files are located", _ Default:="C:\IGFl") ChDir npath nplates = Application.InputBox(prompt:="Enter number of plates" Type:=l) Workbooks.Add Sheets.Add Sheets.Add Sheets("Sheetl").Name = "Combined" Sheets("Sheet2").Name = "Combined_MIT" Sheets("Sheet3").Name = "SNP_Structure" Active Workbook. SaveAs Filename:="Combined.xls" Sheets("Combined"). Select kk = 1 Call FormatSheet(kk) Sheets("Combined_MIT").Select Call FormatSheet(kk) Sheets(" SNP_Structure"). Select Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 319 Sub CombineAndOrder(pop, cpop) [continued] Call FormatSheet(kk) Workbooks.Add Sheets("Sheetl ").Name = "T" Sheets.Add Sheets("Sheet2").Name = "TpclO" For mpop = 1 To 6 cxpop = cpop(mpop) Sheets.Add mpopl = mpop *2+1 ShN = "Sheet" & mpopl Sheets(ShN).Name = cxpop Call FormatSheet( 1) Sheets.Add mpopl = mpopl + 1 ShN = "Sheet" & mpopl Sheets(ShN).Name = cxpop & "pclO" kk= 1 Call FormatSheet(kk) Next mpop Active Workbook. SaveAs Filename:="MIT_Export" Active Workbook. Save Active W orkbook. Close Windows("Combined.xls"). Activate Sheets("Combined").Select Workbooks.Open Filename:="Plate_l .xls" Sheets("T"). Select Range(Cells(l, 1), Cells(491, 7)).Copy Windows("Combined.xls").Activate Range(Cells(l, 1), Cells(l, l)).Select ActiveSheet.Paste Windows("Plate_l .xls"). Activate Active W orkbook. Close Windows("Combined.xls").Activate Cells(l, 1) = "ID" Range("C:C").Insert Shift:=xlToRight Cells(l, 3) = "Status" For i = 2 To 350 Cells(i, 3) = 0 Next i For i = 2 To nplates plate name = "Plate " & i k = 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 320 Sub CombineAndOrder(pop, cpop) [continued] Call Nextk(k) W orkbooks. Open F i lename: =plate_name Call Insert SNPs(k) Windows(plate_name).Activate Active W orkbook. Close Next i Workbooks.Open Filename:="SNP_Structure.xls" Sheets("SNPs_All").Select Cells.Copy Windows("Combined.xls").Activate Sheets("SNP_Structure").Select Cells.Select ActiveSheet.Paste Windows("SNP_Structure.xls").Activate Sheets("SNPs_All").Select For mpop = 0 To 6 cxpop = cpop(mpop) mpop7 = mpop * 7 Cells(l, 20 + mpop7) = cxpop & "Blanks%" Cells(l, 21 + mpop7) = cxpop & "A%" Cells(l, 22 + mpop7) = cxpop & "C%" Cells(l, 23 + mpop7) = cxpop & "G%" Cells(l, 24 + mpop7) = cxpop & "T%" Cells(l, 25 + mpop7) = cxpop & "Al%" Cells(l, 26 + mpop7) = cxpop & "HWp%" Next mpop Range(Cells(l, 1), Cells(l, 68)).Select Selection.HorizontalAlignment = xlCenter 'Sort the SNPs in Combined into SNPStructure order Windows("Combined.xls").Activate Sheets("Combined").Select For k = 4 To 124 If (Cells(l, k) = "") Then kend = k - 1 Exit For End If Next k kl = kend + 1 k2 = kl + (kend - 4) Range(Cells(l, 4), Cells(491, kend)).Cut Range(Cells(l, kl), Cells(l, kl)).Select ActiveSheet.Paste Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 321 Sub CombineAndOrder(pop, cpop) [continued] Windows("Combined.xls").Activate Sheets(" SNP_Structure") .Select i = 2 Do While i <= 123 If (Cells(i, 1) = "") Then GoTo Ld SNPid = Cells(i, 2) Windows("Combined.xls").Activate Sheets("Combined").Select For k = kl To k2 If (Cells(l, k) = SNPid) Then Range(Cells(l, k), Cells(491, k)).Copy Range(Cells(l, i + 2), Cells(l, i + 2)).Select ActiveSheet.Paste Exit For End If Next k Blankpc = Cells(356, k) Windows("Combined.xls").Activate Sheets("Combined").Select Call ACGT(20, 352, k, i) Windows("Combined.xls").Activate Sheets("Combined").Select Call ACGT(27, 372, k, i) Windows("Combined.xls").Activate Sheets(" Combined"). Select Call ACGT(34, 392, k, i) Windows("Combined.xls").Activate Sheets("Combined").Select Call ACGT(41, 412, k, i) W indows("Com bined.xls").Activate Sheets("Combined"). Select Call ACGT(48,432, k, i) Windows("Combined.xls").Activate Sheets("Combined").Select Call ACGT(55,452, k, i) Windows("Combined.xls"). Activate Sheets("Combined").Select Call ACGT(62,472, k, i) Windows("SNP_Structure.xls").Activate Sheets("SNPs_All").Select Cells(i, 8) = Blankpc Range(Cells(i, 8), Cells(i, 8)).NumberFormat = "0.00%" Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 322 Sub CombineAndOrder(pop, cpop) [continued] Range(Cells(i, 20), Cells(i, 68)).NumberFormat = "0.00%" Windows("Combined.xIs").Activate Sheets(" SNP_Structure"). Select i = i+ 1 Loop Ld: Sheets("Combined").Select Range(Cells(l, kl), Cells(491, k2)).Delete For k = 4 To 123 If (Cells(l, k) = "") Then Exit For m = k - 3 Cells(l, k) = "SNP" & m Next k Sheets("Combined").Select Cells.Copy Sheets(" Combined_MIT"). Select Range(Cells(l, 1), Cells(l, l)).Select ActiveSheet.Paste For k = 4 To 123 If (Cells(l, k) = "") Then Exit For If (Cells(352, k) > 0 And Cells(353, k) > 0) Then mO = "1 1" m l = "1 2" m2 = "2 2" GoTo Le End If If (Cells(352, k) > 0 And Cells(354, k) > 0) Then mO = "1 1" m l = "1 3" m2 = "3 3" GoTo Le End If If (Cells(352, k) > 0 And Cells(355, k) > 0) Then mO = " 1 1" m l = " l 4" m2 = "4 4" GoTo Le End If If (Cells(353, k) > 0 And Cells(354, k) > 0) Then mO = "2 2" m l = "2 3" m2 = "3 3" Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 323 Ld: [continued] GoTo Le End If If (Cells(353, k) > 0 And Cells(355, k) > 0) Then mO = "2 2" m l = "2 4" m2 = "4 4" GoTo Le End If If (Cells(354, k) > 0 And Cells(355, k) > 0) Then mO = "3 3" m l = "3 4" m2 = "4 4" GoTo Le End If Le: For i = 2 To 350 If (Cells(i, k) = 0) Then Cells(i, k + 130) = mO If (Cells(i, k) = 1) Then Cells(i, k + 130) = m l If (Cells(i, k) = 2) Then Cells(i, k + 130) = m2 If (Cells(i, k) = 9) Then Cells(i, k + 130) = "0 0" Next i Next k Range(Cells(2,134), Cells(350,253)).Cut Range(Cells(2, 4), Cells(2, 4)).Select ActiveSheet.Paste Windows("SNP_Structure.xls").Activate ActiveWorkbook.SaveAs Filename:="SNP_Structure_TEMP.xls" Call SNP_Structure_eth(27, 68, 0, 0, 0, pop) Call SNP_Structure_eth(34, 68,20,26,1, pop) Call SNP_Structure_eth(41, 68, 20, 33, 2, pop) Call SNP_Structure_eth(48, 68, 20, 40, 3, pop) Call SNP_Structure_eth(55,68,20,47,4, pop) Call SNP_Structure_eth(62, 68, 20, 54, 5, pop) Call SNP_Structure_eth(20, 61, 0, 0, 6, pop) For mpop = 0 To 6 xpop = pop(mpop) SNP_Structure_r = "SNP_Structure_" & xpop & ".xls" Workbooks.Open Filename:=SNP_Structure_r Sheets("SNPs_All").Select Cells.Copy Sheets("SNPs_Gel0pc").Select Range(Cells(l, 1), Cells(l, 1)).Select Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 324 Le: [continued] ActiveSheet. Paste i = 2 Do While i <= 121 If (Cells(i, 1) = "") Then Exit Do If (Cells(i, 17) >= 0.1) Then GoTo LL1 If (Cells(i + 1, l ) o "") Then_ Cells(i + 1 ,4 ) = Cells(i + 1 ,4 ) + Cells(i, 4) Range(Cells(i, 1), Cells(i, 18)).Delete Shift:=xlUp i = i- 1 LL1: i = i+ 1 Loop Cells.EntireColumn.AutoFit Next mpop ' MIT Export Workbooks.Open Filename:="MIT_Export.xls" Call MIT_Export(pop, cpop) Windows("MIT_Export.xls").Activate Acti ve W orkbook. Save Active W orkbook. Close End Sub Sub DPrimeCalc(pl 1, pld, pdl, dp) Dim a As Single dp = p ll - p ld * pdl If (dp >= 0) Then a = p ld * (1 - pdl) If (a > (pdl * (1 - pld))) Then a = pdl * (1 - pld) End If If (dp < 0) Then a = (1 - pld) * (1 - pdl) If (a > (pdl * pld)) Then a = pdl * p ld End If dp = dp / a End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 325 Sub FormatSheet(kk) Cells.Select With Selection. Font .Name = "Arial" .Size = 8 End With End Sub Sub Insert_SNPs(k) Sheets("T").Select Range(Cells(l, 3), Cells(491, 7)).Copy Windows("Combined.xls").Activate Range(Cells(l, k), Cells(l, k)).Select ActiveSheet.Paste End Sub Sub LogLikCalc(lik, pOO, pOd, pdO, data) Dim pOl As Single Dim plO As Single Dim p i 1 As Single pOl = pOd - pO O plO = pdO - pO O p ll = 1 -pOO-pOl - plO lik = 0 If (pOO > 0) Then lik = data(l, 1) * Log(p00 A 2) If (pOO = 0 And data(l, 1) > 0) Then _ lik = data(l, 1) * Log(O.OOOOOOOOOOOl) If (pOO * pOl > 0) Then lik = lik + data(l, 2) * Log(2 * pO O * pOl) If (pOO * pOl = 0 And data(l, 2) > 0) Then _ lik = data(l, 2) * Log(2 * 0.000000000001) If (pOl > 0) Then lik = lik + data(l, 3) * Log(p01 A 2) If (pOl = 0 And data(l, 3) > 0) Then _ lik = data(l, 3) * Log(O.OOOOOOOOOOOl) If (plO * pOO > 0) Then lik = lik + data(2, 1) * Log(2 * plO * pOO) If (plO * pO O = 0 And data(2,1) > 0) Then _ lik = data(2, 1) * Log(2 * 0.000000000001) If (pOO * p ll +pl0*p01 >0)Then_ lik = lik + data(2, 2) * Log(2 * (pOO * pi 1 + plO * pOl)) If (pOO * pi 1 + plO * pOl = 0 And data(2, 2) > 0) Then _ lik = data(2,2) * Log(2 * 2 * 0.000000000001) If (pOl * pi 1 > 0) Then lik = lik + data(2, 3) * Log(2 * pOl * pll) If (pOl * pi 1 = 0 And data(2, 3) > 0) Then _ lik = data(2, 3) * Log(2 * 0.000000000001) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 326 Sub LogLikCalc(lik, pOO, pOd, pdO, data) [continued] If (plO > 0) Then lik = lik + data(3,1) * Log(plO A 2) If (plO = 0 And data(3, 1) > 0) Then _ lik = data(3, 1) * Log(O.OOOOOOOOOOOl) If (plO * pi 1 > 0) Then lik = lik + data(3, 2) * Log(2 * plO * pi 1) If (plO * pi 1 = 0 And data(3, 2) > 0) Then _ lik = data(3,2) * Log(2 * 0.000000000001) If (pi 1 > 0) Then lik = lik + data(3, 3) * Log(pl 1 A 2) If (pi 1 = 0 And data(3, 3) > 0) Then _ lik = data(3, 3) * Log(O.OOOOOOOOOOOl) End Sub Function LoglO(x) Log 10 = Log(x) / Log(lO) End Function Sub MIT_Export(pop, cpop) 'Copy Combined MIT on Combined datasheet to MIT_Export All Sheet 'Note: Column headings removed for export to text file Dim col As Integer Dim endE As Integer Dim i As Integer Dim j As Integer Dim np As Integer Dim SNPn As Integer Dim SNPn 10 As Integer Dim startE As Integer Dim exp As String Dim cxpclO As String Dim SNP Structure r As String Dim xp As String Windows("Combined.xls").Activate Sheets("Combined_MIT").Select Range(Cells(2,1), Cells(350,150)).Copy Windows("MIT_Export.xls").Activate Sheets("T").Select Cells.Select ActiveSheet.Paste 'Separate sheets by ethnicity For np = 1 To 5 exp = cpop(np) Sheets("T").Select Cells.Copy Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 327 Sub MIT_Export(pop, cpop) [continued] Sheets(cxp). Select Cells. Select ActiveSheet.Paste startE = 0 endE = 0 i = 1 Do While i <= 349 If (Cells(i, 2) = exp) Then endE - i If (startE = 0) Then startE = i End If i = i+ 1 Loop If (endE < 349) Then Range(Cells(endE + 1 ,1 ), Cells(349,150)).Select Selection.Delete End If If (startE > 1) Then Range(Cells(l, 1), Cells(startE - 1 , 150)).Select Selection.Delete End If Next np 'All but B (=G) Sheets("T").Select Cells.Copy Sheets("G").Select Cells.Select ActiveSheet.Paste i = 1 Do While i <= 349 If (Cells(i, 2) o "B") Then startE = i Exit Do End If i = i + 1 Loop Range(Cells(l, 1), Cells(startE - 1, 150)).Select Selection.Delete 'Complete pclO sheets For np = 0 To 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 328 Sub MIT_Export(pop, cpop) [continued] xp = pop(np) exp = cpop(np) cxpclO = exp & "pclO" SNP_Structure_r = "SNP_Structure_" & xp & ".xls" Windows("MIT_Export.xls").Activate Sheets(cxp). Select Range(Cells(l, 1), Cells(349, 150)).Copy Sheets(cxpc 10). Select Range(Cells(l, 1), Cells(l, l)).Select ActiveSheet.Paste Windows(SNP_Structure_r).Activate Sheets("SNPs_All").Select i = 2 Do While i <= 121 If (Cells(i, 1) = "") Then Exit Do i = i + 1 Loop SNPn = i -1 Sheets(" SNPs_Ge 1 Opc"). Select i = 2 Do While i <= 121 If (Cells(i, 1) = "") Then Exit Do i = i + 1 Loop SNPnlO = i -1 For i = SNPn To 1 Step -1 Windows(SNP_Structure_r).Activate Sheets("SNPs_GelOpc"). Select For j = 2 To SNPnlO If (CellsO, 1) = i) Then GoTo LI Next j Windows("MIT_Export.xls"). Activate Sheets(cxpc 10). Select col = i + 3 Range(Cells(l, col), Cells(l, col)).EntireColumn.Delete LI: Next i Windows(SNP_Structure_r).Activate Active W orkbook. Save Active W orkbook. Close Next np End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 329 Sub Nextk(k) k = 4 Do While k< = 122 If (Cells(l, k) = "") Then Exit Do k = k + 1 Loop End Sub Sub Plot(pop, cpop) Dim col As Integer Dim i As Integer Dim irow As Integer Dim j As Integer Dim kk As Integer Dim lastrow As Integer Dim mpop As Integer Dim nextSNP As Integer Dim SNPm As Integer Dim SNPn As Integer Dim adplow As Single Dim adpupp As Single Dim afq As Single Dim dis As Single Dim dist As Single Dim dp As Single Dim genopct As Single Dim hw As Single Dim lods As Single Dim r2 As Single Dim tlod As Single Dim tr2 As Single Dim D P rim e P lo tr As String Dim D P rim er As String Dim mark As String Dim pat As String Dim SNP_Structure_r As String Dim xpop As String For mpop = 0 To 6 xpop - pop(mpop) D P rim e P lo tr = "DPrime Plot " & xpop & ".xls" DPrime r = "DPrime " & xpop & ".xls" kk = 1 Call AddPlotWorkbook(kk) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sub Plot(pop, cpop) [continued] ActiveWorkbook.SaveAs Filename:=DPrime_Plot_r Workbooks.Open Filename :=DPrime_r Windows(DPrime_r).Activate Columns("A:N").Copy Windows(DPrime_Plot_r).Activate Sheets(" Input"). Select Range("Al"). Select ActiveSheet.Paste Cells.EntireColumn.AutoFit Call FormatSheet(kk) 'Paste in SNP structure information SNP_Structure_r = "SNP_Structure_" & xpop & ".xls" Workbooks. Open F ilename:=SNP_Structure_r Windows(SNP_Structure_r).Activate Sheets("SNPs_All").Select Range(Cells(l, 1), Cells(121,18)).Copy Windows(DPrime_Plot_r). Activate Sheets("Input").Select Range(Cells(l, 18), Cells(l, 18)). Select ActiveSheet.Paste 'Count the number of SNPs; place in Cells(2,38) i = 2 Do While i < 10000 If (Cells(i, 18) = "") Then Exit Do i = i + 1 Loop SNPn - i - 2 Cells(2, 38) = SNPn 'Count the number of rows of data irow = 2 Do While irow < 10000 If (Cells(irow, 1) = "") Then Exit Do irow = irow + 1 Loop lastrow = irow - 1 Range(Cells(2,11), Cells(10000, ll)).NumberFormat = "0.000 Range(Cells(2, 12), Cells(10000, 12)).NumberFormat = "0.000 Range(Cells(2, 34), Cells(SNPn + 1, 34)).NumberFormat = "0 .000" Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sub Plot(pop, cpop) [continued] 'Dprime Sheets("Input").Select For irow = 2 To lastrow i = Cells(irow, 1) j = Cells(irow, 2) dp = Cells(irow, 10) Cells(j + 10000, i + 1) = dp lods = Cells(irow, 11) col = 5 If (dp >= 0.9) Then col = 3 If (dp >= 0.7 And dp < 0.9) Then col = 2 If (lods < 2) Then Range(Cells(j + 10000, i + 1), Cells(j + 10000, i l)).Select With Selection.Interior .Colorlndex = col .Pattern = xlGray25 .PattemColorlndex = xlAutomatic End With End If If (lods >= 2) Then Range(Cells(j + 10000, i + 1), Cells(j + 10000, i + 1)). Select With Selection.Interior .Colorlndex = col .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With End If Next irow Sheets("Input").Select Range(Cells( 10002, 2), Cells( 10000 + SNPn, SNPn)).Copy Sheets("DPrime"). Select Range("G3").Select ActiveSheet.Paste Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 5)).NumberFormat "0 .00" Sheets("Input").Select dist = Cells(2, 20) Sheets("DPrime").Select Cells(l, 1) = "Loc" Cells(l, 2) = "Afq%" Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 332 Sub Plot(pop, cpop) [continued] Cells(l, 3) = "HWp" Cells(l, 4) - "Gfail%" Cells(l, 7) = dist /1000 Range(Cells(l, 7), Cells(l, 7)).Select Selection.NumberFormat = "0.0" Selection.HorizontalAlignment = xlCenter For i = 1 To SNPn If (i > l)T hen Cells(i + 1, 6) = i Range(Cells(i + 1 ,6 ), Cells(i + 1, 6)).Select Selection.Font.Bold = True Selection.HorizontalAlignment = xlCenter With Selection.Interior .Colorlndex = 6 .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With End If Cells(i + 1, i + 6) = i Range(Cells(i + 1, i + 6), Cells(i + 1, i + 6)).Select Selection.NumberFormat = "0" Selection.Font.Bold = True Selection.HorizontalAlignment = xlCenter With Selection.Interior .Colorlndex = 6 .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With Sheets("Input").Select dist = Cells(i + 2, 21) mark = Cells(i + 1, 22) afq = Cells(i+ 1,34) hw = Cells(i + 1, 35) genopct = Cells(i + 1, 29) Sheets("DPrime"). Select If (i < SNPn) Then Cells(i + 1, i + 7) = dist /1000 Range(Cells(i + 1, i + 7), Cells(i + 1, i + 7)).Select Selection.NumberFormat = "0.0" Selection.HorizontalAlignment = xlCenter Cells(i + 1, 1) = mark Cells(i + 1, 2) = afq Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sub Plot(pop, cpop) [continued] Range(Cells(i + 1 ,2 ), Cells(i + 1, 2)).NumberFormat "0 .0% " Cells(i + 1, 3) = hw Range(Cells(i + 1 ,3 ), Cells(i + 1, 3)).NumberFormat "0 .000" Cells(i + 1,4) = genopct Range(Cells(i + 1 ,4 ), Cells(i + 1, 4)).NumberFormat "0.0% " Next i Columns("A:D").HorizontalAlignment = xlCenter Cells.EntireColumn.AutoFit Call FormatSheet(l) Range(Cells(l, 6), Cells(l, SNPn + 6)).ColumnWidth = 4.99 Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 6)).HorizontalAlignment = xlCenter 'Lod Sheets("DPrime"). Select Range(Cells(l, 1), Cells(SNPn + 1, SNPn + 6)).Copy Sheets("Lod").Select Range("Al"). Select ActiveSheet.Paste Sheets("Input").Select Range(Cells(l, 1), Cells(lastrow, 11)).Copy Sheets("Lod").Select Range(Cells(201, 1), Cells(201, l)).Select ActiveSheet.Paste For irow = 2 To lastrow i = Cells(irow + 200,1) j = Cells(irow + 200, 2) tlod = Cells(irow + 200, 11) If (tlod > 0) Then Cells(j + 1, i + 6) = tlod Next irow Range(Cells(201,1), Cells(200 + lastrow, 11)).Clear Range(Cells(l, 1), Cells(l, 5)).EntireColumn.AutoFit Range(Cells(l, 6), Cells(l, SNPn + 6)). Column Width = 4.99 Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 6)).HorizontalAlignment = xlCenter 'rA 2 Sheets("DPrime").Select Range(Cells(l, 1), Cells(SNPn + 1, SNPn + 6)).Copy Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 334 Sub Plot(pop, cpop) [continued] Sheets("r2").Select Range("Al"). Select ActiveSheet.Paste Sheets(" Input"). Select Range(Cells(l, 1), Cells(lastrow, 12)).Copy Sheets("r2").Select Range(Cells(201,1), Cells(201, l)).Select ActiveSheet.Paste For irow = 2 To lastrow i = Cells(irow + 200, 1) j = Cells(irow + 200, 2) tr2 = Cells(irow + 200,12) Cells(j + 1, i + 6) = tr2 Next irow Range(Cells(201,1), Cells(200 + lastrow, 12)).Clear Range(Cells(l, 1), Cells(l, 5)).EntireColumn.AutoFit Call F ormatSheet( 1) Range(Cells(l, 6), Cells(l, SNPn + 6)).ColumnWidth = 4.99 Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 6)).HorizontalAlignment = xlCenter 'DPrimeGelOpc Sheets("DPrime"). Select Range(Cells(l, 1), Cells(SNPn + 1, SNPn + 6)).Copy Sheets("DPrime_GelOpc").Select Range(Cells(l, 1), Cells(l, l)).Select ActiveSheet.Paste Cells. Select Cells. EntireColumn. AutoF it i = 2 j = 2 Do While i <= SNPn + 1 Windows(SNP_Structure_r).Activate Sheets("SNPs_Gel0pc").Select nextSNP = Cells(j, 1) Windows(DPrimeJPlotr). Activate Sheets("DPrime_Ge 1 Opc"). Select mark = Cells(j, 6) If (mark = "" And Cells(j, 7) = 1) Then mark = 1 If (mark = nextSNP) Then j = j + 1 GoTo LGtl Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 335 Sub Plot(pop, cpop) [continued] End If Range(Cells(j, 1), Cells(j, l)).EntireRow.Delete Range(Cells(l, j + 5), Cells(l, j + 5)).EntireColumn.Delete LGtl: i = i + 1 Loop Range(Cells(2, 6), Cells(2,6)).CIear SNPm = 0 For j = 1 To SNPn Windows(DPrime_Plot_r). Activate Sheets("DPrime_GelOpc"). Select If (CellsQ + 1,1) = "") Then Exit For SNPm = j Windows(SNP_Structure_r).Activate Sheets(" SNPs_Ge 1 Opc"). Select dis = Cells(j + 1,4) W indo ws(DPrime_Plot_r). Activate Sheets("DPrime_Gel Opc").Select Cells(j, j + 6) = dis /1000 Range(Cells(j, j + 6), Cells(j, j + 6)).NumberFormat = "0.0" Next j Range(Cells(l, 1), Cells(l, 5)).Select Selection.EntireColumn.AutoFit Call FormatSheet(l) Range(Cells(l, 6), Cells(l, SNPm + 6)).ColumnWidth = 4.99 Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 6)).HorizontalAlignment = xlCenter 'D Prim eci Sheets(" DPrime"). Select Range(Cells(l, 1), Cells(SNPn + 1, SNPn + 6)).Copy Sheets("DPrime_ci").Select Range("Al").Select ActiveSheet.Paste Sheets(" Input"). Select Range(Cells(l, 1), Cells(lastrow, 14)).Copy Sheets("DPrime_ci").Select Range(Cells(201,1), Cells(201, l)).Select ActiveSheet.Paste For irow = 2 To lastrow i = Cells(irow + 200, 1) j = Cells(irow + 200, 2) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 336 LGtl: [continued] adplow = Cells(irow + 200, 13) adpupp = Cells(irow + 200, 14) If (adplow = 1) Then adplow = 0 adpupp =1 End If col = 2 If (adplow >= 0.7 And adpupp >= 0.98) Then col = 3 If (adpupp < 0.9) Then col = 5 CellsO + 1, i + 6) = "" Range(Cells(j + 1, i + 6), Cells(j + 1, i + 6)).Select With Selection.Interior .Colorlndex = col .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With Next irow Range(Cells(201, 1), Cells(200 + lastrow, 14)).Clear Sheets("DPrime_ci"). Select Range(Cells(l, 1), Cells(l, 5)).EntireColumn.AutoFit Call F ormatSheet( 1) Range(Cells(l, 6), Cells(l, SNPn + 6)).ColumnWidth = 4.99 Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 6)).HorizontalAlignment = xlCenter 'DPrime_ci_Ge 1 Opc Sheets("DPrime_ci").Select Range(Cells(l, 1), Cells(SNPn + 1, SNPn + 6)).Copy Sheets("DPrime_ci_GelOpc").Select Range("Al"). Select ActiveSheet.Paste irow = 2 Do While irow <= SNPn + 1 Sheets("DPrime_ci_GelOpc").Select If (Cells(irow, 1) = "") Then Exit Do If (Cells(irow, 2) < 0.1) Then Range(Cells(irow, 1), Cells(irow, 1)).Select Selection.EntireRow.Delete Range(Cells(l, irow + 5), Cells(l, irow + 5)).Select Selection.EntireColumn.Delete irow = irow -1 End If Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 337 Sub Plot(pop, cpop) [continued] irow = irow + 1 Loop irow - 2 Do While irow <= SNPn + 2 Sheets("DPrime_ci_Ge 1 Opc"). Select If (Cells(irow, 1) = "") Then Cells(irow - 1, irow - 1 + 6) = "" Exit Do End If Sheets("DPrime_Gel Opc"). Select dis = Cells(irow, irow + 6) Sheets("DPrime_ci_GelOpc").Select Cells(irow, irow + 6) = dis Range(Cells(irow, irow + 6), Cells(irow, irow + 6)).Select Selection.NumberFormat = "0.0" irow = irow + 1 Loop Cells(2, 6) ="" Range(Cells(2, 6), Cells(2, 6)).Select With Selection.Interior .Colorlndex = 2 .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With Sheets("DPrime_ci_Gel Opc").Select Range(Cells(l, 1), Cells(l, 5)).EntireColumn.AutoFit Call FormatSheet(l) Range(Cells(l, 6), Cells(l, SNPn + 6)).ColumnWidth = 4.99 Range(Cells(3, 7), Cells(SNPn + 1, SNPn + 6)).HorizontalAlignment = xlCenter Windows(DPrime_r).Activate ActiveWorkbook.Save Active W orkbook. Close Windows(DPrime_Plot_r).Activate Active Workbook. Save Acti ve W orkbook. Close W indo ws(SNP_Stmcture_r) .Activate ActiveWorkbook.Close Next mpop End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 338 Sub SNP_Structure_eth(nbl, nel, nb2, ne2, mpop, pop) Dim kk As Integer Dim SNP_Structure_r As String Dim xpop As String xpop = pop(mpop) S N P S tru ctu rer = "SNP_Structure_" & xpop & ".xls" Workbooks.Open Filename:="SNP_Structure_TEMP.xls" Active W orkbook. Save As F ilename :=SNP_Structure_r Sheets("SNPs_All").Select Range(Cells(l, nbl), Cells(l, nel)).EntireColumn.Delete If (nb2 > 0) Then _ Range(Cells(l, nb2), Cells(l, ne2)).EntireColumn.Delete Range(Cells(l, 12), Cells(l, 19)).EntireColumn.Delete Cells(l, 12) = "Blanks%" Cells(l, 13) = "A%" Cells(l, 14) = "C%" Cells(l, 15) = "G%" Cells(l, 16) = "T%" Cells(l, 17) = "Al%" Cells(l, 18) = "HWp%" Range(Cells(l, 12), Cells(l, 18)).HorizontalAlignment = xlCenter Cells.EntireColumn.AutoFit k k = 1 Call FormatSheet(kk) Windows(SNP_Structure_r).Activate Active W orkbook. Save ActiveWorkbook.Close End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 339 APPENDIX B ‘BLOCK FINDER’: A VISUAL BASIC PROGRAM EMPLOYING THE HAPLOTYPE ‘BLOCK’ DEFINITION OF GABRIEL ET AL. (2002) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 340 Option Explicit Sub B locksV B A_030104() 'PB's macro revised by PB and MCP Application.DisplayAlerts = False Dim afquser As Single Dim npop As Integer Dim pop(6) As String Dim SNPnt As Integer npop = 6 pop(0) = "t" pop(l) = "b" pop(2) = "h" P°P(3)= "j" pop(4) = "1" pop(5) = "w" P°P(6) = "g" SNPnt = 0 Call SetUp(pop, npop, SNPnt) Call DrawSummary(pop, npop) Call Calcs(pop, npop, SNPnt, afquser) Call BlockDraw(pop, npop, SNPnt, afquser) End Sub Sub SetUp(pop, npop, SNPnt) Dim DPrime_Plot_r As String Dim i As Integer Dim mpop As Integer Dim npath As String Dim sn As String Dim SNPn As Integer Dim xpop As String npath = InputBox("Enter directory of D' plots", _ Default:="C:\Current\MEC\Genetics\Leigh") ChDir npath Workbooks.Add Sheets(" Sheet 1 ").Name = "s" For mpop = 0 To npop Sheets.Add i = mpop + 2 sn = "Sheet" & i Sheets(sn).Name = pop(mpop) Next mpop Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 341 Sub SetUp(pop, npop, SNPnt) [continued] ActiveWorkbook.SaveAs Filename:="Blocks.xls" Sheets("s").Select For mpop = 0 To npop xpop = pop(mpop) DPrime Plot r = "D P rim eP lot" & xpop & ".xls" Workbooks.Open Filename:=DPrime_Plot_r Sheets("DPrime_ci").Select Range(Cells(l, 2), Cells(199, 199)).Copy Windows("Blocks"). Activate Sheets(xpop).Activate Cells(l, 2).Select ActiveSheet.Paste With Selection.Font .Name = "Arial" .Size = 10 End With Columns("B:GR").EntireColumn.AutoFit Windows(DPrime_Plot_r).Activate Active W orkbook. Close Next mpop 'Determine Number of SNPs i = 2 Do While i < 126 If (Cells(i, 2) = "") Then Exit Do i = i + 1 Loop SNPnt = i - 2 'Set up Summary Sheet Sheets("s").Select C ells(5,1) = "SNPs" Cells(6, 3) = "Afq" Cells(6, 5) = "Block Threshold" Cells(6, 9) = "Directory" Cells(7, 7) = "SSS Approach" Cells(7, 9) = npath For i = 1 To SNPnt Cells(5, i + 1) = i Next Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 342 Sub SetUp(pop, npop, SNPnt) [continued] Range(Cells(5, 2), Cells(5, SNPnt + l)).Select With Selection.Font .Name = "Arial" .Size = 18 End With With Selection .HorizontalAlignment = xlCenter End With Workbooks.Open Filename:="SNP_Structure.xls" Sheets("SNPs_All").Select Range(Cells(l, 3), Cells(SNPnt + 1, 5)).Copy Windows("Blocks.xls").Activate Sheets("s").Select Cells(l, 1). Select Selection.PasteSpecial Paste:=xlPasteValues, _ Operation:=xlNone, SkipBlanks:=False, Transpose :=True Columns("A:A").EntireColumn.AutoFit Cells(2, 2) ="" Range(Cells(3, 2), Cells(SNPnt + l)).Select With Selection .HorizontalAlignment = xlCenter End With Windows("SNP_Structure.xls").Activate ActiveWorkbook.Close End Sub Sub DrawSummary(pop, npop) Dim DPrime Plot r As String Dim i As Integer Dim mpop As Integer Dim xpop As String Windows("Blocks.xls").Activate Sheets("s").Select Cells(10, 1) = "T" Cells(15, 1) = "B" Cells(20, 1) = "H" Cells(25, 1) = "J" Cells(30, 1) = "L" Cells(35,1) = "W" C ells(40,1) = "G" Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 343 Sub SetUp(pop, npop, SNPnt) [continued] For i = 0 To npop Cells(l 1 + 5 * i, 1) = "Afq" Cells(12 + 5 * i, 1) = "HWp" Cells(13 + 5 * i, 1) = "GFail" Next i For i = 5 To 35 Step 5 Cells(5 + i, 1). Select With Selection.Font .Name = "Arial" .Size = 18 End With Next For i = 5 To 35 Step 5 Range(Cells(5, 2), Cells(5, 200)).Copy Range(Cells(5 + i, 2), Cells(5 + i, 200)). Select ActiveSheet.Paste Next For mpop = 0 To npop xpop = pop(mpop) DPrime_Plot_r = "DPrime_Plot_" & xpop & ".xls" Workbooks.Open Filename:=DPrime_Plot_r Sheets("DPrime").Select Range(Cells(2,2), Cells(200,4)).Copy Windows("Blocks.xls"). Activate Sheets("s").Select Cells(l 1 + 5 * mpop, 2).Select Selection.PasteSpecial Paste :=xlPasteValues, _ Operation:=xlNone, SkipBlanks:=False, _ Transpose:=True Selection.NumberFormat = "0.00" Windows(DPrime_Plot_r).Activate ActiveWorkbook.Close Next mpop End Sub Sub Calcs(pop, npop, SNPnt, afquser) Dim aa As Long Dim afq As Single Dim afqpercent As Single Dim B As Integer Dim bb As Long Dim Bm(120, 120) As Integer Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 344 Sub Calcs(pop, npop, SNPnt, afquser) [continued] Dim c As Integer Dim c200 As Integer Dim comps As Integer Dim d As Integer Dim de As Integer Dim delta As Long Dim Down(120) As Integer Dim i As Integer Dim ii As Integer Dim iii As Integer Dim j As Integer Dim jj As Integer Dim jjj As Integer Dim mpop As Integer Dim m As Integer Dim m200 As Integer Dim marknum As Integer Dim me As Integer Dim oldm As Integer Dim ratiopc As Single Dim ratiouser As Single Dim R As Integer Dim Rn As Integer Dim Rm(120, 120) As Integer Dim RWBc As Integer Dim RWc As Integer Dim SNPend As Integer Dim SNPn As Integer Dim SNPorig(120) As Integer Dim SNPstart As Integer Dim Up(120) As Integer Dim W As Integer Dim Wm(120, 120) As Integer Dim x As Integer Dim xpop As String afqpercent = Application.InputBox _ (prompt:="Enter allele freq threshold %", _ Title:="Allele Frequency Threshold", Default:-TO", Type:=T) afquser = afqpercent / 100 ratiopc = Application.InputBox _ (prompt:="Enter block threshold %", _ Title:="Block Definition Threshold", Default:-TOO", Type:=l) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 345 Sub Calcs(pop, npop, SNPnt, afquser) [continued] ratiouser = ratiopc / 100 Windows("Blocks.xls"). Activate Sheets("s").Select Cells(7, 3) = afqpercent & "%" Cells(7, 5) = ratiopc & "%" Cells(7, 3).Select With Selection .HorizontalAlignment = xlCenter End With For mpop = 0 To npop Windows("Blocks.xls"). Activate xpop = pop(mpop) Sheets(xpop) .Select SNPn = SNPnt 'Clear out SNP distances For c = 1 To SNPn Cells(c, c + 6) = "" Next c 'Using user defined allowable allele frequency, delete appropriate rows/columns and recount c = 2 Do While c <= SNPn + 1 afq = Cells(c, 2) If (afq < afquser) Then Range(Cells(c, 1), Cells(c, l)).EntireRow.Delete Range(Cells(l, c + 5), Cells(l, c + 5)).EntireColumn.Delete Cells(2, 6) = "" c = c - 1 SNPn = SNPn - 1 End If c = c + 1 Loop 'Construct Up and Down arrays to deal with deleted SNPs Down(0) = 0 oldm = 0 For c = 1 To SNPn m = Cells(c +1,6) If (c = 1) Then m = Cells(2, 7) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 346 Sub Calcs(pop, npop, SNPnt, afquser) [continued] me = oldm + 1 Do While me <= m - 1 Down(mc) = Down(oldm) me = me + 1 Loop Down(m) = m oldm = m Next c Up(SNPnt + 1) = SNPnt + 1 oldm = 0 For c = SNPn To 1 Step -1 m = Cells(c +1,6) If (c = 1) Then m = Cells(2, 7) me = oldm -1 Do While me >= m + 1 Up(mc) = Up(oldm) me = me -1 Loop Up(m) = m oldm = m Next c SNPorig(l) = Cells(2, 7) For c = 2 To SNPn SNPorig(c) = Cells(c +1,6) Next Cells(200, 1) = "i" Cells(200, 2) = "j" Cells(200, 3) = "R" Cells(200,4) = "W" Cells(200, 5) = "B" Cells(200,6) = "RW/RWB" Cells(200, 7) = "RWB" x = 1 For i = 1 To SNPn -1 R = 0 W = 0 B = 0 ii = i + 1 For j = i + 1 To SNPn ii = ii + 1 For jj = i + 6 To j + 5 Cells(ii,jj).Select Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 347 Sub Calcs(pop, npop, SNPnt, afquser) [continued] If Selection. Interior. Colorlndex = 3 Then R = R + 1 If Selection.Interior.Colorlndex = 2 Then W = W + 1 If Selection.Interior.Colorlndex = 5 Then B = B + 1 Next jj Cells(200 + x, 1) = i Cells(200 + x, 2) = j Cells(200 + x, 3) = R Cells(200 + x, 4) = W Cells(200 + x, 5) = B Cells(200 + x, 6) = (R + W) / (R + W + B) Cells(200 + x,7) = R + W + B iii = SNPorig(i) jjj = SNPorig(j) Rm(iii, jjj) = R Wm(iii, jjj) = W Bm(iii, jjj) = B x = x + 1 Next j Next i 'Number of comparisons comps = x -1 'Evaluate block runs based on user definition of RW/RWB ratio c = 1 Do While c <= comps c200 = c + 200 If (Cells(c200,6) < ratiouser) Then Range(Cells(c200, 1), Cells(c200, 1 )).EntireRow. Delete c = c -1 comps - comps -1 End If c = c + 1 Loop 'Delete blocks that contain only one comparison square c = 1 Do While c <= comps c200 = c + 200 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 348 Sub Calcs(pop, npop, SNPnt, afquser) [continued] If (Cells(c200, 7) <= 1) Then Range(Cells(c200, 1), Cells(c200, 1 )).EntireRow.Delete c = c - 1 comps = comps -1 End If c = c + 1 Loop 'Insert true SNP #s For c = 1 To comps c200 = c + 200 SNPstart - Cells(c200,1) SNPend - Cells(c200,2) If (SNPstart = 1) Then Cells(c200, 8) = Cells(2,7) If (SNPstart > 1) Then Cells(c200, 8) - Cells(SNPstart + 1, 6) Cells(c200, 9) = Cells(SNPend +1,6) Next c Range(Cells(201, 8), Cells(200 + comps, 9)).Cut Range(Cells(201, 1), Cells(200 + comps, 2)).Select ActiveSheet. Paste 'Delete blocks with no Red c = 1 Do While c <— comps c200 = c + 200 If (Cells(c200, 3) <= 0) Then Range(Cells(c200,1), Cells(c200, 1 )).EntireRow.Delete c = c -1 comps = comps -1 End If c = c + 1 Loop 'Forward Deletion of blocks falling within larger blocks c = 1 Do While c < comps c200 = c + 200 i = Cells(c200,1) ii = Cells(c200 + 1,1) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 349 Sub Calcs(pop, npop, SNPnt, afquser) [continued] If (ii <= i) Then Range(Cells(c200, 1), Cells(c200, 1 )).EntireRow.Delete c = c -1 comps = comps -1 End If c = c + 1 Loop 'Backward Deletion of blocks falling within larger blocks c = comps Do While c > 1 c200 = c + 200 j = Cells(c200,2) jj = Cells(c200 - 1,2) If Gj >= j) Then Range(Cells(c200,1), Cells(c200, 1 )).EntireRow.Delete comps = comps -1 End If c = c -1 Loop 'Identify blocks to be chosen For c = 1 To comps Cells(c + 200, 8) = 0 Next d = 1 Do While d <= comps For c = 1 To comps c200 = c + 200 If (Cells(c200, 8) <= 0) Then m = c RWBc = Cells(c200, 7) Rn = Cells(c200, 3) Exit For End If Next c For c = 1 To comps c200 = c + 200 If (m = c) Then GoTo L01 If (Cells(c200, 8) > 0) Then GoTo L01 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 350 Sub Calcs(pop, npop, SNPnt, afquser) [continued] If (Cells(c200, 7) < RWBc) Then GoTo L01 If (Cells(c200,7) > RWBc) Then GoTo L02 If (Cells(c200, 3) <= Rn) Then GoTo L01 L02: m = c RWBc = Cells(c200, 7) Rn = Cells(c200, 3) L01: Next c m200 = m + 200 Cells(m200, 8) - d de = Cells(m200, 2) For c = m + 1 To comps c200 = c + 200 If (Cells(c200,1) <= de) Then Cells(c200,1) = Up(de + 1) i - Cells(c200,1) j = Cells(c200, 2) Cells(c200, 3) = Rm(i, j) Cells(c200,4) = Wm(i, j) Cells(c200, 5) = Bm(i, j) RWc = Rm(i, j) + Wm(i, j) RWBc = RWc + Bm(i, j) Cells(c200,6) = 0 If (RWBc > 0) Then Cells(c200, 6) = RWc / RWBc Cells(c200, 7) = RWBc End If Next c de = Cells(m200,1) For c = m - 1 To 1 Step -1 c200 = c + 200 If (Cells(c200,2) >= de) Then Cells(c200,2) = Down(de -1 ) i = Cells(c200,1) j = Cells(c200, 2) Cells(c200, 3) = Rm(i, j) Cells(c200,4) = Wm(i, j) Cells(c200, 5) = Bm(i, j) RWc = Rm(i, j) + Wm(i, j) RWBc = RWc + Bm(i,j) Cells(c200, 6) = 0 If (RWBc > 0) Then Cells(c200, 6) = RWc / RWBc Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 351 L01: [continued] Cells(c200, 7) = RWBc End If Next c c = 1 Do While c <= comps c200 = c + 200 If (Cells(c200,3) = 0) Then GoTo L03 If (Cells(c200, 6) < ratiouser) Then GoTo L03 If (Cells(c200, 7) = 1) Then GoTo L03 GoTo L04 L03: Range(Cells(c200, 1), Cells(c200, l)).EntireRow.Delete c = c -1 comps = comps -1 L04: c = c + 1 Loop d = d + 1 Loop For c = 1 To comps c200 = c + 200 i = Cells(c200, 1) j = Cells(c200, 2) Sheets("s"). Select Range(Cells(l, i + 1), Cells(l, i + l)).Copy Sheets(xpop).Select Range(Cells(c200,10), Cells(c200,10)).Select ActiveSheet.Paste Sheets("s"). Select Range(Cells(l, j + 1), Cells(l, j + l)).Copy Sheets(xpop). Select Range(Cells(c200, 11), Cells(c200, 11)).Select ActiveSheet.Paste 'Difference aa = Cells(c200, 10) bb = Cells(c200,11) delta = Abs(bb - aa) marknum = j - i + 1 Cells(c200,12) = delta Cells(c200, 13) = marknum Next c Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 352 L04: [continued] Next mpop End Sub Sub BlockDraw(pop, npop, SNPnt, afquser) Dim blockend As Integer Dim blockstart As Integer Dim c As Integer Dim c200 As Integer Dim comps As Integer Dim d As Integer Dim mpop As Integer Dim xpop As String Windows("Blocks.xls"). Activate Sheets("s").Select For mpop = 0 To npop d = 10 + mpop * 5 xpop = pop(mpop) Sheets(xpop). Select 'Determine number of block comparisons c = 1 Do While c < 10000 If (Cells(c + 200,1) = "") Then Exit Do c = c + 1 Loop comps = c -1 For c = 1 To comps c200 = c + 200 Sheets(xpop) .Select blockstart = Cells(c200,1) blockend = Cells(c200, 2) Sheets("s").Select Range(Cells(d, blockstart + 1), Cells(d, blockend + 1)).Select With Selection.Borders(xlEdgeLeft) .LineStyle = xlContinuous .Weight = xlThick .Colorlndex = xlAutomatic End With Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L04: [continued] With Selection.Borders(xlEdgeTop) .LineStyle = xlDash .Weight = xlMedium .Colorlndex = xlAutomatic End With With Selection.Borders(xlEdgeBottom) .LineStyle = xlDash .Weight = xlMedium .Colorlndex = xlAutomatic End With With Selection.Borders(xlEdgeRight) .LineStyle = xlContinuous .Weight = xlThick .Colorlndex = xlAutomatic End With Selection.Borders(xlInsideVertical).LineStyle xlNone With Selection.Interior .Colorlndex = 3 .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With Next c For c = 1 To SNPnt If (Cells(d + 1, c + 1) < afquser) Then Range(Cells(d, c + 1), Cells(d, c + l)).Select With Selection.Interior .Colorlndex = 34 .Pattern = xlSolid .PattemColorlndex = xlAutomatic End With End If Next c Next mpop End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 354 APPENDIX C ‘PHASING’ A VISUAL BASIC PROGRAM WHICH RESOLVES PHASE OF PROBABILISTIC HAPLOTYPE DATA Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 355 Sub phasing() i = 3 Do While i <= 70 If (Cells* 1, i) = "") Then GoTo La i = i+ 1 Loop La: iend = i - 1 j = 2 Do While j <= 10000 If (Cells(j, 1) = "") Then GoTo Lb i = 70 + 3 Do While i <= iend + 70 CellsQ, i) = Cells(j, i - 70) Cells(j, i + 70) = 0 i = i + 1 Loop total = 0 Lc: mark = 73 i = 74 bigp = Cells(j, 73) Do While i <= iend + 70 If (Cells(j, i) > bigp) Then bigp = Cells(j, i) mark = i End If i = i + 1 Loop Cells(j, mark + 70) = Cells(j, mark + 70) + 1 Cells(j, mark) = Cells(j, mark) - 1 total = total + 1 If (total < 2) Then GoTo Lc j = j + l Loop Lb: Range(Cells(2, 3), Cells(10000,142)).Select Selection.Delete End Sub Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Breast cancer in the multiethnic cohort study: Genetic (prolactin pathway genes) and environmental (hormone therapy) factors
PDF
A large-scale genetic association study of prostate cancer in a multi-ethnic population
PDF
BRCA1 mutations and polymorphisms in African American women with a family history of breast cancer identified through high throughput sequencing
PDF
Androgen receptor gene and prostate -specific antigen gene in breast cancer
PDF
Androgens and breast cancer
PDF
The role of genetic variation in insulin -like growth factors in prostate cancer risk: The multiethnic cohort
PDF
Colorectal cancer risks in Singapore Chinese: Polymorphisms in the insulin-like growth factor-1 and the vitamin D receptor
PDF
Dietary fats, fat metabolizing genes, and the risk of breast cancer
PDF
CYP17 polymorphism and risk for colorectal adenomas
PDF
Association of vitamin D receptor gene polymorphisms with colorectal adenoma
PDF
Effect of genetic factors in the development of childhood lymphocytic leukemia (ALL)
PDF
Insulin-like growth factor 1 genotype, phenotype and breast cancer risk, by racial/ethnic group
PDF
Family history, hormone replacement therapy and breast cancer risk on Hispanic and non-Hispanic women, The New Mexico Women's Health Study
PDF
Determinants of the age at natural menopause: The multiethnic cohort
PDF
Descriptive epidemiology of thyroid cancer in Los Angeles County, 1972-1995
PDF
Biopsychosocial factors in major depressive disorder
PDF
beta3-adrenergic receptor gene Trp64Arg polymorphism and obesity-related characteristics among African American women with breast cancer: An analysis of USC HEAL Study
PDF
A case/parental/sibling control study of Ewing's sarcoma/peripheral primitive neuroectodermal tumor (pPNET)
PDF
Evaluation of the accuracy and reliability of self-reported breast, cervical, and ovarian cancer incidence in a large population-based cohort of native California twins
PDF
Recreational physical activity and risk of breast cancer
Asset Metadata
Creator
Bretsky, Philip Michael
(author)
Core Title
Genetic risk factors in breast cancer susceptibility: The multiethnic cohort
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Epidemiology
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biology, genetics,health sciences, oncology,health sciences, public health,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Henderson, Brian (
committee chair
), Coetzee, Gerhard (
committee member
), Pike, Malcolm (
committee member
), Ross, Ronald (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-464247
Unique identifier
UC11341866
Identifier
3197571.pdf (filename),usctheses-c16-464247 (legacy record id)
Legacy Identifier
3197571.pdf
Dmrecord
464247
Document Type
Dissertation
Rights
Bretsky, Philip Michael
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
biology, genetics
health sciences, oncology
health sciences, public health