Elizabeth Osth Lillie
A Thesis Presented to the
In Partial Fulfillment of the
Requirements for the Degree
May 2004
Copyright 2004 Elizabeth Osth Lillie
I would like to thank my dissertation chairs Drs. Giske Ursin and Theodore Krontiris for
offering their time and energy towards my future success. I could not have completed
this dissertation without their faith in my abilities. While I take pride in cutting my own
path into the forest of breast cancer literature to generate a thesis worthy of the doctoral
degree, it could not have been done without their guidance.
I would also like to acknowledge my dissertation committee members Drs. Leslie
Bernstein, Jim Gauderman and Gerry Coetzee for their additional insight and support.
In addition to my dissertation committee, various individuals have offered their expertise
and moral support that I would like to recognize. A special thanks to Dr. Garry Larson
and Mrs. Cathryn Lundberg at the City of Hope along with Dr. Alpa Patel of the
American Cancer Society.
List of Tables
Chapter 1
Chapter 2
Chapter 3
Chapter 4
REVIEW: The Role o f Androgens and Polymorphisms in
the Androgen Receptor in the Epidemiology o f Breast
GRANT PROPOSAL: The Androgen Receptor and
Mammographic Density
DATA ANALYSIS: Polymorphism in the Androgen
Receptor and Mammographic Density in Women Taking
and Not Taking Estrogen and Progestin Therapy
DATA ANALYSIS: Androgen Receptor GAG
Polymorphism and Breast Cancer Prognosis
V ll
Table 1-1 The relative risk of breast cancer associated with testosterone levels
in postmenopausal women, results from prospective studies
Table 1-2 The relative risk of breast cancer associated with testosterone levels,
results from case-control studies
Table 1-3 Mean levels of testosterone in breast cancer cases and controls 12
Table 1-4 The relative risk of breast cancer associated with the androgen 19
receptor CAG repeat
Table 2-1 Distribution of breast cancer cases and controls according to 32
mammographic density
Table 2-2 Analysis of androgen receptor and breast cancer tumor grade in 32
breast cancer cases with a family history of breast cancer
Table 3-1 Mean percentage of mammographic density by descriptive 44
characteristics (N = 404)
Table 3-2 Least-squares mean percentage breast density by androgen receptor 46
Table 3-3 Association for a single quartile increase in percent mammographic 49
density per 'long' androgen receptor allele
Table 4-1 Distribution of cases in each prognostic indicator category stratified 61
by source of data
Table 4-2 Mean number of androgen receptor CAG repeats by breast cancer 62
prognostic indicator
Table 4-3 Association of androgen receptor CAG with breast cancer 63
prognostic indicators
Table 4-4 Mean number of androgen receptor CAG repeats by breast cancer 65
prognostic indicator in cases age > 55 at diagnosis
Table 4-5 Association of androgen receptor CAG with breast cancer 66
prognostic indicators in cases age > 55 at diagnosis
AR androgen receptor
BMI body mass index
CAG cytosine-adenine-guanine
CARE Contraceptive and Reproductive Experiences
Cl confidence interval
DHT dihydrotestosterone
E2 estradiol
EPT estrogen progestin therapy
ER estrogen receptor
ET estrogen therapy
GGC guanine-guanine-cytosine
H hours
H I hormone therapy
L long allele
ML medium long allele
MS medium short allele
NS not significant
OR odds ratio
P p-value
RR relative risk
s short allele
SD standard deviation
T testosterone
VL very long allele
VS very short allele
Y years
Y/N yes/no
Testosterone binds to the androgen receptor in target tissue to mediate its effects. Both
testosterone and the androgen receptor have been exposures of interest in the etiology of
breast cancer. A triplet repeat polymorphism (CAG) in exon 1 of the gene for the
androgen receptor that encodes a polyglutamine tract has been observed to correlate with
androgen receptor activity. Paradoxical to observations that endogenous testosterone
increases breast cancer risk, results from observational studies that have examined
polymorphisms in the androgen receptor suggest that the low activity androgen receptor
increases breast cancer risk. I reviewed the quality of this evidence and based on my
findings have formulated various hypotheses to further examine the role of androgens and
the androgen receptor in breast cancer etiology.
Mammographic density is a strong breast cancer risk factor that probably reflects hormone
induced cell proliferation in the breast. I evaluated the association between the androgen
receptor CAG repeat length and mammographic density. My results suggest that the long
CAG repeat length (less active androgen receptor) is associated with increased
mammographic density in women on estrogen progestin therapy.
The short CAG repeat length (more active androgen receptor) has been observed to be
associated with increased breast cancer grade in one published study. I evaluated the
association between polymorphism in the androgen receptor and breast cancer prognosis
using a larger study setting. I observed a significant association when the analysis was
restricted to women over age 55.
In conclusion, my review of the literature and results from my data analyses suggest that
androgens and the androgen receptor play a role in breast cancer etiology. However,
further investigation is clearly needed.
REVIEW; The Role o f Androgens and Polymorphisms in the Androgen Receptor in the
Epidemiology o f Breast Cancer (Lillie and others 2003)
The role of androgens in breast cancer etiology has been the subjects of both curiosity
and confusion. It is still unclear by which mechanisms testosterone (T) exerts its activity
in the female breast, and whether the effects are predominantly proliferative or anti
proliferative on breast cells at physiologic levels. In this review I evaluate the results
from epidemiologic studies on the role of circulating T and a functional polymorphism in
the androgen receptor (AR) in breast cancer. I also highlight some of the epidemiologic
challenges in addressing these questions.
Sources of Endogenous Testosterone
There are two main sources of androgens m women. T is produced directly by the ovary,
and also by conversion of the adrenal androgens dehydroepiandrosterone and
dehydroepiandrosterone-sulfate into androstenedione, and then further to T in peripheral
tissue (Norman and Litwig 1987). In premenopausal women, approximately 25% of
circulating T is secreted directly from the adrenal gland and 25% from the ovary, while
the remaining 50% is produced by peripheral conversion of androstenedione (Longcope
and others 1986). T levels vary over the menstrual cycle with peak levels mid-cycle, and
diumally with highest levels in the early morning (Abraham 1974).
T and androstenedione are produced by the interstitial cells of the ovarian stroma and
may continue to respond to gonadotopins and produce T after the menopause (Adashi
1994). In normal postmenopausal women the ovarian vein has been observed to have
higher concentrations of T than is found in peripheral blood (Judd and others 1974);
bilateral oophorectomy results in reductions in T levels by as much as 50% (Judd and
others 1974).
Several smaller cross-sectional studies have foimd lower T levels in postmenopausal than
premenopausal women (Bancroft and Cawood 1996; Labrie and others 1997; Zumoff and
others 1995) or lower levels in perimenopausal than premenopausal women (Longcope
and others 1986). Large longitudinal studies that have followed women through the
menopausal transition have observed either no significant change in T (Burger and others
2000; Longcope and others 1986) or a 15% decrease in both T and androstenedione at
menopause (Rannevik and others 1995). In one study of women aged 50-89 years, T
levels were lowest at the time of the menopause while women above age 70 or more than
20 years post menopause had levels approximating those of premenopausal women
(Laughlin and others 2000).
In summary, there is increasing evidence that the ovary continues to produce
androstenedione and T in healthy postmenopausal women. Levels may either remain the
same or decrease slightly at menopause. However, women with bilateral oophorectomy
may be androgen deficient.
Testosterone and Breast Cancer Risk
Prospective Studies that Examine the Association Between Testosterone and Breast
Eight prospective cohort studies have published data on the association between
endogenous T levels and breast cancer risk using T measured from blood samples
gathered at baseline from postmenopausal women (Berrino and others 1996; Cauley and
others 1999; Dorgan and others 1996; Garland and others 1992; Hankinson and others
1998; Thomas and others 1997; Wysowski and others 1987; Zeleniuch-Jacquotte and
others 1997). Six of these studies were nested case-control studies (Berrino and others
1996; Dorgan and others 1996; Hankinson and others 1998; Thomas and others 1997;
Wysowski and others 1987; Zeleniuch-Jacquotte and others 1997); one was a case-cohort
study (Cauley and others 1999) and one a full cohort study (Garland and others 1992).
Only one of these studies published results for premenopausal women (Wysowski and
others 1987).
Six of these eight studies reported a statistically significant increase in postmenopausal
breast cancer risk with increasing levels of endogenous T (Berrino and others 1996;
Cauley and others 1999; Dorgan and others 1996; Hankinson and others 1998; Thomas
and others 1997; Zeleniuch-Jacquotte and others 1997). A recently conducted pooled
analysis (2002) of these eight prospective studies estimated that the relative risk (RR) of
breast cancer in women whose levels of T were in the top quintile compared to women in
the bottom quintile was 2.22 (95% confidence interval (Cl), 1.59-3.10). A statistically
significant dose-response relationship was also observed (P trend < 0.001)(2002). Two
of these studies also reported statistically significantly increasing breast cancer risk with
increasing levels of free T (Berrino and others 1996; Cauley and others 1999), a measure
of bioavailable T.
The study of premenopausal women (Wysowski and others 1987) found no statistically
significant differences between cases and non-cases in mean levels in either pre or
postmenopausal women, but the sample size was small (premenopausal women: 17 cases,
67 controls; postmenopausal women: 22 cases, 88 controls).
Can the Observed Association between Testosterone levels and Breast Cancer Risk be
Due to Bias?
Effects o f Measurement Biases:
One limitation to the studies reviewed is that serum T may not be the ideal measure of T.
Total T includes both free T and bound T. Further, serum levels do not take into account
the peripheral conversion of precursor androgens into T in the breast tissue itself. The
effect of this measurement error is most likely to be non-differential, therefore, biasing
results towards the null.
Measurement biases due to use of a single hormone measurement or degradation of
hormones in stored specimens over time would most likely be non-differential, resulting
in attenuated estimates of disease risk. All of the existing prospective studies (Table 1-1)
analyzed T measured from only one blood draw which might not be representative of the
cumulative exposure to T; however, given the prospective design of these studies, any
inaccuracy in measurement would likely bias risk estimates towards the null.
The consideration of time of day that blood was drawn and fasting status can help to
avoid the biases due to using a single honnone measurement. To avoid this bias, three of
the studies either matched on time of blood draw (Dorgan and others 1996; Hankinson
and others 1998) or restricted subjects to having their blood drawn in the morning
(Garland and others 1992). Since the effect of these sporadic variations would be to bias
the results towards the null, this may help to explain one of the null associations observed
(Wysowski and others 1987).
The studies that observed an association between T and breast cancer risk attempted to
reduce measurement bias due to degradation by matching cases to controls on the date of
blood draw (Berrino and others 1996; Dorgan and others 1996; Hankinson and others
1998; Thomas and others 1997; Zeleniuch-Jacquotte and others 1997) and storage
conditions such as sample location/shelf in the freezer (Berrino and others 1996). This
was not done in the studies reporting no association between T and breast cancer risk
(Garland and others 1992; Wysowski and others 1987) or in one of the positive studies
(Cauley and others 1999).
Laboratory assay variation would also most likely be non-differential since cases and
controls were analyzed concurrently in these studies. The intra and inter assay
coefficients of variation in these studies were rather good, ranging from 4 to 14 percent.
However, the coefficient of variation was not reported in one of the null studies
(Wysowski and others 1987).
Thus, although there may be some attenuation in effect estimates in all of these studies, it
is not clear whether measurement bias due to degradation can explain the discrepancies
between the two null studies and the positive studies.
Temporal Bias:
If breast cancer development inereases T levels, then studies that included subjects
diagnosed shortly after baseline hormone measurement may have artificially elevated
estimates of the risk of breast cancer associated with T levels. Two of the positive
studies and both of the null studies excluded subjects diagnosed 6-24 months after
baseline (Dorgan and others 1996; Garland and others 1992; Wysowski and others 1987;
Zeleniuch-Jacquotte and others 1997). However, the study with the most conservative
cut-point of 24 months (Dorgan and others 1996) reported a significant positive
association between T levels and breast cancer risk. Thus, although it is possible that
temporal bias played a role in the four positive studies with no exclusions, this latter
study suggests that temporal bias cannot explain the association between T and breast
cancer risk.
Effects o f Confounding:
Lack of control for body mass index (BMI) or age at menopause could result in a positive
bias away from the null. All of the studies that reported a significant association between
T levels and breast cancer risk included either BMI or height and weight as covariates in
the statistical model. All but one of the null studies (Garland and others 1992) considered
either the amount of time menopausal (Dorgan and others 1996; Thomas and others
1997; Wysowski and others 1987) or age at menopause (Berrino and others 1996; Cauley
and others 1999; Hankinson and others 1998; Zeleniuch-Jacquotte and others 1997) as a
covariate to control for the effects of menopause on T levels. Thus it is unlikely that
confounding by these variables can explain the associations observed between T levels
and breast cancer risk.
All of the prospective studies that examined the T-breast cancer association were
conducted using cohorts from Caucasian populations. Only two studies (Berrino and
others 1996; Thomas and others 1997) were conducted outside the United States, one in
Italy (Berrino and others 1996) and one on the island of Guernsey (Thomas and others
1997). It is therefore unlikely that differences in the populations studied can explain the
discrepant results between studies. There are, as far as we know, no prospective data
from non-white populations.
Studies o f Testosterone Measured Post-Diagnosis that Examine the Testosterone-Breast
Cancer Association:
Several case-control studies published the past 20 years have evaluated the association
between T levels and breast cancer risk. Comparisons to levels in control subjects have
shown that both postmenopausal breast cancer cases (Adlercreutz and others 1989; Hill
and others 1985; Lipworth and others 1996; Secreto and others 1991; Secreto and others
1983b) and premenopausal cases (Secreto and others 1983a; Secreto and others 1984;
Secreto and others 1989) have significantly elevated T levels. Table 1-2 presents the odds
ratios (OR) of breast cancer associated with categories of serum T, while Table 1-3
presents results of studies that compared mean levels of T. The measures of T in cases
were almost twice those of controls. Although these retrospective studies support the
prospective study results showing an association between increased T and increased
breast cancer risk, these results may not be as readily interpretable as those from the
prospective studies, given the possibility that the presence of cancer, or the treatment for
it, may have increased T levels.
Both retrospective and prospective studies have published statistically significant
associations between increased levels of T and increased breast cancer risk. These
associations are unlikely to be due to measurement biases, the influence of disease, or
lack of adjustment for the confounding effects of BMI or age at menopause.
Table 1-1 The relative risk of breast cancer associated with testosterone levels in postmenopausal women,
results from prospective studies
Author, year Cohort Age” Sample size Matching Adjusted for
Wysowski, 1987 Washington County,
36-90 39 cases; 155
Race, age, time
since last menstrual
Matching variables only
Garland, 1992
Rancho Bernardo,
50-79 15 cases; 409
at risk
Age, BMI (tertiles), smoking at
baseline other hormones®
Dorgan, 1996
Columbia, MO U.S. 52-73 71 cases; 133
Exact age, date {+/-
1 y) and time of
blood draw (+/-2 h)
y since menopause, height, weight,
parity, family history; matched
Berrino, 1996
Study of Hormones
and Diet in the
Etiology of Breast
Tumors, Italy
40-69 24 cases; 87
Recmitment center,
recruitment date,
dayli^t saving
period at time of
recruitment, location
of freezer storage
Age at menarche, age at first
childbirth, number of births, age at
menopause, weight, height, BMI
waist-to-hip ratio, other hormones;
matched anal)^is
Thomas, 1997
Guernsey, U.K. mean= 59 61 cases; 179
Age {+/- 2 y), date
of blood collection
(+/-1 y), number of
years menopausal (1-
2 y, or 3+)
Age at menarche, parity, number of
y post-menopausal, BMI, E2 and
SHBG; matched analysis
Zeleniuch-Jocquotte, NYXJ Women's 49-65 85 cases; 163 Age at enrollment BMI, age at menarche, parity, age
1997 Health Study, U.S. controls (+/-6 mos), date of
initial blood
donation (+/-3 mos),
menopausal status
at first full-term pregancy, age at
menopause, family history, history
of benign breast condition, history
of oophorectomy, lifetime mos of
lactation, smoking; matched
HanJdnson, 1998
Nurses Health
Study, U.S.
46-69 147 cases; 299
Age (+/- 2 y), month
of collection, time
of day that blood
was drawn (+/- 2 h),
fasting status
BMI (at age 18, quaitiles), family
history, age at menarche (quartiles),
parity/age at 1st birth, age at
menopause (quartiles); matched
Cauley, 1999 Study of
Fractures, U.S.
65-75 97 cases; 250
Age, BMI, age at menarche, first
birth, and menr^ause, surgical
menopause (Y/N), nulliparity
(Y/N), family history, past estrogen
use (Y/N)^, walking for exercise
(Y/N) and alcohol consumption
(g/d quintile)
Baseline age distribution
All units converted to pg/mL
Age adjusted model
Y/N, yes/no; NS, not significant; y, years; h,
' Analysis also conducted excluding past
estrogen users
^ Adjusted for BMI and age only
* Adjustment did not change results from the
crude analysis
Table 1-1 continued.
Author, year Exposure Cases Controls Categories’ ’ RR' 95% Cl P Adjusted RR 95% Cl P
Wysowski, 1987 T 39 155 N/A N/A N/A NS N/A N/A NS
Garland, 1992 T 5 132 37-176 1.0
5 137 177-284 1.1 NS
5 140 285-778 1.0 NS NS
Dorgan, 1996 T 9 32 <98 1.0 1.0
13 28 98-169 1.8 0.6-5.0 2.9 0.9-9.4
20 39 170-259 2.1 0.8-5.6 2.9 1.0-8.6
29 34 >259 3.7 1.4-10.0 6.2 2.0-19.0 0.02
Berrino, 1996 Iree T 24 87 <0.57 1.0
0.57-0.86 1.8 0.4-9.3
>0.86 5.7 1.5-22.2 0.005 4.6 1.1-20.0
T 24 87 <170 1.0
170-250 4.8 0.9-25.1
>250 7.0 1.4-36.4 0.026 11.5^ 1.3-99.6
Thomas, 1997 T 13 59 <210 1.00
22 61 210-360 1.83 0.82-4.12
26 59 >360 2.39 1.01-5.65 0.045
Zeleniuch- T 85 163 <210 1.0
Jocquotte, 1997
210-291 2.4 1.0-5.6
292-415 3.5 1.4-8.4
>415 2.7 1.1-6.8 <0.05
Hankinson, 1998 T 33 75 <160 1.00 1.0
38 79 160-220 1.12 1.12 0.60-2.10
37 78 230-310 1.10 1.07 0.57-2.00
39 67 >310 1.34 0.05 1.40 0.73-2.70 0.04
Cauley, 1999 free T 10 56 <1.6 1 1.0
23 65 1.6-2.4 1.7 0.7-4.2 2.2 0.7-7.1
33 58 2.4-3.S 3.5 1.5-8.2 6.4 2.1-19.6
31 64 >3.8 2.5 1.1-6.0 0.01 3.3 1.1-10.3 0.009
T 10 57 <121 1.0 1.0
25 61 121-176 2.1 0.9-5.0 2.2 0.7-7.1
30 66 177-276 3.0 1.3-6.7 5.5 1.8-17.0
32 60 >276 2.8 1.2-6.5 0.01 3.6 1.1-11.7 0.008
* Baseline age distribution
All units converted to pg/mL
Age adjusted model
' Analysis also conducted excluding past
estrogen users
Adjusted for BMI and age only
* Adjustment did not change results from the
crude analysis
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Table 1-2 The relative risk of breast cancer associated with testosterone levels, results from case-control
A uthor, year Population Exposure Cases Controls Categories* OR 95% Cl P
Secreto, 1984*’ Prem enopausal women serum T 13 47 <590 1.0
in Milan
4 9 590-670 2.6 0.6-10.9
10 6 671 + 10.2 2.6-40.0 0.0004
urinary T 9 35 <8.2 1.0
4 7 8.2-10.5 2.3 1.2-12.9
10 5 10.6+ 8.4 2.1-33.6 0.002
Secreto, 1989*’ W omen in Milan age serum T 31 51 <309 1.0
30-49 32 17 309+ 3.4 1.6-7.3 0.05
urinary T 36 50 <7.6 1.0
24 16 7.6+ 2.1 0.9-4.8 NS
Secreto, 199 V Postmenopausal serum T 16 40 <146 1.0
women in Milan <69 y
16 40 146-212 1.2 0.5-3.0
o f age
18 38 213-275 1.5 0.6-3.7
25 32 >275 2.7 1.1-6.7 0.03
urinary T * * 11 43 <18 1.0
14 38 18-31 1.2 0.5-2.9
17 37 32-46 2.2 0.8-5.7
30 26 >46 4.7 1.8-12.1 0.001
serum DHT 15 37 <36 1.0
20 40 36-57 1.6 0.7-3.7
16 38 58-82 1.3 0 .5 -3 .I
24 35 >82 2.0 0.8-5.0 NS
Lipworth, 1996' Postmenopausal serum T * 23 35 260 1.00
women from Sweden
15 27 350 0.75 0.33-1.75
47 30 470 2.64 1.27-5.46
36 30 700 2.30 0.97-5.50 0.041
“Serum T and DHT converted to pg/ml and urinary T converted to |xg/24 h
’ ’ Age adjusted model
“ Adjusted for age, occupation, number of children
^ Units in pg/ml
“ Adjusted for age and residence
Categorized by quartile medians
Table 1-3 Mean levels of testosterone in breast cancer cases and control
Author, year Population Exposure Group N Mean“ SD
Secreto, 1983 Postmenopausal women Serum T Controls 30 310 110
Carcinoma 28 550 200 0.001
Secreto, 1983 Premenopausal women Urinary T Controls 22 6.25 3.48
Familiality 21 5.41 3.60 NS
Hyperplasia 39 6.97 4.44 NS
Carcinoma 18 11.3 6.78 0.01
Secreto, 1984 Premenopausal women Serum T Controls 55 470 160
Breast Hyperplasia 31 550 200 <0.05
Breast Cancer 23 620 220 <0.005
Hill, 1985 Postmenopausal women Serum T Healthy Caucasian 43 NS
Healthy Japanese 59 NS 0.01
Cases Japanese 33 NS" 0.01
Adlercreutz, 1989 Postmenopausal women Serum T Vegetarians 10 172.80 86.40
Omnivores 9 233.28 66.24 <0.05
Cases 8 319.68 132.48 <0.05
“ Serum T converted to pg/ml and urinary T converted to pg/24 h
’ ’ Comparisons to mean levels in controls using the t-test
' Comparison to Healthy Japanese group
Androgen Receptor and the Role of a Functional Polymorphism in Androgen Receptor
and Breast Cancer Risk
The main receptor for T is the AR. A functional polymorphism in the AR gene has been
examined in female breast cancer and the literature is reviewed to shed light on the
possible mechanisms by which T may affect breast cancer risk.
Androgen Receptor Protein and Breast Cancer
The AR is expressed in the majority of breast cancers (Bayer-Gamer and Smoller 2000;
Bryan and others 1984; Brys and others 2002; Hall and others 1996; Kuenen-Boumeester
and others 1992; Lea and others 1989; Soreide and others 1992). Several studies have
been conducted to examine the effects of androgens on the growth of AR positive breast
cancer cell lines. These studies have reported both inhibitory (Ortmann and others 2002;
Poulin and others 1988) and stimulatory (Hackenberg and others 1988; Marugo and
others 1992) effects. These divergent effects have been observed to be specific to the cell
line under study (Birrell and others 1995a).
To my knowledge, the only in vivo study of the effect of T on breast cell proliferation
was conducted in rats and showed that treatment with T results in both tumor regression
and a reduction in estrogen receptor (ER) expression (Zava and McGuire 1977).
However, it is unclear whether T levels used represent physiologic doses. No in vivo or
epidemiologic studies have examined the association between serum or tissue T levels
and breast cell proliferation in tumors with varying degrees of AR expression.
In summary, the effects of androgens on breast cancer ceil growth are still unclear. In
contrast to the epidemiologic data observing a consistently reported association between
serum T levels and increasing breast cancer risk, in vivo studies report an anti
proliferative effect and in vitro studies report both proliferative and anti-proliferative
The Androgen Receptor Gene and a Polymorphic CAG Repeat
The AR is coded by a single 90 kb gene on the X chromosome (Xql I-ql2) which
encodes an 11 kb mRNA transcript composed of 8 exons (Brown and others 1989; Chang
and others 1988; Lubahn and others 1989; Lubahn and others 1988; Tilley and others
1989). Epidemiologic evidence for a role of the AR gene in breast cancer was first
suggested by studies of male breast cancer patients. A mutation in the AR in the DNA
binding domain resulting in the inability to bind androgens was first reported in a pair of
brothers with breast cancer (Wooster and others 1992). In a study of 13 male breast
cancer cases, 1 case was observed to carry a similar mutation (Lobaccaro and others
1993). Another small study of 11 male breast cancer cases, did not observe this mutation
(Hiort and others 1996). These results suggested that the mutation may play a role in the
development of breast cancer in some males.
Within the first exon of the AR lies a polymorphic CAG repeat that encodes a
polyglutamine tract of variable length. The normal size range of these repeats is between
6 and 39 repeats (Edwards and others 1992; Giovannucci and others 1997). Between 40
and 66 repeats have been observed in patients with a rare, neurodegenerative disorder
called spinal and bulbar muscular atrophy (La Spada and others 1991), a disease
characterized by androgen insensitivity with gynaecomastia, testicular atrophy,
oligospermia, azoospermia, and elevated serum gonadotropins.
AR-CAG Repeat Length and Androgen Receptor Activity
Several studies have observed an association between increasing AR-CAG repeat length
and a linear decrease in AR transaetivation activity (Chamberlain and others 1994; Irvine
and others 2000; Kazemi-Esfarjani and others 1995; Tut and others 1997). Consistent
with this, male carriers of the short AR-CAG repeat length are at increased risk of
prostate cancer (Ekman and others 1999; Giovannucci and others 1997; Hakimi and
others 1997; Hsing and others 2000; Ingles and others 1997; Irvine and others 1995;
Stanford and others 1997).
AR-CAG Repeat Length and Breast Cancer
The association between the length of the AR-CAG repeat polymorphism and hreast
cancer risk has been examined in several case-control studies (Table 1-4) (Dunning and
others 1999; Giguere and others 2001; Haiman and others 2002b; Kadouri and others
2001; Rebbeck and others 1999; Spurdle and others 1999; Suter and others 2003). The
long AR-CAG repeat, representative of the less active AR, was associated with a
statistically significant increase in breast cancer risk in a population of women jfrom
Quebec (Giguere and others 2001), and in a population of BRCAl mutation carriers
(Rebbeck and others 1999). Four additional studies (Dunning and others 1999; Kadouri
and others 2001; Spurdle and others 1999; Suter and others 2003) reported slightly
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
increased risk of breast cancer associated with the long allele, but none of these findings
were statistically significant. A study nested within the Nurses’ Health Study cohort
found no increased breast cancer risk associated with the long AR allele overall, but, an
increased risk was observed when analyses were limited to subjects with a first-degree
family history of breast cancer (OR = 1.70; 95% Cl, 1.2-2.4) (Haiman and others 2002b).
Another trinucleotide repeat in the AR, a GGC repeat, has been observed to be associated
with prostate cancer risk (Chang and others 2002; Hakimi and others 1997; Hsing and
others 2000; Platz and others 1998; Stanford and others 1997). One of three studies that
have examined the GGC repeat length and breast cancer (Dunning and others 1999;
Kadouri and others 2001; Suter and others 2003) found a significant association in
women diagnosed before age 45 (Suter and others 2003), but no evidence of an
interaction between the CAG and GGC repeat with breast cancer risk.
The three studies that reported a significant association between long AR-CAG repeat and
breast cancer risk (Giguere and others 2001; Haiman and others 2002b; Rebbeck and
others 1999) included both pre and postmenopausal women. One study stratified on
menopausal status and found that the significant association with the long AR-CAG was
observed only in postmenopausal (OR = 3.22; 95% Cl, 1.54-6.75) but not in
premenopausal women (0R= 1.03; 95% Cl: 0.43-2.48). If this effect modification is
true then it may explain, at least in part, the non-significant results in the studies
restricted to women under age 40 years (Spurdle and others 1999).
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Issues with the Studies o f AR-CAG Repeat Length and Breast Cancer Risk
The gene for the AR lies on the X chromosome; therefore, women carry two alleles while
men carry only a single allele. In general, normal women are a mosaic with one allele
randomly expressed in each cell. A recent study reports that 13% of young (27-45 years
old) breast cancer cases showed preferential activation of one of the AR alleles as
measured by genotyping of peripheral blood DNA, but there was no preference towards
the allele with the longer or shorter CAG repeat (Kristiansen and others 2002). Analyses
of the AR in women that only consider the length of the CAG repeat on one allele assume
that this is the active allele in the breast tissue. Analyses that use the average of the CAG
repeat lengths or the sum of the repeats consider the contributions of both alleles;
however, if only one allele is preferentially expressed then this would result in
misclassification. There is a high rate of heterozygosity in the AR-CAG repeat length,
therefore this is likely to be a major misclassification problem, which should bias the
results towards the null. Genotyping methods can be optimized to better detect if there is
a preferentially active AR allele by either genotyping tumor tissue or serum DNA using
methylation sensitive enzymes (Kristiansen and others 2002).
In summary, the studies conducted so far suggest that the long AR-CAG repeat (less
active AR) may be associated with increased breast cancer risk in women who are
postmenopausal, have a first-degree family history of breast cancer or a known BRCAI
mutation. The location of the AR gene on the X chromosome means that results from
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
epidemiological studies will be biased towards the null as long as we do not know which
allele is expressed.
■ D
o .
Q .
■ D
( / )
Table 1-4 The relative risk o f breast cancer associated with the androgen receptor CAG repeat
■ D
c q '
— i
■ D
— i
o .
■ D
— i
■ D
( / )
( / )
Author, year Population AJi genotype* Long allele Cases Controls Adjusted for RR (95% Cl)
Rebbeck, 1999
Multi-insUititional study o f mutatioQ
carriers ascertained through families with a histoiy
o f breast/and or ovarian cancer between 1978 and
28+ repeats 19
1.81 (1.06-3.081'’
Spurdle, 1999
Early onset breast cancer (<40) and age matched
controls from Australia.
21+ repeats 78
Age, country o f birth, state, education,
marital status, d o f live births, height,
weight 1 year ago, age at mcnarche, OC
use, family history, ER polymorphism,
mother's cotmtry o f birth, father's country
o f birth
Dunning, 1999
Cases from East Anglian region o f the UK and
random controls from the EPIC cohort.
S /L
23+ repeats 209
0.82 (0.62-1.09)
1.31 (0.87-1.97)
Giguere, 2001
Incident cases from Quebec city and age and area of
residency matched controls. S /L + L /L
21+ repeats 17
Matched analysis
2.14 (1.22-3.73)
Kadouri, 2001
Affected and unaffected BRCAl/2 carriers from
two genetic clinics; one in Jerusalem, Israel and the
other in London, UK.
S /L + L /L
28+ repeats BRCAl/2 carriers: 0.80 (0.44-1.46)
non-carriers: 1.27 (0.83-1.96)
Haiman, 2002
Cases and controls from the Nurses' Health Study
and controls matched on year of birth, menopausal
status, postmenopausal hormone use, and time of
day, month, and fasting status at blood draw.
S /L + L /L
22+ repeats 179
Age at menarche, parity, age at first birth,
BMI at age 18, weight gain since age 18,
benign breast disease, first degree family
history, duration o f posUnenopausal
honnone use; matched analysis
Suter, 2003
Cases (<;45 y) identified through tlte Cancer
Surveillance System o f Western Washington and
fiuquency-matched controls on 5-year age group
and reference year.
22+ repeats 121
Age at reference and reference year
1.2 (0.8-1.7)
^ S, short allele; L, long allele
Analyses using 29 and 30 repeats as the cut point produced progressively higher significant risk estimates and progressively earlier age of
onset, no trend test published
Published OR models the S/S genotype as the high-risk allele
If the long AR-CAG repeat (less active AR) is associated with increased breast cancer
risk in postmenopausal women, then how do these results coincide with results showing
that increased T levels increase postmenopausal breast cancer risk?
One hypothesis to explain this apparent paradox is that the less active AR may be
involved in a physiological feedback associated with increased circulating T. However,
the only data available discount this hypothesis. Two studies have examined the
association between the ^i?-CAG repeat length and circulating T levels in normal
women. ^i?-CAG repeat length was inversely associated with T levels (Haiman and
others 2002b; Westberg and others 2001). In other words, the less active AR was
associated with lower circulating T levels and results were statistically significant in both
a study of premenopausal women (Westberg and others 2001) and a study of
postmenopausal women (Haiman and others 2002b).
If the AR is not involved in a feedback mechanism to influence T levels in
postmenopausal women then it is possible that the effect of T on the breast epithelium
does not act through binding to the AR. T may exert its effect on breast tissue through
conversion of T to estrone which is then aromatized into estradiol (E2) in adipose tissue,
and the increased E2 levels may result in increased breast cell proliferation and breast
cancer risk.
2 0
T may also exert an indirect effect on breast cancer proliferation by sequestering sex
hormone binding globulin, leaving more E2 in the non-protein bound state and able to act
on breast tissue (Lipworth and others 1996; Siiteri and others 1981). Approximately 66%
of total T is bound to sex hormone binding globulin, 31% is bound to albumin, and 2% is
bound to cortisol binding protein (Duim and others 1981). Two of the studies suggesting
an association between T and breast cancer reported that this association disappeared
when adjusting for E2 levels (Thomas and others 1997; Zeleniuch-Jacquotte and others
1997). But, in the pooled analysis the significant association between T and breast cancer
risk remained after adjustment for E2 (2002).
Finally, it is possible that further studies will show that AR-CAG repeat length is not
linked to breast cancer risk.
Prospectively conducted epidemiologic studies have found that increased levels of serum
T are associated with an increase in postmenopausal breast cancer risk. However, a
number of questions remain. Several lines of evidence suggest a role of AR in breast
cancer risk, and sparse epidemiologic data suggest that a long AR-CAG repeat yielding a
less active AR may be associated with increased risk. There still remain a number of
questions on how T increases breast cancer risk. While in vitro studies report both
proliferative and anti-proliferative effects of T on the growth of various breast cancer cell
lines, we still need to further understand under which in vivo circumstances does T exert
these effects. Finally, we do not know whether androgens affect breast cancer risk in
premenopausal women. Further analyses of the role of^i?-CAG repeat length and breast
cancer using genotyping methods that assess which allele is the active AR allele are
clearly needed. Additional data are also needed to help understand the apparent paradox
between the AR-CAG repeat length, T levels and breast cancer risk.
2 2
GRANT PROPOSAL: The Androgen Receptor and Mammographic Density
(Predoctoral Fellowship awarded by California Breast Cancer Research Program 8GB-
Introduction and Hypotheses
High androgen levels, in particular high T levels, have been shown to increase breast
cancer risk in several epidemiologic studies. The AR is expressed in breast tumor tissue
and is involved in feedback mechanisms that regulate T levels. The association between
the CAG repeat length polymorphism in the gene for the AR and breast cancer risk has
been examined in a handful of epidemiologic studies. The results from these studies are
inconsistent and some even suggest that low, rather than high, androgen activity is
associated with increased breast cancer risk.
I prepare to further our understanding of the role of AR-CAG length in breast cancer
development by examining the effect of the AR-CAG repeat length on two different
outcomes. I will address the effect of AR-CAG length on mammographic density, a
strong independent breast cancer risk factor and potential marker for breast cell
proliferation. I propose to test the hypothesis that longer repeat length is associated with
higher mammographic density. A corollary of this hypothesis is that AR activity is
associated with more advanced forms of breast cancer. To test this hypothesis, I will also
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
examine the association between the AR-CAG repeat length and breast cancer tumor
grade. This study will use DNA and mammographic density assessments collected on a
sample of 396 African American and Caucasian cases as part of the population based
Women’s Contraceptive and Reproductive Experiences (CARE) study (PI: Leslie
Bernstein, Ph.D.), the Radiographic Densities and Breast Cancer Prevention study
(BCRP# 2RB 0058, Pis: Malcolm Pike/Giske Ursin) and the Mammographic Density and
Sex Steroid Genes study (BCRP# 6IB 0093, Pis: Sue A. Ingles/Giske Ursin).
Specific Aims
I will test the hypothesis that AR activity influences mammographic density by:
AIM I : Determining the association between mammographic density and the AR-CAG
repeat polymorphism.
AIM 2: Determining the association between mammographic density and AR-CAG
repeat polymorphism stratified by ethnicity, menopausal status, and use of hormone
replacement therapy.
I will also test the hypothesis that AR activity is associated with more advanced breast
cancers by:
AIM 3: Determining the association between polymorphism in AR and the following
prognostic indicators: tumor size, stage, grade, and nodal status.
2 4
Background and Significance
Androgens and Breast Cancer Risk
Six of the eight published studies that examine the influence of T, measured
prospectively, on breast cancer risk report a statistically significant increase in
postmenopausal breast cancer risk with increasing levels of serum T (Berrino and others
1996; Cauley and others 1999; Dorgan and others 1996; Hankinson and others 1998;
Thomas and others 1997; Zeleniuch-Jacquotte and others 1997). While one of these
studies did not publish significant point estimates, a significant trend in risk was reported
(Hankinson and others 1998). Only two of the eight studies failed to demonstrate any
significant association (Garland and others 1992; Wysowski and others 1987).
Biological Rationale for a Role o f AR in Breast Cancer
AR expression has been observed in 50-90% of breast cancers; the more recent studies
using more modem assays show the higher levels of expression (Bryan and others 1984;
Kuenen-Boumeester and others 1992; Soreide and others 1992).
An in vitro study published by Birrell et al. (Birrell and others 1995a) observed that T
affects breast cancer cell proliferation through binding to the AR. Results are ambiguous
since both proliferative and inhibitory effects of 5a-dihydroT and a non-metabolizable
androgen, mibolerone, were observed in AR-positive cell lines. These results confirm
that T binding to AR has an influence on breast cancer tumor growth, although the
direction is still unclear.
AR Polymorphism
The AR is a member of the steroid hormone receptor subfamily of the nuclear hormone
receptor super family. Like all steroid hormone receptors, it is composed of four
domains: The N-terminal transactivation domain, the DNA binding domain, the
translocation domain, and the C-terminal ligand binding domain (Mangelsdorf and others
1995). Upon binding of the ligands, T or 5a-dihydrotestosterone (DHT), to the C-
terminal domain, the receptor directly targets sequences in DNA called androgen
response elements to regulate gene transcription (Adler and others 1992; Ho and others
1993; Roche and others 1992).
The AR is coded by a single 90 kilobase gene on the X chromosome (Xql l-ql2) which
encodes an 11 kilobase mRNA transcript composed of 8 exons (Brown and others 1989;
Chang and others 1988; Lubahn and others 1989; Lubahn and others 1988; Tilley and
others 1985). The entire N-terminal transactivation domain is encoded by the first exon.
Within the first exon lies a polymorphic CAG repeat that encodes a polyglutamine tract
of variable length. The normal size range of these repeats is between 6 and 39 repeats
(Edwards and others 1992; Giovannucci and others 1997). Expansion of repeat to
between 40 and 66 repeats has been observed to result in an androgen insensitive
phenotype of a rare, neurodegenerative disorder called spinal and bulbar muscular
atrophy (La Spada and others 1991). In addition, several epidemiologic studies have
observed an association between being a carrier of the short XR-CAG repeat length and
increased prostate cancer risk (Ekman and others 1999; Giovannucci and others 1997;
Hakimi and others 1997; Hsing and others 2000; Ingles and others 1997; Irvine and
others 2000; Stanford and others 1997). Based on these observations, functional studies
have identified a relationship between increasing yfi?-CAG repeat length and a linear
decrease in AR transactivation activity (Chamberlain and others 1994; Irvine and others
2000; Kazemi-Esfarjani and others 1995; Tut and others 1997).
AR-CAG Repeat and Breast Cancer Risk
Several epidemiological studies have been conducted to examine the association between
the length of the AR-CAG repeat polymorphism and breast cancer risk using a case-
control study design (Dunning and others 1999; Giguere and others 2001; Kadouri and
others 2001; Rebbeck and others 1999; Spurdle and others 1999). Two of the five studies
conducted demonstrated that the long ^R-CAG repeat, and therefore, the less active AR,
is associated with a significant increased breast cancer risk in a population of BRCAI
mutation carriers (Rebbeck and others 1999) and in a population of non-carriers (Giguere
and others 2001). After stratification by menopausal status, the significant association
observed remained only in postmenopausal women (Giguere and others 2001). These
results suggest that low, rather than high T activity increases breast cancer risk in
postmenopausal women and in BRCAI mutation carriers. However, two of the five
studies reported null associations. One of these was in only premenopausal women
(Spurdle and others 1999), another was of a population of Ashkenazi Jewish women
(Kadouri and others 2001). The results reported in the fifth study were supportive of the
observed association but not significant, insufficient power is an unlikely explanation
since this study was one of the largest. However, insignificant results may be due to
limitations in the analysis of the CAG repeat length as a dichotomous variable using a
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
cutpoint based on previous studies of prostate cancer risk (Dunning and others 1999).
Both Rebbeck and Giguere categorized the AR variable based on thorough examination
of the AR distribution in their own study populations.
AR-CAG Repeat and Breast Cancer Prognostic Indicators
The association in BRCAI mutation carriers also showed an association between
increasing CAG repeat length and progressively earlier age of breast cancer diagnosis
(Rebbeck and others 1999). Based on these results, additional case-only studies have
also been conducted to examine the association of the AR-CAG repeat length with age of
onset (Given and others 2000; Menin and others 2001) or various prognostic indicators
such as tumor size, stage, grade, nodal status, ER and progesterone receptor (PR) status
(Yu and others 2000). A significant association between short, rather than long, AR-CAG
repeats and increased breast cancer tumor grade was observed in the only study to
examine this effect (Yu and others 2000). These results contradict case-control study
results showing that the long AR-CAG was associated with increased breast cancer risk.
The study results published by Yu et al. (Yu and others 2000) also contradict results from
studies examining AR expression. AR expression has been observed to correlate with
indicators of good prognosis such as decreased tumor grade (Brentani and others 1986;
Isola 1993; Kuenen-Boumeester and others 1992), increased survival (Bryan and others
1984; Langer and others 1990), and a positive response to tamoxifen (Bryan and others
1984). In addition, it has been shown that expression of AR is correlated with ER
positive status (Brentani and others 1986; Isola 1993; Kuenen-Boumeester and others
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
1992; Lea and others 1989; Miller and others 1985; Soreide and others 1992; Valyani and
others 1987). The study by Bryan et al. (Bryan and others 1984) confirmed that the
effect of AR on prognosis is independent of that of ER by showing that the addition of
AR significantly improved the response to tamoxifen among subjects who are ER
One of the aims of this study will be to examine the association between the AR-CAG
repeat length and breast cancer tumor size, stage, grade, and nodal status using a
population based study setting. Cases will be selected from CARE study, in Los
Angeles County (PI: Leslie Bernstein, Ph.D.) with information on possible confounders,
and sufficient sample size to stratify on menopausal status.
Mammographic Density and Breast Cancer Risk
Mammographic density, measured as either percent density or using Dr. John Wolfe’s
criteria (Wolfe 1976) has been found to be strong and independent breast cancer risk
factor in numerous epidemiological studies (Boyd and others 1995; Brisson and others
1982; Byrne and others 2001; Byrne and others 1995; Carlile and others 1985; de Stavola
and others 1990; Kxook 1978; van Gils and others 2000; Whitehead and others 1985;
Wolfe and others 1987), (Ursin et al. (in preparation)). It has been suggested that
mammographic density may represent a marker for breast cell proliferation (Spicer and
others 1994). In this study we will use a computer-assisted method of assessing density
(Ursin and others 1998). This method has shown to correlate well with the expert
outlining method (Ursin and others 1998), and can strongly predict breast cancer risk
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
(Ursin et al, in preparation). This method was used to measure mammographic density as
part of the Radiographic Densities and Breast Cancer Prevention Study at the University
of Southern California (BCRP #2RB 0058) and will be used in the proposed study.
The AR has been shown to have varying associations with breast cancer risk.
Contradictory to results showing that T increases breast cancer risk, two studies support
the hypothesis that the less active AR (long CAG repeat) increases breast cancer risk. On
the other hand, another study shows that the more active AR (short CAG repeat)
increases risk for more advanced types of breast cancer as measured by breast cancer
tumor grade. These initial studies support the need for further analyses of the role of the
AR in breast cancer etiology. To date, there has been no published examination of the
effects of variability in the AR on mammographic density. Mammographic density is a
strong breast cancer risk factor. It is still unclear whether AR activity may influence
development of more aggressive forms of cancers, and if so, whether this association is
through its influence on mammographic density. If^R-CAG repeat length is associated
with more aggressive forms of breast cancer and mammographic density, then these
observations support interventions to reduce mammographic density in women and,
therefore, increase the ability to detect breast cancer earlier.
Preliminary Results
In this study I will use women who had their mammograms scanned as part of the
Radiographic Densities and Breast Cancer Prevention study (BCRP # 2RB 0058, Pis
Malcolm Pike/Giske Ursin). Determinations of mammographic densities were conducted
by Dr. Ursin on the craniocaudal image from the cancer free (contralateral) breast of
participating cases using a method previously described (Ursin and others 1998).
Mammograms were read in groups of 30-60, where each reading group consisted of a
proportional number from each ethnic group. Preliminary results are listed in Table 2-1
on the full sample.
The study sample that will be used in the proposed study has been used in the BCRP
funded study, Mammographic Density and Sex Steroid Genes (BCRP# 6IB 0093, Pis:
Sue A. Ingles/Giske Ursin); however, no significant associations between the hormone
metabolic genes, CYPl 7, COMT, 17HSDB1, 3HSDB1 and mammographic density were
The AR-CAG repeat and Breast Cancer Grade
I have experience examining the AR-CAG repeat in a sample of breast cancer cases at
City of Hope/Beckman Research Institute. My doctoral dissertation will use the analyses
funded by this award to supplement my initial work at City of Hope (Lillie unpublished).
Preliminary analyses (Table 2-2), although not significant, support the association
observed by Yu, that the short AR-CAG repeat length is associated with increased breast
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
cancer tumor grade. Failure to observe a significant association may be due to
insufficient power to observe a significant association.
Table 2-1 Distribution of breast cancer cases and controls according to mammographic density
density Cases Controls Crude OR OR" 95% Cl P
<1 23 36 1.00 1.00
1-9 70 75 1.46 1.47 0.77-2.80
10-24 109 94 1.82 1.58 0.84-2.97
25-49 266 177 2.35 2.04 1.12-3.74
50-60 97 55 2.76 2.55 1.29-5.06
60-r 110 51 3.38 3.01 1.50-6.03 0.0002
“ S, short allele; L, long allele
’ ’ Analyses using 29 and 30 repeats as the cut point produced progressively higher significant risk estimates
and progressively earlier age of onset, no trend test published
Published OR models the S/S genotype as the high-risk allele
Table 2-2 Analysis o f androgen receptor and breast cancer tumor grade in breast cancer cases with a family
history of breast cancer
AR genotype Grade <3 Grade 3+ OR 95% Cl
Average CAG 22.46 21.68 0.82 0.65-1.04
CAG categorical"
L/L 21 13 1.00
S/S+S/L 20 23 1.86 0.74-4.64
' S, <21 repeats; L, >21 repeats determined based on median of the distribution
Research Design and Methods
1) Genotype all study subjects for the AR-CAG repeat length polymorphism.
2) Conduct analyses to determine the associations between the AR polymorphism and
mammographic density, tumor size, grade, nodal status, and stage.
Subject Identification:
Study subjects will come from the breast cancer case-control study. Women’s CARE
study in Los Angeles County (PI: Leslie Bernstein, Ph.D.). The case patients were
diagnosed between July 1994 and April 1998, and were identified by the Los Angeles
County Cancer Surveillance Program (LACCSP), a Surveillance, Epidemiology, and End
Results (SEER) registry program. Cases also consented to let us access their
mammograms. We will use women who had their mammograms scanned as part of the
Radiographic Densities and Breast Cancer Prevention study (BCRP # 2RB 0058, Pis:
Malcolm Pike/Giske Ursin). We will use the subset of 396 women that also had a blood
sample, this is the same group of women that were used in the Mammographic Density
and Sex Steroid Genes study (BCRP# 6IB 0093, Pis: Sue A. Ingles/Giske Ursin).
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
All women who consented to the study underwent a 75 minute interview with trained
interviewers using a questionnaire with a fairly standard set of questions on established
and suspected breast cancer risk factors (demographics, pregnancies, menstruation,
menopause, surgery, hormonal contraception, hormone replacement, medication,
infertility, medical history, mammogram history (for the 5 years preceding the
diagnosis/reference date), physical activity, smoking, body size, alcohol consumption,
family history of cancer, and prenatal exposures).
Laboratory Methods:
Using a standard protocol, genomic DNA was extracted previously from peripheral
blood. The AR-CAG repeat will be genotyped at a laboratory at the City of
Hope/Beckman Research Institute. Genotyping methods for AR-CAG repeat have been
optimized for use as a non-patemity marker in the study of affected sib pairs with cancer.
Mapping Interactive Cancer Susceptibility Loci (PI: Theodore Krontiris, M.D. Ph.D.).
All genotyping costs for the proposed study will be covered by City of Hope funds (see
memo in appendices).
The exon 1 CAG repeat of the AR will be PCR amplified using a FAM labeled forward
primer (5’-TCCAGAATCTGTTCCAGAGCGTGC-3’) and reverse primer (5’-
GCTGTGAAGGTTGCTGTTCCTCAT-3’). All microsatellite genotyping will be
performed using an ABI 377 with Genescan and Genotyper software (PE/Applied
Biosystems). Direct sequencing of a sample of homozygotes for at least three different
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
lengths will be conducted to determine the exact number of CAG repeats for each size
fragment. All allele calls will be made blinded to outcome data.
Statistical Analysis:
Linear regression models to test the association between the AR and mammographic
density will be used. The basic model will have the form
MD=a + pAR + Y X + e
where X is a vector of additional adjustment covariates (e.g. age, ethnicity) and the
residuals (s) are assumed to be independent and normally distributed. The parameter of
primary interest is p, which measures the association between the AR-CKG repeat length
and mammographic density. Transformations to mammographic density will be used if
necessary to satisfy the model assumptions. Other outcome variables such as tumor size,
stage, grade, and nodal status can replace mammographic density in the equation.
Categorical outcomes will be modeled using logistic regression techniques.
The AR will be treated as a continuous variable that considers the contributions of both
allele lengths as an average length variable. The difference in the allele lengths will be
included in the model to adjust for the difference in effect of subjects heterozygous for
two alleles of extreme lengths versus subjects homozygous for alleles close to the mean.
This model will test for a linear trend in mammographic density with increasing number
of CAG repeats. In order to test for a threshold type effect in number of CAG repeats on
mammographic density, the AR will be categorized into S/S, S/L and L/L genotypes. The
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
median of the AR-CAG repeat lengths in this study sample will be used as the cutpoint of
S versus L for each allele.
Relative amount (% of breast) of mammographic densities will be modeled as the
primary outcome both continuously and categorically. Cutpoints of mammographic
density in this study sample have been previously established as part of the Radiographic
Densities study (2RB 0058). Prognostic indicators will be analyzed as dichotomous
All regression models will be adjusted for age, and if necessary, for family history of
breast cancer. Quality of mammograms, as well as group in which the mammogram was
read will also be evaluated to control for the possible effects of inter and intra assay
measurement errors. In addition, models will be stratified on ethnic group, menopausal
status, and hormone replacement therapy use categories. The RR of having a ‘high risk’
mammographic density or a more advanced stage prognostic indicator associated with
different AR-CAG repeat lengths will be estimated using the odds ratio. Analyses will be
performed using the SAS statistical software package (SAS Institute, Inc. Cary, NC).
Power Calculations:
Since models of the AR genotype as a continuous variable are inherently more powerful
to detect significant effects on the various outcomes being measured, power calculations
were conducted using the AR genotype as a dichotomous categorical exposure variable
(S/S+S/L vs. L/L). The data reported in the preliminary results was used to calculate the
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
percent difference in mammographic density and RR of a dichotomous outcome that can
be detected with 90% power at the 0.05 level of statistical significance. With a sample
size of 396 cases, a standard deviation of 21.7% mammographic density, and a
distribution of 50% of cases carrying the S/S+S/L genotype (Table 2-2), this study will be
able to detect a 7.1% difference in mammographic density between carriers of the
S/S+S/L genot3 ^ e and carriers of the L/L genotype. Using the same 50% exposure rate
{AR genotype), and an equal distribution of cases by tumor grade (Table 2-2), this study
will have 90% power to detect a RR of 1.98.
DATA ANALYSIS: Polymorphism in the Androgen Receptor and Mammographic
Density in Women Taking and Not Taking Estrogen and Progestin Therapy (Lillie and
others 2004)
Evidence suggests that circulating androgen levels in postmenopausal women are
associated with breast cancer risk (2002). In a pooled analysis of 9 prospective studies of
postmenopausal women, the trend in breast cancer risk associated with increasing level of
T was statistically significant. The main receptor for T and DHT in breast tissue is the
AR. Within the first exon of the AR gene is a polymorphic CAG repeat. Several
epidemiologic studies have examined the association between the length of the CAG
repeat polymorphism in the AR gene and breast cancer risk (Dunning and others 1999;
Giguere and others 2001; Haiman and others 2002b; Kadouri and others 2001; Liede and
others 2003; Rebbeck and others 1999; Spurdle and others 1999; Suter and others 2003).
Three studies have observed that women with the less active AR allele (long CAG repeat)
have increased breast cancer risk (Giguere and others 2001; Haiman and others 2002b;
Rebbeck and others 1999), although the evidence is not entirely consistent (Dunning and
others 1999; Kadouri and others 2001; Spurdle and others 1999; Suter and others 2003).
Only one study has examined the association between the AR genotype and mammographic
density, no association was observed (Haiman and others 2003). This apparent
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
contradiction, that high androgen levels, but the less active AR genotype are associated
with breast cancer risk is not well understood. To shed more light on the ^i?-breast
cancer association, we examined the association between the AR-CAG repeat length and
mammographic density, a strong and independent breast cancer risk factor (Boyd and
others 1995; Byrne and others 1995; Ursin and others 2003). We previously reported
results demonstrating that mammographic density represents a strong risk factor for
breast cancer in this study (Ursin and others 2003), and furthermore, that genes important
in estrogen metabolism do not explain the variation in mammographic density observed
in these women (Haiman and others 2002a).
Materials and Methods
Study Population
Study subjects were women newly diagnosed with a first primary invasive breast cancer
who participated in the Los Angeles part of the Women’s CARE Study (Marchbanks and
others 2002). The Women’s CARE Study was a population-based case-control study of
invasive breast cancer in United States-bom African-American and Caucasian women
aged 35-64 years. All of the patient participants were diagnosed between July 1994 and
April 1998. Each participant provided a written informed consent at the time of
interview. All of the women were interviewed in person by trained interviewers using a
standardized questionnaire to collect information on demographic factors and potential
breast cancer risk factors including histories of oral contraceptive use, hormone therapy
(HT) use, family history of breast cancer, age at menarche, complete pregnancy history,
lifetime history of participation in physical exercise activities, and mammogram history
during the five years preceding the patient’s date of diagnosis. We restricted eligibility
for this study to breast cancer patients who reported having a screening mammogram in
the 5 years prior to breast cancer diagnosis. We requested screening mammograms for
60.9% of the patients included in the Los Angeles portion of the CARE study and who
provided written permission for the release of mammograms (N = 755, 426 Caucasian-
Americans and 329 African-Americans). We received mammograms for 71.1% (N =
303) of the eligible Caucasian patients and 69.3% (N = 228) of eligible African-American
patients (Ursin and others 2003). During the conduct of the Women’s CARE Study in
Los Angeles County, we collected a blood sample from all patients willing to provide a
sample. Of the 531 cases for whom we had a screening mammogram, we had a blood
sample for 423 (79.7%) women (Haiman and others 2002a). The study protocol was
approved by the Institutional Review Board at the University of Southern California.
Using a standard protocol, genomic DNA was extracted from peripheral blood (Haiman
and others 2002a). DNA samples were sent to City of Hope Beckman Research Institute
for AR genotyping. The exon 1 CAG repeat of the AR was amplified by PCR using a
fluorescently labeled forward primer (5 ’ -TCCAGAATCTGTTCCAGAGCGTGC-3 ’) and
reverse primer (5’-GCTGTGAAGGTTGCTGTTCCTCAT-3’). All of the microsatellite
genotyping was performed using an ABI 377 sequencer with Genescan and Genotyper
software (PE/Applied Biosystems, Foster City, Cahfomia). The number of CAG repeats
was calculated based on direct sequencing results of 8 prostate cancer cases with different
known PCR product lengths.
Mammographic Density
We measured percent mammographic density using the University of Southem California
Madena computer based threshold method of assessing density (Ursin and others 1998).
Percentage of mammographic density in the right and left breasts of both case and control
women have been shown to be highly correlated (Benichou and others 2003). Therefore,
we used screening mammograms of the contralateral (non-diseased) breast obtained
before diagnosis. The cranio-caudal mammographic images were digitized using a high-
resolution Cobrascan CX-312T scanner (Radiographic Digital Imaging, Torrance, CA)
and then viewed on a computer screen. A single reader blinded to all of the patient
characteristics evaluated all images. The reader first defined the total breast area using a
special outlining tool. The density assessments were done as follows. The reader defined
the region of interest (KOI) excluding the pectoralis muscle and other light artifacts
(prominent veins, and so forth). The reader then applied a yellow tint to gray levels
above a selected threshold using a computerized tool. The area highlighted in this
manner was considered to represent dense tissue. The software then calculated the
number of pixels within the entire breast as outlined and the number of pixels tinted
yellow within the ROI. The ratio of the number of tinted pixels within the ROI to the
total number of pixels in the breast represents the measure of percent density.
The scanned mammogram files of 15 women could not be assessed for density as the
digitized files were unusable and we were unable to obtain the films a second time. We
excluded the mammograms of 4 women who had only one mammogram and were
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
pregnant at the time of mammogram. Therefore, we obtained mammographic density
results for 404 breast cancer patients for whom we had a blood sample.
Menopausal Status/Hormone Therapy Use Status at Mammography
Women were assigned a menopausal status at the time of their mammogram. Women
who had menstruated and not used HT within three months before mammography were
defined as premenopausal. Women who had not menstruated within three months prior to
mammogram, who had a bilateral oophorectomy more than three months prior to
mammogram, who had a simple hysterectomy prior to mammogram with the last
menstrual period more than 6 months prior to surgery, who were 50 years of age or older
and were current or past HT users, or who were 60 years of age or older were defined as
postmenopausal. Otherwise, menopausal status was considered as unknown.
Postmenopausal women were further categorized based on their HT usage as never, past
or recent (within the past 5 years). Type of HT used was further specified as estrogen
therapy (ET) or estrogen progestin therapy (EPT). Women who had used both ET and
EPT were assigned their most recent HT regimen.
Statistical Methods
We classified each allele of the AR gene as S or L using the median number of CAG
repeats across all alleles in the study population, 21 repeats, as the cut-point. Each
woman was categorized by genotype as S/S, S/L or L/L. We used least squares linear
regression methods to model the dependence of percent mammographic density on AR
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
genotypes. All of the models included adjustment for the following set of potential
confounders selected a priori based on their previously reported association with density
and/or the AR genotype: age at mammogram (years); 35-39, 40-44, 45-49, 50-54, 55-59,
60-64; BMI (kg/m^): <22, 22-24.9, 25-29.9, >30; and race: African-American, Caucasian.
In addition to using linear regression, we also modeled percent density as a categorical
outcome using ordinal logistic regression methods. A four-category variable was created
representing quartiles of percent density based on the distribution of percent density
(range, 00.0-85.7%) in the study population. We calculated ORs to estimate the odds of
having a single quartile higher level in percentage of mammographic density associated
with the AR genotype. In addition to analyzing the entire sample, both linear and logistic
regression models were applied to subgroups defined by race, family history of breast
cancer (mother or sister), menopausal status, age at mammogram and HT status to
examine possible effect modification. Data were analyzed using SAS v9 software (SAS
Institute Inc., Cary, North Carolina).
Our sample included 246 Caucasian and 158 African-American breast cancer patients,
with mean age at the time of mammogram 48.4 years (SD = 8.5). The mean percent
mammographic densities of African-Ameri can women (34.1%) and Caucasian women
(34.3%) did not differ (Table 3-1). Mean percent density did vary substantially by age at
mammogram, menopausal status and HT use, BMI, pregnancy history, and age at
menarche, but not by family history of breast cancer.
Table 3-1 Mean percentage of mammographic density by descriptive characteristics (N — 404)
Characteristic N %
Mean %
density P ‘
African-American 158 39.1 34.1
Caucasians 246 60.9 34.3 0.95
Menopausal status/HT use
Premenopausal 191 47.2 35.6
Never used HT 56 13.9 30.6
Past user of EPT 25 6.2 43.2
Past user o f ET 21 5.2 29.4
Recent user of EPT 51 12.6 39.3
Recent user of ET 42 10.4 34.8
Unknown menopause 18 4.5 37.7 0.01
Age at mammogram (years)
35-39 92 22.8 45.1
40-44 84 20.8 41.2
45-49 42 10.4 37.3
50-54 76 18.8 34.0
55-59 70 17.3 27.9
60-64 40 9.9 26.7 <0.0001
BMI (kg/m^)"
<22 114 28.2 49.7
22-24.9 108 26.8 38.2
25-29.9 114 28.2 28.5
>30 68 16.8 20.5 <0.0001
Parity/age at first full tenn pregnancy (years) “
Nulliparous 85 21.0 44.3
l-2/<24 80 19.8 34.0
l-2/>24 97 24.0 40.2
>3/<24 116 28.7 27.5
>3/>24 26 6.4 32.1 <0.0001
Age at menarche (years)'
<12 108 26.7 32.1
12 108 26.7 35.4
13 99 24.5 34.0
>13 89 22.0 40.8 0.04
First-degree family history o f breast cancer ‘
No 326 80.7 35.2
Yes 57 14.1 37.3
Unknown 21 5.2 33.2 0.72
“ Analysis o f covariance
’ ’ Adjusted for age and BMI
Adjusted for age at mammogram
4 4
For all of the subjects, the number of Ji?-CAG repeats ranged from 10 to 35 (median:
21). Stratification by racial group indicated that the AR-CKG repeat distribution for
African-Americans (range, 10-32; median, 20) was statistically significantly lower than
that of Caucasians (range, 10-35; median, 22; P <0.0001).
Mean percent mammographic density was compared across the three AR genotype
categories S/S, S/L, and L/L (Table 3-2). We observed no statistically significant
associations between percent mammographic density and Ji? genotype in all of the
subjects. On average, subjects with the L/L genotype had higher percent mammographic
density (35.6%) than subjects with the S/S or S/L genotype (33.5%), but this difference
was not statistically significant. There were also no statistically significant differences in
mean percent mammographic density within subgroups defined by race, family history or
menopausal status.
We stratified postmenopausal women according to their histories of HT use (never HT,
ever HT, and within the group who had used HT, according to whether their most recent
use was ET or EPT). Among the EPT users, carriers of the L/L genotype had a
statistically significant higher percent mammographic density (41.4%) than carriers of the
S/S genotype (25.7%; P - 0.04). The results were comparable when this analysis was
restricted to recent (within the last year) EPT users {P = 0.03). The results were also
comparable when this analysis was restricted to Caucasians (68% of ever HT users; data
not shown). Adjustment for other covariates, such as family history of breast cancer,
parity, age at first full term pregnancy, and age at menarche did not change the results {P
= 0.01). In ET users, L/L carriers had a similar percent density (32.8%) to that of the S/S
carriers (34.5%, P = 0.83).
Table 3-2 Least-squares mean percentage breast density by androgen receptor genotype
AR Genotype
Characteristic S/S S/L L/L L/L vs S/S P P trend
N 74 185 145
%density“ 34.8 33.0 35.6 0.79 0.50 0.64
N 24 97 125
%density'’ 36.2 33.6 35.3 0.84 0.78 0.86
N 50 88 20
%density*’ 35.4 33.9 39.8 0.40 0.48 0.62
N 35 90 66
%density“ 39.3 36.8 41.1 0.71 0.44 0.55
N 34 85 76
%density“ 34.0 33.0 34.2 0.95 0.92 0.89
Ever HT
N 22 65 52
%density° 31.4 33.5 37.3 0.29 0.50 0.25
Never HT
N 12 20 24
%density“ 38.2 33.6 26.0 0.11 0.27 0.11
ET users
32.8 0.83 0.92 0.86
EPT users
N 10 38 27
%density“ 25.7 35.7 41.4 0.04 0.11 0.04
Current EPT user‘ s
N 9 32 22
%density'‘ 24.6 37.9 41.8 0.03 0.10 0.05
“ Adjusted for race: African-American, Caucasian; age at mammogram (years): 35-39,40-44, 45-49, 50-54,
55-59, 60-54, 55-59, 60-64; BMI (kg/m^): <22, 22-24.9, 25-29.9, >30
’ ’ Adjusted for age at mammogram and BMI
Current EPT=used E P T within 1 year prior
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
In assessing the association between the AR genotype and a higher quartile in percentage
mammographic density for each subgroup of HT usage (Table 3-3), we found that among
women with a history of HT use, there was greater odds of having a single quartile
increase in percent density with each L allele; however, this was not statistically
significant in the multivariate model (adjusted OR = 1.55; 95% Cl: 0.93-2.60, p trend =
0.10). In never HT users, there was a statistically significant protective effect of the L
allele on increasing density; however, this was not observed in the generalized linear
model or in the model adjusting for family history of hreast cancer, parity, age at first full
term pregnancy, and age at menarche. After stratifying by type of HT use we found a
statistically significant higher odds for each L allele among EPT users (adjusted OR =
2.59; 95% Cl, 1.20-5.60; P trend = 0.02), but not among ET users (adjusted OR = 1.02;
95% Cl, 0.46-2.25; P trend = 0.97).
A formal test of interaction between EPT use and the AR genotype was not statistically
significant on a multiplicative scale { P - 0.10).
Because no standard exists for classifying ^i?-CAG repeat lengths into S and L alleles,
we categorized the AR genotype into ten genotypes using the cut-points of 19, 21, and 24
to generate the alleles: very short (VS), medium short (MS), medium long (ML), and
very long (VL). Sample sizes were small in the ten strata; however, density was lowest
in the VS/VS group of EPT users (13.0%) and increased as number of repeats increased
up to the ML/ML group (43.2%) and then remained high in the ML/VL (39.9%) and
VL/VL (56.4%) genotypes suggesting that the findings were similar regardless of
threshold value.
Table 3-3 Association for a single quartile increase in percent mammographic density per Tong’ androgen
receptor allele
% Density” Unadjusted Adjusted*’
AN genotype <17 18-37 37-54 55+ OR 95% Cl P trend OR 95% Cl P trend
SS 8 7 4 3
SL 15 23 17 10
LL 10 15 14 13 1.54 1.00-2.39 0.05 1.55 0.93-2.60 0.10
Never HT
SS 3 6 0 3
SL 11 3 5 1
LL 10 8 2 4 0.85 0.46-1.59 0.61 0.38 0.15-0.95 0.04
ET user
SS 5 3 2 2
SL 8 10 7 2
LL 7 7 7 4 1.30 0.71-2.40 0.39 1.02 0.46-2.25 0.97
EPT user
SS 3 4 2 1
SL 7 13 10 8
LL 3 8 7 9 1.85 0.98-3.46 0.06 2.59 1.20-5.60 0.02
Past EPT user'
SS 0 0 1 0
SL 0 3 1 2
LL 0 3 0 2 0.80 0.14-4.50 0.80
Current EPT user'
SS 3 4 1 1
SL 7 10 9 6
LL 3 5 7 7 2.05 1.03-4.06 0.04
“ Frequency o f subjects in each subgroup defined by AJ^ genotype and percent mammographic density
Adjusted for race: African-American, Caucasian; age at mammogram (years): 35-39,40-44,45-49, 50-54,
55-59, 60-64; BMI (kg/m^): <22, 22-24.9, 25-29.9, >30
Current EPT=used EPT within 1 year prior to mammogram
4 9
In this study the number of CAG repeats in the AR gene was not associated with
mammographic density in all women combined or in subgroups based on race,
menopausal status or family history of breast cancer. The long CAG repeat (>21 repeats)
was strongly associated with increased percent mammographic density in
postmenopausal women who were EPT users.
An increased breast cancer risk has been observed among women with the long CAG
allele in three previous studies (Giguere and others 2001; Haiman and others 2002b;
Rebbeck and others 1999), whereas only a weak, non-significant association was
observed in four studies (Dunning and others 1999; Kadouri and others 2001; Spurdle
and others 1999; Suter and others 2003). The three positive studies were conducted
among BRCAl mutation carriers (Rebbeck and others 1999), postmenopausal women
(Giguere and others 2001), and women with a first-degree family history of breast cancer
(Haiman and others 2002b). Our observed association in EPT users is compatible with
the study that found an association in postmenopausal women (Giguere and others 2001).
Mammographic density may have a strong genetic component. A study of twins reported
significant higher correlation in mammographic density between monozygotic twins than
dizygotic twins (Boyd and others 2002). No data were presented on whether this
association was stronger in twins who were both EPT users. Genes involved in estrogen
metabolism have not shown to be associated with mammographic density in this study
population (Haiman and others 2002a). Using data from the Nurses’ Health Study,
Haiman et al. (Haiman and others 2003) observed no association between
mammograpbic density and the AR genotype. However, no analyses stratified by type of
HT were reported in either of these studies.
Prior epidemiological studies have found that EPT is associated with higher percent
mammographic density (Greendale and others 1999; Greendale and others 2003; Laya
and others 1995; Lundstrom and others 1999; Persson and others 1997; Stomper and
others 1990). In the placebo controlled Postmenopausal Estrogen and Progesterone
Intervention trial, women in the EPT treatment group underwent on average a five
percent increase in mammographic density (Greendale and others 2003). However, the
factors that determine the variation in the percent density response in the EPT arm of this
trial are not known. Our results suggest that AR could be one such factor.
Ligands that bind AR with high affinity include not only T and DHT, but also progestins
(Poulin and others 1989). Human clinical data suggest that the AR may mediate the
antiproliferative effects of high-dose synthetic progestins such as medroxyprogesterone
acetate on breast cancer cells. In one study, the response rate to medroxyprogesterone
acetate was significantly associated with the presence of AR {P <0.001) with a shorter
progression free interval in subjects with higher AR content (Birrell and others 1995b).
No data exist on how the much less potent EPT, known to stimulate breast cell
proliferation (Hofseth and others 1999), affects AR.
Our analysis has some limitations that must be considered in interpreting results. First,
we included a subset of breast cancer patients from the Women’s CARE Study who bad
provided both a mammogram and a blood sample. This introduces the possibility of
selection bias. However for this to occur and explain the EPT findings, EPT using
patients with long AR-CAG repeats and high mammographic density would have bad to
be more likely to have participated. This seems unlikely.
Second, we assessed the relationship between germ-line genotypes and mammograpbic
density of breast cancer patients only. Although we used mammograms of the
contralateral (unaffected) breast obtained before or at the time of diagnosis, these may not
be comparable to mammograms from women without breast cancer. If there are other,
unknown cofactors that are independent risk factors for breast cancer, and that also
interact with the Hi?-CAG repeat length to increase mammographic density in EPT users,
then our estimates of the impact of the AR-CAG repeat on mammographic density may
not represent that of healthy women.
Third, a proportion of the heterozygotes for S/L may be misclassified. The AR gene is
located on the X chromosome, and, -10% of breast cancer cases between the ages of 27
and 65 show preferential X-inactivation (Kristiansen and others 2002). Because only the
S or the L allele is expressed in subjects that have one X chromosome inactivated, the
heterozygote subjects with preferential X-inactivation would be more accurately grouped
as a S/S or the L/L genotype. The effect of such misclassification would be a bias
towards the null, and this may explain why some of our results for the S/L genotype are
weaker than those of the L/L genotype alone.
A final limitation in the analysis of the AR genotype is the method of dichotomozing the
allele length into S versus L. The number of CAG repeats in the AR alleles ranged from
10 to 35 in our study. It has been observed that with increasing length, AR activity
decreases (Chamberlain and others 1994). However, to our knowledge no threshold
number of repeats has been reported. Our observation of a shorter repeat distribution in
African-Americans is concordant with observations reported previously in studies of this
polymorphism and prostate cancer (Hardy and others 1996; Stanford and others 1997).
In the absence of prior information on how to dichotomize allele length, we selected the
median of the distribution of the AR repeat lengths as our cut-point. However, our
sensitivity analysis using three successive cut-points supported the relevance of the 21
repeat cut-point.
Our results may help explain the mechanism by which EPT use increases breast cancer
risk (Rossouw and others 2002). Our results suggest that AR genotype modifies
hormone-induced cell proliferation as reflected in percent mammographic density.
Additional results are needed to determine whether knowledge of AR genotype will be
helpful to clinicians in advising patients when making decisions on whether to use EPT.
DATA ANALYSIS: Androgen Receptor CAG Polymorphism and Breast Cancer
A recent meta-analysis of six prospective studies confirms the association between
increased serum T levels and breast cancer risk in postmenopausal women (2002). In
addition, in both pre (Secreto and others 1983a; Secreto and others 1984; Secreto and
others 1989) and postmenopausal women (Adlercreutz and others 1989; Hill and others
1985; Lipworth and others 1996; Secreto and others 1991; Secreto and others 1983b), T
levels have been observed to be significantly higher in cases post-diagnosis compared to
T binds to the AR in target tissue to mediate its effects. The AR is expressed in the
majority of breast cancers (Bayer-Gamer and Smoller 2000; Bryan and others 1984; Brys
and others 2002; Hall and others 1996; Kuenen-Boumeester and others 1992; Lea and
others 1989; Soreide and others 1992). Several studies have been conducted to examine
the effects of androgens on the growth of AR positive breast cancer cell lines. These
studies have reported both inhibitory (Ortmann and others 2002; Poulin and others 1988)
and stimulatory (Hackenberg and others 1988; Marugo and others 1992) effects. These
divergent effects have been observed to be specific to the cell line under study (Birrell
and others 1995a).
The AR is encoded by a single 90 kb gene on the X chromosome (Xql l-ql2) that
specifies an 11 kb mRNA transcript composed of 8 exons (Brown and others 1989;
Chang and others 1988; Lubahn and others 1989; Lubahn and others 1988; Tilley and
others 1989). The entire N-terminal transactivation domain is encoded by the first exon.
Within the first exon lies a polymorphic CAG repeat that encodes a polyglutamine tract
of variable length. The normal size range of these repeats is between 6 and 39 repeats
(Edwards and others 1992; Giovannucci and others 1997). Several studies have
observed an association between increasing AR-CAG repeat length and a linear decrease
in AR transactivation activity (Chamberlain and others 1994; Irvine and others 2000;
Kazemi-Esfarjani and others 1995; Tut and others 1997). Consistent with this, male
carriers of the short AR-CAG repeat length are at increased risk of prostate cancer
(Ekman and others 1999; Giovannucci and others 1997; Hakimi and others 1997; Hsing
and others 2000; Ingles and others 1997; Irvine and others 1995; Stanford and others
The CAG repeat polymorphism in AR has been examined in relation to breast cancer risk
showing that longer repeat lengths are associated with an increase in risk in BRCAl
mutation carriers (Rebbeck and others 1999), women with a first degree family history of
breast cancer (Haiman and others 2002b), postmenopausal women (Giguere and others
2001) and women of the Philippines (Liede and others 2003). One study has examined
the association between this repeat length and breast cancer prognosis showing that the
short repeat length and, therefore, more active AR is associated with higher breast cancer
tumor grade (Yu and others 2000) in a group of premenopausal and postmenopausal
breast cancer patients.
We have used the ongoing Mapping Interactive Cancer Susceptibility Loci Study, a
multi-institutional study conducted as part of the Eastern Cooperative Oncology Group
(ECOG) to assess the relationship between the AR-CAG repeat length polymorphism and
breast cancer tumor grade. The parent study recruits ECOG patients diagnosed with
breast, prostate, lung, or colon cancer who have a living sibling diagnosed with the same
cancer; our study utilizes the index breast cancer case patient from each sibling pair.
Materials and Methods
Subject Identification
We included 415 breast cancer patients participating in the multi-institutional Mapping
Interactive Cancer Susceptibility Loci Study that is coordinated at the City of Hope
Beckman Research Institute. All subjects were selected for this study if they were
diagnosed with invasive breast cancer and had a living sibling with breast cancer at study
entry. Only the index patient of each sibling pair was included in this analysis. All
women included in this analysis were Caucasian and between the ages of 35 and 80 years
at diagnosis.
Data on breast cancer prognostic indicators were obtained from a combination of sources.
For 189 cases, data were collected from the hospital tumor registry where we accessioned
the subject. If the subject was not initially diagnosed at the recruiting hospital, data were
extracted from the pathology report establishing the patient’s breast cancer diagnosis (N
= 226) by a single investigator (Elizabeth Lillie) who was blinded to the AR genotype.
This diagnosing pathology report was obtained for all index cases at the time of entry
onto the study to confirm the diagnosis. All data, hoth from the tumor registry and from
pathology report extraction, were coded following the methods outlined by the American
Joint Committee of Cancer (Greene and others 2002). A subset of tumor registry reports
(10% from each hospital) was randomly selected and compared with the pathology
reports to confirm consistency across data sources.
Stage was classified as I, IIA, IIB, IIIA, IllB, IV using the TNM classification system.
Tumor size is defined as the size of invasive tumor (cm) at its maximum dimension.
Histologic grade of tumor was coded as grade 1 (well differentiated), grade 2 (moderately
differentiated), grade 3 (poorly differentiated) and grade 4 (undifferentiated). Nodal
status was coded as either positive or negative for metastasis for patients who had nodes
resected. Finally, ER and PR status were recorded as positive or negative when results
were provided on the pathology report.
AR Genotyping
All subjects consented to give 10 ml of blood for DNA analysis. Blood samples were
drawn at the recruiting institution and were sent via overnight mail to the City of Hope
Beckman Research Institute. Using a standard protocol genomic DNA was extracted
from peripheral blood. The exon 1 CAG repeat of the AR was PGR amplified using a
flourescently labeled forward primer (5 ’-TCCAGAATCTGTTCCAGAGCGTGC-3 ’) and
reverse primer (5’-GCTGTGAAGGTTGCTGTTCCTCAT-3’)• All microsatellite
genotyping was performed using an ABI 377 with Genescan and Genotyper software
(PE/Applied Biosystems, Foster City, California). The number of CAG repeats was
calculated based on direct sequencing results of 8 prostate cancer cases with different
known PCR product lengths.
Statistical Analyses
All outcome variables were categorized using methods comparable to those used in the
previous study of these associations (Yu and others 2000). Grades 1-2 were compared to
grade 3. Stage was grouped into three categories I, IIA and IIB combined, and IIIA, IIIB,
and IV combined. ER, PR and nodal status were all categorized as positive or negative.
The tumor sizes of our subjects were, on average smaller (mean, 2.0 cm) than that
reported by Yu et al. (mean, 2.7 cm)(Yu and others 2000); therefore, tumor size was
dichotomized using the median (1.6 cm) of the distribution in our study population as
well as the cut-point of 2.5 cm used by Yu et al. (Yu and others 2000).
The initial analyses were designed to replicate the analyses of the association between
AR-CAG repeat length and breast cancer tumor grade reported by Yu et al. (Yu and
others 2000). The AR-CAG repeat was analyzed as a continuous variable coded in three
ways; the number of CAG repeats in the short allele, the sum of the CAG repeats in both
alleles, and the absolute value of the difference in the number of CAG repeats between
alleles. CAG repeat lengths were compared among the various prognostic indicator
variables using analysis of variance.
The length of the short allele, and the sum of the allele lengths were also dichotomized
using the cut-points of 21 and 44 repeats respectively from Yu et al. (Yu and others
2000). These cut-points were the same as the means in our study population.
Associations between the categorical CAG variables and the prognostic indicator
variables were compared using the chi-square test. For dichotomous outcomes, the odds
ratio and corresponding 95% confidence limits were estimated.
Unlike the previous study, we also assessed the length of the long allele as a continuous
variable and as a categorical variable dichotomized at the mean of the CAG repeat length
of the longer of the two alleles (24 repeats) to further evaluate the association between
AR genotype and breast cancer prognosis. In addition, due to the larger study setting, all
analyses were stratified on age of diagnosis categorized into tertiles <45, 45-55, >55.
Age categories were selected a priori to assess groups that were premenopausal and
postmenopausal with reasonable certainty. Data were analyzed using SAS v9 software
(SAS Institute Inc., Cary, North Carolina).
Of the 415 breast cancer patients who were genotyped for the AR-CKG repeat, 316 had
complete data on TNM stage, 339 on tumor size, 315 on histologic grade, 343 on nodal
status, 234 on ER status and 223 on PR status. Table 4-1 shows the distributions of
patients in each prognostic indicator category stratified by source of data. Patients with
data available from the tumor registry had more tumors of less than 1.6 cm than patients
whose data were extracted by pathology reports {P = 0.01).
In all subjects, we observed no statistically significant association between advanced
stage, increased tumor size, advanced histologic grade, node positive, ER positive, or PR
positive status and the length of the short AR allele, the length of the long allele, the sum
of the AR alleles, or the difference in length between the AR alleles (Table 4-2). We
observed an association of borderline statistical significance {P = 0.06) between the
difference in length of the two alleles and tumor size when using the 2.5 cm cut-point,
showing a larger difference between the repeat lengths of smaller sized tumors.
However, this was not observed when the tumor size was categorized using the 1.6 cm
cut-point. Additional analyses were conducted adjusting for the source of data, i.e. tumor
registry vs. pathology report, but this had no effect on the AR associations.
Modeling the AR genotype categorically did not alter these findings (Table 4-3). We
found no difference in the number of subjects with a higher histologic grade when using
the cutpoint of 21 repeats in at least one allele (50% for <21 repeats vs. 51% for 21+
repeats; p=0.87) or the cutpoint of 44 for the sum oiAR repeats (53% for <44 repeats
versus 51% for 44+ repeats; P = 0.67).
Table 4-1 Distribution of cases in each prognostic indicator category stratified by source of data
All (%)
(N = 415)
Registry (%)
(N = 189)
Path (%)
(N = 226)
155 (37)
144 (35)
99 (24)
97 (51)
76 (40)
58 (26)
68 (30)
Tumor size (cm)
246 (59)
93 (22)
137 (73)
38 (20)
109 (48)
55 (24)
62 (27)
165 (40)
174 (42)
95 (50)
80 (42)
70 (31)
94 (42)
62 (27)
186 (45)
129 (31)
90 (48)
60 (32)
39 (21)
96 (42)
69 (31)
61 (27)
Nodal status
209 (50)
134 (32)
66 (35)
93 (41)
68 (30)
65 (29)
181 (44)
37 (20)
37 (20)
67 (30)
144 (64)
145 (35)
89 (47)
54 (29)
46 (24)
56 (25)
2 4 (11)
146 (65)
134 (32)
205 (49)
65 (34)
92 (49)
69 (31)
“ Cutpoint used by Yu, 2000
Median cutpoint
Table 4-2 Mean number of androgen receptor CAG repeats by breast cancer prognostic indicator
Feature Sample size
Mean CAG
repeats in
allele 1 (SD)
Mean CAG
repeats in
allele 2 (SD)
Mean CAG
repeats in both
alleles (SD)
Mean CAG
between alleles
20.40 (2.69)
20.65 (2.37)
23.60 (2.72)
44.00 (4.67)
43.90 (4.23)
2.90 (2.42)
3.59 (2.67)
Tumor size (cm)
20.41 (2.50)
20.61 (2.36)
23.62 (2.60)
44.03 (4.38)
43.87 (4.47)
3.22 (2.60)
2.65 (2.27)
< 1 .6 '’
20.53 (2.55)
20.40 (2.38)
23.58 (2.65)
23.47 (2.57)
43.86 (4.26)
3.05 (2.51)
3.07 (2.55)
20.56 (2.62)
20.37 (2.48)
23.59 (2.90)
23.64 (2.30)
44.01 (3.77)
3.03 (2.54)
3.26 (2.94)
Nodal status
20.44 (2.55)
20.49 (2.51)
23.56 (2.60)
23.73 (2.62)
44.00 (4.34)
44.22 (4.44)
3.11 (2.77)
3.24 (2.58)
20.40 (2.58)
20.42 (2.19)
23.58 (2.65)
43.85 (3.70)
3.00 (2.22)
20.39 (2.53)
20.44 (2.44)
23.52 (2.61)
23.59 (2.43)
43.90 (4.49)
44.03 (4.27)
3.13 (2.50)
‘ Outpoint used by Yu, 2000
’ Median cutpoint
Table 4-3 Association of androgen receptor CAG with breast cancer prognostic indicators
CAG repeats in allele 1 CAG repeats in allele 2 CAG repeats in both alleles
Feature Sample size <21 >21 >24 <24 <44 >44
Tumor size (cm)'
OR (95%CI)
0.79 (0.49-1.27)
0.65 (0.38-1.10)
OR(95% CI)
OR(95% CI)
0.91 (0.58-1.42)
Nodal status
OR(95% CI)
0.93 (0.60-1.43)
OR(95% CI)
0.71 (0.36-1.39)
1.01 (0.54-1.87)
OR(95% CI)
1.34 (0.77-2.34)
1.33 (0.76-2.30)
* Cutpoint used by Yu, 2000
’ ’ Median cutpoint
After stratification by age at diagnosis, no statistically significant results were observed
in either the <45 years or the 45-55 years age category. However, in women diagnosed
after the age of 55 years, higher histologic grade was associated with a smaller length of
the shorter AR allele (P = 0.04) (Table 4-4). Analyzed categorically, subjects with grade
3 tumor were more likely than those with grades 1 or 2 tumor to have a short allele (61%
for <21 repeats vs. 47% for 21+ repeats; P = 0.08) (Table 4-5). Among the older women,
analysis of the long allele also revealed an association between carrying a long AR allele
of >24 CAG repeats and having node positive breast cancer (OR = 2.26; P = 0.02).
Table 4-4 Mean number of androgen receptor CAG repeats by breast cancer prognostic indicator in cases
age > 55 at diagnosis
Number of
Mean CAG
repeats in
allele 1 (SD)
Mean CAG
repeats in
allele 2 (SD)
Mean CAG
Mean CAG difference
repeats in both between alleles
alleles (SD) (SD)
1 82 20.18(2.61) 23.35 (2.56) 43.54 (4.48) 3.17(2.59)
2 66 20.62 (2.42) 23.02 (2.70) 43.64(4.66) 2.39(2.14)
3-4 8 20.00 (2.27) 24.00 (2.83) 44.00 (4.24) 4.00 (2.88)
P 0.53 0.52 0.96 0.07
Tumor size (cm)
<2.5“ 124 20.25 (2.37) 23.42 (2.50) 43.67 (4.18) 3.17(2.49)
>2.5 45 20.64 (2.56) 23.04 (3.11) 43.69 (5.16) 2.40 (2.42)
P 0.35 0.42 0.98 0.08
<1.6*’ 83 20.34 (2.26) 23.25 (2.52) 43.59(4.21) 2.92 (2.29)
>1.6 86 20.37 (2.57) 23.38 (2.82) 43.76 (4.70) 3.01 (2.67)
P 0.93 0.75 0.81 0.80
1-2 100 20.78 (2.51) 23.64 (3.03) 44.42 (5.01) 2.86 (2.43)
3 62 19.90 (2.70) 23.35 (2.52) 43.26 (4.06) 3.45 (3.28)
P 0.04 0.54 0.13 0.19
Nodal status
Negative 109 20.22 (2.56) 23.29 (2.58) 43.51 (4.27) 3.07 (2.87)
Positive 61 20.62 (2.62) 23.57 (2.89) 44.20 (4.93) 2.95 (2.47)
P 0.33 0.52 0.35 0.78
Positive 93 20.40 (2.38) 23.43 (2.52) 43.83(4.28) 3.20 (2.46)
Negative 20 20.00 (2.08) 23.20 (2.19) 43.20(3.49) 3.03 (2.40)
P 0.49 0.71 0.54 0.78
Positive 64 20.31 (2.36) 23.19(2.49) 43.50 (4.23) 2.88 (2.37)
Negative 46 20.33 (2.36) 23.65 (2.41) 43.98(4.16) 3.33 (2.34)
P 0.98 0.33 0.56 0.32
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Table 4-5 Association of androgen receptor CAG with breast cancer prognostic indicators in cases age > 55
at diagnosis
Sample size CAG repeats in allele 1 CAG repeats in allele 2 CAG repeats in both alleles
Feature <21 >21 >24 <2A <44 >44
Tumor size (cm)
OR (95% Cl)
OR (95% Cl)
O R(95% CI)
0.75 (0.38-1.48)
Nodal status
OR (95%CI)
O R(95% CI)
1.13 (0.41-3.12)
0.90 (0.34-2.36)
OR(95% CI)
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Our study shows no convincmg evidence of an overall association between the AR
genotype and breast cancer histologic grade; thus, we were not able to replicate the
findings by Yu et al. (Yu and others 2000) despite looking at the outcome in several
ways. However, we did observe an association among cases with an older age at
diagnosis. Yu et al. observed the association in a population of consecutive cases
between the ages of 25 and 93 years that had undergone either radical or modified
mastectomy at the Department of Gynecologic Oncology, University of Turin, Italy.
While the age range was similar to ours, our observed association was limited to cases
diagnosed after the age of 55. The analysis of this association by Yu et al. was smaller
(N = 92) and, therefore, analyses stratified by age were not reported.
We are the first to report an association between the long AR-CAG repeat and node
positive breast cancer. The long CAG repeat of the AR gene has been observed to be
associated with increased breast cancer risk in several studies (Giguere and others 2001;
Haiman and others 2002b; Liede and others 2003; Rebbeck and others 1999). Our
observation of an association between carrying a long allele and having node positive
breast cancer in women diagnosed after age 55 years is compatible with these results.
However, this was only observed when the long AR allele was modeled categorically and
may, given the large number of tests performed, be simply a type 1 error.
The age cut-points used in our study were determined a priori to assess groups that are
postmenopausal and premenopausal with reasonable certainty. Menopausal status was
not ascertained at the time of patient recruitment to the study. The limitation of our
observed significant associations to postmenopausal women is concordant with studies
showing that higher androgen levels are associated with breast cancer risk among
postmenopausal women (2002). It is also compatible with results suggesting that the
association between the AR genotype and breast cancer risk is limited to postmenopausal
women (Giguere and others 2001).
The AR is expressed in a large percentage of breast cancers (Bayer-Gamer and Smoller
2000; Bryan and others 1984; Brys and others 2002; Hall and others 1996; Kuenen-
Boumeester and others 1992; Lea and others 1989; Soreide and others 1992). In vitro
studies show that androgens can inhibit (Ortmann and others 2002; Poulin and others
1988) or stimulate (Hackenberg and others 1988; Marugo and others 1992) breast cancer
cell prohferation. Recent evidence of the role of androgen-regulated genes in breast
cancer prognosis supports our observations of a potential role of AR activity as measured
by the AR genotype. Kallikrein 15 expression is mainly under androgen regulation
through the AR and has been shown to be an independent marker of favorable prognosis
in breast cancer (Yousef and others 2002b). Kallikrein 14 is also under androgen
regulation through the AR and has been shown to be an independent marker of
unfavorable prognosis in breast cancer (Yousef and others 2002a).
The majority of cases in this study were initially diagnosed at major cancer centers that
are part of ECOG. The patient populations that these institutions serve have been
reported to have more severe prognosis. Further, the cases included in this study were
not all incident cases. Cases were selected because they had a sibling that was still alive
with breast cancer on the date of study entry. Therefore, the generalizability of these
results is limited and subject to prevalence bias. The magnitude and direction of this bias
is unclear.
Our study was also limited due to the inability to evaluate the role of possible
confounders in our results. The parent study was designed as a genetic linkage study and,
therefore, did not collect complete data on established breast cancer risk factors. The
observations reported by Yu et al. (Yu and others 2000) did not include adjustment for
any covariates.
There are limitations in the measure of the AR genotype that may subject this genetic
variable to measurement error. A recent study reports that 13% of young (27-45 years
old) breast cancer patients show preferential activation of one of the AR alleles as
measured by genotyping of peripheral blood DNA, but there is no preference for the
allele with the longer or shorter CAG repeat (Kristiansen and others 2002). In our study,
87% of cases were AR heterozygotes. We also analyzed the difference between the AR
alleles as a method to address this issue and observed no significant associations between
having a larger difference in the two alleles with either higher grade tumors or node
positive status. Therefore, assuming that preferential activation is random by outcome
category, the misclassification would be non-differential. The associations observed
may, therefore, be underestimated.
The length of the AR-CAG repeat correlates with AR activity (Chamberlain and others
1994; Irvine and others 2000; Kazemi-Esfarjani and others 1995; Tut and others 1997).
Activity decreases with increasing number of repeats. Therefore, the observed
association between the short CAG repeat and increased tumor grade, can be inferred as
showing that increased AR activity results in a poorer prognosis. However, the
association between the long CAG repeat and node positive cancer indicates the opposite.
Further study of the association between the AR genotype and the actual percent of AR
expression in the tumor tissue is warranted to further understand the role of the AR and
breast cancer prognosis. In addition, genotyping methods that consider which AR allele
is active in the tumor tissue are necessary. A broader imderstanding of this relationship
may prove useful to clinicians for assessing breast cancer prognosis and identifying
subgroups of patients that may respond better to certain therapeutic regimens.
