Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
DNA methylation groups determined by GATA5 gene methylation level are correlated with tumor subtype, sex, smoking status, and body mass index in esophageal and gastric adenocarcinoma
(USC Thesis Other)
DNA methylation groups determined by GATA5 gene methylation level are correlated with tumor subtype, sex, smoking status, and body mass index in esophageal and gastric adenocarcinoma
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DNA METHYLATION GROUPS DETERMINED BY GATA5 GENE
METHYLATION LEVEL ARE CORRELATED WITH TUMOR SUBTYPE, SEX,
SMOKING STATUS, AND BODY MASS INDEX IN ESOPHAGEAL AND
GASTRIC ADENOCARCINOMA
by
Xinhui Wang
______________________________________________________________________
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
December 2007
Copyright 2007 Xinhui Wang
Acknowledgements
I would like to express my gratitude to all those who helped me with this thesis. I
gratefully acknowledge the support of my advisor, Dr. Kim Siegmund, who offered
valuable and instructive advice during the entire research and writing process. This thesis
was further improved with the help of Dr. Anna Wu and Dr. Peter Laird. As my
committee members, they gave me numerous beneficial suggestions. I would like to
thank Dr. Mark Krailo and Dr. Kiros Berhane for our helpful discussions about survival
analysis. Mary Ann Murphy, Professor of English, helped debug the grammar errors and
improve the fluency of this thesis. I appreciate her efforts. I would like to give special
thanks to my husband, Rui, whose patient love enabled me to complete this work.
ii
Table of Contents
Acknowledgements ii
List of Tables iv
Abbreviations v
Abstract vi
1. Introduction 1
1.1. Esophageal and stomach cancers 1
1.2. DNA methylation in cancer 2
1.3. Epidemiological risk factors for esophageal and gastric adenocarcinomas 3
1.4. Using cluster analysis to summarize DNA methylation profiles 4
2. Materials and Methods 5
2.1. DNA Methylation Data Set 5
2.2. Epidemiological and Clinical Data 6
2.3. Cluster Analysis 8
2.4. Statistical analysis 9
3. Results 12
4. Discussion 23
References 27
iii
List of Tables
Table 1. Characteristics of DNA Methylation Groups 15
Table 2. Distribution of Tumor Characteristics Between DNA Methylation Groups 17
Table 3. Distribution of Disease Characteristics Between DNA Methylation Groups 18
Table 4. The Association Between DNA Methylation Group and the Variables 19
Table 5. Survival Analysis 21
iv
Abbreviations
BMI: Body Mass Index
GEE: Generalized Estimating Equation
HR: Hazard Ratio
PMR: Percent of Methylated Reference
SD: Standard Deviation
95% CI: 95% Confidence Interval
v
Abstract
Our primary objective is to analyze the association between DNA methylation profiles,
epidemiological risk factors, and clinical outcomes for gastroesophageal
adenocarcinoma. We applied cluster analysis to classify tumor samples using DNA
methylation profiles. We then characterized the resulting DNA methylation subgroups
using logistic and Cox regression, and robust variance estimates to allow for multiple
tissue samples from individual patients. Two major sample groups were generated based
on DNA methylation profiles. Compared to Group 2, after adjusting for sex and tumor
subtypes, subjects in Group 1 had higher GATA5 DNA methylation values, higher BMIs
at diagnosis, and were more likely to have smoked. However, subjects in Group 1
showed better survival than those in Group 2, after adjusting for tumor differentiation.
Cluster analysis is effective for classifying samples based on DNA methylation profiles.
Smoking and BMI are two significant factors associated with DNA methylation group,
which itself is associated with survival.
Key Words: Gastroesophageal Adenocarcinoma, DNA methylation Group, GATA5,
Tumor Site, Sex, BMI
vi
1. Introduction
1.1. Esophageal and stomach cancers
Esophageal and stomach cancers are among the most common cancers and leading
causes of mortality in the world. Although approximately 80% of the esophageal
cancers worldwide are squamous cell carcinoma (SCC), the incidence of
adenocarcinoma of the esophagus (ACE) has been rapidly increasing, especially among
white men (Parkin et al. 1993; Devesa et al.1998; Blot et al, 1999). As a result, ACE is
now more prevalent than SCC among white men in western countries (Guo et al. 2006;
Holmes et al. 2007). The descriptive epidemiology and risk factor patterns differ for
these two cell types of esophageal cancer (Holmes et al. 2007). However, the reasons
for the increasing incidence of ACE are not known (Blot et al, 1999; Siewert et al. 2007;
Holmes et al. 2007).
Depending on the location of the tumor, stomach cancer (gastric cancer) can be divided
into two subtypes: gastric cardia (the proximal part of the stomach in the
gastroesophageal junction) and distal gastric adenocarcinoma (the lower part of the
stomach). Although the incidence of distal gastric adenocarcinoma has decreased
dramatically during the past five decades, the incidence of gastric cardia has remained
relatively stable or increased somewhat, especially among men in western countries
(Blot et al. 1991; Brown et al. 2002). It appears that cancers that arise in different
1
locations of the stomach (i.e., proximal vs. distal gastric cancers) display different
pathology and have different epidemiological risk factors associated with them (Terry et
al. 2002; Crew et al. 2006).
Esophageal and stomach cancers are both aggressive diseases that may have different
molecular profiles, including both genetic and epigenetic, of disease progression
(Lagarde et al. 2007). Better understanding of the underlying molecular bases and their
associations with epidemiological risk factors may benefit us with early diagnosis and
improved prognosis.
1.2. DNA methylation in cancer
Genetic alterations are very common in tumors. However, many diseases without
changes in gene sequence also show heritability throughout the tumor cell propagation
(Holliday 1987). Previous studies have shown that abnormal DNA methylation,
including hypermethylation and hypomethylation, of CpG islands located in the
promoter regions of certain genes is associated with tumor progression (Baylin et al.
1998; Jones et al.1999; Frigola et al. 2005; Estécio et al. 2007). The abnormal DNA
methylation can inactivate these critical housekeeping genes, such as tumor suppressor
genes and DNA replication mismatch repair genes, through gene transcriptional
silencing in many cancers, including in esophageal and gastric adenocarcinoma (Wong
2
et al. 1997; Klump et al. 1998; Jones et al.1999; Eads et al. 2000; Lee et al. 2005;
Vasavi et al. 2006). Therefore, alterations in DNA methylation in tissue may be seen as
initial events induced by endogenous or exogenous factors and play an important role in
tumor progression.
1.3. Epidemiological risk factors for esophageal and gastric adenocarcinomas
Many differences in the epidemiology, underlying pathogenesis, and potential risk
factors have been found for esophageal and gastric adenocarcinomas (Holmes et al.
2007; Siewert et al. 2007). For example, the SCC type of esophagus cancer is most
prevalent in black males, with tobacco and alcohol consumption as the major risk
factors (Holmes et al. 2007; Siewert et al. 2007). The incidences of esophageal and
gastric cardia adenocarcinomas, however, dominate in white men and have increased
more sharply over the past several decades. History of gastroesophageal reflux disease,
high body mass index (BMI), and certain dietary factors are shown to be consistently
associated with the risk of esophageal and gastric cardia adenocarcinomas (Holmes et al.
2007; Siewert et al. 2007). The incidence of distal gastric adenocarcinoma, on the other
hand, is high in African-American, Hispanics and certain Asian groups. History of
helicobacter pylori (H pylori) infection, smoking, and certain dietary factors are
important risk factors associated with this cancer subtype (Crew et al. 2006).
3
In order to gain a better understanding of the molecular mechanisms and tumor
prognosis for esophageal and stomach adenocarcinomas, we analyzed the relationships
between a number of epidemiological risk factors and the epigenetic changes in gene
DNA methylation. Furthermore, we investigated their associations with certain clinical
outcomes.
1.4. Using cluster analysis to summarize DNA methylation profiles
A typical analysis of multiple genes and multiple exposures may start by looking at the
quantitative associations of methylation for each gene with each exposure. However,
studying DNA methylation data for a single gene one at a time is considered
inappropriate when it comes to studying the statistical significance of multiple DNA
methylation markers in the associations with esophageal and gastric adenocarcinoma
cancers. DNA methylation data provide important information for investigating tumor
heterogeneity, and this information can be used for identifying clusters of samples. We
applied cluster analysis to our tumor samples in order to summarize the variation in
DNA methylation profiles across samples and identified two main clusters. Then, we
investigated the associations between epidemiological factors, certain clinical outcomes
and the DNA methylation subgroups that we generated.
4
2. Materials and Methods
2.1. DNA Methylation Data Set
We used a subset of tumor samples from a population-based, multi-ethnic esophageal
and gastric adenocarcinoma study that was conducted in Los Angeles County between
1992-1997 (Wu et al., 2001). This study collected questionnaire data from 942 patients
with histologically-confirmed esophageal or gastric adenocarcinoma cases (222
esophageal adenocarcinomas, 277 gastric cardia, and 443 distal gastric cancers) (Wu et
al., 2001). A total of 440 tissue samples from 308 patients (32.7% of 942 patients) were
included in the DNA methylation analysis.
A total of 79 genes were selected for study using MethyLight technology. Before
analyzing the samples for DNA methylation of the targeted genes, tissue samples were
evaluated for DNA quantity. Those with the greatest quantity were analyzed at all 79
genes; those with the lowest quantity were analyzed at only 9 genes; and those with
intermediate quantities were analyzed at subsets of either 19 or 39 genes. In total, 440
tissue samples were evaluated for 9 genes; 220 tissue samples were evaluated for an
additional 10 genes (19 genes total); 130 tissue samples were evaluated for an additional
20 genes (39 genes total); and 50 tissue samples were evaluated for an additional 40
genes (79 genes total). The 9 genes analyzed for all samples were selected based on
5
their biological significance in tumor development, or their variability across tissue
types observed from earlier studies.
Among these 440 samples for which DNA methylation data were available, 20 were
excluded from the final analysis: one sample was from a patient that had previous
cancer, 13 samples did not have gastric/esophageal tissues, and 6 samples were further
omitted because the reported dietary intake of these patients were either too low (<=500
calories per day) or too high (>7000 calories per day). Thus, 420 samples remained,
which included 317 tumor tissue samples, 79 normal tissue samples, and 24 metaplasia
or dysplasia tissue samples. In this study, we analyzed these 317 tumor tissue samples
(from 278 patients) only.
2.2. Epidemiological and Clinical Data
We used a special structured questionnaire for this study (Wu et al., 2001). It contained
questions on lifetime smoking and alcohol drinking habits, body size characteristics at
20 and 40 years old as well as on the reference date of diagnosis (1 year before
diagnosis), personal and family history of diverse diseases and cancers, use of selected
medications, diet history and lifetime occupational history.
6
We considered a number of variables as possible predictors of DNA methylation
profiles. Demographic characteristics included: age at diagnosis (continuous), race
(white/non-white), and gender. Personal exposures were obtained from an in-person
interview that asked about smoking history, dietary intake, height, and body weight at
age 20, age 40, and at diagnosis, history of reflux diseases/symptoms and other medical
conditions. Smoking history was evaluated in a number of different ways. First, we
considered a categorical variable, classifying individuals as never, former, or current
smokers. We also investigated the average number of cigarettes smoked per day, the age
when subjects started smoking and stopped smoking, the total years of smoking, and the
number of pack-years of cigarettes smoked. The nutrients we investigated included total
calories (in Kcal), folate acid (in gram), total fat (in gram), and dietary fiber (in gram).
Medical history and history of various symptoms prior to diagnosis of cancer/interview
were taken as part of our structured questionnaire. We evaluated four types of
symptoms: sour stomach, gas pain, heart burn, and swallow symptoms (none, 1, 2, 3 or
4); as well as the history of several medical conditions that were diagnosed by a
physician, including the presence of any one of the three metabolic syndromes (diabetes,
hypertension, heart disease) (no/yes). For patients who responded that they had been
diagnosed with these conditions, we also asked about treatment (no disease, yes without
treatment, yes with treatment). We considered the time between the 1
st
diagnosis of
7
each condition and the diagnosis of cancer (no disease, 3-10 years, 11-20 years, 20+
years, yes with unknown years).
Information on tumor characteristics was routinely collected by the Los Angeles County
Cancer Surveillance Program (CSP), a population-based cancer registry that is a
member of the National Cancer Institute’s Surveillance, Epidemiology, and End Results
(SEER) program and the statewide California Cancer Registry (CCR). For this analysis,
we considered tumor site (esophageal adenocarcinoma; gastric cardia cancer; distal
gastric cancer) and several other tumor characteristics at diagnosis: stage (in situ or
localized; regional, direct extension, or lymph nodes only; regional, direct extension,
and lymph nodes; distant metastases; unstageable), tumor size (in millimeter), and
tumor differentiation (well differentiated; moderately differentiated; poorly
differentiated; undifferentiated; unknown differentiation). Information on vital status
(dead/alive) was also obtained from the CSP and we calculated the time of follow-up
since diagnosis (in years).
2.3. Cluster Analysis
Two-dimensional hierarchical cluster analysis provides a powerful visualization tool to
display DNA methylation profiles and their correlation to sample and/or gene
characteristics. It generates clusters of genes on one hand and clusters of samples on the
8
other hand, based on the similarity of DNA methylation profiles. In this study, we used
the Ward method which applies an analysis of variance criterion in evaluating the
distance between any two hypothetical clusters. This hierarchical cluster method starts
with each observation as its own cluster and combines two into one cluster such that we
have minimal increase in error sum of squares within clusters at each step until one big
cluster is formed (Ward, 1963). Therefore, samples were ordered near each other
according to their similarity in DNA methylation profiles. The same was done for the
genes too.
DNA methylation data was a non-negative continuous value represented by the percent
of methylated reference (PMR). Its transformation on the natural log scale (ln(PMR+1))
was used in the above cluster analysis. Cluster analysis was applied to the 45 tumor
tissue samples for which all 79 genes were measured and two major clusters of samples
were formed. A classifier was then built using one of the nine genes measured on all
tissue samples, so that all 317 tissues could be studied for their associations between
exposures and DNA methylation group.
2.4. Statistical analysis
After two major groups of our samples were determined, we performed logistic
regression using Generalized Estimating Equations (GEE) to evaluate the associations
9
between the DNA methylation group variable and all other variables of interest. This
approach allows us to analyze all tumor samples, accounting for the correlation in
outcome among tissue samples obtained from a single patient. In addition, we
investigated whether the relationship between each variable and DNA methylation
group differed by sex, by introducing product terms into the regression model. P-values
were reported both with and without adjustment for other variables. In the analyses with
adjustment, we controlled for sex and/or tumor site. Adjusted odds ratios and 95%
confidence intervals (95% CI) are reported.
Patient survival was analyzed using a Cox regression model, with DNA methylation
group as a predictor variable. For subjects with more than one tissue sample, the
average group value was used as a summary for DNA methylation group membership.
Tumor stage, differentiation, site and sex were tested as potential important predictors.
Hazard ratios (HR) with 95% CIs and p-values were calculated and reported for the
above continuous and categorical variables. Trend effects for the ordinal categorical
variables in relation to survival were also evaluated using ordinal coding for the
variables in the Cox regression model. Furthermore, we analyzed the relationship
between the summary group variable and survival after adjusting for either tumor stage
or tumor differentiation. The association of the summary DNA methylation group and
the subjects’ survival times were assessed using the Log Rank test for those in either
10
Group 1 or Group 2. All p-values reported are two-sided and are evaluated at the 0.05
significance level.
11
3. Results
In our 317-tumor sample dataset, there were 102 (32.2%) esophageal adenocarcinomas,
121 (38.2%) gastric cardia cancer, and 94 (29.6%) distal gastric cancer samples
(numbers of the three tumor subtypes are shown separated by “slashes” in the order of
“Esophageal /Cardia/Distal cancer” in this paper). 242 subjects (76/76/90) had only one
tumor sample, 33 subjects (10/21/2) had 2 tumor samples, and 3 subjects (2/1/0) had 3
tumor samples in this dataset.
Among 50 tissue samples with 79 genes, 5 were normal tissues and excluded. The
remaining 45 tumor tissue samples had methylation data for all 79 genes, 5 of which
were non-informative (zero for all samples) and omitted from the cluster analysis.
Therefore, we did the cluster analysis using the methylation data of the other 74 genes
for the 45 gastroesophageal tumor samples.
The cluster analysis identified two major subgroups of samples with distinct DNA
methylation profiles for these 74 genes (Figure 1). Group 1 has 27 samples, and Group
2 has 18 samples. Among these 74 genes, the methylation value of gene GATA5 is
found to be the most statistically significantly different between these two groups
(p=0.005), and its value itself can provide fairly accurate prediction about which group
12
a sample should be in (the area under ROC (Receiver Operating Characteristic) Curve
for GATA5 was 0.97).
Figure 1. Cluster Analysis
Figure 1 shows the heatmap of the standardized (log-transformed PMR values) DNA methylation values for
74 genes (columns) in 45 gastroesophageal tumor samples (rows). The Ward method was used to do the
clustering. Similar methylation patterns were clustered closely, and two major sample groups were formed
based on methylation profiles of these 74 genes. Based on the Ward method (W, left color bar), the lower
27 samples (dark green) were assigned to Group 1, and the upper 18 samples (light green) were assigned
to Group 2. On the other hand, based on gene GATA5 methylation criterion (G, right color bar), 26 samples
(dark green) belonged to Group 1, and 19 samples (light green) belonged to Group 2.
The arrow at the bottom of the diagram shows the GATA5 methylation column.
Red color in the diagram represents high standardized methylation value.
Yellow color in the diagram represents low standardized methylation value.
13
The logistic regression model for predicting DNA methylation group by GATA5
methylation value is:
logit P(group=1) = -2.9554 + 0.0363*GATA5
The optimal cut point of the GATA5 methylation value for classifying a sample into
Groups 1 or 2 is 81.4. If the GATA5 methylation value is bigger than 81.4, then the
sample is assigned to Group 1; otherwise, it is assigned to Group 2. Only 3 samples out
of 45 samples (6.7%) are misclassified by the GATA5 methylation value (Fig 1). This is
not improved by the addition of any other variable to the model.
Since GATA5 is one of those 9 genes measured for all 440 samples, we further apply
this classification criterion to the entire collection of 317 tumor samples. This final
classification assigns186 tumor samples (160 subjects) to Group 1 and 131 tumor
samples (120 subjects) to Group 2 (Table 1).
Table 1 shows the characteristics of the DNA methylation groups. There is a statistically
significant difference in the proportion of males between the groups (87% in Group 1 vs.
71% in Group 2, p = 0.003). Whites account for 75.8% of the samples in Group 1 and
63.4% of the samples in Group 2 (p = 0.03). The mean age at diagnosis is 61.3 (standard
14
Table 1. Characteristics of DNA Methylation Groups
†
Group1 Group2 p-value*
186 tissues/160 patients 131 tissues/120 patients
GATA5 methylation 286.9 (235.6) 22.0 (23.1)
Subject Characteristics
Sex Male 161 (86.6%) 93 (71.0%) 0.003
Race White 141 (75.8%) 83 (63.4%) 0.03
Age at diagnosis (in years) 61.3 (8.6) 60.2 (10.9) 0.35
BMI at age 20 (in kg/m
2
) 22.2 (3.2) 21.7 (3.1) 0.25
Missing (count) 14 26
BMI at age 40 (in kg/m
2
) 25.7 (4.4) 24.7 (4.1) 0.10
Missing (count) 14 26
BMI at diagnosis (in kg/m
2
) 27.7 (5.2) 25.8 (6.0) <0.05
Missing (count) 7 15
Smoking Factors
Smoking status 0.0008
Never 41 (22.0%) 51 (38.9%)
Former smoker 95 (51.1%) 38 (29.0%)
Current smoker 50 (26.9%) 42 (32.1%)
Number of cigarettes per day 43.3 (33.5) 53.3 (38.8) 0.03
Among smokers
Age start smoking 17.6 (4.9) 18.6 (5.6) 0.23
Age stop smoking 50.6 (12.1) 52.9 (13.1) 0.24
Total year of smoking 33.0 (12.8) 34.3 (13.8) 0.50
Number of pack-year 47.9 (36.2) 46.0 (37.4) 0.72
Missing (count) 1 1
Nutrient Factors
Total calorie (in kcal) 2736.1 (1157.4) 2523.6 (1065.5) 0.12
Folate acid density 172.6 (55.3) 180.9 (64.8) 0.26
Total fat density 43.1 (7.7) 40.4 (9.1) 0.01
Dietary fiber density 10.9 (3.8) 11.5 (4.5) 0.22
† Except for those specified, the information in this table means Count (%) or Mean (SD).
* For categorical and continuous variables, p-values were computed using GEE.
deviation (SD) = 8.6) for Group 1 and 60.2 (SD = 10.9) for Group 2. The mean BMI at
age 20 and at age 40 do not differ significantly between the groups; however, the mean
15
BMI at age of diagnosis shows a statistically significant association with group, with
27.7 kg/m
2
in Group 1 and 25.8 kg/m
2
in Group 2 (p < 0.05).
Smoking status is found to be statistically significantly correlated with DNA
methylation group (p = 0.0008) (Table 1). The proportion of never smokers was lower
in Group 1 than in Group 2 but this excess of smokers in Group 1 is due mainly to an
excess of former smokers. More than half (51.1%) of the subjects in Group 1 are former
smokers compared to 29.0% in Group 2. Surprisingly, the average number of cigarettes
smoked per day was higher among smokers in Group 2 than in Group 1 (p = 0.03).
However, there were no differences between smokers in Group 1 and Group2 in terms
of the age the subject started smoking, the age that the subject stopped smoking, the
total years of smoking, and the number of pack-years that the subject experienced
predict DNA methylation subgroup (all p > 0.05). Of the 4 nutrients we investigated in
this analysis, only total fat differs between the groups, with lower levels seen in Group 2
compared to Group 1 (p = 0.01) (Table 1).
Table 2 summarizes the descriptive statistics of tumor characteristics between DNA
methylation groups. Tumor subsite differed between Group 1 and Group 2 (p = 0.0003).
Esophageal adenocarcinoma accounts for 39.3% and gastric cardia cancer accounts for
39.8% of the subtypes in Group1, while most of the samples (42.0%) in Group 2 are
16
distal gastric cancer. However, there are no differences in tumor stage, tumor size, or
tumor differentiation degree between groups (all p > 0.05) (Table 2).
Table 2. Distribution of Tumor Characteristics Between DNA Methylation Groups
†
Tumor Characteristics
Group1
186 tissues/160
patients
Group2
131 tissues/120
patients p-value*
Tumor site / Cancer subtype 0.0003
Esophageal Cancer 73 (39.3%) 29 (22.1%)
Gastric Cardia Cancer 74 (39.8%) 47 (35.9%)
Distal Gastric Cancer 39 (21.0%) 55 (42.0%)
Tumor stage 0.18
1) In situ or localized 38 (20.4%) 22 (16.8%)
2) Regional, direct extension or lymph nodes only 32 (17.2%) 20 (15.3%)
3) Regional, direct extension & lymph nodes 62 (33.3%) 34 (26.0%)
4) Distant metastases or unstageable 54 (29.0%) 55 (42.0%)
Tumor size 50.2 (22.5) 52.8 (28.5) 0.51
Size unknown (count) 58 41
Tumor differentiation 0.20
Well or Moderately differentiated 50 (26.9%) 34 (25.9%)
Poorly differentiated 123 (66.1%) 79 (60.3%)
Undifferentiated 8 (4.3%) 7 (5.3%)
Differentiation not known 5 (2.7%) 11 (8.4%)
†
The information in this table: Count (%) or mean (SD).
* p-values were computed using GEE.
The frequencies of other medical histories by DNA methylation group are shown in
Table 3. The distributions of the number of symptoms (sour stomach, gas pain,
heartburn, and swallow symptoms) that subjects had differ between the groups (p =
0.01). However, history of diabetes, history of any one disease (diabetes, hypertension,
17
heart disease) and whether or not the patient received treatment, as well as the lag time
between the earliest diagnosis of any one of the 3 diseases and cancer diagnosis are not
statistically significantly different between groups (all p > 0.05).
Table 3. Distribution of Disease Characteristics Between DNA Methylation Groups
†
Disease Characteristics
Group1
186 tissues/160
patients
Group2
131 tissues/120
patients p-value*
Presence of symptoms: sour stomach, gas pain, heartburn, and swallow symptoms 0.01
None 85 (48.0%) 77 (62.1%)
One 30 (17.0%) 21 (16.9%)
Two 44 (24.9%) 13 (10.5%)
Three or four 18 (10.2%) 13 (10.5%)
Missing 9 7
Presence of Diabetes 0.57
No, 0-2 lag years 170 (91.4%) 116 (89.2%)
Yes 16 (8.6%) 14 (10.8%)
Don’t know** 0 1
Presence of any one disease: Diabetes, Hypertension, Heart disease 0.63
No, 0-2 lag years 98 (52.7%) 73 (55.7%)
Yes 88 (47.3%) 58 (44.3%)
Presence of any one of the 3 diseases and whether received treatment 0.37
No, 0-2 lag years 98 (52.7%) 73 (55.7%)
Yes, Treatment - 8 (4.3%) 10 (7.6%)
Yes, Treatment + 80 (43.0%) 48 (36.6%)
Lag time of the 3 diseases from the earliest one to cancer diagnosis 0.81
No, 0-2 years 98 (52.7%) 73 (56.6%)
3-10 years 37 (19.9%) 27 (20.9%)
11-20 years 28 (15.1%) 17 (13.2%)
20+ years 23 (12.4%) 12 (9.3%)
Yes, don’t know year** 0 2
†
The information in this table: Count (%).
* p-values were computed using GEE.
** Observations were deleted when calculating the p-values.
18
Table 4. The Association Between DNA Methylation Group and the Variables
Variable
Adjusted
Odds Ratio (95% CI)* p-value*
p-value for
interaction
†
Sex Male vs. Female 2.12 (1.11-4.03) ** 0.03** 0.90**
Race White vs. Non-White 0.96 (0.49-1.86) 0.90 0.20
BMI at diagnosis 1.06 (1.00-1.13) 0.04 0.97
Number of cigarettes per day 0.99 (0.99-1.00) 0.14 0.18
Total fat density 1.02 (0.99-1.05) 0.26 0.88
Smoking status 0.04 0.26
Former vs. Never smoker 2.26 (1.17-4.38)
Current vs. Never smoker 1.27 (0.65-2.49)
Number of the four symptoms: Sour stomach, Gas
pain, Heartburn, and Swallow symptoms 0.07 0.14
One vs. None 1.18 (0.57-2.45)
Two vs. None 2.40 (1.18-4.84)
Three/Four vs. None 0.97 (0.39-2.37)
* Odds ratios, 95% CI’s and p-values were calculated for the variables using GEE after adjusting for tumor
site and sex.
** Odds ratio, 95% CI and p-values were calculated for the sex variable using GEE after adjusting for tumor
site.
† p-values were calculated for interaction between the variable and sex in terms of the relationship to DNA
methylation group after adjusting for tumor site.
Table 4 summarizes the associations between DNA methylation group and the variables
of interest after adjusting for sex and/or tumor site. The association between sex and
DNA methylation group does not vary by tumor site (interaction p = 0.90). Furthermore,
there are no interactions between race, the number of cigarettes smoked each day, and
the intake of total fat and sex after adjusting for tumor site (all p > 0.05). Males are
twice as likely to be in Group 1 as females (95% CI = 1.11-4.03) after controlling for
the effect of tumor site. BMI at the age of diagnosis is still statistically significantly
19
associated with group after adjusting for sex and tumor site. The subject is 1.34 times as
likely to be in Group 1 with a 5 kg/m
2
increase in BMI at diagnosis (p = 0.04). However,
after adjusting for sex and tumor site, race, the number of cigarettes smoked per day and
intake of total fat are no longer statistically significantly correlated with group (all p >
0.05).
Table 4 also shows that with the adjustment for sex and tumor site, smoking status is
statistically significantly associated with group (p = 0.04), but the number of symptoms
that the subject had among sour stomach, gas pain, heartburn, and swallow symptoms is
not (p = 0.07). Compared to never smokers, former smokers are 2.3 times as likely to be
in Group 1 (95% CI = 1.17-4.38), while current smokers are 1.3 times as likely to be in
Group 1 (95% CI = 0.65-2.49). Moreover, after adjusting for tumor site, no interactions
between sex and these two variables were detected (both p > 0.05) (Table 4).
Survival analysis results are shown in Table 5. The median survival time for subjects in
DNA methylation Group 1 (1.5 years) is longer than that for subjects in Group 2 (0.9
years) without controlling for any covariate (log rank p = 0.04). As we expected, tumor
stage is associated with survival time (heterogeneity p < 0.0001). Compared to subjects
diagnosed with earlier stage (In situ or localized) tumors, the mortality risk of those
diagnosed with more advanced stage tumors (either regional, direct extension or lymph
20
Table 5. Survival Analysis
Variable
Number
of patient
(N=278)
Number
of death
(N=239)
Hazard Ratio
(95% CI)* p-value*
DNA methylation Group 0.04
Group 1 160** 135** 1.00
Group 2 118 104 1.31 (1.01-1.69)
Sex 0.06
Male 221 198 1.00
Female 57 41 0.73 (0.52-1.02)
BMI at diagnosis 0.99 (0.96-1.01) 0.29
Smoking status 0.90
Never 82 70 1.00
Former 116 102 0.97 (0.72-1.32)
Current 80 67 1.05 (0.75-1.47)
Tumor site 0.42
Esophageal Cancer 88 79 1.00
Gastric Cardia 98 87 1.09 (0.80-1.47)
Distal Gastric 92 73 0.88 (0.64-1.21)
Tumor size 1.01 (1.00-1.01) 0.06
Tumor stage <0.0001
1) In situ or localized 55 36 1.00
2) Regional, direct extension or lymph nodes only 42 31 1.39 (0.86-2.26)
3) Regional, direct extension & lymph nodes 82 73 2.48 (1.65-3.73)
4) Distant metastases or unstageable 99 99 6.54 (4.35-9.86)
Tumor differentiation 0.04
Well- or Moderately-differentiated 71 59 1.00
Poorly-differentiated 176 154 1.21 (0.90-1.64)
Undifferentiated 15 14 1.88 (1.05-3.37)
Differentiation not known
†
16 12 0.67 (0.36-1.25)
* Hazard ratios, 95% CIs, and p-values of Wald tests were calculated using Cox regression.
** Two subjects with multiple tissue samples that have different DNA methylation group values were
included in Group 1 here. They both died before the end of the study.
†
This category was not included in the trend test.
nodes only) is 1.4-fold higher (95% CI = 0.86-2.26); the mortality risk of those with
tumors in both regional, direct extension and lymph nodes is 2.5-fold higher (95% CI =
21
1.65-3.73); the mortality risk of those with tumors in the last stage (distant metastases or
unstagable) is 6.5-fold higher (95% CI = 4.35-9.86) (Table 5). The increase in risk with
increase in tumor stage is statistically significant (HR = 1.93, trend p < 0.0001). Tumor
differentiation is also associated with survival time (heterogeneity p = 0.05). Compared
to patients with well- or moderately-differentiated tumors, the less differentiated the
tumors, the higher the rates of death (HR = 1.29, trend p = 0.05). Interestingly, the
summary DNA methylation group variable also seems to be a good predictor of survival
times; subjects in Group 2 were 1.3-fold more likely than those in Group 1 to die from
the cancer (p = 0.04). Furthermore, compared to subjects in Group 1, after adjusting for
tumor differentiation, subjects in Group 2 were 1.5 times as likely to die from the cancer
(95% CI = 1.14-1.93, p = 0.004); whereas after adjusting for tumor stage, subjects in
Group 2 were 1.2 times as likely to die from the cancer (95% CI = 0.88-1.50, p = 0.30).
Except for the above 3 variables, we did not find any other statistically significant
predictors for survival times (all p > 0.05 as shown in Table 5).
22
4. Discussion
In our esophageal and gastric adenocarcinoma study, we had DNA methylation data for
9 genes from all 440 micro-dissected tissue samples, including both normal and tumor
tissues from the esophageal, gastric cardia, or distal gastric regions. Although a total of
79 genes were selected d for our study, only 45 tumor tissue samples had enough DNA
for analysis of all 79 genes. However, a cluster analysis of all genes on the subset of 45
samples identified two subgroups that were best predicted by GATA5 alone, one of the 9
genes measured on all 440 samples. As a result, we were able to classify all samples in
order to study the associations between DNA methylation group and environmental
exposures and clinical outcome.
Individually, GATA5, ITGA4, and RUNX3 were three genes for which the methylation
data showed statistically significant correlations with certain exposures, such as
smoking status (all p<0.05, no correction for multiple testing). In addition to the above
three genes, methylation for genes PRKAR1A and CACNA1G also showed statistically
significant associations with the aforementioned three different tumor subtypes (data
not shown). Since this was an exploratory approach, it required adjusting for multiple
comparisons in order to control the overall significance level (false-positive rate) of the
study. As an alternative, we used the DNA methylation groups identified via our cluster
23
analysis as a single summary of DNA methylation profiles to explore associations
between DNA methylation, environmental exposures and clinical outcome.
As the results showed, group membership based on DNA methylation profile was
correlated with sex and tumor subtype (both p <0.05). Furthermore, BMI at diagnosis
and smoking status were statistically significantly correlated with group after adjusting
for both tumor subtype and sex. Group membership was a significant predictor for
survival time. This significance was maintained after adjusting for tumor differentiation,
while it disappeared with the adjustment for tumor stage. Compared to subjects in
Group 2, subjects in Group 1 had higher DNA methylation of GATA5 (PMR > 81.4),
higher BMIs at diagnosis, and were more likely to be former or current smokers.
Interestingly, compared to subjects in Group 2, subjects in Group 1 tended to live longer.
Smoking status and BMI at diagnosis were not statistically significantly correlated with
survival times themselves, which means that these potential factors for high GATA5
DNA methylation in gastroesophageal adenocarcinoma may not be risk factors for death
from these cancers. Nevertheless, a higher level of GATA5 gene methylation is
correlated with higher probability of survival, although it is confounded by tumor
prognosis stage (adjusted HR = 1.15, 95% CI = 0.88-1.50) and tumor differentiation
severity (adjusted HR = 1.48, 95% CI = 1.14-1.93).
24
Abnormal methylation of the GATA gene family has been found in many cancers,
especially in gastrointestinal cancer (Akiyama et al. 2003; Guo et al. 2004; Wakana et al.
2006). Among its family members, GATA5 plays an important role in inducing
expressions of certain anti-tumor genes. Hypermethylation in the promoter region of
GATA5 gene suppresses its transcription, thus results in decreased expression of those
anti-tumor genes, which are critical in tumor suppression and cell differentiation
(Akiyama et al. 2003). In our gastroesophageal adenocarcinoma study, we also found
that the median methylation level of the GATA5 gene was much higher in tumors than in
normal tissues after adjusting for tumor site (esophageal adenocarcinoma, gastric cardia
and distal gastric) (data not shown), which was consistent with previous studies.
However, among tumors, different levels of GATA5 methylation showed statistically
significant associations with smoking status and BMI at the age of cancer diagnosis. As
mentioned before, the cut point for the GATA5 methylation value was 81.4, which may
also be considered as the cut point for GATA5 methylation in normal vs. tumor tissues
of gastroesophageal adenocarcinoma (The median PMR for GATA5 is 22 with the
inter-quartile of 3-57 in normal tissue, while it is 127 with the inter-quartile of 20-251 in
tumor tissue.). Therefore, it was not surprising to see different characteristics between
subjects with high levels of GATA5 methylation (PMR > 81.4) and those with low levels
of GATA5 methylation (PMR < 81.4). Ironically, our results showed that subjects with
higher levels of GATA5 methylation had higher probability of survival, although they
25
were more likely to have been smokers and had higher BMIs at the age of their cancer
diagnosis after adjusting for sex and their tumor subtypes. Tumor stage and tumor
differentiation probably explained most of the differences in the rate of survival
between subjects with different methylation levels of GATA5, although these two
variables did not differ statistically significantly between groups. This deserves more
investigation.
Above all, similar results were found using the cluster analytic approach we described
in this paper. Thus the cluster analysis approach we took can be described as a method
to identify DNA methylation profiles in the samples, which is then correlated with
information on exposures and clinical data.
26
References
Akiyama Y , Watkins N, Suzuki H, Jair KW, van Engeland M, Esteller M, Sakai H, Ren
CY , Yuasa Y , Herman JG, Baylin SB. (2003) GATA-4 and GATA-5 transcription factor
genes and potential downstream antitumor target genes are epigenetically silenced in
colorectal and gastric cancer. Mol Cell Biol. Dec; 23(23):8429-39.
Baylin SB, Herman JG, Graff JR, Vertino PM, Issa JP. (1998) Alterations in DNA
methylation: a fundamental aspect of neoplasia. Adv Cancer Res. 72:141–196.
Blot WJ, Devesa SS, Kneller RW, Fraumeni JF Jr. (1991) Rising incidence of
adenocarcinoma of the esophagus and gastric cardia. JAMA 265: 1287-1289.
Blot WJ, McLaughlin JK. (1999) The changing epidemiology of esophageal cancer.
Semin Oncol 26 (5 Suppl 15): 2-8.
Brown LM, Devesa SS. (2002) Epidemiologic trends in esophageal and gastric cancer
in the United States. Surg Oncol Clin N Am 11: 235-256.
Crew KD, Neugut AI. (2006) Epidemiology of gastric cancer. World J Gastroenterol.
Jan 21;12(3):354-62.
Devesa SS, Blot WJ, Fraumeni JF Jr. (1998) Changing patterns in the incidence of
esophageal and gastric carcinoma in the United States. Cancer 83 (10): 2049-53.
Eads CA, Lord RV , Kurumboor SK, Wickramasinghe K, Skinner ML, Long TI, Peters
JH, DeMeester TR, Danenberg KD, Danenberg PV , et al. (2000) Fields of aberrant CpG
island hypermethylation in Barrett's esophagus and associated adenocarcinoma. Cancer
Res. 60(18):5021–5026.
Estécio MR, Gharibyan V , Shen L, Ibrahim AE, Doshi K, He R, Jelinek J, Yang AS, Yan
PS, Huang TH, Tajara EH, Issa JP. (2007) LINE-1 hypomethylation in cancer is highly
variable and inversely correlated with microsatellite instability. PLoS ONE. May
2;2(5):e399.
Frigola J, Solé X, Paz MF, Moreno V , Esteller M, Capellà G, Peinado MA. (2005)
Differential DNA hypermethylation and hypomethylation signatures in colorectal cancer.
Hum Mol Genet. Jan 15;14(2):319-26.
27
Guo M, Akiyama Y , House MG, Hooker CM, Heath E, Gabrielson E, Yang SC, Han Y ,
Baylin SB, Herman JG, Brock MV . (2004) Hypermethylation of the GATA genes in lung
cancer. Clin Cancer Res. Dec 1;10(23):7917-24.
Guo M, Ren J, House MG, Qi Y , Brock MV , Herman JG. (2006) Accumulation of
promoter methylation suggests epigenetic progression in squamous cell carcinoma of
the esophagus. Clin Cancer Res. Aug 1;12(15):4515-22.
Holliday R. (1987) The inheritance of epigenetic defects. Science. Oct 9; 238(4824):
163-70.
Holmes RS, Vaughan TL. (2007) Epidemiology and pathogenesis of esophageal cancer.
Semin Radiat Oncol. Jan;17(1):2-9.
Jones PA, Laird PW. (1999) Cancer epigenetics comes of age. Nat Genet. 21(2):
163–167.
Klump B, Hsieh CJ, Holzmann K, Gregor M, Porschen R. (1998) Hypermethylation of
the CDKN2/p16 promoter during neoplastic progression in Barrett's esophagus.
Gastroenterology. 115(6):1381–1386.
Lagarde SM, ten Kate FJ, Richel DJ, Offerhaus GJ, van Lanschot JJ. (2007) Molecular
prognostic factors in adenocarcinoma of the esophagus and gastroesophageal junction.
Ann Surg Oncol. Feb;14(2):977-91.
Lee OJ, Schneider-Stock R, McChesney PA, Kuester D, Roessner A, Vieth M,
Moskaluk CA, El-Rifai W. (2005) Hypermethylation and Loss of Expression of
Glutathione Peroxidase-3 in Barrett's Tumorigenesis. Neoplasia. September; 7(9):
854–861.
Parkin DM, Pisani P, Ferlay J. (1993) Estimates of the worldwide incidence of eighteen
major cancers in 1985. Int J Cancer. Jun 19;54(4):594-606.
Siewert JR, Ott K. (2007) Are squamous and adenocarcinomas of the esophagus the
same disease? Semin Radiat Oncol. Jan;17(1):38-44.
Terry MB, Gaudet MM, Gammon MD. (2002) The epidemiology of gastric cancer.
Semin Radiat Oncol. Apr;12(2):111-27.
28
29
Vasavi M, Ponnala S, Gujjari K, Boddu P, Bharatula RS, Prasad R, Ahuja YR, Hasan Q.
(2006) DNA methylation in esophageal diseases including cancer: special reference to
hMLH1 gene promoter status. Tumori. 92(2):155-62.
Wakana K, Akiyama Y, Aso T, Yuasa Y. (2006) Involvement of GATA-4/-5
transcription factors in ovarian carcinogenesis. Cancer Lett. Sep 28;241(2):281-8.
Ward, J. H. (1963). Hierarchical groupings to optimize an objective function. Journal of
the American Statistical Association, 58, 234-244.
Widschwendter M, Jones PA. (2002) DNA methylation and breast carcinogenesis.
Oncogene. Aug 12;21(35):5462-82.
Wong DJ, Barrett MT, Stoger R, Emond MJ, Reid BJ. (1997) p16INK4a promoter is
hypermethylated at a high frequency in esophageal adenocarcinomas. Cancer Res.
57(13):2619–2622.
Wu AH, Wan P, Bernstein L. (2001) A multiethnic population-based study of smoking,
alcohol and body size and risk of adenocarcinomas of the stomach and esophagus
(United States). Cancer Causes Control. Oct;12(8):721-32.
Abstract (if available)
Abstract
Our primary objective is to analyze the association between DNA methylation profiles, epidemiological risk factors, and clinical outcomes for gastroesophageal adenocarcinoma. We applied cluster analysis to classify tumor samples using DNA methylation profiles. We then characterized the resulting DNA methylation subgroups using logistic and Cox regression, and robust variance estimates to allow for multiple tissue samples from individual patients. Two major sample groups were generated based on DNA methylation profiles. Compared to Group 2, after adjusting for sex and tumor subtypes, subjects in Group 1 had higher GATA5 DNA methylation values, higher BMIs at diagnosis, and were more likely to have smoked. However, subjects in Group 1 showed better survival than those in Group 2, after adjusting for tumor differentiation. Cluster analysis is effective for classifying samples based on DNA methylation profiles. Smoking and BMI are two significant factors associated with DNA methylation group, which itself is associated with survival.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Finding signals in Infinium DNA methylation data
PDF
DNA methylation and gene expression profiles in Vidaza treated cultured cancer cells
PDF
Genome-scale profiling of DNA methylation in sporadic pituitary macroadenomas: association with tumor invasion and histopathological subtype
PDF
Gene expression and angiogenesis pathway across DNA methylation subtypes in colon adenocarcinoma
PDF
DNA methylation markers for blood-based detection of small cell lung cancer in mouse models
PDF
Understanding DNA methylation and nucleosome organization in cancer cells using single molecule sequencing
PDF
CpG poor promoter SULT1C2 regulated by DNA methylation and is induced by cigarette smoke condensate in lung cell lines
PDF
Air pollution, smoking, and multigenerational DNA methylation Signatures: a study of two southern California cohorts
PDF
Genetic epidemiological approaches in the study of risk factors for hematologic malignancies
PDF
Nonlinear modeling of the relationship between smoking and DNA methylation in the multi-ethnic cohort
PDF
RNA methylation in cancer plasticity and drug resistance
Asset Metadata
Creator
Wang, Xinhui
(author)
Core Title
DNA methylation groups determined by GATA5 gene methylation level are correlated with tumor subtype, sex, smoking status, and body mass index in esophageal and gastric adenocarcinoma
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Preventive Medicine (Health Behavior)
Publication Date
11/30/2007
Defense Date
12/01/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
BMI,DNA methylation group,gastroesophageal adenocarcinoma,GATA5,OAI-PMH Harvest,Sex,tumor site
Language
English
Advisor
Siegmund, Kimberly (
committee chair
), Laird, Peter W. (
committee member
), Wu, Anna (
committee member
)
Creator Email
xinhuiwa@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m952
Unique identifier
UC1207407
Identifier
etd-Wang-20071130 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-593857 (legacy record id),usctheses-m952 (legacy record id)
Legacy Identifier
etd-Wang-20071130.pdf
Dmrecord
593857
Document Type
Thesis
Rights
Wang, Xinhui
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
BMI
DNA methylation group
gastroesophageal adenocarcinoma
GATA5
tumor site