Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Functional role of a Ewing-sarcoma-specific vlncRNA in tumor growth and progression
(USC Thesis Other)
Functional role of a Ewing-sarcoma-specific vlncRNA in tumor growth and progression
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
FUNCTIONAL ROLE OF A EWING-SARCOMA-SPECIFIC VLNCRNA IN TUMOR
GROWTH AND PROGRESSION
By
Yang Liu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(GENETICS, MOLECULAR AND CELLULAR BIOLOGY)
May 2016
ii
Dedication
To my parents, Xingping Liu and Donghong Su,
for their unconditional love, support, and encouragement,
To my wife, Xiaoping Yu,
for her endless love, companionship, and devotion,
and
To my little son, Andrew Liu,
whose innocent smile lights up my world.
iii
Acknowledgements
I sincerely appreciate the support and guidance of my mentor, Dr. Timothy J. Triche, whose
professional perseverance has greatly contributed to my ongoing, personal development of
the character and qualities of a scientist. His foresight in science has helped me to deeply
understand the importance of combining molecular biologic and computational techniques
in cancer research. I am also thankful to him for encouraging me and providing me with
concrete opportunities to develop hands-on skills in these research areas.
I would like to offer my gratitude to my dissertation committee members, Dr. David Hinton
and Dr. Shahab Asgharzadeh, for their guidance and opinions concerning my project and
for their great help which has been leading towards the completion of my studies. Their
constructive suggestions regarding my presentation and dissertation have helped me to
present my study more logically and comprehensibly.
In my years spent as a graduate student, I had the pleasure of working with a number of
lab mates, including Sheetal Mitra, Anirban Mitra, Hyung-Gyoo Kang, Cindy Cronin-
Yoder and Violette Shahbazian. They gave great help to my project and contributed to my
growth as a sceintist. I would also like to express my appreciation to collaborators Dr.
Philipp Kapranov and Dr. Laura Li; their suggestions on my work in this dissertation were
extremely helpful. I am grateful to Dr. Jonathan Buckley for providing technical support
to my project, as well. I would like to thank Yu Zhan, Seda Mkhitaryan, Gevorg
Karapetyan, Michael Sheard, Alexander Navarro and Karen Miller, who also made very
useful and much appreciated contributions to my project.
iv
Last but not least, I am greatly thankful to Bami Andrada and Dr. Ite A. Laird for their
guidance and help towards achieving the completion of my studies. Many thanks to all my
lab mates and classmates who made my years at USC an enjoyable and unforgettable
experience.
v
Table of Contents
Dedication ii
Acknowledgements iii
List of Figures vii
Abstract ix
Chapter 1: Introduction 1
1.1 Ewing sarcoma family of tumors (ESFTs) 1
1.2 Non-coding RNAs in Ewing sarcoma 2
1.2.1 Open Reading Frame (ORF) 3
1.2.2 Categorizations of ncRNAs 4
1.3 Very Long Non-coding RNAs (vlncRNAs) 7
1.4 VlncRNAs in Ewing sarcoma 9
1.5 Spliced variants of EW-vlncRNA 11
Chapter 2: EW-vlncRNA in Ewing sarcoma 12
2.1 Introduction 12
2.2 Materials and methods 14
2.3 Results 19
2.3.1 A highly and specifically expressed transcript in Ewing sarcoma 19
2.3.2 The non-coding EW-vlncRNA is part of the 381.37 kb transcript 25
2.3.3 EW-vlncRNA is expressed independently of CCDC171 in
Ewing sarcoma 29
2.4 Discussion 34
Chapter 3: A spliced ncRNA sEWVLNC of EW-vlncRNA 37
3.1 Introduction 37
3.2 Materials and methods 38
3.3 Results 41
3.3.1 sEWVLNC is spliced from EW-vlncRNA 41
3.3.2 sEWVLNC is more stable than EW-vlncRNA 44
3.3.3 sEWVLNC is localized in the nucleus 45
3.4 Discussion 46
Chapter 4: Down-stream genes of sEWVLNC 49
4.1 Introduction 49
4.2 Materials and methods 52
4.3 Results 58
vi
4.3.1 Altered expression of sEWVLNC affects many down-stream
Genes 58
4.3.2 Down-stream signaling pathway analysis of sEWVLNC 61
4.3.3 Correlation analysis of the gene network regulated by
sEWVLNC 63
4.3.4 WGCNA analysis of sEWVLNC down-stream genes 65
4.3.5 sEWVLNC down-regulates the HGF/MET signaling pathway
by suppressing MET expression 68
4.4 Discussion 71
Chapter 5: sEWVLNC delays Ewing sarcoma progression 74
5.1 Introduction 74
5.2 Materials and methods 78
5.3 Results 83
5.3.1 Ewing sarcoma invasion in vitro delayed by sEWVLNC 83
5.3.2 Ewing sarcoma growth and progression in vivo suppressed by
sEWVLNC 86
5.3.3 sEWVLNC is a potential variable in Ewing sarcoma prognosis 90
5.3.4 MET expression is down-regulated by sEWVLNC potentially
by binding to transcription factor c-Jun 94
5.4 Discussion 96
Chapter 6: Conclusions and Perspectives 98
Bibliography 103
vii
List of Figures
Figure 1.1: The classification of lncRNAs by GENCODE 6
Figure 1.2: EW-vlncRNA in Ewing sarcoma 10
Figure 2.1: The 381.37 kb transcript is specifically expressed in Ewing sarcomas 21
Figure 2.2: The 381.37 kb transcript is highly expressed in most Ewing sarcoma 22
Figure 2.3: Other vlncRNAs are not highly expressed in Ewing sarcoma 23
Figure 2.4: The 381.37 kb transcript in human melanoma (NHEM) 25
Figure 2.5: RNA sequencing data of CHLA-9 and CHLA-10 cells 27
Figure 2.6: Long-range overlapping PCRs 28
Figure 2.7: MeDIP-Seq data for CHLA-9 and CHLA-10 cells 32
Figure 2.8: PCR and qPCR results of the CCDC171 gene and EW-vlncRNA 32
Figure 2.9: EW-vlncRNA knockdown with ASOs 33
Figure 3.1: The spliced variant sEWVLNC derived from EW-vlncRNA 43
Figure 3.2: Pfam results for sEWVLNC 44
Figure 3.3: The stability of EW-vlncRNA and sEWVLNC 45
Figure 3.4: The locations of EW-vlncRNA and sEWVLNC 46
Figure 4.1: EW-vlncRNA knockdown and sEWVLNC overexpression 60
Figure 4.2: Signaling pathway analysis of genes regulated downstream of sEWVLNC 62
Figure 4.3: Genes correlated with sEWVLNC 64
Figure 4.4: WGCNA analysis of the sEWVLNC transcript 67
Figure 4.5: MET gene regulation by sEWVLNC 69
Figure 4.6: Regulation of the MET gene by sEWVLNC 71
Figure 5.1: Depiction of the cell invasion assay 84
viii
Figure 5.2: Tumor cell motility in vitro reduced by sEWVLNC 85
Figure 5.3: Effect of inhibition of c-Met on tumor cell motility 86
Figure 5.4: Effect of sEWVLNC on Ewing sarcoma growth and progression in vivo 87
Figure 5.5: Tumor growth delayed by sEWVLNC 89
Figure 5.6: Association between expression level of sEWVLNC and survival rate of
Ewing sarcoma patients 92
Figure 5.7: Survival analysis based on MET expression 93
Figure 5.8: Transcription efficiency affected by sEWVLNC 95
Figure 5.9: RNA FISH of sEWVLNC and c-Jun 96
ix
ABSTRACT
The Ewing sarcoma family of tumors (ESFTs) are the second most common bone and soft
tissue tumor in children, most commonly in adolescence. A chromosomal translocation
t(11:22) generating the EWSR1-FLI1 fusion gene is well known as a molecular pathologic
marker of Ewing sarcoma. Current advance showed that several specific gene targets of
this chimeric protein EWS-FLI1 have been identified, all possessing multiple GGAA
repeats in their promoter region, but until now, no non-coding RNA targets have been
found.
Nowadays, over 60% of all known human genes are documented to be non-coding RNA
transcripts, and many of these have been shown to be functional that were not previously
appreciated. In the past, studies of non-coding RNA focused mainly on non-coding RNAs
in small size; however, recent studies have revealed that long non-coding RNAs (lncRNAs,
over 200 nucleotides) perform a wide range of functions in cellular development,
proliferation, and survival. Moreover, current research in lncRNAs have uncovered a new
category of ncRNAs that span more than 50 kb in size and named very long non-coding
RNA (vlncRNA) in this study. This new category is unusually difficult to study due to its
large size and lack of conventional promoter. Thus, even though this class of ncRNA has
been known for several years, few publications have characterized their functions. In this
study, one such vlncRNA is shown to be expressed exclusively in the Ewing sarcoma
family of tumors, and a function for this vlncRNA is identified.
Unlike the fusion gene EWSR1-FLI1 which drives Ewing sarcoma aggression, sEWVLNC,
a 2 kb spliced variant of this Ewing-sarcoma-specific vlncRNA, delays tumor progression
x
dramatically in vitro and in vivo. Moreover, survival analysis and relapse analysis of 52
Ewing sarcoma patients from a COG database indicates that sEWVLNC enhances patient
survival rate. Based on signaling pathway analysis, gene cluster selection, and down-
stream gene prediction in this study, the results indicate that sEWVLNC might regulate
tumor growth and progression by decreasing proto-oncogene MET expression, thereby
suppressing the HGF/MET signaling pathway. Corresponding mechanistic investigations
suggest a potential interaction between sEWVLNC and the transcription factor c-Jun, a
binding partner of c-Fos which together with c-Jun forms the heterodimer AP-1 complex
that serves as a major promoter of the MET gene. These findings are the first to document
that a tumor specific vlncRNA is spliced into a functional variant that suppresses proto-
oncogene MET expression. In conclusion, these results suggest that sEWVLNC plays an
important role in Ewing sarcoma suppression and prognosis.
1
Chapter 1: Introduction
1.1 Ewing sarcoma family of tumors (ESFTs)
Depending on histologic and molecular criteria, the Ewing sarcoma family of tumors
(ESFTs) is typically described as consisting of small, blue, round tumors which commonly
arise in bone or nearby soft tissue (Zhang et al. 2004). ESFTs were first documented in
1921 by Ewing who distinguished the disease from osteosarcoma and other types of
cancers (Ewing 1921); in 1975, this disease was reported in soft tissue by Angervall and
Enzinger (Angervall et al. 1975). ESFTs are classified by the American Cancer Society
into the following three main groups; Ewing sarcoma of bone, extraosseous Ewing tumor
(EOE), and peripheral primitive neuroectodermal tumor (PNET). Other tumors historically
incorporated in the group include adult neuroblastoma and the Askin tumor of chest wall
but together they are better described as the Ewing Sarcoma Family of Tumors, or EFSTs.
More recently, they are commonly simply referred to as Ewing sarcoma, as over 95% of
these tumors contain a common chromosomal translocation t(11:22)(q24:q12) or variants
thereof which results in transcription and translation of the EWSR1-FLI1 fusion gene
(Turc-Carel et al. 1988). Other common similar translocations including the t(21:22)
translocation (EWSR1-ERG) and t(7:22) translocations (EWSR1-ETV1) are found in over
5% cases of ESFTs (Im et al. 2000). Some rare translocations have also been reported, but
all generate a functionally similar gene product. Moreover, previous studies have shown
that ESFTs display positive immunostaining for the transmembrane glycoprotein CD99
(MIC2) and negative staining for PTPRC (protein tyrosine phosphatase, receptor type C)
(Steven et al. 2010). MIC2 is in fact one of several direct targets of the chimeric protein
2
EWS-FLI1 or equivalent oncoprotein, and is therefore expressed on essentially every
Ewing sarcoma with a translocation.
1.2 Non-coding RNAs in Ewing sarcoma
Even though almost one century has passed since ESTFs were first described, Ewing
sarcoma is still poorly understood. Previous studies focused mainly on gene fusion and
gene expression in Ewing sarcoma, but the results did not provide as much information
about this type of tumors as anticipated. Thus, current studies extend the traditional field
of cancer research, including the non-coding RNA sub-field. The importance in tumor
metastasis and tumor growth of some lncRNAs has been demonstrated in previous work
(Rinn et al. 2012, Derrien et al. 2012).
In 1965, the first non-coding RNA was documented to be an alanine transfer RNA in yeast
(Holley et al. 1965); subsequently, over ten thousand various non-coding RNA genes have
been defined over the past fifty years. Among over sixty thousand human genes annotated
by ENCODE (the Encyclopedia of DNA Elements), over two thirds were described as non-
coding RNA genes, of which most are speculated to be highly functional (ENCODE
Project Consortium 2012). However, unlike protein-coding genes, non-coding RNA genes
are difficult to study because of their varied structures and tissue-specific expressions. By
2014, a GENCODE catalog of 40,944 non-coding RNA transcripts was supported by solid
evidence and this number is still much less than previously anticipated (Harrow et al. 2012).
3
1.2.1 Open Reading Frame (ORF)
Identifying non-coding RNAs is a prerequisite for understanding non-coding RNAs.
Comparing the definitions of non-coding RNAs between GENCODE and other resources,
the open reading frame (ORF) is an essential molecular genetic unit to consider when
discussing how to distinguish non-coding RNA genes from protein-coding genes (Derrien
et al. 2012). ORFs have the potential to code for proteins or peptides. In general, ORFs
include a start codon (usually ATG) and can be continually translated before a stop codon
(in most cases TAA, TAG, or TGA) is encountered, and the corresponding transcripts
encompass the complete ORF. Every nucleotide triplet in the transcribed ORF represents
a standard genetic code encoding one amino acid. Six translation frames originating from
the two anti-parallel DNA strands are possible but only one is typically used for a given
gene. However, the presence of an ORF does not always result in production of a protein
or peptide; some candidate ORFs contain multiple stop codons which precludes effective
translation into protein from that site. Moreover, ORFs consist of exons of protein-coding
genes, but exons are not equal to ORFs. Many genes contain untranslated exons, or portions
thereof (usually 5’ and 3’). Because most exons do not themselves contain start codons,
they cannot be translated into proteins or peptides. In fact, some ncRNAs share some exons
or DNA regions with nearby coding genes on the same strand. This type of ncRNA is
referred to as a processed transcript in GENCODE annotation (Derrien et al. 2012) and is
called sense lncRNA in the current lncRNA classification published by Kapranov and
colleagues (St-Laurent et al. 2015).
4
1.2.2 Categorizations of ncRNAs
ORFs provide important information to understand ncRNAs, but a clear classification
framework of lncRNAs is also critical to annotate and identify non-coding RNAs.
Moreover, functional roles of ncRNAs can be predicted based on ncRNA classification,
such as antisense ncRNAs, most of which suppress expression or enhance degradation of
the coding transcript of the associated gene. Most studies to date categorize ncRNAs into
two groups: small non-coding RNAs (< 200 nucleotides) and long non-coding RNAs (≥
200 nucleotides, lncRNAs) (Rinn et al. 2012). The latter, lncRNAs, are the more important
non-coding RNA species, yet remain the most poorly understood (Derrien et al. 2012).
Based on GENCODE human annotations, lncRNAs are categorized into the following five
types corresponding to their locations with nearby protein-coding genes (Derrien et al.
2012): (1) antisense ncRNAs, which intersect with nearby protein-coding genes in the
opposite direction; (2) lincRNAs (long intergenic non-coding RNAs), which locate in
intergenic loci and do not have any overlapping region with nearby genes; (3) sense
overlapping transcripts, in which a relative protein-coding gene is enveloped in an intron
in the same direction but which does not contain any ORF after RNA splicing; (4) sense
intronic transcripts, which locate within an intron of a protein-coding gene, but do not
overlap with any exon; (5) processed transcripts, which do not include any ORF and do not
belong to any of the other classifications above. Figure 1.1 shows the relationships between
lncRNAs and nearby protein-coding genes. New landscapes of lncRNA classifications
have been subsequently developed in recent years, and lncRNAs have been separated into
ten subgroups by some authors (St-Laurent et al. 2015). Interestingly, in contrast to the
somewhat ambiguous definition of processed transcripts given by GENCODE, Kapranov’s
5
group has provided a precise description of sense ncRNAs as being equal to processed
transcripts. Sense ncRNAs are defined as a type of ncRNAs which partially intersect with
nearby protein-coding genes on the same strand and are read in the same direction. In other
words, sense ncRNAs can share some exons or DNA sequence with nearby genes, but they
do not contain any ORF and cannot encode any protein or peptide.
6
Figure 1.1 The classification of lncRNAs by GENCODE. (A) antisense ncRNAs, which
intersect with nearby protein-coding genes in the opposite direction. LincRNAs (long
intergenic non-coding RNAs) are located in intergenic loci and do not have any
overlapping region with nearby genes. Sense overlapping transcripts involve a protein-
coding gene that is enveloped by an intron of the lncRNA in the same direction but which
does not contain an ORF after RNA splicing. Sense intronic transcripts are located within
an intron of a protein-coding gene but do not overlap with any exon; (B) processed
transcripts, also named sense lncRNAs, which do not include any ORF and do not belong
to any of other classifications above.
7
1.3 Very Long Non-coding RNAs (vlncRNAs)
Although non-coding RNAs have been classified into sncRNAs and lncRNAs for many
years, a novel type of non-coding RNA has recently emerged and is now recognized in the
human genome, termed very long non-coding RNAs (> 50 kb, vlncRNAs) in this study
(also named macro ncRNAs in other study. However, the define of macro ncRNAs is
ambiguous and overlaps lncRNAs) (Koemer et al. 2009, Rinn et al. 2012, St-Laurent et al.
2015). Their extremely large size, lack of conservative promoters, and absence of poly-A
tails has impeded structural and functional studies. Furthermore, the difficulty of vlncRNA
investigation is dramatically increased by the relative inefficiency of conventional gene
over-expression or traditional knockdown methods. As vlncRNAs are commonly over 50
kb in size, it is impossible to generate over-expression vectors; similarly, most vlncRNAs
are thought to remain in the nucleus after transcription, where shRNA or siRNA cannot
suppress their transcription efficiently. Thus, even though the existence of vlncRNA
transcripts has been described for several years, there are very few published studies
describing their functions.
The earliest vlncRNA research was documented in the 1990s, but there was no equivalent
classification of ncRNAs at that time. Therefore, these ncRNAs are most commonly known
as lncRNAs. The studies of these ncRNAs demonstrated that most of them play antisense
inhibitory roles in cellular activities. Among them, Kcnq1ot1 and Air are very well known
and have been shown to function by interfering with nearby genes (Pandey et al. 2008,
Seidl et al. 2006). Both Kcnq1ot1 and Air can be categorized as vlncRNAs, since both are
much more than 50 kb in size: Kcnq1ot1 more than 91 kb in size and Air is over 108 kb in
8
size. Furthermore, studies have demonstrated that Kcnq1ot1 is localized in a genetic
intronic locus between exon 10 and exon 12 of the Kcnq1 gene, but is transcribed on the
opposite strand. Kcnq1ot1 mainly inhibits the nearby Kcnq1 gene by suppressing the
chromatin modifications of Kcnq1 through lineage-specific binding. The vlncRNA Air is
localized in the intronic space between exon 2 and exon 3 of the Igf2r gene, and it is also
transcribed on the opposite strand. Similar to Kcnq1ot1, the main function of Air is to
interfere with the Slc22a3, Slc22a2, and Igf2r genes. Recent advances have shown that Air
down-regulates Slc22a3 by interacting with H3K9 histone methyl-transferase G9a to
induce methylation at the Slc22a3 promoter. Although previous studies reported that Air
exists and performs functions in full-length format, reports have more recently
demonstrated several splice variants in Air ncRNA, and the functions of those spliced
variants remain unknown.
Besides antisense vlncRNAs, the very long intergenic non-coding RNAs (vlincRNAs)
were first described in 2011 by Kapranov, St-Laurent, and Triche (Kapranov et al. 2011);
while corresponding studies with more comprehensive evidence were published in 2013
(St-Laurent et al. 2013). So far, two cases of vlincRNAs have been reported to play
functional roles: (1) one of them, HELLP (hemolysis, elevated liver enzymes, low platelets),
is about 205 kb in size, is localized between PMCH and IGF1, and is involved in cell
cycling and extravillous trophoblast invasion (van-Dijk et al. 2012); (2) the other is VAD
(vlincRNA antisense to DDAH1), which is over 200 kb in size and is located in the DDAH1
locus but on the opposite strand of the DDAH1 gene. VAD performs functions in
modulating chromatin and preventing histone variant H2A.Z from interfering with the
9
INK4 gene promoter (Lazorthes et al. 2015). Although studying vlincRNA is still in its
infancy, the importance of vlincRNAs is now increasingly accepted (de-Hoon et al. 2015).
1.4 VlncRNAs in Ewing sarcoma
Thousands of tissue-specific vlncRNAs have been identified in previous studies (St-
Laurent et al. 2013), one 381.37 kb transcript is specifically and highly expressed in
virtually all Ewing sarcoma primary tissues and cell lines. Total RNA sequencing data
showed that this transcript is localized at 9q22.3 (Figure 1.2A), and HuEx (Affymetrix
human exon array) data of 178 Ewing sarcoma samples showed that this transcript exists
in the large majority (over 90%) of Ewing sarcoma samples. Comparing with other
pediatric cancers, bone cancers, and soft tissue cancers, the data presented in this study
indicate that this 381.37 kb transcript is specifically expressed in Ewing sarcoma.
Total RNA-Seq of Ewing sarcoma cells showed that this transcript might appear to lack
conventional gene splicing, since this 381.37 kb genetic locus is seemingly completely
filled with overlapping reads (Figure 1.2A). However, the GENCODE consortium
annotation displays a transcript ENST00000486641 that is also localized at the same
genomic locus as this 381.37 kb transcript. The main expression part of this 381.37 kb
transcript is occupied by the transcript ENST00000486641 which is 187.12 kb in size, this
transcript is named EW-vlncRNA in this study (Figure 1.2B); RNA sequencing showed
that the remaining part of the 381.37 kb transcript was at an extremely low expression level
and it was speculated to be a 3’ run-off transcription. A previous study unveiled that there
is an overlap between EW-vlncRNA and the nearby gene CCDC171, but EW-vlncRNA
does not contain any open reading frame (ORF) and was identified as a non-coding
10
processed transcript by the GENCODE consortium (Harrow et al. 2012, ENCODE Project
Consortium 2012). The data from this study also indicates that EW-vlncRNA is transcribed
specifically and independently in Ewing sarcomas. Further, this unique expression of this
specific vlncRNA, alone among all vlncRNAs identified to date, suggests it might be a
useful, tumor specific biomarker of Ewing sarcoma.
Figure 1.2 EW-vlncRNA in Ewing sarcoma. (A) based on total RNA sequencing data,
RNA-Seq read alignments of this 381.37 kb transcript fill in the genetic locus completely,
almost no classic alternative splicing spaces were found in this transcript. (B) based on
GENCODE, one 187.12 kb transcript ENST00000486641, named EW-vlncRNA in this
study, is also localized at the same genomic locus as this 381.37 kb transcript. Depending
on RNA-Seq data, EW-vlncRNA occupied the main expression segment of this 381.37 kb
transcript, the remaining part of the 381.37 kb transcript was speculated to represent a 3’
run-off transcription.
11
1.5 Spliced variants of EW-vlncRNA
As discussed above, the prevailing theory of vlncRNAs suggests that this new category of
ncRNAs functions mainly as full-length transcripts, though some spliced RNAs might also
be transcribed within the loci, such as Air and VAD, which include spliced variants.
Nevertheless, there is no experimental evidence to support the idea that the unspliced
vlncRNAs might be more stable than the spliced variants.
In this study, RNA splicing and the stability of EW-vlncRNA were evaluated. The data
shows that several short variants (~2 kb) are spliced from the EW-vlncRNA (~200 kb) by
the excision of spliceosome via canonical splice junction recognition (5’- GU…AG -3’).
Based on PCR, sub-cloning, and DNA sequencing data, three major splice variants with
poly-A tails were aligned and identified to be derived from EW-vlncRNA. The largest
variant is 2 kb in size and is named sEWVLNC (spliced variant of EW-vlncRNA).
Actinomycin D treatment was used to analyze the stability of EW-vlncRNA and sEWVLNC.
The half-life of sEWVLNC is much greater than that of unspliced EW-vlncRNA. This result
suggests that the spliced variants is more stable than the corresponding unspliced vlncRNA.
Therefore, it is reasonable to hypothesize that the short sEWVLNC is the functional form
and the large EW-vlncRNA is the unspliced precursor.
12
Chapter 2: EW-vlncRNA in Ewing sarcoma
2.1 Introduction
Non-coding RNA studies, like tRNA studies, can be traced back to the 1960s (Holley et al.
1965), but the capacity to perform large-scale exploration of the undefined regions of the
human genome have not been available until recently. Unlike mRNAs which contain poly-
A tails and 5’ caps, many non-coding RNAs do not include any 5’ or 3’ modification.
Without those modifications, it is difficult to identify transcribed fragments of ncRNAs via
traditional Sanger sequencing. Some ncRNAs, such as Xist and miRNAs, were identified
in the 1990s (Rinn et al. 2012), and tiling microarray, which allows the simultaneous survey
of 20,000 genes or genomic loci, came into widespread use by 1999 (Dumham et al. 1999).
However, the technology for genome-wide RNA sequencing did not emerge until the
twenty-first century. For this reason, knowledge of ncRNAs was extremely poor prior to
the year 2000. In the first decade of the 2000s, human total RNA sequencing (RNA-seq)
became feasible via next-generation sequencing technology (Mortazavi et al. 2008,
Marioni et al. 2008), allowing for comprehensive identification and annotation of ncRNAs.
To date, over 40,000 well characterized non-coding genes have now been identified in the
human genome (Harrow et al. 2012), and a burgeoning number of researchers have
recognized the importance of ncRNAs in cancer and cell development (Fatica et al. 2014,
Hung et al. 2011, Kretz et al. 2012, Nagano et al. 2011, Ponting et al. 2009, Qureshi et al.
2011). Total RNA sequencing also allowed identification of many variant forms of
ncRNAs, vlncRNAs were categorized as such a new type of ncRNAs.
13
Although thousands of vlncRNAs have now been found, most are not specifically
associated with specific cell types, and it is a challenge to determine their functions in
development and disease. The discovery of a Ewing-sarcoma-specific vlncRNA expressed
at extraordinarily high levels in this type of tumors affords the opportunity to study its
potential role in the etiology of Ewing sarcoma. This consequently became the primary aim
of this research. For this purpose, RNA-Seq and exon microarrays were used to study
hundreds of tumor tissues and cell lines from over a dozen different pediatric cancers. The
results indicated that tens of vlncRNAs are expressed in Ewing sarcoma. Among them, a
371.37 kb transcript was found to be expressed highly and specifically in Ewing sarcoma.
Comparing RNA sequencing data of this transcript to GENCODE annotation (Harrow et
al. 2012, ENCODE Project Consortium 2012, Cunningham et al. 2015), EW-vlncRNA was
consequently identified as uniquely expressed in Ewing Sarcoma and so named. Extensive
research on this transcript is reported herein.
Most ncRNA studies have focused on intergenic ncRNAs or antisense ncRNAs, due to
concerns that an overlap between a ncRNA and a protein-coding gene on the same strand
might contain a potential ORF which results in protein coding and leads the study to self-
contradiction (Pelechano et al. 2013, Rinn et al. 2007, Satpathy et al. 2015, Wilusz et al.
2008). Therefore, knowledge of sense ncRNAs is only fragmentary at best. In this study,
EW-vlncRNA is shown to be a sense ncRNA that includes an overlap with the nearby gene
CCDC171; in addition, the GENCODE database has characterized this EW-vlncRNA as a
non-coding processed transcript (Harrow et al. 2012, ENCODE Project Consortium 2012,
Cunningham et al. 2015). Furthermore, in the GENCODE database, EW-vlncRNA was
thought to be simply a runoff transcription of CCDC171 gene. However, ASO knockdown
14
results, HuEx data, and MeDIP analyses in the present study clearly document the
independence and integrity of this vlncRNA which is unrelated to CCDC171 transcription.
2.2 Materials and Methods
Cell culture
The Ewing sarcoma cell lines CHLA-9, CHLA-10, and A4573 were cultured in DMEM,
high glucose, HEPES medium (Life Technologies, Carlsbad, CA) supplemented with 20%
fetal bovine serum (FBS, Life Technologies & Sigma, St. Louis, MO) as well as 100 U/mL
penicillin and streptomycin (Life Technologies). All cells were expanded in 100 mm
culture dishes (Corning Life Sciences, Corning, NY) at 37°C in a humidified incubator
using room air (Thermo Fisher Scientific, Waltham, MA) with 5% CO2. Cells were
passaged before reaching 90% confluency using 0.05% Trypsin-EDTA (Life
Technologies), and the viability of cells was consistently observed to be over 85%.
HuEx Array, RNA-Seq, and MeDIP-Seq
HuEx array data of 178 Ewing sarcoma primary tissues and cell lines were made available
by the laboratory of Dr. Timothy Triche from a variety of sources (vide infra). The Ewing
sarcoma dataset (GSE63157 in GEO) included the following three groups: (1) 38 HuEx
CEL files from Ewing sarcoma cell lines; (2) 75 CEL files from the Euro-EWING
(European Intergroup Cooperative Ewing’s Sarcoma Study) trial 92 and Euro-Ewing trial
99 study) case material (kindly provided by Dr. Uta Franke); and (3) 65 CEL files from
COG cases from clinical trials INT-0154 and AEWS0031. For HuEx array analysis of total
genomic transcription identified as 1,400,000 independent transcripts, total RNA was
isolated from the starting tissue to generate biotin-labeled cRNA that hybridized with the
15
Affymetrix GeneChip® Human Exon 1.0 ST Array (Affymetrix, Santa Clara, CA) to
estimate gene or ncRNA expression. Quality control for the HuEx array was assessed by
Affymetrix Expression Console Software (Affymetrix) following the manufacturer’s
protocol. Core and extended probes were used for expression analyses (Langer et al. 2010).
GEO GSE63157: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63157
Total RNA sequencing was prepared from total RNA after ribosomal depletion
(RiboMinus v2, Life Technologies Inc., Carlsbad, CA), which was reverse transcribed with
random hexamers to construct a cDNA library of all RNA transcripts, coding or non-coding,
followed by DNA sequencing by the next-generation sequencing Personal Genome
Machine (PGM) or Proton on the Ion Torrent™ Platform (Thermo Fisher, Carlsbad, CA)
using the 318 and P1 chips. Poly-A tail RNA sequencing employed a similar system, but
total RNA was replaced by poly-A tailed RNA. In this study, two ES cell lines, CHLA-9
and CHLA-10, were sequenced from both total RNA and poly-A tailed RNA. RNA-Seq
data were evaluated using Genetrix 3.63 software (Epicenter Software, Pasadena, CA).
Methylated DNA immunoprecipitation and high-throughput sequencing (MeDIP-seq) was
performed using preparations of methylated DNA from CHLA-9 and CHLA-10 cells, and
were analyzed by methylation sequencing (Illumina, San Diego, CA). In addition, an
alternative method, Hpall DNA methylation analysis, was utilized with CHLA-9 cells. The
Mspl methylase activity of MeDIP methylates the outer cytosine residue by recognizing
the sequence CCGG, while the Hpall methylase methylates the inner cytosine residue via
the same sequence. MeDIP-Seq data and Hpall sequencing data were analyzed using
Genetrix 3.63 software (Epicenter Software).
16
ASO, DsiRNA, and shRNA
Antisense oligonucleotides (ASOs) are commonly 20-25 nucleotides in size, and consist of
2’-O-methyl RNA and phosphorothioate DNA to resist RNase H-mediated degradation.
The ASOs used here were synthesized by IDT (Integrated DNA Technologies, Coralville,
IA). In this study, four ASOs were designed to knock down EW-vlncRNA or the 381.37
kb transcript: ASO Exon 1 complementarily binds to exon 6 of EW-vlncRNA, ASO Exon
2 complementarily binds to exon 8, ASO Intron complementarily binds to intron 2 between
exon 2 and exon 3 of EW-vlncRNA 3’ to the end of the CCDC171 gene, and ASO 3
complementarily binds to the 3’ extended region beyond the body of EW-vlncRNA.
CHLA-9 cells were transfected with ASOs using Cell Line Nucleofector Kit R (Lonza,
Basel, Switzerland) using the Nucleofector instrument (Lonza) by electroporation,
according to the manufacturer’s protocol. Unlike traditional siRNA in which transfection
using liposomal reagents cannot efficiently intervene in nuclear RNAs, ASOs knock down
target RNA not only in the cytoplasm but also in the nucleus via RNase H. Expression of
the target RNA is effectively reduced by about 70% or more within 3 hours after
transfection, and the efficiency of the inhibition decreased by 72 hours (Ideue et al. 2009).
ASO Exon 1: 5'- mG*mC*mC*mA*mG*mG*A*A*T*C*C*A*T*G*T*C*C*A*A*T*-
mG*mC*mA*mG*mC*mC -3', ASO Intron: 5'- mC*mC*mU*mU*mC*mU*T*T*C*-
T*T*A*T*C*T*T*G*T*T*G*mC*mC*mC*mA*mC*mC -3', ASO 3: 5'- mU*mC*-
mU*mC*mA*mU*A*A*T*T*C*C*C*T*C*T*T*C*T*T*mU*mC*mU*mC*mC*mC -
3', ASO negative control: 5’- mA*mA*mG*mC*mG*C*G*C*A*C*C*A*G*C*G*mC*-
mC*mU*mC*mC -3’.
17
The knockdown of small interfering RNAs (siRNAs) is dependent on the RNA Induced
Silencing Complex (RISC), and target RNAs are degraded by Dicer. Dicer-substrate RNAs
(DsiRNAs) are cleaved into two complementary siRNAs by the endoribonuclease activity
of Dicer, and siRNAs interfere with the target RNAs via the RISC-Dicer system (Bennasser
et al. 2011, Scherer et al. 2003). A 27-mer-duplex DsiRNA was designed and was
synthesized by IDT (Integrated DNA Technologies). This DsiRNA targeted the same guide
sequence as ASO Exon 1. The DsiRNA was transfected into CHLA-9 using DharmaFECT
Transfection Reagents (Thermo Fisher Scientific), following the manufacturer’s protocol.
The small hairpin RNA (shRNA) was constructed using a pcDNA 6.2-GW/EmGFP-miR
shRNA vector of BLOCK-iT Pol II miRNA RNAi Expression Vector Kit (Invitrogen,
Carlsbad, CA), which targeted the same guide sequence as ASO Exon 1 and DsiRNA. The
21-mer-miR-RNAi sequence, inserted in the pcDNA 6.2-GW/EmGFP-miR shRNA vector
and used to knock down the target RNA, was designed by Invitrogen’s RNAi Designer
(www.invitrogen.com/rnai). CHLA-9 cells were transfected with shRNA using Cell Line
Nucleofector Kit R (Lonza) using a Nucleofector instrument (Lonza) through
electroporation, following the manufacturer’s protocol. Cells were cultured in medium
with 10 ug/ml blasticidin for two to four weeks to form a stable cell line, which was then
sorted by flow cytometry according to EmGFP fluoresence intensity, following the
manufacturer’s protocol (Xu et al. 2009). All ASOs, DsiRNA, and shRNA knockdowns
contain corresponding negative controls, and knockdown efficiency was estimated using
qPCRs. DsiRNA target sequence: 5’- GCTGCATTGGAC-ATGGATTCCTGGC -3’,
shRNA target sequence: 5’- CTGGCTGCATTGGACATGGATTC -3’.
18
PCRs, qPCRs, and long-range overlapping PCRs
Human total RNA was purified using an RNeasy Mini Kit (Qiagen, Hilden, Germany) with
DNase treatment by an RNase-Free DNase Set (Qiagen) following the manufacturer’s
protocol. The quality of RNA was determined by a NanoDrop 1000 Spectrophotometer
(Thermo Fisher Scientific), according to the manufacturer’s guide. The cDNA library for
total RNA was reversely transcribed using a SuperScript® III First-Strand Synthesis
System for RT-PCR (Invitrogen) with random hexamers primers, while the cDNA library
for poly-A tail RNA was synthesized using an iScript cDNA synthesis Kit (Bio-Rad) with
oligo (dT) primers. Generally, every 10 ul reaction of reverse transcription used as much
as 500 ng of RNA.
PCR and long-range overlapping PCR products were amplified by platinum Taq DNA
polymerase high fidelity (Life Technologies). The products of PCRs were commonly in
the size range of 1 kb to 3 kb, the average size of long-range overlapping PCR products
was 5 kb. Fifty-six overlapping strand-specific PCRs were performed to cover the genetic
region chr9: 15,971,658 - 16,255,911. As the overlapping PCRs detect the integrity of
transcript, whether the transcript is continuous or discontinuous depended on the length of
the maximum-size product of the overlapping PCR. Quantitative PCRs (qPCRs) were
employed using a QuantiTect SYBR Green PCR Kit (Qiagen); based on normalization by
GAPDH, the 2
-ΔΔCT
method was used to analyze the expression of each target gene. SYBR
Green signals were detected using an Applied Biosystems 7900HT Fast Real-Time PCR
System (Applied Biosystems, Foster City, CA), following the manufacturer’s protocol.
19
2.3 Results
2.3.1 A highly and specifically expressed transcript in Ewing sarcoma
As Ewing sarcoma tumors can be confused with many other pediatric cancers due to similar
histopathology, six types of cancers (Ewing sarcoma, osteogenic sarcoma, fibrosarcoma,
medullo-blastoma, neuroblastoma, and rhabdomyosarcoma, each commonly observed in
similar tissues), were examined to identify a Ewing-sarcoma-specific ncRNA. Based on
HuEx results of thirty samples (five samples per type of cancer), one 381.37 kb transcript
was found to be specifically expressed in Ewing sarcoma (Figure 2.1A). Analyzing the
expression level with log2 values in the exon array, the expression of this transcript in
Ewing sarcoma was at least 16 times higher than that in other cancer types (Figure 2.1B).
Moreover, expression of the transcript has not effectively been observed in any other tumor
type so far (data not shown). To confirm the high expression of this transcript in most
Ewing sarcoma samples, HuEx data of 178 Ewing sarcoma samples were evaluated. HuEx
heat-map analysis demonstrated that the 381.37 kb transcript is highly expressed in over
90% of Ewing sarcoma samples: the heat-map in Figure 2.2A suggested that 162 out of
178 cases express this transcript at a high level, and others at detectable levels. The average
log2 expression value of this transcript reached about 10 among the 178 cases of Ewing
sarcoma (Figure 2.2B). This degree of specificity coupled with high expression levels is
unprecedented in similar studies.
Besides this EW-vlncRNA, other vlncRNAs were also expressed in Ewing sarcoma, but
their expression levels were very low and were far less than that of the 381.37 kb transcript.
Furthermore, they were not uniquely expressed by Ewing sarcoma cells (data not shown).
20
In Figure 2.3A, a tree diagram of 178 Ewing sarcoma cases was generated in order of
vlncRNA expression level, and the corresponding heat-map displays the expressions of
twelve different vlncRNAs. Among them, EW-vlncRNA demonstrated the highest
expression level in most Ewing sarcomas. Quantifying the expression of twelve vlncRNAs,
the expression level of EW-vlncRNA was beyond the threshold 7 (the median log2 value
of gene expression range, from 2 to 12) in 165 cases out of 178 Ewing sarcoma samples
(Figure 2.3B).
21
Figure 2.1 The 381.37 kb transcript is specifically expressed in Ewing sarcomas. (A)
Analyzing HuEx data of six different types of tumors (Osteogenic sarcoma (OS), Ewing
sarcoma (EWS), Fibrosarcoma (FS), Medullo-blastoma (MB), Neuroblastoma (NB), and
Rhabdomyosarcoma (RMS)) via a heat-map, the results showed that the expression level
of the 381.37 kb transcript in Ewing sarcoma (red squares) is much higher than that in other
cancer types (blue squares); (B) Calculating the log2 values of HuEx array demonstrated
that the 381.37 kb transcript in Ewing sarcoma (EWS (blue)) was found to be transcribed
16 times higher than that of other cancer types (OS (orange), FS (green), MB (purple), NB
(grey), and RMS (yellow)).
22
Figure 2.2 The 381.37 kb transcript is highly expressed in most Ewing sarcoma. (A)
The heat-map based on HuEx data of 178 Ewing sarcoma cases indicated that the 381.37
kb transcript is highly expressed in most Ewing sarcoma samples. Red color represents
high expression, while blue is low expression; (B) Using the log2 values of the HuEx data
to estimate expression levels, the expression of the 381.37 kb transcript, EW-vlncRNA,
and sEWVLNC was much higher than that of the CCDC171 residuals, the difference
reaching around eight fold (p-value = 2.2e
-16
). A significant difference in expression level
between EW-vlncRNA and the 381.37 kb transcript was also noted, presumably because
one half of the 381.37 kb transcript is run-off transcription, and EW-vlncRNA is the main
expression segment of the 381.37 kb transcript. In the exon array data, 26 probes were used
to calculate the CCDC171 residuals, 40 targeted probes were used for the whole 381.37 kb
transcript, 17 out of 40 probes were used to analyze EW-vlncRNA expression, and 9 of 40
probes to estimate sEWVLNC.
23
Figure 2.3 Other vlncRNAs are not highly expressed in Ewing sarcoma. (A) Besides
the EW-vlncRNA, other eleven vlncRNAs were detected using HuEx array, and a
dendrogram of the 178 Ewing sarcoma cases was clustered by vlncRNA expression levels.
In the heat-map, each row represents one vlncRNA, and every column represents one
Ewing sarcoma sample. Thus, each color square corresponds to the expression of one
vlncRNA in one Ewing sarcoma case. Red color means high expression, while white color
means low expression; (B) When analyzing the expressions of the 12 different vlncRNAs
in the 178 Ewing sarcoma cases, the results suggested that only the EW-vlncRNA is highly
expressed in most Ewing sarcoma (> 90% cases)
24
Additional analysis of human RNA-Seq data obtained using the Gene Expression Omnibus
(GEO) database showed the 381.37 kb transcript is not expressed in most samples of 111
different types of tissues and cell lines (Roadmap Epigenomics Consortium 2015) (data
not shown here), although one of three human melanoma (NHEM) cases expressed this
transcript at high levels, the only tissue in the database to do so (Figure 2.4). These findings
further supported the conclusion that the 381.37 kb transcript is specifically expressed in
Ewing sarcoma. Interestingly, CHLA-9 cells, derived from a primary Ewing sarcoma,
showed high expression, while CHLA-10 cells, derived from the same patient at relapse,
showed much lower expression. The findings are consistent with the hypothesis that
expression of the EW-vlncRNA is ‘protective’, or associated with less aggressive primary
tumors, while loss of expression is associated with clinically aggressive, recurrent,
treatment resistant tumors with a dismal prognosis.
25
Figure 2.4. The 381.37 kb transcript in human melanoma (NHEM). Human RNA-Seq
data was obtained from more than 200 cases of different tissues or cell lines, and all but
one did not express this transcript. Only NHEM #2 expressed this transcript at a high level.
A second melanoma case (NHEM #3) expressed the transcript at a low level.
2.3.2 The non-coding EW-vlncRNA is part of the 381.37 kb transcript.
Although the 381.37 kb transcript was found in most Ewing sarcomas, the main expression
segment of this transcript is located in the 5’ section. The 3’ part of this transcript is at an
extremely low expression level and is speculated to be the product of run-off transcription,
as is often observed in ncRNAs. Based on Havana and GENCODE databases (Harrow et
al. 2012), a 187.12 kb EW-vlncRNA, named transcript ENST00000486641 in GENCODE,
is localized at the same genetic locus as the 5’ half of the 381.37 kb transcript. Based on
poly-A tail RNA sequencing and total RNA sequencing data in Figure 2.5A and 2.5B, EW-
vlncRNA represented a dominant expression region of the 381.37 kb transcript. RNA
sequencing data further supported the previous assumption that the low expression segment
of this transcript is a 3’ extra extension in run-off transcription. To validate this conclusion,
26
HuEx data from 178 Ewing sarcoma cases was inspected as log2 expression values, and
the results showed the expression of EW-vlncRNA was significantly higher than that of
the 381.37 kb transcript in Figure 2.2B (p-value < 0.0005). These data in aggregate
supported the hypothesis that EW-vlncRNA is a valid expression part of this transcript.
To confirm that both the 3’ extension in run-off transcription and EW-vlncRNA were
derived from an integral transcript, long-range overlapping strand-specific PCR was
employed to detect the integrity of the transcript. Since this method is applied to first-strand
cDNAs synthesized from full-length transcripts, it could test for single unspliced RNAs
with or without poly-A tails (van-Dijk et al. 2012). For this purpose, about 56 long-range
overlapping PCRs, the average size of which was around 5 kb, were designed to cover the
genetic region: chr9: 15,971,658 - 16,255,911 (Figure 2.6). The overlapping PCR products
demonstrated that EW-vlncRNA and the 3’ extra extension in run-off transcription are
contiguous, and that some PCR products are connected with two parts. Thus, they were
from a single transcript. However, the overlapping PCRs cannot be applied to the overlap
region between EW-vlncRNA and the nearby CCDC171 gene, since the transcript of the
CCDC171 gene would interfere with the PCR products of EW-vlncRNA.
27
Figure 2.5 RNA sequencing data of CHLA-9 and CHLA-10 cells. (A) Total RNA
sequencing data of CHLA-9 (blue) and CHLA-10 (green). (B) Poly-A tail RNA sequencing
data of CHLA-9 (blue) and CHLA-10 (green).
28
Figure 2.6 Long-range overlapping PCRs. (A) Fifty-six long-range overlapping PCR
primers were designed to cover the genomic region of chr9: 15,971,658 - 16,255,911. (B)
The average size of these overlapping PCR products was around 5 kb.
29
2.3.3 EW-vlncRNA is expressed independently of CCDC171 in Ewing
sarcoma.
Because the overlap exists between EW-vlncRNA and the CCDC171 gene, it is possible
that EW-vlncRNA is a variant of CCDC171 transcript. However, HuEx data showed that
the expression of EW-vlncRNA is much higher than that of the CCDC171 gene, shown in
Figure 2.2. In addition, RNA-Seq data from total RNAs and poly-A tail RNAs also
supported the expression difference (Figure 2.5). Thus, EW-vlncRNA was speculated to
be expressed independently in Ewing sarcoma. As noted earlier, overlapping coding and
non-coding transcripts are well documented in GENCODE database. CCDC171 and EW-
vlncRNA appear to be another example of coding and non-coding transcripts being
transcribed from overlapping loci.
To further investigate the independence of EW-vlncRNA, HuEx data of 178 EWS samples
were analyzed. Exon array probes of the CCDC171 gene, excluding the probes of the
overlap region, were organized as CCDC171 residual. Based on HuEx data, the expression
level of EW-vlncRNA was much higher than that of CCDC171 residual in Figure 2.2B (p-
value is 2.2e
-16
), HuEx expression log2 values demonstrated that a 10-fold difference in
expression existed between EW-vlncRNA and CCDC171. If EW-vlncRNA is an extension
of the CCDC171 gene, they should have had similar expression levels, or at least in the
same order of magnitude.
Based on HuEx data, expression of the CCDC171 gene was extremely low in most Ewing
sarcomas (Figure 2.2A). The findings implied that the CCDC171 gene might be repressed
in Ewing sarcoma. To validate this assumption, CpG methylation status at the CCDC171
30
gene promoter in CHLA-9 and CHLA-10 cells was assessed by MeDIP-Seq (Clark et al.
2012, Deaton et al. 2011). Promoter methylation is a universal mechanism to regulate gene
expression, and hyper-methylation of a gene promoter generally inhibit the binding of
transcription factors and repress gene expression. The MeDIP-Seq data showed that
expression of the CCDC171 gene was suppressed by corresponding promoter hyper-
methylations (Figure 2.7). CpGs island methylation signals were observed in the promoter
region of the CCDC171 gene, from -2.5 kb to 1 kb of the transcription start site (TSS).
Besides, CpGs at the potential enhancer region of the CCDC171 gene, from -25 kb to -2.5
kb away from TSS, were also hyper-methylated in CHLA-9 and CHLA-10 cells, and this
finding correlated with the observed suppression of CCDC171 expression in both cell lines.
These results were validated by qPCR results of the CCDC171 gene which demonstrated
that the expression level of CCDC171 was extremely low in CHLA-9 and CHLA-10 cells.
Moreover, another methylation sequencing method, named Hpall DNA methylation
analysis (Mi et al. 1992), was used to assess the CpGs methylation status at the CCDC171
promoter, and the result indicated the same conclusion as MeDIP-Seq data in Figure 2.7.
Dissimilar to the CCDC171 gene, there are no methylation signals around the TSS of EW-
vlncRNA, and the expression of EW-vlncRNA is quite high in CHLA-9 cells (Figure 2.8).
These findings suggested that EW-vlncRNA is transcribed independently of CCDC171
gene; otherwise, the hyper-methylations of CpGs at the CCDC171 promoter could
concordantly suppress the expression of EW-vlncRNA. In reality, RNA-Seq data from total
RNA and poly-A RNA of Ewing sarcoma indicated that there is almost no expression of
CCDC171 in CHLA-9 and CHLA-10 cells (Figure 2.5); in contrast, EW-vlncRNA
expression was clearly observed in CHLA-9 cells as well as in CHLA-10 cells, albeit the
31
latter at a much lower level. Taken together, these observations suggest that EW-vlncRNA
expression does not represent background transcription of the CCDC171 gene.
Although the different expression levels and promoter methylation status of CCDC171 and
EW-vlncRNA supported the independent transcription of EW-vlncRNA, more direct
evidence was sought to distinguish EW-vlncRNA from the CCDC171 transcript. Three
different types of knockdowns, DsiRNA, shRNA, and antisense oligonucleotides, were
applied in this study: DsiRNA and shRNA cleave RNAs in cytoplasm via the RISC system
involving the Dicer enzyme, whereas ASOs function mainly in nuclei by complementarily
binding to targets and degrading RNAs via RNase H (Aartsma-Rus et al. 2009, Ideue et al.
2009). In this study, four ASOs were designed using putative guide sequences of EW-
vlncRNA (Figure 2.9A). Two of these sequences targeted exons of EW-vlncRNA (ASO
Exon 1 complementarily hybridized a region in exon 6 of EW-vlncRNA, and ASO Exon 2
in exon 8); while the other two separately bound to intron 2 (ASO Intron) or the 3’ run-off
transcription extension (ASO 3). DsiRNA as well as shRNA targeted the same exon locus
as ASO Exon 1. Based on qPCR results in Figure 2.9B, EW-vlncRNA expression was
decreased about 70% with ASO Exon 1 (p-value < 0.005) which is much more efficient
than ASO Exon 2 (data not shown here). Similarly, the expression of EW-vlncRNA was
reduced around 60% by ASO Intron knockdown (p-value < 0.005), and over 20% of EW-
vlncRNA was degraded by ASO 3 knockdown (p-value < 0.05). However, neither DsiRNA
or shRNA effectively interfered with EW-vlncRNA, even though both of them target the
same sequence as ASO Exon 1. The results also indicated that EW-vlncRNA transcription
is unrelated to CCDC171 expression, since none of those knockdowns effectively affected
32
expression of CCDC171 gene (Figure 2.9C). Together, these knockdown experiments
directly demonstrate the independence of EW-vlncRNA.
Figure 2.7 MeDIP-Seq data for CHLA-9 and CHLA-10 cells. CpG methylation signals
were observed in the promoter region of the CCDC171 gene, from -2.5 kb to 1 kb of the
transcription start site (TSS). Moreover, CpGs in the potential enhancer region of the
CCDC171 gene, from -25 kb to -2.5 kb away from the TSS, were also hyper-methylated
in CHLA-9 and CHLA-10 cells.
Figure 2.8 PCR and qPCR results for the CCDC171 gene and EW-vlncRNA. PCR and
qPCR results for the CCDC171 gene (black) and EW-vlncRNA (green) indicated that the
CCDC171 gene is expressed at an extremely low level in CHLA-9 and CHLA-10 cells.
33
Figure 2.9 EW-vlncRNA knockdown with ASOs. (A) Four ASO knockdowns were
designed according to putative guide sequences of EW-vlncRNA: ASO Exon 1 (dark blue)
complementarily hybridized a region in exon 6 of EW-vlncRNA, ASO Exon 2 (dark blue)
was for exon 8, ASO intron (light blue) targeted the intron 2 of EW-vlncRNA beyond the
overlap region, and ASO 3 (light blue) targeted the 3’ run-off transcription extension past
EW-vlncRNA. DsiRNA as well as shRNA (light green) targeted the same exon locus as
ASO Exon 1. (B) qPCR results of different knockdowns showed that the expression of
EW-vlncRNA was reduced about 70% by ASO Exon 1 knockdown, around 60% by ASO
Intron knockdown, and over 20% by ASO 3 knockdown. No significant difference in EW-
vlncRNA expressions was found by DsiRNA or shRNA knockdown. (C) none of the above
knockdowns dramatically decreased the expression of CCDC171 gene.
34
2.4 Discussion
As vlncRNAs have only been known for several years, our understanding of their structure,
processing, and function, if any, is extremely limited. Previous data suggested that
retrotransposons usually flank very long intergenic ncRNAs in most cases (St-Laurent et
al. 2013), suggesting that vlincRNAs might have transferred between human genomes via
retrotransposons at some point in the past. However, retrotransposons were not found in or
around sense or antisense to the EW-vlncRNA. Based on human GENCODE annotation,
other repeating elements that might represent ancient viral footprints (LINES, SINES, etc)
were also sought. No short interspersed nuclear elements (SINEs), satellite repeats, simple
repeats (also named micro-satellites), or other repeats were identified. Only long
interspersed nuclear elements (LINEs) were found at the EW-vlncRNA locus; this type of
retrotransposon is widespread in most eukaryotic genomes with more than 850,000 copies
in the human genome (Lander et al. 2001) and is not associated with prior retrotransposition
of the locus.
Another characteristic of vlncRNAs is lack of a conventional promoter, and EW-vlncRNA
also exhibited no promoter. The traditional promoter marker H3K4Me3, which is
specifically associated with many gene promoters, is not found in vlncRNAs. Therefore,
recognizing TSSs of vlncRNAs by transcription factors is currently a great challenge. Since
EW-vlncRNA does not include a promoter, Chip-Seq analysis of H3K4Me3 is not fit for
distinguishing EW-vlncRNA from the CCDC171 gene. However, CpGs methylation status
at the CCDC171 promoter can be assessed by MeDIP-Seq data, and the results obtained
indirectly supported the independence of EW-vlncRNA.
35
Unfortunately, the Ensemble database fails to distinguish EW-vlncRNA from CCDC171
and interprets it as a processed transcript of CCDC171 named CCDC171-004 (Harrow et
al. 2012). Even if EW-vlncRNA were an extension of the CCDC171 gene, EW-vlncRNA
is also defined as a non-coding processed transcript by GENCODE. However, the MeDIP-
Seq data, RNA-Seq data, as well as HuEx data in this study, are all consistent with EW-
vlncRNA being unrelated to CCDC171 gene. Thus, even though there is an overlap
between the transcript and the gene, EW-vlncRNA should still be considered an
independent transcript. Processed transcripts have been reported for years, but most
ncRNA studies avoid this terminology for overlapping coding/non-coding RNAs. At least
500 such ‘processed transcript’ genes have been documented.
The data from the ASO knockdowns also support another conclusion. Not only is the
independence of EW-vlncRNA supported by ASO knockdowns, it also shows that the EW-
vlncRNA is a single integral part of the 381.37 kb transcript. ASO 3 targeted the 3’ extra
extension in run-off transcription, far from EW-vlncRNA, but ASO 3 knockdown
effectively decreased the expression of EW-vlncRNA. Thus, the 3’ run-off transcription is
an integral extension of the EW-vlncRNA. Further, different ASOs also respectively and
efficiently reduce the expression of the EW-vlncRNA, indicating that EW-vlncRNA is
contiguous. Moreover, the knockdown results showed that DsiRNA and shRNA cannot
intervene in EW-vlncRNA expression through the RISC/Dicer system in cytoplasm, but
ASO Exon 1, guided by the same sequence as DsiRNA and shRNA, can repress EW-
vlncRNA expression in the nucleus by RNase H. These findings indicate that the main
36
location of EW-vlncRNA might be in the nucleus, and this hypothesis was validated in
subsequent qPCR and RNA FISH experiments.
37
Chapter 3: A spliced ncRNA sEWVLNC of EW-vlncRNA
3.1 Introduction
Message RNA splicing is a universal post-transcriptional modification in eukaryotic cells.
Similarly, most non-coding RNA also undergo RNA splicing (Affymetrix ENCODE
Transcriptome Project 2009). However, recent studies found some non-coding RNAs can
resist RNases and sustain their unspliced form. Kcnq1ot1 and Air are the most well-known
examples of these unspliced ncRNA transcripts (Seidl et al. 2006, Pandey et al. 2008).
Although Kcnq1ot1 is 91 kb in size, its half-life can reach 3.5 hours with actinomycin D
treatment. More importantly, studies have shown that Kcnq1ot1 inhibits Kcnq1 gene
expression by repressing chromatin modifications through lineage-specific binding, and
Kcnq1ot1 can only perform its known functions in its full-length form. The other ncRNA,
Air, is 108 kb in size and is an antisense transcript of the Igf2r gene. Dissimilar to Kcnq1ot1,
recent studies have shown that Air has spliced variants as well as its unspliced form.
Unspliced Air is relatively unstable, with a half-life of no more than 2.5 hours. In mouse
placenta, non-coding RNA Air can interfere with the cis-linked genes Slc22a3, Slc22a2,
and Igf2r. However, the functional roles of Air are implemented only in its full-length form.
Thus, the antisense type of vlncRNAs, such as Kcnq1ot1 and Air, have to maintain their
unspliced patterns to perform their functions.
Studies of vlincRNAs have revealed that, based on RNA-Seq data, most very long
intergenic ncRNAs retain their full-length forms (St-Laurent et al. 2013). Although
previously published articles have reported that vlincRNAs might play functional roles via
38
their unspliced forms, no reports to date have estimated the stability of vlincRNAs. Even
the half-lives of the well-known HELLP and VAD ncRNA are still unknown. Interestingly,
GENCODE human annotation has shown that some spliced variants exist in the vlincRNA
VAD locus, but the expression levels of these spliced variants are very low, and most VAD
vlncRNA maintains their unspliced form in cells (Lazorthes et al 2015).
In this study, the half-life of unspliced EW-vlncRNA and the spliced variant sEWVLNC
was analyzed with actinomycin D treatment. The RNA-stability results as well as the
expression analyses of EW-vlncRNA and sEWVLNC indicated that the spliced sEWVLNC
was more stable than unspliced EW-vlncRNA, and formed a major pattern in Ewing
sarcoma. Kcnq1ot1, Air, and VAD are primarily localized in the nucleus and perform
functions involving regulation of transcription, and the potential functions of sEWVLNC
might be implemented in the same manner. Most ncRNAs function in the nucleus and
regulate gene expression by governing transcription in contrast to cytoplasmic ncRNAs
that silence mRNA by direct hybridization (Zhang et al. 2014). Therefore, the locations of
EW-vlncRNA and sEWVLNC are very important for speculation of their functional
mechanisms.
3.2 Materials and Methods
Cell culture
The human rhabdomyosarcoma (RMS) cell line Rh-28 was cultured in IMDM,
GlutaMAX™ Supplement (Life Technologies) plus 20% fetal bovine serum (FBS, Life
Technologies & Sigma) as well as 100 U/mL penicillin and streptomycin (Life
Technologies). Similar to Ewing sarcoma cell lines CHLA-9 and CHLA-10, Rh-28 cells
39
were cultured in 100 mm culture dishes (Corning Life Sciences) at 37°C in a humidified
incubator (Thermo Fisher Scientific) with 5% CO2. Cells with 85% viability were passaged
before 90% confluency using 0.05% Trypsin-EDTA (Life Technologies).
Nuclear and Cytoplasmic RNAs Isolation
Since lysis buffer, from the nuclear and cytoplasmic RNA isolation kit, specifically breaks
cellular membranes but does not attack nuclear membranes, the nucleus and cytoplasm
were separately fractionated by centrifuging at 14,000 G. The nuclear fraction with
contained RNA is found in the pellet, while the cytoplasmic fraction of RNA is present in
the supernatant. Nuclear and cytoplasmic RNAs were separately purified using Fisher
BioReagents SurePrep Nuclear or Cytoplasmic RNA Purification Kits (Thermo Fisher
Scientific), according to the manufacturer’s protocol. RNase-Free DNase (Qiagen) was
also applied to nuclear and cytoplasmic RNAs to remove any DNA contamination,
following the manufacturer’s protocol. RNA quality was determined using a NanoDrop
1000 Spectrophotometer (Thermo Fisher Scientific).
Cloning and Expression Vectors
PCR products in agarose gel were extracted and then purified using GenElute™ Gel
Extraction Kit (Sigma), following the manufacturer’s protocol. DNA concentration was
assessed using NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific). The 3’ ends
of the DNA fragments were modified with Taq polymerase to add a single deoxyadenosine.
Using TOPO® TA Cloning® Kit (Invitrogen) for sequencing, DNA fragments with
deoxyadenosine modifications were inserted into pCR™ 4-TOPO® vector by
topoisomerase which can incorporate DNA fragments containing deoxyadenosine
40
modification via a phospho-tyrosyl bond that is formed through the breaking of the
phosphodiester backbone after 5’-CCCTT. Next, the cloning vectors with the target PCR
products were transformed into competent One Shot® TOP10 E.coli cells (Life
Technologies), and the transformation was incubated in LB plates at 37°C overnight.
Several single-colony E.coli cells were selected and cultured in LB medium for further
analysis. Plasmid DNA was purified using the QIAprep® Spin Miniprep Kit (Qiagen),
according to the manufacturer’s protocol. The target DNA fragment was amplified by PCR,
and the PCR products were extracted and purified again. The resultant DNA preparation
was then sequenced (Genewiz, Berkeley, CA).
After sequencing, target DNA fragments containing deoxyadenosine modifications from a
single-colony were inserted into pEF6/V5-His-TOPO® expression vector using pEF6/V5-
His TOPO® TA Expression Kit (Invitrogen), following the manufacturer’s protocol.
Similar to the cloning vector, the expression vectors containing the target DNA fragment
were integrated by topoisomerase and were transformed into One Shot® TOP10 E.coli
cells. Colonies were chosen and cultured in LB medium following the manufacturer’s
protocol, and the plasmids containing the target DNA fragment were extracted and purified
using the QIAGEN® Plasmid Plus Midi Kit (Qiagen), according to the manufacturer’s
protocol. Finally, the expression vectors were transfected into Ewing sarcoma tumors using
Cell Line Nucleofector Kit R (Lonza) using a Nucleofector instrument (Lonza), according
to the manufacturer’s protocol. Stable cell lines were generated by blasticidin selection.
The efficiency of expression vector modulation of gene expression was estimated by
qPCRs and human exon microarray expression profiling.
41
RNA Half-Life Measurements
Cells were cultured and passaged into several dishes when 90% confluent, with fresh
DMEM medium containing 20% fetal bovine serum overnight (to increase RNA products).
Cells in different dishes were treated with 5 ug/ml actinomycin D for different time periods;
0, 0.5, 1, 2, 4, 8, and 12 hours. After actinomycin D treatment, RNA was isolated from
cells and was purified using RNeasy Mini Kit (Qiagen) as well as RNase-Free DNase Set
(Qiagen) following the manufacturer’s protocol. In the end, RNA quantities were measured
by qPCRs.
Pfam ORF Analyses
The DNA sequence from the target RNA was uploaded into the Pfam website
(http://pfam.xfam.org/) of The European Bioinformatics Institute. The Pfam database is
based on multiple sequence alignments from widespread protein families; when there is a
recognized ORF, Pfam provides detailed protein domain information.
3.3 Results
3.3.1 sEWVLNC is spliced from EW-vlncRNA
Although RNA-Seq read alignments of CHLA-9 and CHLA-10 cells demonstrated no
RNA splicing in EW-vlncRNA, the PCR products, from the first exon of EW-vlncRNA to
the 3’ poly-A tail, showed three major spliced variants (2 kb, 1.6 kb, and 1.3 kb in size)
that were derived from EW-vlncRNA (Figure 3.1A and 3.1B). Interestingly, based on
GENCODE human annotation, one 2.3 kb non-coding RNA is spliced from EW-vlncRNA,
as shown in Figure 3.1A (Harrow et al. 2012, ENCODE Project Consortium 2012,
42
Cunningham et al. 2015). Therefore, EW-vlncRNA is more likely to be processed by the
spliceosome to generate these specific splice variants. To further investigate these variants
of EW-vlncRNA, sub-cloning and DNA sequencing were applied to the three dominant
splice variants. Aligning the DNA sequence with GENCODE human annotation
documented skipped exons in all three splice variants in Figure 3.1A. The largest 2 kb
spliced RNA, named sEWVLNC, was localized at the same locus as the 2.3 kb ncRNA
noted in GENCODE database, with two skipped exons, and was examined by PCR and
DNA sequencing (Figure 3.1A and 3.1B). Comparing RNA-Seq data of sEWVLNC to EW-
vlncRNA showed that sEWVLNC occupied the main expression peaks of EW-vlncRNA
(Figure 3.1C). Thus, even splicing was not predominantly evident from the EW-vlncRNA
RNA-seq data, it is clearly present in the overall primary transcripts, presumably as a small
proportion of the total. The spliced variants are clear in poly-A RNA-Seq data (Figure
3.1C), indicating the primary 381.37 kb transcript is most likely processed by the
spliceosome to form spliced isoforms that are also poly-adenylated subsequently.
Although EW-vlncRNA is a non-coding processed transcript, and the 2.3 kb RNA is also
identified as non-coding, there is no evidence to support sEWVLNC as being a non-coding
RNA as well. To examine this possibility, Pfam was applied to estimate ORFs in
sEWVLNC, since Pfam can recognize all known proteins and protein domains (Finn et al.
2014). The result showed that there is no ORFs but many stop codons in sEWVLNC. These
multiple stop codons would prevent potential translation of any candidate ORF and are a
common feature of ncRNAs (Figure 3.2).
43
Figure 3.1 The spliced variant sEWVLNC derived from EW-vlncRNA. (A) Three
major splice variants were derived from EW-vlncRNA (separately 2 kb, 1.6 kb, and 1.3 kb
in size). The 2.3 kb ncRNA is defined as a splice variant of EW-vlncRNA by GENCODE.
The skipped exons of three spliced variants in the study were identified by aligning DNA
sequence with GENCODE human annotation. (B) PCR products from the first exon of
EW-vlncRNA to the 3’ poly-A tail indicated RNA splicing in EW-vlncRNA. (C) Based on
RNA-Seq data, sEWVLNC occupied the main expression peaks of EW-vlncRNA.
44
Figure 3.2 Pfam results for sEWVLNC. No recognized ORFs were found in sEWVLNC
by Pfam analysis; instead, multiple stop codons were observed (red squares).
3.3.2 sEWVLNC is more stable than EW-vlncRNA
Previous studies suggested that vlncRNAs in general perform functions as full-length
transcripts, and documented their structural integrity as single continuous transcripts.
However, unspliced vlncRNAs are prone to degradation due to their very large sizes, and
degradation of vlncRNAs would destroy their functions. In this study, the half-life of EW-
vlncRNA and sEWVLNC was estimated by qPCRs with actinomycin D treatment, since
actinomycin D can stop transcription initiation by occupying an active site of the minor
groove of the DNA helix to interfere with topoisomerase (Leclerc et al. 2002). EW-
vlncRNA-rich CHLA-9 cells were treated with actinomycin D for different time periods,
and the results showed that the half-life of EW-vlncRNA was less than 1.5 hour; conversely,
45
that of sEWVLNC was approximately 2.5 hours (Figure 3.3). Near 40% of sEWVLNC
residuals were detected after 12 hours with actinomycin D treatment; in contrast, no more
than 10% of EW-vlncRNA was found after 2 hours with actinomycin D treatment (p <
0.0005). To validate this finding, the experiment was repeated with Rh-28 cells which do
not express native EW-vlncRNA or sEWVLNC. With actinomycin D treatment, the half-
life time of artificial sEWVLNC in Rh-28 cells was about 2.3 hours, similar to that of native
sEWVLNC in CHLA-9.
Figure 3.3 The stability of EW-vlncRNA and sEWVLNC. With actinomycin D treatment,
the half-life of sEWVLNC was 2.5 hours for CHLA-9 cells and 2.3 hours for Rh-28 cells.
In contrast, 50% of EW-vlncRNA was degraded in CHLA-9 cells in less than 1.5 hours.
3.3.3 sEWVLNC is localized in the nucleus
The subcellular locations of ncRNAs are another important mechanistic characteristic of
ncRNA functions. Cytoplasmic ncRNAs typically function by direct hybridization to
mRNAs, while nuclear ncRNAs typically regulate gene transcription, often in conjunction
46
with a transcription factor complex. Therefore, the locations of EW-vlncRNA and
sEWVLNC were examined in this study. Nuclear and cytoplasmic RNAs were separately
extracted and purified from CHLA-9 cells, and the quantity of RNA was measured using
qPCRs. Using pre-RNA of GAPDH as a positive control (most pre-RNAs are located in
the nucleus), the results when normalized to GAPDH demonstrated that about 90% of EW-
vlncRNA resides in nucleus, as does over 75% of sEWVLNC (Figure 3.4). These findings
implied that EW-vlncRNA or sEWVLNC, as nuclear ncRNAs, might perform functions in
the transcription process.
Figure 3.4 The locations of EW-vlncRNA and sEWVLNC. About 90% of EW-vlncRNA
and over 75% of sEWVLNC is located in the nucleus, indicating that both EW-vlncRNA
and sEWVLNC are primarily nuclear ncRNAs.
3.4 Discussion
Unlike most mRNAs or lncRNAs, vlncRNAs do not contain obvious RNA splice isoforms
in the RNA-Seq data (Mortazavi et al. 2008, Ozsolak et al. 2011). Analyzing RNA-Seq
47
read alignments from the ENCODE database, common exon-exon alignments were not
found in the EW-vlncRNA locus, and RNA sequencing data of CHLA-9 and CHLA-10
cells supported this conclusion. However, the classic splice junction sequence 5’- GU…AG
-3’ was found in exon-intron boundaries of EW-vlncRNA. Considering that many spliced
fragments exist in the long-range overlapping PCR products of EW-vlncRNA, it is likely
that the EW-vlncRNA genetic locus contains multiple irregularly spliced variants, of which
sEWVLNC is just the largest one with a poly-A tail, as supported by poly-A RNA
sequencing. This is similar to the ncRNAs Air, Kcnq1ot1, and VAD, but the mechanism to
generate these specific isoforms from such large precursor transcripts remains poorly
understood.
Although the half-lives of both EW-vlncRNA and sEWVLNC can be detected in CHLA-9
cells with actinomycin D treatment, additional assessment of the induced sEWVLNC in Rh-
28 cells was necessary to confirm the half-life of sEWVLNC. EW-vlncRNA consistently
underwent RNA splicing and generated sEWVLNC, which counter-balanced the
degradation of sEWVLNC and interfered with the RNA-stability test. Employing artificial
sEWVLNC in Rh-28 cells neutralized this interference. Notably, the stability of sEWVLNC
is much higher than that of EW-vlncRNA, and therefore sEWVLNC has more chance to
perform long-term functions in cell activities. Although previous studies of vlncRNAs
underlined the importance of the unspliced form, there is little or no direct evidence to
confirm that the spliced variants of vlncRNAs are non-functional. On the contrary, the
spliced variants of vlncRNAs might still have functions that differ from the unspliced
parental forms (Quinn et al. 2014).
48
More than one method can be applied to estimate the locations of ncRNAs, including RNA
FISH and Northern blotting. However, detecting the locations of RNAs using qPCR is a
feasible and accurate alternative (Wang et al 2006). Using qPCR, most EW-vlncRNA and
sEWVLNC were observed to be localized in the nucleus. As nuclear and cytoplasmic
ncRNAs employ different functional mechanisms (Holoch et al. 2015, Lee 2012),
identifying EW-vlncRNA and sEWVLNC as nuclear ncRNAs is important for predicting
their potential functional roles. Similar to other vlncRNAs and nuclear lncRNAs, it is likely
that EW-vlncRNA and sEWVLNC regulate the transcription process.
49
Chapter 4: Down-stream genes of sEWVLNC
4.1 Introduction
Statistics has been developed over several centuries, but the foundation of biostatistics was
not formally reported until the 20
th
century. Prior to the 21st century, biostatistics was
applied mostly in correlation analyses between genotypes and phenotypes. For gene co-
expression analyses, the classic method of the Pearson product-moment correlation
coefficient, which was first documented in the 1880s, has been used to estimate the
association between two paired candidate genes in recent years. Accompanying the
development of microarray and next generation sequencing technology, current biological
studies were faced with the need to analyze thousands of genes from hundreds of samples
simultaneously (Dudoit et al. 2002). Without biostatistics, especially Pearson correlation
analysis, it is impractical to find intrinsic associations among massive numbers of genes in
an efficient manner. Current research showed that genes and transcripts as well as ncRNAs
are identified as important factors of cell activities (Geisler et al. 2013). The existence of
so many factors renders over ten thousand variables to be analyzed per study. For this
reason, bioinformatics or the application of statistical methods to high-dimensional data
sets has become an indispensable method in modern biology.
However, simple Pearson correlation coefficient analysis alone might result in some
important biological information being overlooked. For example, current knowledge
shows that gene function is best assessed in pathway networks rather than in isolation, since
any one gene is likely to be regulated by more than one factor (Miller et al. 2011). Thus, a
50
simple correlation between two genes cannot fully describe the corresponding gene
network. To solve this problem, biologists have tried to supplement statistics with analyses
of gene networks and signaling pathways. With this approach, genes undergoing altered
expressions were clustered into signaling pathways; if most genes in one signaling pathway
were similarly regulated by the target gene (a large proportion of them down-regulated or
up-regulated by the target gene), the functions of this target gene would be related to this
signaling pathway (Kan et al. 2013). For instance, if increasing expression of gene A
reduced expression of most genes in signaling pathway B, pathway B is considered to be
negatively regulated by gene A. Signaling pathway analysis has been developed for many
years, and some efficient software programs have been developed, such as Ingenuity
Pathway Analysis (IPA) which mainly focuses on signaling pathway analysis based on
microarray or sequencing data (Jones et al. 2012). However, this method also has some
disadvantages: primarily, current studies only uncover parts of signaling pathways, while
many signaling pathways remain undetected, such that some genes cannot be assigned to
the correct pathway. Moreover, interactions among signaling pathways have been observed
for many years, but gene network analysis of signaling pathways cannot fully reveal
downstream effects. Further, signaling pathway analysis can be confounded by feedback
inhibition which is an important characteristic of signaling pathways. Therefore, signaling
pathway analysis cannot completely replace gene co-expression analysis, although it can
provide a crucial supplement to the latter.
Another deficiency of gene co-expression analysis via the Pearson correlation coefficient
is false positivity (Storey et al. 2002). Without signaling pathway analysis, gene candidates
can be correlated in pairs by system error or random error. Thus, a more comprehensive
51
analysis could dramatically enrich the accuracy of the down-stream gene prediction. With
this goal in mind, a method named Gene Set Enrichment Analysis (GSEA) was developed
over the past decade. GSEA determines whether if a predefined set of genes is significantly
changed in paired samples; therefore, this method is very effective at detecting the down-
stream gene network caused by altered expression of a gene of interest. However, this
method also increases the rate of false negatives, as predefined sets of genes cannot cover
all possible gene networks, and unintended omissions unfortunately exclude some
important gene candidates.
As an alternative, an efficient method termed Weighted Gene Co-Expression Network
Analysis (WGCNA) was introduced more recently to address this problem. WGCNA was
mainly developed by Horvath and Langfelder (Langfelder et al 2008, Langfelder et al
2011), and the theory underlying this method has been steadily elaborated for more than
ten years. Differing from signaling pathway analysis or a gene set enrichment analysis,
WGCNA is based solely on statistics. Gene co-expression correlations are primarily
calculated in the traditional correlation analyses, and then the corresponding genes are
divided into different modules based on connection strength, which is estimated in
topology by the gene co-expression correlations (Langfelder et al 2011). Subsequently,
associations between modules and biological states (the expression changes of genes or the
changes of phenotypes) are measured using Student's t–test or Pearson correlation
coefficient (Langfelder et al 2008). The results of gene co-expression correlations and
module associations indicate downstream target genes. Nowadays, the applications of
WGCNA are widespread in biology, especially in identifying complicated gene networks
such as the human brain transcriptome and genetic programs in embryonic development.
52
Relevant ongoing studies are introducing this method into cancer research, since tumor
cells also contain complicated and irregular gene networks.
It is commonly recognized that statistical predictions cannot replace molecular
experimentation. An increasing accumulation of factors have been incorporated in gene
network analysis, although the complexity of biological systems presumably continues to
be underestimated in most current studies. Unfortunately, incomplete data in biostatistics
can lead to serious mistakes in assessment (Storey et al. 2004). Adding to the problem,
knowledge of most cell activities, like signaling pathways, gene expression regulation, and
epigenomic modification, is far from complete. In other words, the algorithms in biology
do not yet accurately describe the mechanisms of cell activities. Thus, statistical studies
should be confirmed by molecular results as a mandatory complement. In this study,
several relevant molecular experiments were undertaken to validate gene network
statistical analyses.
4.2 Materials and Methods
Gene Co-Expression Correlation
Correlations of paired genes were based on log2 expression values of genes from exon
array data, and the quality of all data was assessed by Affymetrix Expression Console
Software (Affymetrix) following the manufacturer’s protocol. Among sample quality
metrics, assessment of the quality of HuEx data was mainly dependent on the following
three factors: the pos_vs_neg_auc (area under the curve of pos. / neg.), the all probe set
mean (the signal mean of all probe sets), and the all probe set rle mean (the relative-log-
53
expression mean of all probe sets). HuEx data from 178 validated specimens were used in
this study.
A histogram of sEWVLNC expression was assessed using the formula:
=
where N is the total number of observations, k is the number of categories, and mi is the
number of observations falling into different category.
Correlations between sEWVLNC and other genes were assessed using the Pearson product-
moment correlation coefficient, specifically the formula:
ρ =
(,)
cov (X, Y) = E[(X - μ
) (Y - μ
)]
where ρ is the Pearson correlation coefficient, cov is the covariance, σ is the standard
deviation, µ is the mean, E is the expected value of a random variable, and X and Y are
variables.
The threshold for the correlations was 0.5 and -0.5, all genes with absolute values equal to
or larger than 0.5 were listed in the down-stream gene network (R ≥ 0.5 or R ≤ -0.5).
Correlation analysis was implemented using the Pearson's product moment correlation
coefficient command by the package ‘stats’ for the R language and environment.
Signaling Pathway Analysis
Signaling pathway analyses were mainly based on HuEx data of the altered expressions of
sEWVLNC. Down-stream genes of sEWVLNC were identified by fold changes in gene
54
expression, and a threshold of 1.5 was used for the fold change. Downstream genes were
divided into different gene sets by signaling pathways using QIAGEN‘s Ingenuity
®
Pathway Analysis (IPA
®
, QIAGEN Redwood City, www.qiagen.com/ingenuity),
according to the manufacturer’s protocol. In IPA analysis, the Fisher's exact test was used
to determine the significance of signaling pathway candidates with the p-values below 0.05,
using the formula:
=
+ + +
where p is the probability; n is the sum of a, b, c, and d; and a, b, c, and d are the counts
for different types of observations.
Besides the significance, the up-regulation or down-regulation of signaling pathways was
estimated by the standard score (z-score), using the formula below:
=
− μ
where µ is the mean, and σ is the standard deviation.
Moreover, upstream factors and the regulators of the signaling pathway candidates were
also calculated using z-score. Signaling pathways with a z-score over 1 were assumed to
be up-regulated by sEWVLNC, while those with a z-score below -1 were considered down-
regulated by sEWVLNC. Finally, dozens of potential signaling pathways were displayed
by IPA, and the top ten candiates were chosen for further analysis.
55
Weighted Gene Co-Expression Network Analysis
All HuEx data of Ewing sarcomas were pre-processed to delete missing values and outliers
before Weighted Gene Co-Expression Network Analysis (WGCNA); outliers were
identified by clustering of samples depending on the set of dissimilarities for all objects.
Unlike the quality control of HuEx data which estimates the accuracy of the exon array,
outlier analysis is designed to minimize the side effects of abnormal values on total data.
The expression levels of all genes constituted the main factor for measuring the set of
dissimilarities. Hierarchical cluster analysis used Ward's minimum variance method to
develop the clusters and the updated Lance–Williams dissimilarity formula was used to
calculate the stage distances between clusters. To implement this step, the cluster command
in the package ‘stats’ and the height cut command of the package ‘WGCNA’ were used.
The formula used for Ward's Minimum Variance method was the following:
=
=
||
− ||
1
+
1
where DKL is the distance between cluster K and L, XK and XL are the observations in
cluster K and L, and NK and NL are the numbers of observations in cluster K and L.
The algorithm used for the updated Lance–Williams dissimilarity formula:
() =
+
+
+|
−
|
where dij, dik, and djk separately represent the pairwise distances among cluster i, j, and k;
and αi, αj, β, and γ are all parameters based on the numbers of observations in clusters.
After modifying the data by Lance–Williams dissimilarity formula, the soft thresholding
power was calculated to raise gene co-expression similarities. Soft thresholding power (β)
56
is based on scale-free topology (SFT), which measures the frequency distribution of the
special connectivity, also known as cluster adjacency. Generally, the results of SFT contain
the Exponentially Truncated SFT R
2
and Log Log SFT R
2
, both of which were over 0.85
in this study. In other words, the SFT maintained over 85% of gene connections in the gene
co-expression network analysis. Moreover, the SFT value is not invariant, it can be
adjusted according to gene connectivity in different studies.
The formula used for Exponentially Truanted SFT R
2
:
= +
log( ) +
The formula used for Log Log SFT R
2
:
= +
log( ) +
log (log ())
where k is the number of gene paired nodes, and C0 is intercept, and C1 and C2 are slopes.
Subsequently, an adjacency matrix was generated via soft thresholding power. An
adjacency matrix detects the connection strength between paired genes. Usually, gene
connectivity involves two types of networks: unweighted network and weighted network.
The former represents direct neighbors, while the latter represents all connection strengths
of paired gene nodes, which were measured using the adjacency matrix.
The formula used for the adjacency matrix:
=
where ki is the gene node i, and aij is connection strength between node i and j.
57
However, the adjacency matrix includes false positives and spurious associations, so the
gene adjacencies were transformed into a Topological Overlap Matrix (TOM) which is
used to estimate the similarity of interconnectedness of all paired gene nodes. In short,
TOM separates paired gene nodes into gene modules via co-regulation or
interconnectedness. The paired gene nodes that cannot be incorporated into gene modules,
were labeled as outliers, and were deleted in subsequent analyses. In addition, the sizes of
gene modules were limited in the study. As gene modules are mainly based on the
interconnectedness of paired gene nodes, the size can fluctuate dramatically. An extremely
large size reduces the accuracy of a gene module, while a small size increases the
complexity of gene co-expression analysis. The size of gene module was restricted between
25 to 200 (protein-coding genes) in the present study.
The formula used for general TOM:
(, ) =
|
( )
⋂ ( )| +
min(| () |, | () |) +1−
)
where N1(i) represents the set of neighbors of node i, and aij is the adjacency between node
i and j.
The formula for TOM in gene co-expression network analysis was:
=
∑
+
min ,
+1−
When all gene modules were clustered by the corresponding dissimilarity of TOM
(disTOM = 1 – TOM value), the correlations between gene modules and sEWVLNC
58
expression were calculated to detect the diverse effects of sEWVLNC on different gene
modules (R over 0.5 is considered as a significantly positive correlation. Conversely, R
below -0.5 is a significantly negative correlation). The WGCNA was implemented using
the package ‘WGCNA’ for the R language and environment. The relevant gene network of
each module was visualized using VisANT 5.0 software which determines the connection
strength between each paired gene and the whole topological overlap plot of each module
(Hu et al. 2013).
Statistical Analysis
All data were described as means ± standard deviation (SD). p-values less than 0.05 were
considered as significant in the Student's t-test and the Wilcoxon signed-rank test. All plots
were implemented using the packages ‘RcolorBrewer’, ‘pheatmap’, ‘gplots’, and ‘plotrix’
for the R language and environment (Kolde 2015, Lemon 2006, Neuwirth 2014, R
Development Core Team 2008, Warnes et al. 2015).
4.3 Results
4.3.1 Altered expression of sEWVLNC affects many down-stream genes
Based on HuEx data from EW-vlncRNA knockdown in CHLA-9 cells, approximately
5,857 protein-coding genes were significantly affected by EW-vlncRNA. Among them,
3,132 genes were down-regulated, and 2,725 genes were up-regulated. As sEWVLNC is
spliced from EW-vlncRNA, the down-stream genes of EW-vlncRNA knockdown should
include all candidate genes that are regulated by sEWVLNC. Similarly, HuEx data from
sEWVLNC overexpression in CHLA-10 cells indicated that 2,725 protein-coding genes
59
were in the down-stream gene network of sEWVLNC, in which the expression of 1,530
genes was decreased by sEWVLNC, while the expression of 1,195 genes was increased by
sEWVLNC. Comprehensive combined analysis between EW-vlncRNA knockdown and
sEWVLNC overexpression indicated that 362 genes had a positive correlation with
sEWVLNC, and 341 genes a negative correlation (Figure 4.1A, 4.1B, 4.1C). To confirm
some important proto-oncogenes and oncogenes of those 703 gene candidates, qPCR was
applied in this study (data not shown); however, the results not surprisingly showed some
genes to be false positives. Further investigation displayed that there were false negatives
also. Given that there is a small but not negligible degree of random effects of ASO
knockdown and degradation of EW-vlncRNA over time, it should be kept in mind that
results which are dependent solely on altered expression of sEWVLNC cannot be expected
to fully characterize down-steam gene networks. Nonetheless, these findings suggest that
sEWVLNC affects the expression of several hundred downstream genes.
60
Figure 4.1 EW-vlncRNA knockdown and sEWVLNC overexpression. (A) The
corresponding expression values of probes in HuEx data showed that EW-vlncRNA was
effectively knocked down by ASOs in CHLA-9 cells (NC, negative control; #1 and #2
represent ASO_Exon 1 and ASO_Exon 2, respectively), and that sEWVLNC artificially
over-expressed in CHLA-10 cells (#1 and #2 are repeats of sEWVLNC expression vectors).
(B) 5,857 protein-coding genes were affected by EW-vlncRNA knockdown, while 2,725
genes were affected in sEWVLNC down-stream gene network. (C) among 703 gene
candidates, 362 genes were up-regulated by sEWVLNC, while 341 genes were down-
regulated.
61
4.3.2 Down-stream signaling pathway analysis of sEWVLNC
Although comprehensive analysis between EW-vlncRNA knockdown and sEWVLNC
overexpression cannot necessarily accurately characterize the down-stream gene network
of sEWVLNC, the results implicated down-stream signaling pathway candidates which
were regulated by sEWVLNC. Analyzing the corresponding down-stream genes using IPA,
the results showed that sEWVLNC was involved in dozens of signaling pathways. In this
study, the first six signaling pathways were chosen for further characterization. Based on
IPA analyses, the results indicated that one out of six signaling pathways was up-regulated
by sEWVLNC, one was down-regulated, while the other four were only associated with
sEWVLNC (Figure 4.2A). However, the analysis suggested that, although it is extremely
unlikely that false positives from the EW-vlncRNA knockdown would incorrectly lead to
identification of false signaling pathway candidates (there is an extremely low probability
that all false positive genes would belong to a small number of signaling pathways), the
false positives could affect the order of the downstream signaling pathways by weakening
the scores of the candidate pathways in IPA analysis. Therefore, comprehensive analysis
using IPA cannot precisely estimate the importance of these signaling pathways. To
address this shortcoming, the candidate signaling pathways were ranked according to
stand-alone HuEx data from sEWVLNC overexpression. The sEWVLNC overexpression
vector can generate stable cell lines, such that the rank of these signaling pathways can
demonstrate the effects of sEWVLNC on tumor activities. The results showed that, among
the first six signaling pathway candidates, two were down-regulated, while the rest were
only associated (Figure 4.2B). The HGF/MET signaling pathway resided in the fifth
position of the comprehensive analysis, and also ranked first in the sEWVLNC
62
overexpression data. Thus, the HGF/MET signaling pathway was identified as the most
important signaling pathway candidate of sEWVLNC.
Figure 4.2 Signaling pathway analysis of genes regulated downstream of sEWVLNC.
(A) A comprehensive analysis: the first six signaling pathways were selected by a
comprehensive analysis between sEWVLNC knockdown and overexpression. Among them,
one was up-regulated by sEWVLNC (purple), one was down-regulated (blue), and the other
four were only associated with sEWVLNC (black). (B) Based on stand-alone HuEx data
from sEWVLNC overexpression, the first six signaling pathways were identified: two of
them were down-regulated by sEWVLNC (dark blue), while the rest were only associated
(black).
63
4.3.3 Correlation analysis of the gene network regulated by sEWVLNC
Besides traditional analyses, correlative statistical methods were also applied. Based on
HuEx data from 178 Ewing sarcoma samples, 456 protein-coding genes were significantly
associated with sEWVLNC. Among them, 279 genes were identified to be positively
correlated, and 177 genes were negatively correlated (Figure 4.3A). Most of them were
regularly expressed in Ewing sarcoma, while very few showed low expression (Figure
4.3B). This observation lent further support to the accuracy of the correlation analysis.
64
Figure 4.3 Genes correlated with sEWVLNC. (A) 456 protein-coding genes were
significantly associated with sEWVLNC: 279 genes were identified to be positively
correlated, and 177 genes were negatively correlated. (B) The expression levels of 456
correlative genes were analyzed with HuEx.
65
4.3.4 WGCNA analysis of sEWVLNC down-stream genes
Although co-expression gene correlation analyses demonstrated that sEWVLNC was
significantly associated with 456 genes, this result also contained false positive candidates,
as is typical of complex data sets like this. To overcome this flaw, sEWVLNC downstream
genes were analyzed in gene clusters. However, the previous data from signaling pathway
analyses only assessed the sEWVLNC down-stream gene network through estimation of
biological signal communications. There was no data to address the relationship between
sEWVLNC and gene clusters that was completely dependent on statistical analyses. Unlike
signaling pathways analysis which is mainly based on well-known biological signals,
statistical calculation should display the potential connections between sEWVLNC and the
corresponding gene clusters, and might cover unknown gene networks downstream of
sEWVLNC. In the study, gene clustering analysis showed that 449 out of 456 genes (98.5%)
can be incorporated into six groups by connection strength between paired genes, with the
connection strength completely based on adjacent matrix values which assemble genes by
correlations between each other (Figure 4.4A). The remaining 7 genes have a high
possibility of being false positive, since they do not form any topological connection with
other genes. Analyzing the six gene clusters in Figure 4.4B, two of them showed a negative
correlation with sEWVLNC (R < -0.5), while the other four were positively correlative (R >
0.5). Of the two negatively correlated groups, one contained 30 genes while the other
contained 60 genes. The four positively correlated groups separately contained 185 genes,
25 genes, 106 genes, and 43 genes. Unlike signaling pathway analysis that depends on
biological communications, WGCNA analysis is based on a statistic adjacency matrix
estimation which assesses the connection strength between paired genes and is calculated
66
by co-expression gene correlations. Therefore, it is critical to estimate whether the
correlation between one model and sEWVLNC coincides with the correlations between the
genes in the same model and sEWVLNC. For instance, if model X is positively correlated
to sEWVLNC, and gene Y, which is clustered in model X, also displays positive correlation
to sEWVLNC; thus, gene Y is more likely to be a reliable downstream target of sEWVLNC,
based on WGCNA analysis.
Besides module analysis, a visual topological matrix was implemented in this study to
examine the connection strengths among every gene cluster. The results showed the major
genes and gene connections in gene clusters. Among them, several important genes were
found, such as MET and FOXD1, and this statistical model also supported the hypothesis
that sEWVLNC might be involved in many biologically important, functional roles.
67
Figure 4.4 WGCNA analysis of the sEWVLNC transcript. (A) sEWVLNC co-expression
gene dendrogram: 449 out of 456 genes (98.5%) were incorporated into six modules
(separated by colors) according to connection strength between paired genes, while the
remaining 7 genes (grey) might be false positives. (B) among six modules, two of them
were negatively correlated with sEWVLNC (R ≤ -0.5), while the other four showed positive
correlations (R ≥ 0.5). In the two negatively correlative groups, module A contains 30
genes, the other module B contains 60 genes. The four positively correlative groups
separately contain 185 (module C), 25 (module D), 106 (module E), and 43 genes (module
F).
68
4.3.5 sEWVLNC down-regulates the HGF/MET signaling pathway by
suppressing MET expression
Although dozens of signaling pathway or gene cluster candidates were selected by IPA
analysis and WGCNA analysis, the main functional roles of sEWVLNC were not
determined by these approaches. In signaling pathway analysis, the HGF/MET signaling
pathway was ranked fifth in the comprehensive analysis between EW-vlncRNA
knockdown and sEWVLNC overexpression, and this signaling pathway was also first in the
stand-alone sEWVLNC overexpression analysis. Therefore, the HGF/MET signaling
pathway is more likely to be one of the major down-stream pathways of sEWVLNC, which
altogether contains about 30 down-stream gene candidates by statistical analysis. To
validate this assumption, the correlation between MET and sEWVLNC was calculated, and
the result showed that MET expression was negatively associated with sEWVLNC
expression (R = -0.575) in the group of 178 EWS cases (Figure 4.5A). These findings,
specifically the negative correlation between MET and sEWVLNC, further supported the
hypothesis that sEWVLNC might play an important functional role in repressing the
HGF/MET signaling pathway. Interestingly, when examining the genes in the modules of
the WGCNA result (Figure 4.5B), MET was found in module A which was negatively
correlated with sEWVLNC (R = -0.717). Although the other 29 genes of this module do not
belong to the HGF/MET signaling pathway, this topological result, measured by adjacency
matrix estimation, independently highlighted the important, negative correlation between
sEWVLNC and the MET gene.
69
Figure 4.5 MET gene regulation by sEWVLNC. (A) A negative association between the
MET gene and sEWVLNC (R = -0.575) in 178 EWS cases by Pearson correlation analysis.
(B) WGCNA analysis. The MET gene was found in module A which is negatively
correlated to sEWVLNC (R = -0.717). The other 29 genes of module A do not belong to
the HGF/MET signaling pathway, but they display strong connection strength with the
MET gene through statistical analysis using adjacency matrix estimation. Therefore, these
30 genes, including MET, were incorporated into one module.
70
However, statistical predictions alone cannot guarantee experimental results, so the
suppression of MET expression by sEWVLNC was evaluated by qPCR. In CHLA-10 cells,
expression of MET was dramatically reduced by sEWVLNC overexpression. This result
confirmed the previous hypothesis that sEWVLNC represses MET expression in Ewing
sarcoma (Figure 4.6). Furthermore, since HGF is an upstream initiator of the HGF/MET
signaling pathway, it is possible that sEWVLNC suppresses the HGF/MET signaling
pathway by inducing a decrease not in MET expression but in HGF expression. However,
based on the correlation analysis and the WGCNA analysis, only the MET gene in the
HGF/MET signaling pathway was found to correlate with sEWVLNC. There is no evidence
that HGF is also repressed by sEWVLNC. To clarify this question, qPCRs were also
employed to analyze the change in expression of HGF in Ewing tumors upon sEWVLNC
overexpression. The results showed that sEWVLNC does not affect HGF expression (data
not shown). In summary, sEWVLNC repressed the HGF/MET signaling pathway by
interfering with MET expression. This conclusion is not only supported by the mathematic
model of 178 EWS cases, but also confirmed by biologic experiments.
71
Figure 4.6 Regulation of the MET gene by sEWVLNC. In CHLA-10 and A4573 cells,
expression of the MET gene was dramatically reduced by sEWVLNC overexpression, as
measured by qPCR. However, MET expression was not affected in Rh-28 cells by
sEWVLNC, most likely due to the genetic background difference between Ewing sarcoma
and Rhabdomyosarcoma.
4.4 Discussion
Statistical methods have been used to study genetic functions and genetic connections for
decades, and improved mathematical models have developed in the past few decades. In
the past, they were used to estimate simple genotype-phenotype relationships, whereas
current state-of-the-art methods effectively detect complicated, multi-factorial down-
stream gene networks, as illustrated here. However, false positives and false negatives
continue to be a major disadvantage of statistical prediction, as is undoubtedly the case
herein (Stoery et al. 2002, Stoery et al. 2004).
In this study, signaling pathway analysis using IPA software and WGCNA analysis were
two main methods to minimize false positives, since there is an extremely low probability
72
that false signaling pathway candidates will be generated by random errors or system errors
of the statistical analyses. On the other hand, signaling pathway analysis focuses on the
biological signal communications, while WGCNA analysis underlines the topological
connection strengths between paired genes based solely on statistical calculations. As both
methods supported a correlation between MET and sEWVLNC, it is extremely unlikely that
MET gene regulation is a false positive. In addition, signaling pathway analysis ranked the
importance of the MET gene in first place in sEWVLNC functions. Furthermore, of the 191
Ewing sarcoma samples examined in this study, although some were deleted as the outliers,
178 samples were judged to be valid for gene co-expression analysis. As the sample size
was sufficiently large, the accuracy of the mathematical analysis and statistical model is
likely reliable.
False negatives remain a substantial challenge in gene co-expression analysis, and even
though several statistical methods were used in the study, none of them could fully avoid
false negatives. Furthermore, corresponding methods demonstrated that not all genes were
activated in normal micro-environmental conditions. In other words, some genes were only
expressed in specific progressive stages of Ewing sarcoma. In these cases, the ineffective
correlations between sEWVLNC and the corresponding genes are inaccurate in certain cell
circumstances. Therefore, statistical assumptions need to be supplemented by confirmatory
experimental methods such as qPCR, western blot, and/or immunoprecipitation.
The false positive down-stream genes derived from the random effects of ASO knockdown
represent another biologic complication in this study. Unlike false positives caused by
random errors or system errors of statistical calculations, the non-specific knockdown of
73
ASOs disturbed the down-stream gene network analysis of sEWVLNC by confusing altered
expressions of the corresponding down-stream genes. Thus, downstream gene analysis
based exclusively on ASO knockdown is not reliable. In this case, the qPCR results, which
showed sEWVLNC down-regulated MET expression in Ewing sarcoma, are very important
to support the negative correlation of sEWVLNC with MET expression based on the
statistical anslyses.
74
Chapter 5: sEWVLNC delays Ewing sarcoma progression
5.1 Introduction
Non-coding RNAs play multiple functional roles in cell development and in specific tissues.
They are also highly expressed and presumably mediate similar functions in cancer,
paralleling the well-known oncofetal relationship recognized between cancer and normal
development. In this study, statistical modeling and signaling pathway analysis indicated
that sEWVLNC performs several functions in Ewing sarcoma cell as well. Because
sEWVLNC is specifically expressed in Ewing sarcoma, the aforementioned findings
underline the importance of sEWVLNC in this type of childhood cancer. Mathematical
analysis showed that sEWVLNC suppresses the HGF/MET signaling pathway by
interfering with the expression of the MET gene which has been recognized as a proto-
oncogene highly related to tumor cell motility, tumor invasion, and metastasis for many
years. Therefore, given the well documented role of MET gene in tumor metastasis, this
study has mainly focused on the functional role of sEWVLNC in suppressing tumor
progression by down-regulation of MET expression.
The receptor tyrosine kinase c-Met, encoded by the MET gene, is localized to the 7q31
genetic locus; c-Met is a cell membrane receptor for hepatocyte growth factor (HGF) ligand.
It was originally identified in chemical carcinogen induced osteosarcoma, the other major
bone tumor besides Ewing sarcoma (Cooper et al, 1984). The c-Met is initially expressed
in paraxial differentiating mesenchymal cells but is mainly found in epithelial cells after
organismal development. In contrast, HGF is specifically expressed in mesenchymal cells.
75
The c-Met receptor consists of α and β heterodimers linked by a disulfide bond, and both
α and β heterodimers are truncated from a common precursor by proteolytic processing.
The c-Met protein contains five domains: the Sema domain, PSI domain, IPT domain,
Kinase domain, and a multifunctional docking site; the first three domains are extracellular,
and the other two are intracellular (Organ et al. 2011). Similarly, HGF, the ligand for c-
Met, also consists of α and β chains, which are also bound through a disulfide bond. The α
chain contains a N-terminal hairpin loop and four Kringle domains, while the β chain is
homologous to blood-clotting serine proteases but without proteolytic active sites (Raghav
et al. 2012). c-Met is involved in several cellular activities by activating a wide range of
related signaling pathways: in development, c-Met is well known for enhancing cell
mitogenesis and morphogenesis, and for supporting the survival and proliferation of both
hepatocytes and placental trophoblast cells. In embryogenesis, c-Met is very important for
the migration of skeletal muscle progenitor cells. In fact, c-Met is responsible for the cell-
scattering phenotype (Zhu et al. 1994), a close normal analogue of tumor cell invasion and
metastasis. Not surprisingly, in cancer, c-Met mainly functions in tumor proliferation,
survival, invasion, and angiogenesis (Appleman et al. 2011). Moreover, c-Met protein
expression is widespread in human organs, including kidney, liver, muscle, bone marrow,
prostate, and pancreas; thus, dysregulated MET expression in cancer is an important
determinant of tumor cell behavior across many human malignancies, paralleling the wide
range of normal functions in non-malignant tissues (Organ et al. 2011).
Numerous studies undertaken over the past three decades have shown that c-Met protein is
involved in a large number of biological and molecular events through HGF/MET signaling
pathways. Its ligand HGF binds to the α-β chains of c-Met to induce homodimerization
76
which then causes phosphorylation of the two tyrosine residues (Y1234 and Y1235) in the
kinase domain of c-Met. Subsequently, the phosphorylated kinase domain induces the
phosphorylation of tyrosine residues (Y1349 and Y1356) in the multifunctional activation
sites of c-Met (Organ et al. 2011). These two tyrosines, Y1349 and Y1356, form a tandem
motif, which can be recognized by an SH2 domain. After those two tyrosine residuals are
phosphorylated, the multifunctional activation sites of c-Met recruit down-stream signaling
effectors, such as phospholipase Cγ (PLCγ), Src homology domain-containing 5’ inositol
phosphatase (SHIP-2), proto-oncogene tyrosine-protein kinase Src (c-SRC), growth factor
receptor-bound protein 2 (GRB2), phosphoinositide 3-kinase (PI3K), signal transducer and
activator of transcription 3 (STAT3), Src homology-2-containing (SHC), and GRB2-
associated-binding protein 1 (GAB1). Among those effectors, the adaptor proteins GRB2
and GAB1 cooperatively interact with c-Met to activate a wide range of down-stream
responses, as GRB2 and GAB1 can provide binding sites for additional adaptor proteins
that are involved in the RAS pathway, the PI3K/AKT pathway, the JAK/STAT pathway,
the β-Catenin pathway, and the Notch pathway. Moreover, other important proteins are
found in the down-stream network of c-Met, such as mitogen activated protein kinase
(MAPK), which is activated by the rat sarcoma viral oncogene homolog (RAS) which is
itself regulated by c-Met through the adaptor protein GRB2. Active MAPK can translocate
to the nucleus and cooperate with corresponding transcription factors to activate a large
number of down-stream genes. Another example is PI3K which is activated by c-Met
through binding to GAB1; PI3K can phosphorylate AKT to regulate cellular apoptosis. On
the other hand, the activation of STAT3 is also a down-stream response to c-Met, and based
on recent studies, activating STAT3 dramatically reduces apoptosis (Organ et al. 2011).
77
As c-Met is associated with a large number of signaling pathways, it unsurprisingly plays
critical roles in tumor growth and cancer invasion. Several groups of investigators have
documented that c-Met is an important proto-oncogene in tumor malignant transformation
(Raghav et al. 2012, Appleman et al. 2011). c-Met was firstly characterized as a proto-
oncogene in the 1980s, when the tyrosine kinase domain of c-Met was found to fuse to an
upstream translocating promoter domain (TPR) in an osteosarcoma cell line (Soman et al.
1990). This chromosomal rearrangement was also found in gastric carcinoma and in nearby
normal mucosa, and corresponding studies showed that the TPR-MET fusion protein
induces the development of epithelial-derived tumors in animal models. Besides the TRP-
MET fusion in gastric carcinoma, many corresponding studies suggested that the MET gene
is constantly expressed and was activated in a wide range of tumors. These findings
indicated that amplification of the MET gene might occur during tumor progression.
Moreover, non-amplification-based expression of the MET gene can also be enhanced in
non-small cell lung carcinomas by EGFR tyrosine kinase inhibitors (Organ et al. 2011). As
discussed above, c-Met protein affects many signaling pathways and is critical for cell
proliferation, invasion, survival, and angiogenesis. Irregular over-expression of the MET
gene can potentiate tumor development and progression, and therefore suppression of the
MET gene in cancer is a rational target for tumor treatment (Appleman et al. 2011). Recent
advances have shown that ATP-competitive or allosteric inhibitors of c-Met can
significantly repress tumor growth and invasion. However, some reports also documented
that the response to MET suppression is not permanent, and tumors could acquire resistance
to c-Met inhibitors quickly. Although c-Met protein regulates a large number of signaling
pathways, tumors can bypass c-Met by directly activating downstream effectors such as
78
PI3K and those in the MAPK cascade. Therefore, multiple steps blocking several parallel
signaling pathways of c-Met may be necessary to overcome tumor resistance to c-Met
inhibitors (Appleman et al. 2011).
In this study, most Ewing sarcomas displayed high expression level of sEWVLNC with
inversely correlated suppressed MET expression. In vitro, this inhibition reduced Ewing
sarcoma invasion; similarly, sEWVLNC overexpression significantly delayed tumor
growth and metastasis in vivo. Impressively, survival analysis of Ewing sarcoma patients
further indicated that the inhibitory function of sEWVLNC on Ewing sarcoma progression
is most likely linked to suppression of the MET gene. To elucidate a potential mechanism
for the observed suppression of MET expression by sEWVLNC, RNA FISH for the
sEWVLNC was employed in parallel with IHC (immunohistochemistry) for the c-Jun
protein. The results showed that sEWVLNC co-localized with c-Jun, which binds with c-
Fos to form the heterodimer AP-1 complex (a major transcriptional activator of the MET
gene) (Seol et al. 2000), and this interaction might result in repression of MET expression.
5.2 Materials and methods
Cell Invasion and Visual Image
The Ewing sarcoma cell lines CHLA-10 and A4573 as well as the RMS cell line Rh-28
were passaged twice and brought to approximately 80% confluence before assessing cell
invasion using the fluorimetric QCM 24-well / 96-well ECMatrix Cell Invasion Assay (Cat.
# ECM 554, Cat. # ECM 555, EMD Millipore). Cells were cultured in FBS-free DMEM
medium for 24 hours. Cells were detached using 0.5% trypsin, and 85% cell viability was
a pre-condition for commencing the cell invasion assay. For the 24-well assay, 0.5 X 10
6
79
cells were plated into the interior chamber of transwell plates in chemoattractant-free
medium, while for the 96-well cell invasion assay 0.5 X 10
5
cells were used. DMEM
medium with 10% FBS (as a chemoattractant) was added into the exterior chamber of the
plate. Therefore, there is a concentration difference in the chemoattractant between the
interior chamber and the exterior chamber of the plate; cells can only invade from the
interior to the exterior through the ECM membrane.
Cells were incubated for 48 hours at 37ºC with 5% CO2 in a humidified incubator. Next,
the chemoattractant-free medium in the interior chamber was removed, and the interior
chamber was incubated in pre-warmed cell detachment solution for 30 minutes. During the
detachment, the chamber was tilted back and forth several times to assist in completely
dislodging cells from the ECM membrane. Lysis Buffer / Dye Solution was prepared
according to the manufacturer’s protocol. The ratio of CyQuant GR Dye to 4X Lysis Buffer
was 1:75. Finally, 75 ul of this Lysis Buffer / Dye Solution was added into 225 ul cell
detachment solution with cells (the ratio of Lysis Buffer / Dye Solution to detachment
solution was 1:3), and the mixture was incubated for 15 minutes at room temperature.
Fluorescent signals were detected by microplate reader using 480 nm excitation and 520
nm emission, according to the manufacturer’s protocol. In contrast, the interior chamber
containing cells for visual imaging was washed with DPBS instead of being incubated in
cell detachment solution. The cells were then stained with 0.1% Crystal violet solution. In
this cell invasion test, two negative controls were included: first, the sample without cells;
second, cells were incubated in the chamber without chemoattractant.
80
Animal Work and Bioluminescence Imaging
Fluorescent signals were generated in bioluminescence assays by activation of D-luciferin
upon addition of luciferase in tumor bearing mice, where the tumor cells have been
transfected with luciferase expression vectors. By measuring fluorescent signal strength in
an intact animal using the Xenogen imaging device, tumor growth and metastasis can be
quantitated in mice. CHLA-10 cell line with the pCCL-MNDU3-LUC lentivirus luciferase
expression vector were a gift from Dr. Kang Hyung-Gyoo. The sEWVLNC overexpression
vector was generated from the pEF6/V5-His-TOPO® expression vector using the
pEF6/V5-His TOPO® TA Expression Kit (Invitrogen). Derived from the same colony of
CHLA-10 cells with luciferase, three constructs of CHLA-10 cells were designed: CHLA-
10 cells with luciferase plasmid plus empty pEF6/V5-His-TOPO vector (CHLA-10
luci_control), CHLA-10 cells with luciferase plasmid plus sEWVLNC pEF6/V5-His-
TOPO vector copy 1 (CHLA-10 luci_sEWVLNC #1), and CHLA-10 cells with luciferase
plasmid plus sEWVLNC pEF6/V5-His-TOPO vector copy 2 (CHLA-10 luci_sEWVLNC
#2). The sEWVLNC pEF6/V5-His-TOPO vector copy 1 and copy 2 are the same
sEWVLNC expression vectors from different plasmid colonies. Before cell injection, the
expression of luciferase in the three CHLA-10 models was assessed and calibrated using a
Xenogen imaging system by the animal imaging core at Children's Hospital Los Angeles.
CHLA-10 cells (5 x 10
5
cells per mouse) of the three constructs were injected into 8-week-
old female NOD SCID mice (The Jackson Laboratory) bred in the animal imaging core at
Children's Hospital Los Angeles. Thirty mice (10 mice per construct) were used to study
the functions of sEWVLNC in Ewing sarcoma experimental metastasis and growth
following tail-vein injection, while another 30 mice were used to assess the functions of
81
sEWVLNC in Ewing sarcoma spontaneous metastasis and growth following subcutaneous
injection. The tail-vein-injection model and the subcutaneous-injection model were
repeated twice, and each time 15 mice (5 per construct) were used. A total 60 were used in
this study (2 groups x 2 time points x 15 mice in each group per time).
Bioluminescence imaging was performed at weekly or more frequent time points in order
to measure tumor growth and progression for both the tail-vein-injection and subcutaneous-
injection experiments. For the tail-vein-injection model, Ewing sarcoma proliferation and
invasion were measured at day 0, day 7, day 10, and day 14; for the subcutaneous-injection
model, Ewing sarcoma proliferation and invasion were measured at day 0, day 4, day 7,
day 14, and day 21. Luminescent photon emission by viable tumor cells in each animal
was quantitated using Living Image 4.4 software (Caliper Life Sciences), according to the
manufacturer’s protocol. sEWVLNC targets dozens if not hundreds of genes directly or
indirectly as demonstrated by the statistical analysis of Ewing sarcomas and cell lines
reported above. Consequently, overexpression of sEWVLNC resulted in enhanced
expression of the luciferase plasmid itself, necessitating a background correction for signal
derived from the plasmid alone. However, CHLA-10 cells with both luciferase plasmid
and sEWVLNC expression vector did display overall enhanced fluorescent emission
compared to cells with luciferase plasmid alone. Data analyses at each time point was
normalized accordingly.
RNA FISH and IHC (Immunohistochemistry)
CHLA-9 cells were cultured in DMEM medium with 20% FBS overnight and were at least
85% cell viable when plated on a glass slide. CHLA-9 cells were fixed with 3.7%
82
formaldehyde buffer for 10 minutes, followed by three washes with cold DPBS. Next,
cells were permeablized with 0.1% Triton X-100 at room temperature for 5 minutes.
Subsequently, cells were washed again with wash buffer A solution (2 ml Stellaris RNA
FISH 5X Wash Buffer A (Cat. # SMF-WA1-60, Biosearch Technologies, lnc, Petaluma,
CA), 1 ml deionized formamide, 7ml nuclease-free water). After three washes, cells were
incubated in Hybridization Buffer (Cat. # SMF-HB1-10, Biosearch Technologies, lnc,
Petaluma, CA), 100 ul deionized formamide) with sEWVLNC specific RNA probes and
c-Jun specific primary antibody at 37°C overnight (at least 6 hours).
After hybridization, 50 ul of wash buffer A solution was added to cells with a Goat anti-
Rabbit IgG (H+L) secondary antibody Alexa Fluor® 647 conjugate (Cat. #A-21245, Life
Tech.) at room temperature for one hour. After another three washes with wash buffer A
solution, cells were incubated in FluoroPure
TM
grade DAPI (Cat. #D21490, Life Tech.) at
room temperature for 10 minutes. Finally, cells were washed with wash buffer B solution
(cat. # SMF-WB1-20, Biosearch Technologies, lnc, Petaluma, CA) with 88 ml of Nuclease-
free water) three times, and ProLong®Gold Antifade Mountant (Cat. # P36930, Life Tech.)
was added on the microscope slide to fix, stabilize, and protect cells from fluorescence
fading. For combined RNA FISH and immunohistochemistry, the primary antibody and
FISH probe were as follows: a primary rabbit polyclonal IgG antibody for c-Jun (H-79,
Cat. # sc-1694, Santa Cruz, CA), and Custom Stellaris®FISH Probes with CAL Fluor®
Red 610 Dye for EW-vlncRNA / sEWVLNC utilizing Stellaris®RNA FISH Probe Designer
4.1 (Biosearch Technologies, lnc, Petaluma, CA).
FISH Probe Designer: www.biosearchtech.com/stellarisdesigner/.
83
Survival analysis
Ewing sarcoma HuEx CEL files and survival data from previously published clinical trials
INT-0154 and AEWS0031 were publicly available in Gene Expression omnibus
(GSE63127). 33 patients were male, and 19 patients were female. In this cohort, 13 deaths
and 17 cancer relapse events occurred. For survival analysis, 52 patients were divided into
two groups according to expression level of sEWVLNC calculated by the mean value of
sEWVLNC expression in 178 EWS cases. Twenty-seven out of 52 patients expressed
sEWVLNC above the threshold and 25 patients below the threshold. Based on the mean
value of MET expression observed in the 178 EWS cases, the 52 patients were divided into
two groups: 39 patients expressed the MET gene above the threshold and 13 patients below.
In addition, based on the median value of MET expression in the 178 EWS cases, the 52
patients were divided into two groups: 27 of them expressed the MET gene above the
threshold and 25 patients below. Based on patient outcome, overall survival was calculated;
similarly, based on cancer relapse, event-free survival was also determined (Therneau et al.
2000). Survival analyses was performed using the package ‘survival’ for the R language
and environment; a cut-off p-value of 0.05 was used.
GEO GSE63157: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63157
5.3 Results
5.3.1 Ewing sarcoma invasion in vitro delayed by sEWVLNC
Even though the statistical model detailed above suggested negative regulatory effects of
sEWVLNC on the MET gene and the HGF/MET signaling pathway, there was no direct
biologic evidence to support this role of sEWVLNC in Ewing sarcoma. To evaluate the
84
association between sEWVLNC and tumor metastasis, a tumor cell migration assay in vitro
was employed using an extracellular matrix cell invasion assay (Egeblad et al. 2002).
Before the cell invasion assay, the inhibitory effect of sEWVLNC on MET gene had been
confirmed by qPCRs; the results had indicated that sEWVLNC repressed MET expression
by approximately 60% in CHLA-10 cells and around 30% in A4573 Ewing sarcoma cells
(Figure 4.6). However, the expression of the MET gene was not significantly changed in
Rh-28 cells by sEWVLNC. In the cell invasion assay, all CHLA-10, A4573, and Rh-28
cells were starved for 24 hours to induce tumor cells to migrate across the extracellular
matrix (ECM) membrane, from chemoattractant-low medium to chemoattractant-high
medium (Figrue 5.1). The results of the cell invasion assay demonstrated that over-
expression of sEWVLNC inhibited Ewing sarcoma cells migration (Figure 5.2A); tumor
cell invasion was reduced about 50% for CHLA-10 cells (p-value < 0.005) and around 25%
for A4573 (p-value < 0.05). In Rh-28 cells, sEWVLNC did not affect cell migration,
probably due to the significant difference in genomic background between Ewing sarcoma
and RMSs. To visualize cell migration, CHLA-10 and A4573 cells were stained with
crystal violet. Cells expressing sEWVLNC were less numerous than controls based on
overall staining intensity (Figure 5.2B).
Figure 5.1 Depiction of the cell invasion assay. Before the cell invasion assay, tumor
cells were starved for 24 hours to induce tumor cells to migrate across extracellular matrix
membrane (ECM) from chemoattractant-low medium to chemoattractant-high medium.
85
Figure 5.2 Tumor cell motility in vitro reduced by sEWVLNC. (A) sEWVLNC
overexpression resulted in reduction of tumor cell invasion by approximately 50% in
CHLA-10 cells (p-value < 0.005), and around 25% in A4573 cells (p-value < 0.05). (B) In
the upper photographs, tumors that had crossed the ECM membrane (into the lower
chamber) were stained with crystal violet. sEWVLNC over-expression inhibited the
progression of both Ewing sarcoma cell lines, as the staining intensity of tumor cells with
sEWVLNC was distinctly less than control (cells were in the center of wells and stained
with crystal violet). Similarly, the lower photographs display higher magnification
microscopic images of tumor cells seen in the corresponding wells in the upper
photographs.
86
Although the results of the cell invasion assay suggested that sEWVLNC suppressed Ewing
tumor cell migration in vitro, there was no mechanistic evidence to support that this
suppression was based on the inhibitory effect of sEWVLNC on MET expression. To
examine that possibility, c-Met was reduced by DsiRNA or 0.5 uM Tivantinib (a small
molecule inhibitor of c-Met). The results indicated that reducing c-Met by over 75%
resulted in approximately a 90% reduction in Ewing sarcoma cell invasion (Figure 5.3).
Together, these findings implied that sEWVLNC can potently inhibit Ewing sarcoma cell
growth and migration by down-regulating MET expression.
Figure 5.3 Effect of inhibition of c-Met on tumor cell motility. (A) c-Met was reduced
either by DsiRNA or 0.5 uM Tivantinib. (B) qPCR measurement of MET expression after
DsiRNA treatment, showing that c-Met was dramatically decreased by DsiRNA inhibition.
5.3.2 Ewing sarcoma growth and progression in vivo suppressed by
sEWVLNC
In addition to the cell invasion test in vitro, the biologic effects of sEWVLNC on Ewing
sarcoma proliferation and metastasis in vivo was examined in a mouse model. Both a
87
luciferase plasmid and a sEWVLNC expression vector were transfected into CHLA-10 cells
by double transfection. These modified tumor cells were then injected into SCID mice, as
described previously. Normal mouse cells do not contain a luciferase plasmid; only the
injected tumor cells express luciferase which generates fluorescent signals when activated
by addition of D-luciferin. Therefore, tumor growth and progression in the mouse model
could be quantitatively assessed by bioluminescence imaging (BLI) and quantification of
fluorescent signals (Figure 5.4) (Gildea et al. 2000, Kim et al. 2011, Lu et al. 2011).
Injected mice were divided into the following three groups based on CHLA-10 cells
transfected with different plasmids: CHLA-10 cells with luciferase plus control, CHLA-10
with luciferase plus sEWVLNC #1, and CHLA-10 with luciferase plus sEWVLNC #2.
Figure 5.4 Effect of sEWVLNC on Ewing sarcoma growth and progression in vivo.
Tumor cells containing empty vector (luciferase only) or sEWVLCN overexpression vector
#1 or #2 were subcutaneously injected into the shoulder region of immunodeficient SCID
mice. Fluorescent signals of tumors could be detected as early as day 7 after tumor cell
injection (data not shown). The bioluminescent images shown controls (NC, negative
control) and experimental samples (sEWVLNC) were recorded on day 14.
Before cell injection, the expression efficiency of luciferase plasmid was assessed by BLI;
surprisingly, the results indicated that sEWVLNC dramatically enhanced the luciferase
expression (p-value < 0.05). For an equivalent amount of tumor cells, the fluorescent signal
88
of cells with sEWVLNC was twice that of control. Consequently, all calculations were
normalized accordingly. After 7 days, Ewing sarcoma growth and invasion could be
quantified according to fluorescent signal intensity by BLI. The BLI results showed that in
the spontaneous metastasis model involving subcutaneous injection model, Ewing sarcoma
tumors of mice treated with sEWVLNC grew much more slowly than empty vector controls
(Figure 5.5A); tumors of mice that received empty vector proliferated twice as much after
14 days as tumors from mice receiving sEWVLNC (p-value < 0.0005). Furthermore, in the
tail-vein experimental metastasis model, tumors from mice treated with empty vector
invaded more quickly than tumors of mice treated with sEWVLNC (Figure 5.5B), and
tumors from control mice grew twice as much as tumors in mice treated with sEWVLNC
after 10 days (p-value < 0.0005). To conclude, the therapeutic effects of sEWVLNC on
Ewing sarcoma invasion was supported in vivo in both mouse models.
89
Figure 5.5 Tumor growth delayed by sEWVLNC. (A) in a subcutaneous injection model,
Ewing sarcomas of mice treated with control vector (containing luciferase only)
proliferated twice as much as tumors of mice treated with the sEWVLNC overexpression
vector after 14 days (p-value < 0.0005); meanwhile, in an experimental metastasis model
involving tail-vein injection, Ewing sarcomas of mice treated with control vector spread
twice as much as tumors from mice treated with the sEWVLNC overexpression vector after
10 days (p-value < 0.0005).
90
5.3.3 sEWVLNC is a potential variable in Ewing sarcoma prognosis
As both the tumor invasion assay in vitro and BLI data in vivo indicated an inhibitory effect
of sEWVLNC on Ewing sarcoma growth and progression, the correlation between
sEWVLNC expression and the survival rate of Ewing sarcoma patients remains a major
focus of this study. To investigate the correlation in more depth, 52 Ewing sarcoma patient
biopsy samples published and available for download were analyzed. A total of 13 deaths
occurred in this group of patients, and cancer relapse occurred in 17 patients. Based on the
mean value of sEWVLNC expression in 178 EWS cases, these 52 Ewing sarcoma patients
were divided into two groups: a sEWVLNC-high group (27 patients) and a sEWVLNC-low
group (25 patients). In overall survival analysis based on death events, the survival rate of
Ewing sarcoma patients with high sEWVLNC expression was significantly higher than that
of patients with low sEWVLNC expression (p-value = 0.00731 in Figure 5.6A). In event-
free survival analysis based on cancer relapse, results indicated that the probability of
relapse in Ewing sarcoma patients expressing a high level of sEWVLNC was significantly
lower than that of those expressing a low level of sEWVLNC (p-value = 0.0133 in Figure
5.6B).
Next, a potential correlation between MET gene expression and patient survival was
examined. The results of survival analysis in this patient cohort were consistent with results
when patients were stratified according to their different levels of MET expression (Figure
5.7). The survival rate of the patients with low MET expression was found to be
dramatically higher than that of patients with high MET expression (p-value = 0.0195 in
Figure 5.7A) when 52 patients were divided into two groups (27 patients in the MET-low
91
group and 25 patients in the MET-high group) according to the median value of MET
expression in 178 EWS cases. In addition, similar to the previous survival analysis based
on sEWVLNC expression, the 52 patients were divided into unbalanced groups (39 patients
in the MET-low group and 13 patients in the MET-high group) according to the mean value
of MET expression in 178 EWS cases; the survival analysis suggested a marginal
significance between the two groups (p-value = 0.0807 in Figure 5.7B). Therefore,
sEWVLNC displayed characteristics of a potential biomarker in Ewing sarcoma prognosis.
Moreover, these survival analyses demonstrated a positive association between sEWVLNC
expression and enhanced survival rate of Ewing sarcoma patients, which would most likely
occur through downstream suppression of the MET gene.
92
Figure 5.6 Association between expression level of sEWVLNC and survival rate of
Ewing sarcoma patients. (A) Overall survival by sEWVLNC (COG data): based on
number of patient deaths, overall survival analysis showed that the survival rate of Ewing
sarcoma patients expressing a high level of sEWVLNC (green) was distinctly higher than
that of patients expressing a low level of sEWVLNC expression (blue) (p-value = 0.00731).
(B) Event-free survival by sEWVLNC (COG data): based on cancer relapse, event-free
survival analysis indicated that the probability of cancer relapse of Ewing sarcoma patients
expressing a high level of sEWVLNC expression (green) was significantly lower than those
expressing a low level of sEWVLNC expression (blue) (p-value = 0.0133).
93
Figure 5.7 Survival analysis based on MET expression. (A) Event-free survival by c-
Met (COG data): selected according to a median value of MET expression in 178 EWS
cases, 52 patients were divided into two groups: 27 patients in the MET-low expression
group (green) and 25 patients in the MET-high expression group (blue). The event-free
survival rate of Ewing sarcoma patients with low MET expression was dramatically higher
than that of the patients with high MET expression (p = 0.0195). (B) Selected according to
the mean value of MET expression in 178 EWS cases, the same 52 patients were divided
into unbalanced groups: 39 patients in the MET-low group (green) and 13 patients in the
MET-high group (blue). The event-free survival analysis suggested a marginal significance
between two groups (p = 0.0807).
94
5.3.4 MET expression is down-regulated by sEWVLNC potentially by
binding to transcription factor c-Jun
The studies discussed herein addressed the functional roles of sEWVLNC in Ewing sarcoma,
and both the statistical model and the molecular experiments supported the hypothesis that
MET expression is suppressed by sEWVLNC. However, solely depending on these data, a
mechanistic explanation is unclear. Based on a comprehensive analysis of the sEWVLNC
downstream gene network and of the gene co-expression correlation in the statistic model
above, 29 genes were identified to be affected by sEWVLNC. Moreover, the observation
that sEWVLNC increased the expression efficiency of the luciferase plasmid inadvertently
supported the assumption that sEWVLNC functions in gene transcription (Figure 5.8).
Therefore, it is reasonable to assume that sEWVLNC might regulate the MET gene
transcriptionally. It is known that HIF1, ETS1, and AP-1 transcription factors are
promoters of the MET gene, and AP-1 consists of c-Jun and c-Fos heterodimer; therefore,
RNA FISH and IHC were performed to detect the potential interaction between sEWVLNC
and AP-1. In Figure 5.9, the results showed that most sEWVLNC was localized in nucleus,
some sEWVLNC co-locates with c-Jun antibody indicating a potential interaction between
c-Jun and sEWVLNC. The result reported here suggests that sEWVLNC might suppress
MET expression by inhibiting AP-1 binding to MET promoter.
95
Figure 5.8 Transcription efficiency affected by sEWVLNC. (A) the strategy of double
transfection: luciferase plasmid co-expressed with sEWVLNC expression vector. (B) cells
transfected with sEWVLNC express luciferase twice as much as control cells without
sEWVLNC (p-value < 0.05).
96
Figure 5.9 RNA FISH of sEWVLNC and c-Jun. Based on RNA FISH, sEWVLNC (red)
is localized in the nucleus, and some sEWVLNC co-localizes with c-Jun (blue).
5.4 Discussion
Although the statistical model showed that dozens of signaling pathways were correlated
with sEWVLNC expression, the HGF/MET pathway ranked first in signaling pathway
analysis of an sEWVLNC overexpressing cell line. Furthermore, comparing sEWVLNC-
rich to sEWVLNC–poor cell lines, sEWVLNC-rich CHLA-9 cells expressed a low level of
MET whereas sEWVLNC-poor CHLA-10 cells derived from the same patient after relapse
expressed a high level of MET. Together, these findings suggest that sEWVLNC expression
levels correlate inversely with MET expression, and support MET as a key mediator of
tumor aggression in Ewing sarcoma is regulated by sEWVLNC.
Unlike most ncRNAs, sEWVLNC affects the expression efficiency of expression vector,
measured as enhanced luciferase expression. The animal models presented here
demonstrated that tumors with sEWVLNC spread much more slowly than control cells in
mice. Results from the in vitro cell invasion assay complemented the animal work and
97
suggested that sEWVLNC can inhibit Ewing sarcoma motility. Moreover, molecular
experiments provided evidence that sEWVLNC suppresses MET expression. The proto-
oncogene MET well known to promote tumor growth and progression, and since
sEWVLNC is herein shown to down-regulate MET expression, it is reasonable to speculate
that sEWVLNC inhibits the development of Ewing sarcoma in part at least via suppression
of MET expression. In fact, this assumption is supported by a MET-specific DsiRNA and
a c-Met inhibitor. Furthermore, the survival analysis of 52 Ewing sarcoma patients showed
a positive association between sEWVLNC expression and a higher survival rate in this
Ewing sarcoma patient cohort. In addition, survival analysis of patients based on MET
expression was inversely consistent with the corresponding analysis of survival versus
sEWVLNC expression. Taken all together, these findings strongly suggest that sEWVLNC
plays an important clinical role in Ewing sarcoma by suppression of the MET gene, and
this ncRNA might therefore be a potential marker of Ewing sarcoma prognosis and a
rational therapeutic target.
Besides the putative functions of sEWVLNC in Ewing sarcoma, identification of the
mechanism through which sEWVLNC regulates down-stream genes represents an ongoing
challenge. In this study, the result of RNA FISH and IHC demonstrated that sEWVLNC
might repress the MET gene through a potential interaction with c-Jun which binds to c-
Fos, forming the heterodimer AP-1 that activates the promoter of the MET gene. However,
the precise mechanism of this inhibition is poorly understood. As sEWVLNC is a newly
described ncRNA, these questions need to be solved in future studies.
98
Chapter 6: Conclusions and Perspectives
Although vlncRNA was discovered and defined several years ago, corresponding studies
of this type of ncRNAs have been conducted for over a decade. The earliest research in this
area focused on the vlncRNAs with antisense inhibitory roles. Kcnq1ot1 and Air, as two
classic samples, have been shown to perform functions in their full-length forms. Kcnq1ot1
is localized in intronic space between exon 10 and exon 12 of the Kcnq1 gene, but
transcribed on the opposite strand, and is thus a classic antisense transcript. The main
function of Kcnq1ot1 is as an antisense inhibitor of the Kcnq1 gene; Kcnq1ot1 suppresses
chromatin modifications of Kcnq1 through a lineage-specific binding. Kcnq1ot1 is only in
its full-length form (Pandey et al. 2008). The other classic vlncRNA, Air, is located in
intronic space between exon 2 and exon 3 of the Igf2r gene, and is also transcribed on the
negative strand. Similar to Kcnq1ot1, the main function of Air is to complementarily inhibit
the Slc22a3, Slc22a2, and Igf2r genes. However, the functional effects of Air are not solely
based on an antisense inhibition, by which Air suppresses Igf2r expression. Recent studies
have shown that Slc22a3 expression is suppressed by Air which accomplishes this by
interacting with H3K9 histone methyl-transferase G9a to induce the methylation at the
Slc22a3 promoter (Seidl et al. 2006). Although these studies highlighted the fact that the
intact vlncRNA was necessary to perform functions, more recent research has
demonstrated that most vlncRNAs have spliced variants, even though the functions of the
spliced forms are still unknown.
Very long intergenic non-coding RNAs were the second sub-field of vlncRNAs to be
described recently (Kapranov et al. 2012). HELLP, which is about 205 kb in size and was
99
the first reported vlincRNA, is known to be involved in cell cycle regulation and to reduce
extra-villous trophoblasts invasion. HELLP is localized in the intergenic space between
PMCH and IGF1, and is transcribed on the positive strand. Unlike Kcnq1ot1 and Air,
identification of HELLP functions was achieved by statistical analysis (van-Dijk et al.
2012). Another vlncRNA, VAD, is more than 200 kb in size and is located at the DDAH1
locus between BLC10 and ZNHIT6 but on the opposite strand. VAD plays a functional role
in modulating chromatin; its main function is suppression of the histone variant H2A.Z to
interfere with the INK4 gene promoter (Lazorthes et al. 2015). All of these studies
underline that vlncRNAs perform their functions in their native, unspliced form. Even if
splice variants of Air and VAD exist, no studies have unveiled any function of those
isoforms. In contrast, this study focused on the splice variants of a vlncRNA. Spliced
variants with poly-A tails, usually less than 10 kb in size, are much more stable than
unspliced vlncRNAs (over 50 kb in size). The stability of the spliced variants makes them
more likely to perform long-term functions in cells. In this study, an actinomycin D assay
supported this assumption: the half-life of unspliced EW-vlncRNA was much less than that
of sEWVLNC, a spliced poly-A-tail RNA of EW-vlncRNA.
Unlike vlincRNAs or antisense vlncRNAs, EW-vlncRNA shares an overlapping DNA
region with the CCDC171 gene. Although GENCODE has reported that EW-vlncRNA
(termed ENST00000486641 in GENCODE) is a non-coding processed transcript, studies
of this kind of vlncRNAs have not been published to date. Recently, a new classification
of ncRNAs has emerged that describes processed transcripts as sense ncRNAs, but little
information about this kind of ncRNAs is currently available (St-Laurent et al. 2015). In
this study, EW-vlncRNA, a sense ncRNA, is shown to have spliced a variant, sEWVLNC,
100
that is potentially involved in multiple signaling pathways. This study also shows that EW-
vlncRNA is not an extension of CCDC171 transcription, since the hyper-methylations
observed at the CCDC171 promoter did not reduce EW-vlncRNA expression but did
suppress CCDC171 expression. In addition, the knockdown assay directly supported the
idea that the CCDC171 gene was not significantly inhibited by ASO knockdown whereas
EW-vlncRNA expression was suppressed. Meanwhile, HuEx data and RNA-Seq data
indicated that the expression of EW-vlncRNA and the CCDC171 gene were of different
magnitudes. These findings indicated that EW-vlncRNA is not transcriptional background
noise from the CCDC171 gene. Dissimilar to traditional genes, the findings presented here
indicate that EW-vlncRNA lacks a conventional promoter, consistent with previous studies
of vlincRNAs in general (St-Laurent et al. 2013). It is currently unknown which
transcriptional factors promote EW-vlncRNA expression and how the transcription occurs.
Identifying this transcriptional mechanism will be very important for further studies of
vlncRNAs, as it might be in contrast to the prevailing paradigm of transcription and
splicing in which DNA is transcribed to RNA followed by RNA splicing. Besides RNA
splicing, identifying the mechanism through which some vlncRNA, like Air, can
effectively resist degradation by RNase will also be important to further investigate.
Previous studies have shown that the half-life of the unspliced format of Air is longer than
that of Air spliced variants. This is the reverse of the EW-vlncRNA, which is less stable
than its spliced product. Thus, identifying the corresponding mechanisms will be important
for comprehending the essential rules of RNA degradation.
Based on a comprehensive analysis between sEWVLNC overexpression and knockdown,
dozens of signaling pathway candidates were identified in this study. Further investigation
101
using gene co-expression correlation and WGCNA analyses showed that the HGF/MET
signaling pathway is one of the main down-stream targets of sEWVLNC. qPCR results
supported the hypothesis that sEWVLNC effectively depresses HGF/MET signaling
pathway by suppressing MET expression. The cell invasion test in vitro and tumor
metastasis analysis in vivo confirmed that a high level of sEWVLNC dramatically inhibited
Ewing sarcoma growth and progression. Though the data showed that this inhibitory effect
of sEWVLNC was not efficient in other types of tumors such as rhabdomyosarcoma,
sEWLVNC overexpression caused Ewing sarcoma invasion to be reduced by at least 25%
in vitro, and inhibited tumor progression by approximately 50% in vivo. The MET
knockdown assays indicated that sEWVLNC primarily reduces tumor metastasis via down-
regulation of the MET gene. Survival analysis of a Ewing sarcoma patient cohort was
consistent with these experimental results, and suggested that sEWVLNC can increase
patient survival rate and reduce the relapse rate. This study also analyzed the mechanism
of MET gene down-regulation by sEWVLNC; the results suggest that sEWVLNC might
interact with c-Jun, preventing the AP-1 complex from promoting MET expression.
However, the mechanism explaining how sEWVLNC inactivates AP-1 is presently
unknown; it might happen through competitive inhibition, yet there is currently no direct
evidence to support this assumption. Moreover, the statistical model predicts that
sEWVLNC regulates approximately 29 down-stream gene candidates. Identification of this
large number of candidates is consistent with the possibility that sEWVLNC is involved in
regulation of the transcriptional process by binding to gene promoters, but the speculation
is yet unproven. Similarly, it is still unclear how sEWVLNC up-regulates downstream genes
102
such as luciferase; it is likely that sEWVLNC interacts directly with a second transcription
factor (other than AP-1) to promote gene expression.
Although vlncRNAs have been studied for several years, the functional roles of most
vlncRNAs are far from understood. This study contributes evidence to the accumulating
body of knowledge supporting the hypothesis that vlncRNAs play multiple functional roles
in development and cancer. However, unlike previous studies of vlncRNAs that indicated
the unspliced form is a prerequisite for vlncRNAs to perform at least some of their
functions, the findings in the present study document that spliced variants of EW-vlncRNA
also are responsible for their observed functional roles. In conclusion, this study identifies
the first tumor-specific vlncRNA, characterizes its processing to splice variants, and
documents one of its spliced isoforms suppressing MET mediated tumor progression and
metastasis.
103
Bibliography
Aartsma-Rus A, van Vliet L, van Ommen G, et al. 2009. Guidelines for Antisense
Oligonucleotide Design and Insight Into Splice-modulating Mechanisms. Molecular
Therapy 17: 548-553.
Affymetrix ENCODE Transcriptome Project, Cold Spring Harbor Laboratory ENCODE
Transcriptome Project. 2009. Post-transcriptional processing generates a diversity of 5'-
modified long and short RNAs. Nature 457: 1028-1032.
Appleman LJ. 2011. MET Signaling Pathway: A Rational Target for Cancer Therapy.
Journal of Clinical Oncology 29: 4837-4838.
Bennasser Y, Voinnet O, Benkirane M, et al. 2011. Competition for XPO5 binding
between Dicer mRNA, pre-miRNA and viral RNA regulates human Dicer levels. Nature
Structure & Molecular Biology 18: 323-327.
Clark C, Palta P, Coffey AJ, et al. 2012. A comparison of the whole genome approach of
MeDIP-seq to the targeted approach of the Infinium HumanMethylation450 BeadChip(®)
for methylome profiling. PLos One 7: e50233.
Cooper CS, Park M, Croce CM, et al. 1984. Molecular cloning of a new transforming
gene from a chemically transformed human cell line. Nature 311:29-33.
Cunningham F, Trevanion SJ, Flicek P, et al. 2015. Ensembl 2015. Nucleic Acids
Research 43: 662-669.
de Hoon M, Shin JW, Carninci P. 2015. Paradigm shifts in genomics through the
FANTOM projects. Mammalian Genome 26: 391-402.
Deaton AM, Bird A. 2011. CpG islands and the regulation of transcription. Genes &
Development 25: 1010-1022.
Derrien T, Johnson R, Bussotti G, Harrow J, Guigo R, et al. 2012. The GENCODE v7
catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and
expression. Genome Research 22: 1775-1789.
Dudoit S, Yang YH, Speed TP, et al. 2002. Statistical methods for identifying
differentially expressed genes in replicated cDNA microarray experiments. Statistica
Sinica 12: 111-139.
Dumham I, Shimizu N, Burrill W, et al. 1999. The DNA sequence of human
chromosome 22. Nature 402: 489-495.
Egeblad M, Werb Z. 2002. New functions for the matrix metalloproteinases in cancer
progression. Nature Reviews Cancer 2: 161-174.
104
Ewing J. 1921. Diffuse endothelioma of bone. Proceedings of the New York Pathological
Society 21: 17-24.
Fatica A, Bozzoni I. 2014. Long non-coding RNAs: new players in cell differentiation
and development. Nature Reviews Genetics 15: 7-21.
Finn RD, Bateman A, Punta M, et al. 2014. Pfam: the protein families database. Nucleic
Acids Research 42: 222-230.
Geisler S, Coller J. 2013. RNA in unexpected places: long non-coding RNA functions in
diverse cellular contexts. Nature Reviews Molecular Cell Biology 14: 699-712.
Gildea JJ, Gulding KM, Theodorescu D, et al. 2000. Transmembrane motility assay of
transiently transfected cells by fluorescent cell counting and luciferase measurement.
Biotechniques 29: 81-86.
Harrow J, Guigo R, Hubbard TJ. 2012. GENCODE: the reference human genome
annotation for The ENCODE Project. Genome Research 22: 1760-1774.
Holley R, Apgar J, Zamir A, et al. 1965. Structure of a ribonucleic acid. Science 147:
1462-1465.
Holoch D, Moazed D. 2015. RNA-mediated epigenetic regulation of gene expression.
Nature Reviews Genetics 16: 71-84.
Hu Z, Chang YC, Delisi C, et al. 2013. VisANT 4.0: Integrative network platform to
connect genes, drugs, diseases and therapies. Nucleic Acids Research: 225-231.
Hung T, Wang Y, Chang HY, et al. 2011. Extensive and coordinated transcription of
noncoding RNAs within cell-cycle promoters. Nature Genetics 43: 621-629.
Ideue T, Yokoi T, Hirose T, et al. 2009. Efficient oligonucleotide-mediated degradation
of nuclear noncoding RNAs in mammalian cultured cells. RNA 15: 1578-1587.
Im Y, Kim H, Kim S, et al. 2000. EWS-FLI1, EWS-ERG, and EWS-ETV1 oncoproteins
of Ewing tumor family all suppress transcription of transforming growth factor beta type
II receptor gene. Cancer Research 60: 1536-1540.
Jones M, Rhodenizer D, Hegde M, et al. 2012. DDOST mutations identified by whole-
exome sequencing are implicated in congenital disorders of glycosylation. American
Journal of Human Genetics 90: 363-368.
Kan Z, Zheng H, Mao M, et al. 2013. Whole-genome sequencing identifies recurrent
mutations in hepatocellular carcinoma. Genome Research 23: 1422-1433.
105
Kapranov P, St Laurent G. 2012. Dark matter RNA: existence, function, and controversy.
Front Genetics 3: 60.
Kapranov P, St Laurent G, Triche TJ, et al. 2011. The majority of total nuclear-encoded
non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA. BMC Biology 8:
149.
Kim J, Sun L, Mochly-Rosen D, et al. 2011. Sustained inhibition of PKCα reduces
intravasation and lung seeding during mammary tumor metastasis in an in vivo mouse
model. Oncogene 30: 323-333.
Koerner MV, Pauler FM, Barlow DP, et al. 2009. The function of non-coding RNAs in
genomic imprinting. Development 136: 1771-1783
Kolde R. 2015. pheatmap: Pretty Heatmaps. R package version 1.0.2.
Kretz M, Chang HY, Khavari PA, et al. 2012. Control of somatic tissue differentiation by
the long non-coding RNATINCR. Nature 493: 231-235.
Lander E, Linton L, Birren B, et al. 2001. Initial sequencing and analysis of the human
genome. Nature 409: 860-921.
Langer W, Sohler F, Sommer A, et al. 2010. Exon array analysis using re-defined probe
sets results in reliable identification of alternatively spliced genes in non-small cell lung
cancer. BMC Genomics 11: 676.
Langfelder P, Horvath S. 2008. WGCNA: an R package for weighted correlation network
analysis. BMC Bioinformatics 9: 559.
Langfelder P, Luo R, Horvath S, et al. 2011. Is my network module preserved and
reproducible? PLoS Computational Biology 7.
Lazorthes S, Trouche D, Nicolas E, et al. 2015. A vlncRNA participates in senescence
maintenance by relieving H2AZ-mediated repression at the INK4 locus. Nature
Communications 6.
Leclerc GJ, Leclerc GM, Barredo JC. 2002. Real-time RT-PCR analysis of mRNA decay:
half-life of Beta-actin mRNA in human leukemia CCRF-CEM and Nalm-6 cell lines.
Cancer Cell International 2: 1-5.
Lee JT. 2012. Epigenetic regulation by long noncoding RNAs. Science 338: 1435-1439.
Lemon J. 2006. Plotrix: a package in the red light district of R. R-News 6: 8-12.
106
Lu X, Massaque J, Kang Y, et al. 2011. VCAM-1 promotes osteolytic expansion of
indolent bone micrometastasis of breast cancer by engaging α4β1-positive osteoclast
progenitors. Cancer Cell 20: 701-714.
Marioni J, Mason C, Gilad Y, et al. 2008. RNA-seq: an assessment of technical
reproducibility and comparison with gene expression arrays. Genome Research 18: 1509-
1517.
Mi S, Roberts RJ. 1992. How M.Mspl and M.Hpall decide which base to methylate.
Nucleic Acids Research 20: 4811-4816.
Miller JA, Cai C, Horvath S, et al. 2011. Strategies for aggregating gene expression data:
the collapseRows R function. BMC Bioinformatics 12: 322.
Mortazavi A, Schaeffer L, Wold B, et al. 2008. Mapping and quantifying mammalian
transcriptomes by RNA-Seq. Nature Methods 5: 621-628.
Nagano T, Fraser P. 2011. No-nonsense functions for long noncoding RNAs. Cell 145:
178-181.
Neuwirth E. 2014. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2.
Oberst A, Green D. 2011. It cuts both ways: reconciling the dual roles of caspase-8 in cell
death and survival. Nature Reviews Molecular Cell Biology 12: 757-763.
Organ S, Tsao M. 2011. An overview of the c-MET signaling pathway. Therapeutic
Advances in Medical Oncology 3: 7-19.
Ozsolak F, Milos PM. 2011. RNA sequencing: advances, challenges and opportunities.
Nature Reviews Genetics 12: 87-89.
Pandey RR, Nagano T, Kanduri C, et al. 2008. Kcnq1ot1 antisense noncoding RNA
mediates lineage-specific transcriptional silencing through chromatin-level regulation.
Molecular Cell 32: 232-246.
Pelechano V, Steinmetz LM. 2013. Gene regulation by antisense transcription. Nature
Reviews Genetics 14: 880-893.
Ponting CP, Oliver PL, Reik W. 2009. Evolution and functions of long noncoding RNAs.
Cell 136: 629-641.
Quinn JJ, Akhtar A, Chang HY, et al. 2014. Revealing long noncoding RNA architecture
and functions using domain-specific chromatin isolation by RNA purification. Nature
Biotechnology 32: 933-940.
107
Qureshi IA, Mehler MF. 2011. Non-coding RNA networks underlying cognitive
disorders across the lifespan. Trends in Molecular Medicine 17: 337-346.
R Development Core Team. 2008. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing ISBN 3-900051-07-0.
Raghav KP, Gonzalez-Angulo AM, Blumenschein GR,Jr. 2012. Role of HGF/MET axis
in resistance of lung cancer to contemporary management. Translational Lung Cancer
Research 1: 179-193.
Rinn JL, Chang HY. 2012. Genome Regulation by Long Noncoding RNAs. Annual
Review of Biochemistry 81: 145-166.
Rinn JL, Segal E, Chang HY, et al. 2007. Functional demarcation of active and silent
chromatin domains in human HOX loci by noncoding RNAs. Cell 129: 1311-1323.
Roadmap Epigenomics Consortium, Wang T, Kellis M, et al. 2015. Integrative analysis
of 111 reference human epigenomes. Nature 518: 317-330.
Satpathy AT, Chang HY. 2015. Long noncoding RNA in hematopoiesis and immunity.
Immunity 42: 792-804.
Scherer LJ, Rossi JJ. 2003. Approaches for the sequence-specific knockdown of mRNA.
Nature Biotechnology 21: 1457-1465.
Seidl C, Stricker S, Barlow D. 2006. The imprinted Air ncRNA is an atypical RNAPII
transcript that evades splicing and escapes nuclear export. EMBO Journal 25: 3565-3575.
Seol DW, Chen Q, Zarnegar R. 2000. Transcriptional activation of the hepatocyte growth
factor receptor (c-met) gene by its ligand (hepatocyte growth factor) is mediated through
AP-1. Oncogene 19: 1132-1137.
Soman NR, Wogan GN, Rhim JS. 1990. TPR-MET oncogenic rearrangement: detection
by polymerase chain reaction amplification of the transcript and expression in human
tumor cell lines. Proc. Natl. Acad. Sci. 87: 738-742.
St Laurent G, Triche TJ, Kapranov P, et al. 2013. VlincRNAs controlled by retroviral
elements are a hallmark of pluripotency and cancer. Genome Research 14: R73.
St Laurent G, Wahlestedt C, Kapranov P. 2015. The landscape of long noncoding RNA
classification. Trends in Genetics 31: 239-251.
Steven G, Juli T, Elizabeth S, et al. 2010. Flow cytometric detection of ewing sarcoma
cells in peripheral blood and bone marrow. Pediatric Blood Cancer 54: 13-18.
108
Storey JD. 2002. A direct approach to false discovery rates. Journal of the Royal
Statistical Society: Series B 64: 479-498.
Storey JD, Taylor JE, Siegmund D. 2004. Strong control, conservative point estimation,
and simultaneous conservative consistency of false discovery rates: A unified approach.
Journal of the Royal Statistical Society: Series B 66: 187-205.
Su Z, Yang Z, Yu Q, et al. 2015. Apoptosis, autophagy, necroptosis, and cancer
metastasis. Molecular Cell 14:48.
The ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements
in the human genome. Nature 489: 57-74.
Therneau TM, Garmbsch PM. 2000. Modeling survival data: Extending the Cox model.
Springer, New York.
Turc-Carel C, Aurias A, Berger M, et al. 1988. Chromosomes in Ewing's sarcoma. I. An
evaluation of 85 cases of remarkable consistency of t(11;22)(q24;q12). Cancer Genet
Cytogenet 32: 229-238.
van Dijk M, Thulluru HK, Oudejans CB, et al. 2012. HELLP babies link a novel lncRNA
to the trophoblast cell cycle. Journal of Clinical Investigation 122: 4003-4011.
Wang L, Zhu W, Levy D. 2006. Nuclear and cytoplasmic mRNA quantification by
SYBR green based real-time RT-PCR. Methods 39: 356-362.
Warnes GR, Bolker B, Venables B. 2015. gplots: Various R Programming Tools for
Plotting Data. R package version 2.16.0.
Wilusz JE, Freier SM, Spector DL. 2008. 3' end processing of a long nuclear-retained
noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135: 919-932.
Xu XM, Yoo MH, Hatfield DL, et al. 2009. Simultaneous knockdown of the expression
of two genes using multiple shRNAs and subsequent knock-in of their expression. Nature
Protocol 4: 1338-1348.
Zhang J, Hu S, Triche TJ, et al. 2004. Selective usage of D-type cyclins by Ewing’s
tumors and rhabdomyosarcomas. Cancer Research 64: 6024-6034.
Zhang K, Shi ZM, Hong W, et al. 2014. The ways of action of long non-coding RNAs in
cytoplasm and nucleus. Gene 547: 1-9.
Zhu H, Naujokas MA, Park M. 1994. Receptor chimeras indicate that the met tyrosine
kinase mediates the motility and morphogenic responses of hepatocyte growth/scatter
factor. Cell Growth Differ 5: 359-366.
Abstract (if available)
Abstract
The Ewing sarcoma family of tumors (ESFTs) are the second most common bone and soft tissue tumor in children, most commonly in adolescence. A chromosomal translocation t(11:22) generating the EWSR1-FLI1 fusion gene is well known as a molecular pathologic marker of Ewing sarcoma. Current advance showed that several specific gene targets of this chimeric protein EWS-FLI1 have been identified, all possessing multiple GGAA repeats in their promoter region, but until now, no non-coding RNA targets have been found. ❧ Nowadays, over 60% of all known human genes are documented to be non-coding RNA transcripts, and many of these have been shown to be functional that were not previously appreciated. In the past, studies of non-coding RNA focused mainly on non-coding RNAs in small size
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genomic risk factors associated with Ewing Sarcoma susceptibility
PDF
Genetics and the environment: evaluating the role of noncoding RNA in autism spectrum disorder
PDF
The role of LGR5 in the pathogenesis of Ewing sarcoma: a marker of aggressive disease and a contributor to the malignant phenotype
PDF
Harnessing the power of stem cell self-renewal pathways in cancer: dissecting the role of BMI-1 in Ewing’s sarcoma initiation and maintenance
PDF
The role of miRNA and its regulation in pulmonary hypertension in sickle cell disease
PDF
Integrated genomic & epigenomic analyses of glioblastoma multiforme: Methods development and application
PDF
The role of PAX8 in epithelial ovarian carcinoma
PDF
Identification of novel epigenetic biomarkers and microRNAs for cancer therapeutics
PDF
LINC00261 induces a G2/M cell cycle arrest and activation of the DNA damage response in lung adenocarcinoma
PDF
The role of microRNAs in cancer
PDF
Functional characterization of a prostate cancer risk region
PDF
Lymphatic cell environment promotes sustained KSHV lytic replication and viral maintenance
PDF
Placenta growth factor-miRNAs-lncRNAs axis in the regulation of ET-1 gene involved in pulmonary hypertension in sickle cell disease
PDF
Establishing the non-coding RNA LINC00261 as a tumor suppressor in lung adenocarcinoma
PDF
SUMOylation regulates RNA polymerase III -- dependent transcripton via MAF1
PDF
Effects of chromatin regulators during carcinogenesis
PDF
The role of glucose-regulated proteins in endometrial and pancreatic cancers
PDF
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
PDF
An essential role of argininosuccinate synthase 1 in Kaposi’s sarcoma-associated herpesvirus-induced cellular transformation
PDF
Virus customization of host protein machinery for efficient propagation
Asset Metadata
Creator
Liu, Yang
(author)
Core Title
Functional role of a Ewing-sarcoma-specific vlncRNA in tumor growth and progression
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Genetic, Molecular and Cellular Biology
Publication Date
02/23/2016
Defense Date
12/07/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Ewing sarcoma,ncRNA,OAI-PMH Harvest,sEWVLNC
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hinton, David R. (
committee chair
), Asgharzadeh, Shahab (
committee member
), Triche, Timothy J. (
committee member
)
Creator Email
liu26@usc.edu,mao405628@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-214039
Unique identifier
UC11276890
Identifier
etd-LiuYang-4153.pdf (filename),usctheses-c40-214039 (legacy record id)
Legacy Identifier
etd-LiuYang-4153.pdf
Dmrecord
214039
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Liu, Yang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Ewing sarcoma
ncRNA
sEWVLNC