Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Analysis of SNP differential expression and allele-specific expression in gestational trophoblastic disease using RNA-seq data
(USC Thesis Other)
Analysis of SNP differential expression and allele-specific expression in gestational trophoblastic disease using RNA-seq data
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
Analysis of SNP differential expression and allele-specific expression in
Gestational trophoblastic disease using RNA-seq data
by
Qili Fei
A Thesis Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the Requirements for the Degree
(APPLIED BIOSTATISTICS AND EPIDEMIOLOGY)
MASTER OF SCIENCE
December 2016
2
Table of Contents
List of Figures..................................................................................................................................4
List of Tables...................................................................................................................................5
Abstract............................................................................................................................................6
Chapter 1: Introduction....................................................................................................................7
Chapter 2: Materials and Methods ..................................................................................................9
2.1 Materials Generation and Variants Calling....................................................................9
2.2 Statistical Analysis.......................................................................................................11
Chapter 3: Results..........................................................................................................................14
3.1 Identification of SNP Loci and Measurement of SNP Expression..............................14
3.2 GTD-specific SNPs show difference in expression of the two alleles........................16
3.3 Identification of significant GTD-specific SNP groups...............................................19
3.4 The read counts and ASE ratios of GTD-specific SNP groups along the
chromosome………………………………………………………………………….......20
Chapter 4: Discussion........... ........................................................................................................25
References .....................................................................................................................................27
3
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to Dr. Joshua Millstein for his invaluable
inspiration and guidance throughout my thesis experience. I am thankful to Dr. Melissa Wilson
and Dr. Wendy Mack for their advice on my thesis. I am also thankful
to Meng Li, Yibu Chen, Kefan You for their help with bioinformatics tools, Jie Liu for help with
data analysis, and Xi Chen for help with English grammar. I would like to thank my parents for
their encouragement and support.
4
List of Figures
Fig.1 Flow Chart for RNA-seq Data Process and Variants Calling…………………..…………10
Fig. 2: The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome 1-22………………………………………………………………..……………….14
Fig. 3: The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome X……………………………….………………….……………………………….15
Fig. 4: The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome Y…………………………………………………………….……….…………….15
Fig. 5: The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome M……………………………………………………………………….………….16
Fig. 6: SNP ASE Ratio distribution on Chromosome 1-22 (QUAL>60) …………….…..……..17
Fig. 7: SNP ASE Ratio distribution on Chromosome 1-22 (QUAL>500) ……………………...18
Fig. 8: SNP ASE Ratio distribution on Chromosome 1-22 (QUAL>1000) ………………...…..18
Fig. 9: Estimated 𝐹𝐷𝑅 and 95% CI for a series of significance thresholds applied to 462 SNP
groups of difference between normal villi and mole samples….……………….…….…..……..19
Fig. 10: Estimated 𝐹𝐷𝑅 and 95% CI for a series of significance thresholds applied to 461 SNP
groups of difference between choriocarcinoma and mole samples……………….………...20
Fig. 11: Mean-centered log 2 read counts (upper figure) and ASE Ratio (lower figure) of
significant group on Chromosome 9 ………………………………………………….…………22
Fig. 12: Mean-centered log 2 read counts (upper figure) and ASE Ratio (lower figure) of
significant groups on Chromosome10 ……………………………………….………………….23
Fig. 13: Mean-centered log 2 read counts (upper figure) and ASE Ratio (lower figure) of
significant groups on Chromosome 18 ……………….………………………….………...……24
5
List of Tables
Table 1: The chromosome positions of significant GTD-specific SNP groups (Mole VS villi)...20
6
Abstract
Objectives: The purpose of our study was to introduce and illustrate a method to analyze SNP
expression and allele-specific gene expression in Gestational trophoblastic disease on a
chromosome level using RNA-seq data.
Methods: RNA-Sequencing (RNA-Seq) and GATK variant calling was used for 5 mole samples,
1 choriocarcinoma sample from different patients and one normal villi sample as control from
the healthy patient. A permutation testing FDR method was used to define significant GTD-
specific SNP expression regions between mole samples and villi samples, mole samples and
chriocarcinoma sample, separately. An estimated allele-specific expression (ASE) ratio was used
to detect whether expression differed between two alleles in GTD samples.
Results: We identified 5 significant GTD-specific regions in the comparison between mole and
normal villi samples: three on chromosome 10, one on chromosome 9 and one on chromosome
18. In these regions, ADAM12 and BCL-2 gene were related with mole pregnancy. The plots of
SNP ASE ratio of GTD samples showed a wider spread than the normal sample, which was
around 0.5, indicating that the SNPs of GTD samples showed different expression of the two
alleles.
Conclusion: The permutation testing approach is a useful tool for detection of significant regions
for SNPs in RNA-seq data for GTD. The ASE ratio plot provided insights for
further ASE analysis.
7
CHAPTER 1: INTRODUCTION
Gestational trophoblastic disease (GTD) is a group of rare conditions related to abnormal growth
of cells inside women’s uterus. GTD starts in the cells which would normally develop into the
placenta during pregnancy
[1]
. Hydatidiform mole and choriocarcinoma are two main types of
GTD. There are two types of Hydatidiform mole: complete hydatidiform mole (CHM) and
partial hydatidiform mole (PHM). The incidence of molar pregnancy in Southeast Asia is 7 to
10 times higher than in Europe or North America. In the United States, for example, the reported
incidence of molar pregnancy is 1 in 1500 pregnancies, while incidence is 1 in 125 in Taiwan
[2]
.
GTD-associated gene expression or mutations have been reported. Hydatidiform mole is usually
associated with mutations in the NLRP7 gene at chromosome 19q13.3–13.4
[3]
. P57 gene
expression is associated with androgenetic complete mole
[4]
. H19 and IGF2 gene expression
are related to choriocarcinoma. The IGF2 gene expressed from the paternally derived allele,
whereas the H19 gene is expressed from the maternally derived allele
[5]
.
With the development of Next Generation Sequencing technologies, RNA-seq has many
advantages and been widely used in many research areas. RNA-seq provides bountiful
information on gene expression or characterization, codes sequence variation, and also gives
potential insights into post-translational processes such as RNA editing
[6]
. RNA-seq is used to
detect allele-specific expression (ASE) and identify disease-related single nucleotide
polymorphisms (SNP)
[7]
. A number of studies have reported the feasibility of SNP detection
using RNA-seq. SNPs can be used to identify genomic divergence of species to elucidate the
processes of speciation and evolution, and to associate genomic variations with phenotypic
traits
[8]
. Recently, methods of allele-specific expression analysis were introduced
[9][10]
using data
8
from whole genome sequencing and RNA-seq. These methods are becoming useful tools to
determine the parental-origin of disease-related gene mutations.
Several studies have analyzed GTD using RNA-seq. To better understand the global impact of
SNPs on chromosome level, we introduced a method to detect significant regions based on SNP
expression along the chromosome using RNA-seq data from GTD patients. We sought to
determine if expression on some regions were significantly related to GTD. In addition, we
estimated whether there is allele-specific expression with only RNA-seq data using the SNP-
related allele-specific ratio.
9
CHAPTER 2: MATERIALS AND METHODS
2.1 Material Generation and Variant Calling
Five mole samples and one choriocarcinoma sample were collected from different GTD patients
and one villi sample was collected from a normal patient. All data were generated using
Illumina standard library preparation and sequencing protocols provided by the manufacturer.
The flow chart for data process and variants calling is shown in Fig. 1. The Partek® Flow®
RNA-seq alignment protocol was used and the basic tutorial can be downloaded
[11]
. All the
sequencing reads were mapped to the human genome (build hg38) with STAR (2.3.0.1).
The alignment BAM files were further processed by the GATK (version 3.4)
[12]
package for
realignment and recalibration. The Picard package was used to identify and remove duplicate
reads. The variants were detected using HaplotypeCaller of the GATK package. We filtered the
SNPs from the variant call format files collected from the GATK protocol and completed
annotation using Partek® Flow® with the dbSNP annotation database to identify known SNPs in
human genes.
10
Fig. 1: Flow Chart for RNA-seq Data Process and Variants Calling
Fastaq files
1 villi , 5 mole , 1 choriocarcinoma
Partek with STAR
Bam files
Mapping and Marking
Duplicates
Split'N'Trim
Variant calling
Use GATK with
default command
lines
Variant filtering
Variants Analysis Steps
Annotations the vcf files
11
2.2 Statistical Analysis
We used R 3.2.0 to produce the plots and analyze the data.
For SNP expression plots, we used the total read counts of SNPs, and took the log2 for read
counts of all SNPs because of its large disparity. The mean expression point of each
chromosome was calculated as the total SNP log2 read counts divided by the total SNP numbers
per chromosome. We then centered each SNP log2 read count with their mean expression points
throughout each individual chromosome.
Individual SNP groups on each chromosome were defined based on the SNP base-pair position
on the normal villi sample. The first 5% of SNPs from the beginning of each chromosome were
defined as the first SNP groups. In total, 20 SNP groups were defined on each chromosome. For
GTD samples, the group positions of those SNPs on each chromosome were categorized
according to the group positions of the normal villi sample. For the mitochondrial chromosome
and Y chromosome, we defined only 2 groups in these two chromosomes because there were
only a few SNPs in these chromosomes. The partitions without SNPs were excluded. A total of
462 SNP groups for villi compared with mole samples, and 461SNP groups for choriocarcinoma
compared with mole samples were included in the study. The mean of centered SNP log2 read
counts of every group was conducted for every group. One-sample t-tests were used to detect
whether the read counts of each SNP group in normal villi is significant statistical different from
mole samples. We use the mean of each mole samples as the null hypothesis for other 5 one-
sample t-test, which are the permutation t-tests. These permutation t-tests excluded the false
positive p-value in the further FDR method.
12
In the comparison of normal villi and mole samples, the p-value with normal villi as null
hypothesis was set as the true p-value; the other five p-values were set as permutation p-values
for comparison of normal villi to mole samples. Similarly, the p-value with choriocarcinoma as
the null hypothesis was set as the true p-value; the five other p-values were set as permutation p-
values for comparison of choriocarcinoma to mole samples. A post-hoc strategy was used for
identifying the significance threshold as the minimum upper CI bound.
In order to test for significant GTD-specific regions, the permutation-based FDR method was
introduced into the data analysis
[13]
. The False Discovery Rates (FDR) is a widely used
statistical method
in NGS analysis; NGS analysis provides greater power.
[
Permutation testing
approaches are especially important in genomic studies. They can be used to compute the
sampling distribution for any test statistic under the strong null hypothesis that a set of genetic
variants has absolutely no effect on the outcome
[14]
.
However, in many small-sample situations, we do not have many permutations with our samples.
In our study for example, we only have 6 samples for each comparison (1 normal villi compared
to 5 mole samples; or 1 choriocarcinoma compared to 5 mole samples) with limited permutations
and subsequent dependencies. We propose a permutation-based tail-area FDR estimator
involving a simple function of counts of observed and permuted test outcomes. The CI estimator
for FDR also provides guidance as to whether the number of groups is sufficient.
To analyze whether GTD showed differential SNP expression between two alleles, ASE ratio
plots were used. First, variant call format files from the seven samples were combined using
13
bcftools
[15]
. The reference read counts and alternative read counts of SNPs from the vcf files
were used to compute ASE ratios according to the formula below. The ASE ratio of each specific
SNP was mapped along the chromosome for each sample separately. Different SNP quality,
which is the phred-scaled quality score for the assertion made in alternative bases, was set as
thresholds to detect the distribution of the ASE ratio of those disease-related SNPs throughout
the chromosomes.
𝑨𝑺𝑬 𝑹𝑨𝑻𝑰𝑶= (𝑨𝑳𝑻 𝒄𝒐𝒖𝒏𝒕𝒔)/(𝑹𝑬𝑭 𝒄𝒐𝒖𝒏𝒕𝒔+𝑨𝑳𝑻 𝒄𝒐𝒖𝒏𝒕𝒔)
14
CHAPTER 3: RESULTS
3.1 Identification of SNP Loci and Measurement of SNP Expression
SNPs were collected with QUAL greater than 60, which ensures higher confidence calls. we
aligned the mean-centered log2 read counts of remaining SNPs with their chromosome positions
(Fig. 2 to Fig. 5). Both the number of SNPs and their expression were lower in the normal villi
sample and Mole M6 than other samples on chromosome Y (Fig.4).
Fig. 2: The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome 1-22. The mean-centered log2 SNP read counts along the autosomal chromosomes.
Blue and red colors are used to distinguish separate chromosomes.
15
Fig. 3 The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome X.
Fig. 4 The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome Y
16
Fig. 5 The distribution and expression of mean-centered log 2 read counts of SNPs on
chromosome M.
3.2 GTD-specific SNPs show difference in expression of the two alleles
ASE ratios were generated from alternative read counts and reference read counts of SNPs from
the vcf files, as described in Methods. With a QUAL threshold, which is a phred-scaled quality
score for the assertion made in ALT greater than sixty, the ASE ratio points showed spread among
0 to 1.0 (Fig. 6). When the threshold of quality was increased to 500 and 1000, the ASE ratio in
the normal villi sample plots become centered around 0.5 (Fig. 7-Fig. 8). In contrast, in the GTD
samples, the plots are still evenly spread among 0 to 1.0. (Fig. 6-Fig. 8). Among all the samples,
Mole M11 and M17 contained points of ASE ratio more polarized to 0 or 1.0 at quality threshold
of 1000 (Fig.8).
17
Fig. 6: SNP ASE Ratio distribution on Chromosome 1-22 (QUAL>60) The estimated allele
specific ratios of SNPs were pointed along the autosomal chromosomes. Blue and red colors
distinguish separate chromosomes
18
Fig. 7: SNP ASE Ratio distribution on Chromosome 1-22 (QUAL>500) The estimated allele
specific ratios of SNPs were pointed along the autosomal chromosomes Blue and red colors
distinguish separate chromosomes.
Fig 8: SNP ASE Ratio distribution on Chromosome 1-22 (QUAL>1000) The estimated allele
specific ratios of SNPs were pointed along the autosomal chromosomes. Blue and red colors
distinguish separate chromosomes
0.0 0.2 0.4 0.6 0.8 1.0
Choriocarcinoma
1
0.0 0.2 0.4 0.6 0.8 1.0
Mole M6
0.0 0.2 0.4 0.6 0.8 1.0
Mole M11
0.0 0.2 0.4 0.6 0.8 1.0
Mole M17
0.0 0.2 0.4 0.6 0.8 1.0
Mole M21
0.0 0.2 0.4 0.6 0.8 1.0
Mole M22
0.0 0.2 0.4 0.6 0.8 1.0
Chromosome
ASE RATIO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Normal Villi
Chromosome 1-22
Allele specific Ratio
19
3.3 Identification of significant GTD-specific SNP groups
In order to identify GTD related SNP regions, each chromosome was partitioned into 20 SNP
groups as described in the Method section. The FDR for each SNP group was generated
according to the SNP expression profile based on six permutations over a range of potential p-
value significance thresholds. Each permuted dataset was created by randomly selecting the
group expression mean from one mole sample as the null hypothesis. In the comparison of
normal villi and mole samples, we defined a natural threshold at p<0. 00002 when the lowest
FDR was 0.040 (0.005,0.400) (Fig.9). Five SNP groups were identified as GTS-specific regions
from the FDR plot: 3 groups on chromosome 10, 1 group on chromosome 18, 1 group on
chromosome 9 (Table 1).
Fig. 9: Estimated 𝑭𝑫𝑹 and 95% CI for a series of significance thresholds applied to 462
SNP groups of difference between normal villi and mole samples. A final set of “significant”
SNP groups was identified using a threshold, shown as a vertical blue dashed line that
corresponded to the minimum FDR and minimum upper confidence limit. Numbers in the field
denote counts of positive test sat the specified p-value significance threshold.
20
Chromosome
Group
Start
Position
End
Position
Chromosome
band
P-‐value
9
14
110426292
111692264
q31.3
1.39E-‐05
10
2
3467108
7580479
p15.2-‐p14
1.74E-‐05
10
5
28052854
33311959
p11.22-‐p12.1
1.58E-‐05
10
20
126280255
133427415
q26.2-‐q26.3
6.53E-‐06
18
18
63293128
73159591
q21.33-‐q22.3
1.28E-‐06
Table 1: The chromosome positions of significant GTD-specific SNP groups (Mole VS villi)
A similar approach was used to identify SNP groups that differentiated choriocarcinoma and
mole samples. No significant upper limit was found in the plot, and therefore no significant
group was identified to distinguish between mole and choriocarcinoma (Fig. 10).
Fig. 10: Estimated 𝑭𝑫𝑹 and 95% CI for a series of significance thresholds applied to 461
SNP groups of difference between choriocarcinoma and mole samples.
21
3.4 The read counts and ASE ratios of GTD-specific SNP groups along
the chromosome
To explore the potential difference of ASE between normal villi and mole samples, we
highlighted the significant SNP regions on the ASE plots and mean-centered expression plots
(Fig.11-Fig.13). We observed obvious high expression of SNP read counts on normal villi when
mole samples have low expression on the significant group on chromosome 9 (Fig. 11). A
similar higher expression of SNP read counts was found at the left edge of the third significant
group of normal villi sample on chromosome 10 (Fig. 12). The significant SNP group on
chromosome 18 has a broader coverage on the chromosome, but shows no obvious difference
between mole and normal samples (Fig.13).
22
Fig. 11: mean centered log2 read counts (upper figure) and ASE Ratio (lower figure) of
significant group on Chromosome 9
0.0e+00 2.0e+07 4.0e+07 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08
0.0 0.4 0.8
Chromosome 9 position
ASE RATIO
Mole M6
0.0e+00 2.0e+07 4.0e+07 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08
0.0 0.4 0.8
Chromosome 9 position
ASE RATIO
Mole M11
0.0e+00 2.0e+07 4.0e+07 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08
0.0 0.4 0.8
Chromosome 9 position
ASE RATIO
Mole M17
0.0e+00 2.0e+07 4.0e+07 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08
0.0 0.4 0.8
Chromosome 9 position
ASE RATIO
Mole M21
0.0e+00 2.0e+07 4.0e+07 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08
0.0 0.4 0.8
Chromosome 9 position
ASE RATIO
Mole M22
0.0e+00 2.0e+07 4.0e+07 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08
0.0 0.4 0.8
Chromosome 9 position
ASE RATIO
Normal Villi
Chromosome 9 position
Allele specific Ratio
23
Fig. 12: Mean-centered log2 read counts (upper figure) and ASE Ratio (lower figure) of
significant groups on Chromosome10
24
Fig. 13: Mean-centered log2 read counts (upper figure) and ASE Ratio (lower figure) of
significant groups on Chromosome 18
25
CHAPTER 4: DISCUSSION
Our study provided a method to understand the estimated allele-specific expression and read
counts of differential expression along the chromosome for GTD which could give an important
study direction for future studies using larger samples.
We can see the ASE ratio points of SNPs for GTD samples spread among 0 and 1.0, while the
points for normal samples distribute more tightly around 0.5, which is in accordance with a
previous study
[9]
. These results give us a hint that the GTD shows different expression between
the two chromosomes. Mole M11 and M17 contained values of ASE ratio more polarized to 0 or
1.0 at a quality threshold of 1000. This result may indicate these samples are from complete
hydatidiform moles, because a complete hydatidiform mole is free from maternal DNA. With
this consideration, we can identify parental-derived GTS-specific SNPs in further studies.
In the plots of SNP expression, both the number of SNPs and their expression were lower in the
normal villi sample and Mole M6 sample on chromosome Y. These lower SNP expressions on
Chromosome Y may indicate the gender of the sample.
In the plots of the permutation-based FDR method, there were five GTD-specific regions that
significantly differed between normal villi and mole samples. Within these significant GTD-
specific regions there are many genes identified in previous studies to be relevant to
hydatidiform moles. ADAM12 on 10q26.2 is related with adverse pregnancy outcomes
[16]
. The
significant region on chromosome 9 is chromosome 9 open reading frame 84 (C9orf84), which is
involves in pathways of meiotic crossover formation evident in most eukaryotes. The SNP of
26
moles on this region showed lower expression than the normal villi
[17]
. BCL-2 gene on
18q21.33 is related to the development of hydatidiform moles
[18]
. However, the region
containing NLPR7, which is the most common gene related with moles, was not significantly
different between normal villi and mole samples. The reason may be due to the limited sample
numbers and the different subtypes of hydatidiform moles.
In the comparison of choriocarcinoma and mole samples, we do not have significant regions. The
reason may be the similar development of placenta. The SNP number on choriocarcinoma is
much smaller than other samples. Hence, we need collect more samples for the comparsion
between choriocarcinoma and mole samples. Also, choriocarcinoma may be more similar to
GTD than normal.
In a previous study, the permutation-based FDR approach was applied to the analysis of
association between gene expression and disease
[13]
. In our study, we used this approach to
identify significant GTD-specific regions. There are only five significant SNP groups in the
comparison between mole and normal villi samples. Some genes in these regions are associated
with the development of GTD. Therefore the method is valid for identifying the GTD-specific
regions using RNA-seq data.
Our study has limitations. Different types of hydatidiform moles, quality control for SNPs,
group-cutting strategies of the permutation-based FDR approach may affect the results. We also
need to identify the GTD-specific SNPs of significant regions in further studies.
27
REFERENCES
[1] Seckl MJ, Sebire NJ, Berkowitz RS. Gestational trophoblastic disease. The Lancet.
2010;376:717-729.
[2] GARNER EIO, GOLDSTEIN DP, FELTMATE CM, BERKOWITZ RS. Gestational
Trophoblastic Disease. Clinical Obstetrics and Gynecology. 2007;50:112-122.
[3] Sanchez-Delgado M, Martin-Trujillo A, Tayama C, et al. Absence of Maternal Methylation
in Biparental Hydatidiform Moles from Women with NLRP7 Maternal-Effect Mutations Reveals
Widespread Placenta-Specific Imprinting: e1005644. PLoS Genetics. 2015;11.
[4] Sasaki S, Sasaki Y, Kunimura T, Sekizawa A, Kojima Y, Iino K. Clinical Usefulness of
Immunohistochemical Staining of p57 kip2 for the Differential Diagnosis of Complete
Mole.BioMed research international. 2015;2015:905648.
[5] Wake N, Arima T, Matsuda T. Involvement of IGF2 and H19 imprinting in choriocarcinoma
development. International journal of gynaecology and obstetrics: the official organ of the
International Federation of Gynaecology and Obstetrics. 1998;60 Suppl 1:S1.
[6] Bahn J.H., Lee J.H., Li G., Greer C., Peng G., Xiao X. Accurate identification of A-to-I RNA
editing in human by transcriptome sequencing. Genome Res. 2012;22:142–150.
[7] http://genome.cshlp.org/content/14/5/908.full
[8] Kumar S, Banks TW, Cloutier S. SNP Discovery through Next-Generation Sequencing and
Its Applications.International journal of plant genomics. 2012;2012:831460.
[9] Nariai N, Kojima K, Mimori T, Kawai Y, Nagasaki M. A Baysian approach for estimating
allele-specific expression from RNA-Seq data with diploid genomes. BMC genomics. 2016;17
Suppl 1:2.
[10] Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM. A powerful and flexible
statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq
data. Genome research. 2011;21:1728-1737.
[11] http://www.partek.com/html/PartekFlow/RNASeq_Tutorial.pdf
[12] GATK best practices workflow for SNP and indel calling on RNA-seq data.
http://gatkforums.broadinstitute.org/wdl/discussion/3891/calling-variants-in-rnaseq
[13] Millstein J, Volfson D. Computationally efficient permutation-based confidence interval
etimation for tail-area FDR. Frontiers in genetics. 2013;4:179.
28
[14] Yekutieli D, Benjamini Y. Resampling-based false discovery rate controlling multiple test
procedures for correlated test statistics. Journal of Statistical Planning and Inference.
1999;82:171-196.
[15] bcftools — utilities for variant calling and manipulating VCFs and BCFs.
https://samtools.github.io/bcftools/bcftools.html
[16] Yang J, Wu J, Guo F, et al. Maternal Serum Disintegrin and Metalloprotease Protein-12 in
Early Pregnancy as a Potential Marker of Adverse Pregnancy Outcomes: e97284. PLoS One.
2014;9.
[17] Macaisne N, Vignard J, Mercier R. SHOC1 and PTD form an XPF-ERCC1-like complex
that is required for formation of class I crossovers.Journal of cell science. 2011;124:2687.
[18] Candelier J, Frappart L, Yadaden T, et al. Altered p16 and Bcl-2 Expression Reflects
Pathologic Development in Hydatidiform Moles and Choriocarcinoma. Pathology & Oncology
Research. 2013;19:217-227.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
An analysis of disease-free survival and overall survival in inflammatory breast cancer
PDF
Differential methylation analysis of colon tissues
PDF
Predicting neonatal outcomes among women diagnosed with severe preeclampsia and HELLP syndrome: a comparison of models
PDF
Expansion of emergency medical service regional routing for stroke patient care in the United States from 2000-2010
PDF
Incidence and survival rates of the three major histologies of renal cell carcinoma
PDF
Air pollution and breast cancer survival in California teachers: using address histories and individual-level data
PDF
Identification of differentially connected gene expression subnetworks in asthma symptom
PDF
Probing the genetic basis of gene expression variation through Bayesian analysis of allelic imbalance and transcriptome studies of oil palm interspecies hybrids
PDF
An assessment of necrosis grading in childhood osteosarcoma: the effect of initial treatment on prognostic significance
PDF
The study of germ cell tumors and related conditions: an analysis of self-reported data with characterization and comparison of Family History Questionnaires respondents and non-respondents
PDF
Genomic risk factors associated with Ewing Sarcoma susceptibility
PDF
Associations between physical activities with bone mineral density in postmenopausal women
PDF
Gene expression and angiogenesis pathway across DNA methylation subtypes in colon adenocarcinoma
PDF
The effect of inhibiting the NFkB pathway on myeloid derived suppressor cell cytokine production
PDF
Effects of post-menopausal hormone therapy on arterial stiffness in the ELITE trial
PDF
Pharmacogenetic association studies and the impact of population substructure in the women's interagency HIV study
PDF
Using ribosome footprinting to detect translational efficiency of Mbnl1x2 KO muscle cells
PDF
The evaluation of the long-term effectiveness of zero/low fluoroscopy workflow in ablation procedures for the treatment of paroxysmal and persistent atrial fibrillation
PDF
Survival of children and adolescents with low-risk non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) treated with surgery only: an analysis of 234 patients from the Children’s Oncology Group stud...
PDF
Need for tissue plasminogen activator for central venous catheter malfunction and its association with occurrence of vVenous thromboembolism
Asset Metadata
Creator
Fei, Qili
(author)
Core Title
Analysis of SNP differential expression and allele-specific expression in gestational trophoblastic disease using RNA-seq data
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Applied Biostatistics and Epidemiology
Publication Date
09/28/2016
Defense Date
09/26/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
allele-specific expression,gestational trophoblastic disease,OAI-PMH Harvest,SNP differential expression
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Millstein, Joshua (
committee chair
), Mack, Wendy Jean (
committee member
), Wilson, Melissa (
committee member
)
Creator Email
qfei@usc.edu,qili.fei831@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-307675
Unique identifier
UC11281050
Identifier
etd-FeiQili-4826.pdf (filename),usctheses-c40-307675 (legacy record id)
Legacy Identifier
etd-FeiQili-4826.pdf
Dmrecord
307675
Document Type
Thesis
Format
application/pdf (imt)
Rights
Fei, Qili
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
allele-specific expression
gestational trophoblastic disease
SNP differential expression