Page 1 |
Save page Remove page | Previous | 1 of 128 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
STATISTICAL MODELING OF SEQUENCE AND GENE EXPRESSION
DATA TO INFER GENE REGULATORY NETWORKS
by
Xiting Yan
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTATIONAL BIOLOGY)
August 2009
Copyright 2009 Xiting Yan
Object Description
| Title | Statistical modeling of sequence and gene expression data to infer gene regulatory networks |
| Author | Yan, Xiting |
| Author email | xitingya@usc.edu; yanxiting@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Molecular & Computational Biology |
| School | College of Letters, Arts and Sciences |
| Date defended/completed | 2009-05-13 |
| Date submitted | 2009 |
| Restricted until | Unrestricted |
| Date published | 2009-08-07 |
| Advisor (committee chair) | Sun, Fengzhu |
| Advisor (committee member) |
Chen, Ting Schumitzky, Alan Nuzhdin, Sergey Waterman, Michael |
| Abstract | Understanding the gene regulatory network has always been one of the important and challenging tasks to understanding the mechanisms behind different biological processes or behaviors of organisms. Development in biological experimental techniques has produced a deluge of data to help accomplish this task, including the microarray data, ChIP-chip data, sequencing data, etc. Based on these techniques, biologists have designed different experiments to investigate the regulatory networks from different aspects. However, due to the random effects during the production of these data, statistical and probabilistic models are needed to extract reliable regulatory relationship from the data. Moreover, integration of different sources of data is also critical for obtaining complete and accurate information regarding the biological mechanisms.; In this dissertation, I try to infer the gene transcription regulatory networks by first detecting the downstream or regulated genes in the network, which tend to be differentially expressed under different conditions such as different tissues or treatment versus control experiment. Two different methods are developed to detect the differentially expressed transcripts or genes from the sequencing data and the gene expression microarray data. The first method estimates the expression levels of all the annotated mRNAs in available database using the mRNA sequencing data and identifies the differentially expressed mRNAs based on the estimations. In this way, however, the mRNAs are considered independently which contradicts the fact that interactions between genes are commonly observed. Therefore, the second method was developed which detects sets of genes that are enriched in differentially expressed or more generally speaking, phenotype associated genes. The sets of genes are predefined so that genes in each set interact with each other in certain way. Due to the limited knowledge of the interactions between mRNAs, currently this method can only be applied to the gene expression data. Applications of both methods to simulated data sets and real data sets show robust and accurate predictions. After the downstream genes are detected, the genes that regulates the expression of these downstream genes are critical to infer the regulatory networks. Thus, we propose another analysis method which utilizes an EM algorithm to predict the gene regulatory network from the gene expression data, sequence data and allele-specific expression data in certain number of genotypes. Preliminary simulation studies suggest the minimum number of genotypes that are needed to achieve satisfactory detection accuracy. Due to the lack of real data set, so far as, this method has only been applied to simulated data sets. |
| Keyword | sequence; gene expression; regulatory networks; data analysis; statistical modeling; inference |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m2542 |
| Rights | Yan, Xiting |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Yan-2928 |
| Archival file | uscthesesreloadpub_Volume56/etd-Yan-2928.pdf |
Description
| Title | Page 1 |
| Full text | STATISTICAL MODELING OF SEQUENCE AND GENE EXPRESSION DATA TO INFER GENE REGULATORY NETWORKS by Xiting Yan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTATIONAL BIOLOGY) August 2009 Copyright 2009 Xiting Yan |
Comments
Post a Comment for Page 1

