Page 1 |
Save page Remove page | Previous | 1 of 102 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
Too many needles in this haystack: Algorithms for the Analysis of Next
Generation Sequence Data.
by
Mourad (Tade) Souaiaia
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTATIONAL BIOLOGY AND BIOINFORMATICS)
December 2012
Copyright 2012 Mourad (Tade) Souaiaia
Object Description
| Title | Too many needles in this haystack: algorithms for the analysis of next generation sequence data |
| Author | Souaiaia, Mourad (Tade) |
| Author email | tade.souaiaia@gmail.com;tade.souaiaia@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Computational Biology and Bioinformatics |
| School | College of Letters, Arts And Sciences |
| Date defended/completed | 2012-08-02 |
| Date submitted | 2012-09-01 |
| Date approved | 2012-09-01 |
| Restricted until | 2012-09-01 |
| Date published | 2012-09-01 |
| Advisor (committee chair) | Chen, Ting |
| Advisor (committee member) |
Sun, Fengzhu Knowles, James |
| Abstract | The development of second-generation sequencing (SGS) technology has provided sci- entists with a myriad of opportunities as well as new challenges. SGS machines are capable of sequencing billions of short reads at a fraction of the cost and time in com- parison to older technology. Often, the study of sequence data begins with the align- ment of billions of short dna reads to the 3 billion base pair human reference genome, a daunting computational task, especially if the error-rate between the reads and ref- erence is high. For this reason, PerM was developed to use periodic spaced seeds to efficiently and accurately provide highly sensitive ungapped alignment for Illumina and SOLiD reads. Inexact alignments are often the most interesting biologically, because mismatches between the read and reference are often the result of genetic variation. To accurately detect and discern variation from machine errors, we developed ComB, which iteratively applies Bayesian statistics to color or base alignment to accurately determine mutation probability. This allowed us to study a host of biological phenomena which result in rare nucleotide differences, including single nucleotide polymorphisms (SNPs), RNA-editing, and allele-specific expression. DNA-methylation of cytosine residues also produces single-base mismatches when dna is treated with sodium-bisulfite which changes all unmethylated cytosine residues to thymine. To accurately estimate methylation rates from sodium bisulfite treated dna we developed FadE, an algorithm which uses Newton- Raphson optimization to estimate the methylation rate at every cytosine residue in the genome. Finally, we have applied all our statistical tools to study human mRNA editing, and have shown that RNA editing in human brain tissue occurs at a much lower rate than previously thought. |
| Keyword | algorithms; genome sequencing; genetics; computational biology; gene expression; methylation; SNP calling; sequence alignment |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m |
| Rights | Souaiaia, Mourad (Tade) |
| Access conditions | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
| Repository name | University of Southern California Digital Library |
| Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
| Repository email | cisadmin@usc.edu |
| Archival file | uscthesesreloadpub_Volume4/etd-SouaiaiaMo-1180.pdf |
Description
| Title | Page 1 |
| Full text | Too many needles in this haystack: Algorithms for the Analysis of Next Generation Sequence Data. by Mourad (Tade) Souaiaia A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTATIONAL BIOLOGY AND BIOINFORMATICS) December 2012 Copyright 2012 Mourad (Tade) Souaiaia |
Comments
Post a Comment for Page 1

