Page 1 |
Save page Remove page | Previous | 1 of 91 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
DBSSC : Density-Based Searchspace-limited Subspace Clustering by Jongeun Jun A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) Dec 2013 Copyright 2013 Jongeun Jun
Object Description
Title | DBSSC: density-based searchspace-limited subspace clustering |
Author | Jun, Jongeun |
Author email | jongeunj@usc.edu;jongeunj@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2013-10-21 |
Date submitted | 2013-11-18 |
Date approved | 2013-11-18 |
Restricted until | 2013-11-18 |
Date published | 2013-11-18 |
Advisor (committee chair) | McLeod, Dennis |
Advisor (committee member) |
Shahabi, Cyrus O’Leary, Daniel E. |
Abstract | We propose a mining framework that supports the identification of useful knowledge based on data clustering. With the recent advancement of microarray technologies, the expression levels of thousands of genes can be measured simultaneously. The availability of the huge volume of microarray dataset makes us to focus our attention on gene expression datasets mining. We apply density-based approach to identify clusters from full-dimensional microarray datasets, and get the meaningful results. In general, microarray technologies provide multi-dimensional data. In particular, given that genes are often co-expressed under subsets of experimental conditions, we present a novel subspace clustering algorithm. In contrast to previous approaches, our method is based on the observation that the number of subspace clusters is related with the number of maximal subspace clusters to which any gene pair can belong. By performing discretization to gene expression profiles, the similarity between two genes is transformed as a sequence of symbols that represents the maximal subspace cluster for the gene pair. This domain transformation (from genes into gene-gene relations) allows us to make the number of possible subspace clusters dependent on the number of genes. Based on the symbolic representations of genes, we present an efficient subspace clustering algorithm that is scalable to the number of dimensions. In addition, the running time can be drastically reduced by utilizing inverted index and pruning non-interesting subspaces. Furthermore, by incorporating the density-based approach into the above searchspace-limited subspace clustering, we develop a fast running subspace clustering algorithm which finds important subspace clusters. Experimental results indicate that the proposed method efficiently identifies co-expressed gene subspace clusters for the yeast cell cycle datasets. |
Keyword | data mining; big data analysis; bioinformatics; subspace clustering; gene expression analysis; density-based clustering |
Language | English |
Format (imt) | application/pdf |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Jun, Jongeun |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-JunJongeun-2157.pdf |
Archival file | uscthesesreloadpub_Volume8/etd-JunJongeun-2157.pdf |
Description
Title | Page 1 |
Repository email | cisadmin@lib.usc.edu |
Full text | DBSSC : Density-Based Searchspace-limited Subspace Clustering by Jongeun Jun A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) Dec 2013 Copyright 2013 Jongeun Jun |