Page 1 |
Save page Remove page | Previous | 1 of 241 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
SUBSET SELECTION ALGORITHMS FOR PREDICTION
by
Abhimanyu Das
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2011
Copyright 2011 Abhimanyu Das
Object Description
| Title | Subset selection algorithms for prediction |
| Author | Das, Abhimanyu |
| Author email | abhimand@usc.edu;abhi.das@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Computer Science |
| School | Viterbi School of Engineering |
| Date defended/completed | 2011-05-10 |
| Date submitted | 2011-07-29 |
| Date approved | 2011-07-29 |
| Restricted until | 2011-07-29 |
| Date published | 2011-07-29 |
| Advisor (committee chair) | Kempe, David |
| Advisor (committee member) |
Teng, Shanghua Sha, Fei James, Gareth |
| Abstract | In this dissertation, we study the subset selection problem for prediction. It deals with choosing the “best” or “most informative” k-subset from a large set of n > k observable variables, to predict the value of a function or another variable of interest that is related to the observable variables. Natural applications of this problem abound in areas as diverse as medicine, social sciences, economics, numerical analysis, signal processing and sensor networks. There are various mathematical formulations for this problem, depending on the characterization of the best subset and of the dependencies between variables. We study two versions: the first version is a stochastic framework for subset selection of random variables using linear regression, and the second is an adversarial framework for estimating aggregate statistics of a function in the presence of metricspace induced spatial constraints. ❧ The goal of this dissertation is to perform an algorithmic analysis of the subset selection problems, characterize natural conditions which make these problems tractable, and explore polynomial-time algorithms with guaranteed optimal or near-optimal solutions. For the stochastic subset selection problem, we explore two broad approaches for designing efficient approximation algorithms. The first approach uses a graph-theoretic framework to characterize the covariance structure of the problem instance, and design efficient algorithms for several classes of covariance graphs. The second approach uses an algebraic framework based on spectral and submodular analysis, to identify conditions under which greedy algorithms can obtain good performance guarantees. ❧ For adversarial subset selection, we provide efficient deterministic and randomized sampling strategies and corresponding prediction functions to approximate some commonly used aggregate statistics. For the deterministic setting, we show an interesting connection with common clustering problems, and obtain constant factor approximation algorithms for predicting the average and maximum statistics. For the randomized setting, we obtain a polynomial-time approximation scheme for the problem of finding the optimal randomized algorithm for choosing a single sample to predict the average statistic. We also solve the interesting special case of estimating the integral of a univariate Lipschitz-continuous function over the [0, 1] interval using one sample, and design an optimal randomized algorithm in this setting. ❧ For several of our subset selection algorithms, we also experimentally validate our theoretical analysis on several real-world data sets. |
| Keyword | approximation algorithms; machine learning; regression; feature selection; sparse approximation; compressed sensing; submodularity |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m |
| Rights | Das, Abhimanyu |
| Access conditions | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
| Repository name | University of Southern California Digital Library |
| Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
| Repository email | cisadmin@usc.edu |
| Archival file | uscthesesreloadpub_Volume71/etd-DasAbhiman-201.pdf |
Description
| Title | Page 1 |
| Full text | SUBSET SELECTION ALGORITHMS FOR PREDICTION by Abhimanyu Das A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2011 Copyright 2011 Abhimanyu Das |
Comments
Post a Comment for Page 1

