Page 1 |
Save page Remove page | Previous | 1 of 187 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
DECIPHERING NATURAL LANGUAGE by Sujith Ravi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2011 Copyright 2011 Sujith Ravi
Object Description
Title | Deciphering natural language |
Author | Ravi, Sujith |
Author email | sravi@usc.edu; sujith_ravi@yahoo.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2011-03-21 |
Date submitted | 2011 |
Restricted until | Unrestricted |
Date published | 2011-03-28 |
Advisor (committee chair) | Knight, Kevin |
Advisor (committee member) |
Marcu, Daniel Chiang, David Teng, Shang-Hua Narayanan, Shrikanth S. |
Abstract | Most state-of-the-art techniques used in natural language processing (NLP) are supervised and require labeled training data. For example, statistical language translation requires huge amounts of bilingual data for training translation systems. But such data does not exist for all language pairs and domains. Using human annotation to create new bilingual resources is not a scalable solution. This raises a key research challenge: How can we circumvent the problem of limited labeled resources for NLP applications? Interestingly, cryptanalysts and archaeologists have tackled similar challenges in solving "decipherment problems".; This thesis work aims to bring together techniques from classical cryptography, NLP and machine learning. We introduce a novel approach called "natural language decipherment" that can solve natural language problems without labeled (parallel) data. A wide variety of NLP problems can be formulated as decipherment tasks---for example, in statistical language translation one can view the foreign-language text as a cipher for English. Instead of relying on parallel training data, decipherment uses knowledge of the target language (e.g., English) and large quantities of readily available monolingual source (cipher) data to induce bilingual connections between the source and target languages. Using decipherment techniques, we make headway in attacking a hierarchy of problems ranging from letter substitution decipherment to sequence labeling problems (such as part-of-speech tagging) to language translation. Along the way, we make several key contributions -- novel unsupervised algorithms that search for minimized models during decipherment and achieve state-of-the-art results on a number of important natural language tasks. Unlike conventional approaches, these decipherment methods can be easily extended to multiple domains and languages (especially resource-poor languages), thereby helping to spread the impact and benefits of NLP research. |
Keyword | natural language processing; machine learning; computational decipherment; artificial intelligence; statistics |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m3705 |
Contributing entity | University of Southern California |
Rights | Ravi, Sujith |
Repository name | Libraries, University of Southern California |
Repository address | Los Angeles, California |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-Ravi-4461 |
Archival file | uscthesesreloadpub_Volume51/etd-Ravi-4461.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | DECIPHERING NATURAL LANGUAGE by Sujith Ravi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2011 Copyright 2011 Sujith Ravi |