Page 1 |
Save page Remove page | Previous | 1 of 171 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
TREE-ADJOINING MACHINE TRANSLATION by Steve DeNeefe A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2011 Copyright 2011 Steve DeNeefe
Object Description
Title | Tree-adjoining machine translation |
Author | DeNeefe, Steve |
Author email | deneefe@usc.edu;sdeneefe@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2011-10-11 |
Date submitted | 2011-10-27 |
Date approved | 2011-10-27 |
Restricted until | 2011-10-27 |
Date published | 2011-10-27 |
Advisor (committee chair) | Knight, Kevin |
Advisor (committee member) |
Marcu, Daniel Chiang, David Schaal, Stefan Narayanan, Shrikanth |
Abstract | Machine Translation (MT) is the task of translating a document from a source language (e.g., Chinese) into a target language (e.g., English) via computer. State-of-the-art statistical approaches to MT use large collections of human-translated documents as training material, gathering statistics on the patterns of correspondence between languages according to the features specified by the translation model. Using this bilingual translation model in conjunction with a target language model, created by gathering statistics from a large monolingual corpus, a new document in the source language can be automatically translated into its target-language equivalent with surprising accuracy. ❧ Much MT research focuses on types of the patterns and features to include in a translation model. Recent statistical MT models have used syntax trees to enforce grammaticality, but the currently popular tree substitution models only memorize sequences of words or constituents, specifying exactly what phrases to use and exactly what trees are grammatical, which does not generalize well. Adding the operation of tree-adjoining provides the freedom to splice additional information into an existing grammatical tree. An adjoining translation model allows general, linguistically-motivated translation patterns to be learned without the clutter of endless variations of optional material. The appropriate modifiers, such as adjectives, adverbs, and prepositional phrases, can be grafted into these core patterns as needed to translate details. We show that the increased generalization power provided by adjoining, when used carefully, improves MT quality without becoming computationally intractable. ❧ In this thesis, we describe challenges encountered by both word-sequence-based and syntax-tree-based MT systems today, and present an in-depth, quantitative comparison of both models. Then we describe a novel model for statistical MT which addresses these challenges using a synchronous tree-adjoining grammar. We introduce a method of converting these grammars to a weakly equivalent tree transducer for decoding. Then we present a method for learning the rules and associated probabilities of this grammar from aligned tree/string training data, and empirically analyze important characteristics of the resulting model, considering and evaluating many variations. Finally, our results show that adjoining delivers a consistent improvement over a baseline statistical syntax-based MT model on both medium and large-scale MT tasks using several language pairs. |
Keyword | machine translation; statistical machine translation; tree-adjoining grammar; formal grammar; translation models; computational linguistics; syntax-based machine translation |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | DeNeefe, Steve |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Archival file | uscthesesreloadpub_Volume71/etd-DeNeefeSte-365.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | TREE-ADJOINING MACHINE TRANSLATION by Steve DeNeefe A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2011 Copyright 2011 Steve DeNeefe |