Page 1 |
Save page Remove page | Previous | 1 of 278 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
SEMANTICALLY-ENRICHED PARSING FOR
NATURAL LANGUAGE UNDERSTANDING
by
Stephen Tratz
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2011
Copyright 2011 Stephen Tratz
Object Description
| Title | Semantically-enriched parsing for natural language understanding |
| Author | Tratz, Stephen Charles |
| Author email | tratz@usc.edu;stevatra@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Computer Science |
| School | Viterbi School of Engineering |
| Date defended/completed | 2011-08-25 |
| Date submitted | 2011-10-08 |
| Date approved | 2011-10-09 |
| Restricted until | 2011-10-09 |
| Date published | 2011-10-09 |
| Advisor (committee chair) | Hovy, Eduard |
| Advisor (committee member) |
Hobbs, Jerry Chiang, David Rosenbloom, Paul O'Leary, Daniel |
| Abstract | This thesis details three contributions to the advancement of semantic-enriched parsing for English sentences: inventories of semantic relations covering three semantically ambiguous linguistic phenomena, large datasets annotated according to the inventories, and, finally, a suite of tools for semantically-enriched parsing built using the datasets. For the purposes of this thesis, semantically-enriched parsing is defined as the reconstruction of the underlying grammatical structure of text along with shallow semantic annotation of semantically-ambiguous structures. Ultimately, semantically-enriched parsing is one of the most critical steps in natural language understanding---the initial step in which the text is read by the machine into a knowledge representation for further processing and reasoning. ❧ The first contribution of this thesis is to advance the theoretical foundations for the interpretation of three ambiguous linguistic phenomena in English that have significant overlap in terms of the relations expressed: noun compounds, possessive constructions, and prepositions. For these, I define inventories of relations based upon extensive annotation by myself, previous work by others, and inter-annotator agreement studies. In the case of prepositions, the relations are created by refining an existing resource whereas the other two are created from scratch. In addition to mappings to prior work, mappings are provided across the different inventories in order to create a unified set of relations. ❧ Second, I produce large datasets annotated according to the aforementioned sense inventories. Such data is vital for training most automatic tools and also provides exemplars for the theory embodied in the inventories. Some of these datasets are created from scratch, including a collection of over 17,500 noun compounds and a collection of over 21,900 possessive construction examples. In the case of prepositions, an existing resource including over 24,000 annotated examples is refined. ❧ The final contribution is a suite of tools that can construct semantically-enriched parse trees. The suite is designed to work in a sequential, pipeline-like fashion and can be thought of as consisting of two subsections. The first part reconstructs the grammatical structure of the text using a dependency parser that extends the non-directional easy-first algorithm developed by Goldberg and Elhadad in order to support non-projective trees and is trained using my improved dependency tree conversion of the Penn Treebank. Second are semantic annotation modules that add shallow semantic annotation for noun compounds, preposition senses, possessives, and verbal arguments. Combined, these tools produce semantically-enriched parse trees that include both grammatical structure and shallow semantics. The core parser itself achieves state-of-the-art accuracy and can process over \parsespeed sentences per second, which is substantially faster than most of the accurate parsers available today. ❧ In conclusion, this thesis work provides significant contributions to computational linguistics, both in terms of theory and resources. It advances our understanding of the relations expressed by three semantically-ambiguous linguistic phenomena, creates large annotated datasets useful for machine learning, and produces a fast, accurate, and informative system for semantically-enriched parsing. |
| Keyword | computational linguistics; parsing; semantics; noun compounds; prepositions; possessives; easy-first |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m |
| Rights | Tratz, Stephen Charles |
| Access conditions | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
| Repository name | University of Southern California Digital Library |
| Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
| Repository email | cisadmin@usc.edu |
| Archival file | uscthesesreloadpub_Volume6/etd-TratzSteph-323.pdf |
Description
| Title | Page 1 |
| Full text | SEMANTICALLY-ENRICHED PARSING FOR NATURAL LANGUAGE UNDERSTANDING by Stephen Tratz A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2011 Copyright 2011 Stephen Tratz |
Comments
Post a Comment for Page 1

