Page 1 |
Save page Remove page | Previous | 1 of 83 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
1 A STATISTICAL ONTOLOGY-BASED APPROACH TO RANKING FOR MULTI-WORD SEARCH by Jinwoo Kim ______________________________________________________________________ A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2013 Copyright 2013 Jinwoo Kim
Object Description
Title | A statistical ontology-based approach to ranking for multi-word search |
Author | Kim, Jinwoo |
Author email | jinwook@usc.edu;amadeust@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2013-01-22 |
Date submitted | 2013-04-26 |
Date approved | 2013-04-29 |
Restricted until | 2013-04-29 |
Date published | 2013-04-29 |
Advisor (committee chair) | McLeod, Dennis |
Advisor (committee member) |
Nakano, Aiichiro Pryor, Larry Medvidovic , Nenad Kuo, C.-C. Jay |
Abstract | Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships between keywords. As a result, the accuracy (precision and recall rate) is often unsatisfactory, and the ranking algorithms fail to properly reflect the semantic relevance of keywords. Our research particularly focuses on increasing the accuracy of search results for multi-word search. We propose a statistical ontology-based semantic ranking algorithm based on sentence units, and a new type of query interface including wildcards. First, we allocate higher-ranking scores to keywords located in the same sentence compared with keywords located in separate sentences. While existing statistical search algorithms such as N-gram only consider sequences of adjacent keywords, our approach is able to calculate sequences of non-adjacent keywords as well as adjacent keywords. Second, we propose a slightly different type of query interface, which considers a wildcard as an independent unit of a search query to reflect what users are actually seeking by way of the function of query prediction based on not query data but actual Web data. Unlike current information retrieval approaches such as proximity, statistical language modeling, query prediction and query answering, our statistical ontology-based model synthesizes proximity concept and statistical approaches into a form of ontology. This ontology helps to improve web information retrieval accuracy. ❧ We validated our methodology with a suite of experiments using the Text Retrieval Conference document collection. We focused on two-word queries in our experiments - as two-word queries are quite common. After applying our statistical ontology-based algorithm to the Nutch search engine, we compared the results with results of the original Nutch search and Google Desktop Search. The result demonstrates that our methodology has improved accuracy quite significantly. |
Keyword | ontology; information retrieval; semantic web; search engine |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Kim, Jinwoo |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-KimJinwoo-1614.pdf |
Archival file | uscthesesreloadpub_Volume3/etd-KimJinwoo-1614-0.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | 1 A STATISTICAL ONTOLOGY-BASED APPROACH TO RANKING FOR MULTI-WORD SEARCH by Jinwoo Kim ______________________________________________________________________ A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2013 Copyright 2013 Jinwoo Kim |