Page 1 |
Save page Remove page | Previous | 1 of 184 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
AUTOMATIC GENERATION OF THE LOGICAL STRUCTURE OF WEB PAGES
by
Yanbo Ru
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2007
Copyright 2007 Yanbo Ru
Object Description
| Title | Automatic generation of the logical structure of web pages |
| Author | Ru, Yanbo |
| Author email | yru@usc.edu |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Computer Science |
| School | Viterbi School of Engineering |
| Date defended/completed | 2006-12-05 |
| Date submitted | 2007 |
| Restricted until | Unrestricted |
| Date published | 2007-02-14 |
| Advisor (committee chair) |
Horowitz, Ellis Szekely, Pedro |
| Advisor (committee member) |
Zimmermann, Roger Gupta, Sandeep K. |
| Abstract | The World Wide Web has become one of the most important information resources today. Web pages are designed for visual browsing by human users. However, information on the web is also accessed by computer applications and specialized browsing devices. These computer applications and devices need to retrieve, filter, process, index, and present information on the web in sophisticated ways, not just simply rendering web pages like a conventional browser.; In order to facilitate the information filtering and processing of these computer applications and devices, it is desirable to include semantic knowledge into the processing of web pages. Thus it is necessary to determine the logical structure of web pages, i.e. information regions on the web pages and their inter-relationships.; This thesis develops a formal model for representing the logical structure of web pages and presents algorithms for automatically generating the logical structure tree. Instead of using rule based classification approaches, it identifies logical objects by analyzing the structural and semantic features, by checking the lifetime of objects on the web page, and determines the functionality of logical objects using neural networks.; To evaluate the logical structure model, this thesis uses it to categorize search engine results. By classifying the search results into categories, such as articles, product lists, portal pages, homepages of commercial and non-commercial websites, etc., application programs can have a better understanding of the search results.; This thesis also evaluates the logical structure model and algorithms by applying them to ranking the search engine results. By analyzing a web page's logical structure, we can have a more comprehensive understanding about the structure of the web page, and also the importance of each information region. Instead of taking the entire web page as the atomic information unit and giving all hyperlinks and occurrences of keywords on the web page the same weight of importance, we can use individual logical objects as atomic information units and give each information unit different weight according to its functionality and importance. Therefore we can achieve better indexing and more precise ranking of web pages. |
| Keyword | logical structure; web page; machine learning; neural network |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Type | texts |
| Legacy record ID | usctheses-m246 |
| Rights | Ru, Yanbo |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Ru-20070214 |
| Archival file | uscthesesreloadpub_Volume26/etd-Ru-20070214.pdf |
Description
| Title | Page 1 |
| Full text | AUTOMATIC GENERATION OF THE LOGICAL STRUCTURE OF WEB PAGES by Yanbo Ru A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2007 Copyright 2007 Yanbo Ru |
Comments
Post a Comment for Page 1

