Page 1 |
Save page Remove page | Previous | 1 of 184 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
STATISTICAL APPROACHES FOR INFERRING CATEGORY KNOWLEDGE
FROM SOCIAL ANNOTATION
by
Anon Plangprasopchok
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2010
Copyright 2010 Anon Plangprasopchok
Object Description
| Title | Statistical approaches for inferring category knowledge from social annotation |
| Author | Plangprasopchok, Anon |
| Author email | plangpra@usc.edu; anon.plangprasopchok@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Computer Science |
| School | Viterbi School of Engineering |
| Date defended/completed | 2010-08-16 |
| Date submitted | 2010 |
| Restricted until | Unrestricted |
| Date published | 2010-09-03 |
| Advisor (committee chair) | Lerman, Kristina |
| Advisor (committee member) |
Arbib, Michael A. Knoblock, Craig A. O'Leary, Daniel E. |
| Abstract | Social annotation captures the collective knowledge of thousands of users and can potentially be used to enhance an array of applications including Web search, information personalization and recommendation, and even synthesize categorical knowledge. In order to make best use of social annotation -- annotation generated by many users, we need methods that effectively deal with the challenges of data sparseness and noise, as well as take into account inconsistency in the vocabulary, interests, and the level of expertise among individual users. In this thesis, I study computational approaches to learning and integrating category knowledge in terms of topics, concepts, and hierarchical relations between them from two popular forms of social annotation: tags and personal hierarchies.; Learning category knowledge from tags created by many distinct users to describe objects is challenging since tags not only reflect object's categories but also users' interests in the tagged objects. To address this challenge, I propose a probabilistic model that takes into account variation in interest among users to infer a more accurate topic model of the tagged objects. I explore its performance in detail on a synthetic data set and compare it to Latent Dirichlet Allocation (LDA), a popular document modeling algorithm. I show that in domains with high tag ambiguity, variations among users can actually help discriminate between tag senses, leading to better topics. My approach is, therefore, best suited to make sense of social annotation, since this domain is characterized both by a high degree of noise and ambiguity, and a highly diverse user population with varied interests. Additionally, I extend the model to automatically adjust its key parameters as suggested by data. This capability helps overcome one of the main difficulties of applying the original model to the data: namely, having to specify the right number of common topics and interests.; Structured social annotation, such as personal hierarchies, helps users organize their content. Although individual structures --- broader/narrower relations between concepts --- are already explicitly specified by users, learning their common complex structures in a specific form, such as tree, is a difficult task. This is because individual users usually specify them in many different ways; therefore, they are not conforming. As the second main contribution of the thesis, I study the folksonomy learning problem, i.e., learning a common hierarchy from many small personal hierarchies. I first propose a simple, yet efficient clustering-based method that incrementally weaves individual hierarchies into a deeper, more complete folksonomy, from its root down to leaves. Inconsistencies are removed as the common hierarchy grows. Alternatively, I frame folksonomy learning as a generic structure learning problem -- learning complex structures from many smaller ones. I develop a novel probabilistic approach, which is based on distributed inference. Thanks to structural constraints integrated into the inference procedure, the method avoids structural inconsistencies, as all individual structures are combined simultaneously.; All proposed approaches are evaluated on real-world data sets and the experimental results demonstrate their advantages in many aspects. As social annotations become more and more available, the approaches are very promising as a means to mine knowledge from social annotation that can prove useful in many applications. |
| Keyword | clustering; data mining; folksonomy; machine learning; social annotation; social information processing |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m3419 |
| Rights | Plangprasopchok, Anon |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Plangprasopchok-4066 |
| Archival file | uscthesesreloadpub_Volume26/etd-Plangprasopchok-4066.pdf |
Description
| Title | Page 1 |
| Full text | STATISTICAL APPROACHES FOR INFERRING CATEGORY KNOWLEDGE FROM SOCIAL ANNOTATION by Anon Plangprasopchok A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2010 Copyright 2010 Anon Plangprasopchok |
Comments
Post a Comment for Page 1

