Page 1 |
Save page Remove page | Previous | 1 of 126 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
SPAM E-MAIL FILTERING VIA GLOBAL AND USER-LEVEL DYNAMIC ONTOLOGIES by Seongwook Youn A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2009 Copyright 2009 Seongwook Youn
Object Description
Title | Spam e-mail filtering via global and user-level dynamic ontologies |
Author | Youn, Seongwook |
Author email | syoun@usc.edu; fortisisimo@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2008-12-03 |
Date submitted | 2009 |
Restricted until | Unrestricted |
Date published | 2009-11-21 |
Advisor (committee chair) | McLeod, Dennis |
Advisor (committee member) |
Horowitz, Ellis Nakano, Aiichiro Neumann, Ulrich Pryor, Lawrence |
Abstract | E-mail is clearly a very important communication method between people on the Internet. However, the constant increase of e-mail misuse/abuse has resulted in a huge volume of spam e-mail over recent years. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. In my research to date, e-mail data was classified using four different classifiers; Neural Network, SVM classifier, Naive Bayesian Classifier, and C4.5 Decision Tree (J48) classifier. An experiment was performed based on different data size and different feature size. Feature is a set of words to charaterize domain dataset. The final classification result should be ’1’ if it is actually spam, otherwise, it should be ’0’. This paper shows that a simple C4.5 Decision Tree classifier, which makes a binary tree, is efficient for datasets that can be viewed as a binary tree.; We present a new approach to filtering spam e-mail using semantic information represented in ontologies. Ontologies allow for machine-understandable semantics of data [99]. Traditional keyword-based filters rely on manually constructed pattern-matching rules, but spam e-mail varies from user to user and also changes over time. Hence, an adaptive learning filtering technique is deployed in our system. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques; experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. Also, we deploy an Image e-mail handling capability by extraction of information from text embedded image e-mail using OCR. Additionally, we improve the spam filter using a personalized ontology in spam decision on gray e-mail. In the proposed SPONGY (SPam ONtoloGY) system, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter.; The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user’s background as well as the filtering mechanism used in the global ontology filter creation. Using the user-customized ontology filter, we measured the performance improvement by precision, recall and accuracy of classification. Through a set of experiments, it was proven that better classification performance (about 95%) can be achieved using the user-customized ontology filter, which is adaptive and scalable. The main contributions of the paper are 1) to introduce an ontology-based multi-level filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy, and 2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded within other systems for better performance. |
Keyword | e-mail; ontology; spam filtering; text classification |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m2754 |
Contributing entity | University of Southern California |
Rights | Youn, Seongwook |
Repository name | Libraries, University of Southern California |
Repository address | Los Angeles, California |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-Youn-3379 |
Archival file | uscthesesreloadpub_Volume26/etd-Youn-3379.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | SPAM E-MAIL FILTERING VIA GLOBAL AND USER-LEVEL DYNAMIC ONTOLOGIES by Seongwook Youn A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2009 Copyright 2009 Seongwook Youn |