Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
An adaptive temperament -based information filtering method for user -customized selection and presentation of online communication
(USC Thesis Other) 

An adaptive temperament -based information filtering method for user -customized selection and presentation of online communication

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content AN ADAPTIVE TEMPERAMENT-BASED INFORMATION FILTERING METHOD FOR USER CUSTOMIZED SELECTION AND PRESENTATION OF ON-LINE COMMUNICATION by Cha-Hwa Lin A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2002 Copyright 2002 Cha-Hwa Lin Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3093784 Copyright 2002 by Lin, Cha-Hwa All rights reserved. ® UMI UMI Microform 3093784 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by Cha - /-/a) a. L-fn under the direction of h s* dissertation committee, and approved by all its members, has been presented to and accepted by the Director of Graduate and Professional Programs, in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Director Date ®ecem^e r 2002 Dissertation Committee t Chair Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedication To the people who nurture, support, and encourage my spiritual and emotional growth and constantly inspire me to keep seeking the delight of learning new technology, especially my family, Chih-Sheng, Min-Hsuan, Min-Heng, and Min-Hung. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements I wish to express my sincere and deep appreciation to many of my great teachers and wonderful friends during my study at USC, especially the following people. First of all, I gratefully acknowledge my advisor, Professor Dennis McLeod, for his guidance, encouragement, and support. His knowledge, dedication, and humor make his lectures on database systems and database systems interoperability very interesting, creative, and inspiring. The lectures on human temperaments opened for me the fascinating cross-disciplinary field of computer science, cognitive science, and social science for machine learning and intelligent systems that I have plunged into. I would like to thank him with my most sincere and deepest gratitude not only for his understanding, kindness, and patience giving me the freedom of exploring the research with creative (and sometimes naive) thinking but also for his insights and timely advice guiding me towards appropriate research direction. Without his help, I would not have made this far. I am grateful for the inspiration and wisdom of the committee members, Professor Kevin Knight for his valuable comments, Professor Larry Pryor for his precious suggestions and motivating encouragement, Professor Daniel O’Leary for his informative remarks, and Professor Cyrus Shahabi for his helpful feedback. They have contributed to the quality as well as the technical depth of this dissertation. iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Professor Shankar Rajamoney showed and trained me the remarkable power of artificial intelligence. The theory-W, make every one a winner, that I learned from Professor Barry Boehm in software engineering deeply impressed me and has been taken as a motto in my life. Professor Kai Hwang taught me the principles and techniques of advanced computer architecture, particularly in parallelism and scalability. I thank them from the bottom of my heart. I would like to thank the staff in Computer Science Department and Integrated Media Systems Center for their assistance, especially Amy Yung for her advice, Kusum Shori for her editing, and Lisette Garcia-Miller for her editing and efforts. A number of my former and present colleagues in Computer Science Department and Integrated Media Systems Center, including Jonghyun Kahng, Goksel Aslan, Wen-Hsiang Kevin Liao, Latifur Khan, Yan Wang, Matthew Sheby, Hyun Woong Shin, Seokkyung Chung, Kiyoung Yang, and Yun-An Anne Chen, also deserve acknowledgement for their positive suggestions and discussions. I have been fortunate to work with these friendly and intelligent people. I am also grateful for a number of students at USC and friends who have participated in the user-studies testing of the experiments and have given feedback and encouragement. My dear friend Karen Dang has my hearty appreciation for helping me and sharing my distress, laughter, dreams, and hopes. My sincere thankfulness is due to I-Chun Julia Huang, Feng-Hsia Kao, Han-Chou Lee, Yonghua Zhang, Dinunzio Jim, Yu-Chi Lee, Nico Lin, Chun-Cheng Thomas Lin, Meng-Shan Lin, Yu-Chung Lin, iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Yu-Heng Pei, Kuo-Luen Wang, I-Cheng Charles Hsu, Shih-Liang Ou, Hung-An Hu, Sister Delphina, Sister Theodora, Lerena, Yi-Ping Na, Jeeyon Jung, Xiaoli Yu, Karen Greene, Xia Wang, Kyung Lee, Champa Soyza, and Agatha Poblele for their help, inspiration, and friendship. I feel a deep sense of gratitude to my family. Their spiritual support, encouragement, and understanding have made all the difference. I am grateful to the happy memory of my father. I am deeply indebted to my mother for her devotion to the family. I thank my sisters, Kwan-Hwa and Ming-Hwa, and brother, Wei-Chung, for being there for me when I needed. I would like to further thank my mother-in- law for her understanding and support. My special thanks go to my sister-in-law, Fui-Fang, for her kindness, understanding, and patience for helping me and listening to my ramblings, musings, anxieties, and joys. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents Dedication ii Acknowledgements iii List of Tables ix List of Figures xi Abstract xv Chapter 1 Introduction 1 Chapter 2 Related Research 12 2.1 Customized Access Systems .... 12 2.1.1 Web-Specific Systems................................ 12 2.1.2 General-Purpose Systems........................................................... 17 2.2 Human Temperaments......................... 21 2.3 Vector Space Model.................... 23 Chapter 3 Temperament-Based Information Filtering Method 26 3.1 Overview of the Temperament-Based Filtering Model ..... 27 3.1.1 Temperament-Segmented Information Space..................................— .27 3.1.2 Functional Architecture................. 28 3.2 Learning the Temperament Concept .... 31 3.3 Segmentation Criteria..................................... ..32 3.3.1 Segmentation Function .............................................. 34 3.3.2 Preset Threshold ....... 36 3.3.3 An Illustrative Example.............................. 37 3.3.4 Bias Elimination............................................. 38 3.3.5 Ordering the Segments......................... — 38 3.4 Temperament Weights of Information Units ..... 39 3.4.1 Temperament Weighting Function ........................................... 39 3.4.2 An Illustrative Example.................... .41 3.4.3 Estimating Centroid Temperament Weights of Clusters ...........42 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.4 The Learning Algorithm........................................................... 44 3.5 Inference in the Classifier....................................................................... .....45 3.5.1 The Popularity Similarity Measure ..... ..46 3.5.2 Heuristic Classification Function.. ..............................................47 3.5.3 The Classifier Algorithm ...... 48 3.6 Inference in the Filtering Process ..... 49 3.6.1 Heuristic Selection Rules.. .... 50 3.6.2 The Filtering Algorithm.................................................................. ....51 3.7 Feedback Refinement ...... 53 3.7.1 Relevance Feedback Method ................................................... 54 3.7.2 The Refinement Algorithm................... 55 Chapter 4 Experimental Prototype Implementations 56 4.1 Overview of the Experimental Prototype System.........................................56 4.1.1 User Simulation................................... 59 4.1.2 Evaluation Metrics ........ 60 4.1.3 Experimental Procedure.......................................................................61 4.1.4 Experimental Setup...................................... .63 4.1.4.1 General Considerations .... 64 4.1.4.2 Threshold in Popularity for Segmentation..............................64 4.1.4.3 Threshold in Cosine Similarity Measure for Clustering ...64 4.2 Experiment #1 on a Document Collection of 2,000 Web Pages .... 65 4.2.1 User Interfaces, Modeling, and Profiles...............................................67 4.2.2 Experimental Results........................................................ 68 4.2.2.1 Case 1: For Users with Unknown Temperament and Interest ........................................................................68 4.2.2.2 Case 2: Given the User Temperament.................. ....70 4.2.2.S Case 3: Given the User Interest Key Terms...........................71 4.2.2.4 Case 4: Given the User Temperament and Interest Key Terms................................................................................ 72 4.3 Experiment #2 on an Art Image Collection of 2,000 Pictures ....... .78 4.3.1 User Interfaces, Modeling, and Profiles ..... 78 4.3.2 Experimental Results.............. 80 4.4 Experiment #3 on an Art Image Collection of 100 Pictures ... 85 4.4.1 User Interfaces, Modeling, and Profiles...............................................86 4.4.2 Experimental Results.............................................................. ..87 4.5 Experiment #4 on Representation Styles............. 92 4.5.1 Questionnaire Design and Modeling....................................................92 4.5.2 User Interfaces, Modeling, and Profiles................................ 97 4.5.3 Experiment #4.1 for Static Information Space.....................................98 4.5.3.1 Experimental Results............... 99 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5.4 Experiment #4.2 for Dynamic Information Space............................103 4.5.4.1 Experimental Results..................... 104 Chapter 5 Examination of the Popularity Similarity Measures 109 5.1 General Considerations ...... 110 5.2 Experimental Results .... 110 5.2.1 For a Collection of 2,000 Documents................................................I l l 5.2.2 For a Collection of 2,000 Art Images............................................. . 112 5.2.3 For a Collection of 100 Art Images............................................... 112 5.3 Summary.................................................................................................... 113 Chapter 6 Conclusions 120 References 123 Appendix A User Interfaces of Experiment #1 on a Document Collection of 2,000 Web Pages 130 Appendix B User Interfaces of Experiment #2 on an Art Image Collection of 2,000 Pictures 142 Appendix C User Interfaces of Experiment #3 on an Art Image Collection of 100 Pictures 153 Appendix D User Interfaces of Experiment #4 on Representation Styles 164 v iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables 1.1 Key terms relevant to the temperament-based information filtering method 7 2.1 Keirsey’s four temperaments.................................. ....22 2.2 The percentage distributions of the temperaments in the united states................22 3.1 Sample statistical results ...............................................................................38 3.2 Sample statistical results (scale of# of User Responses: 100) ..... 42 3.3 Clustering procedure................................... ...43 3.4 Temperament-based learning algorithm.......................................................... ...44 3.5 Temperament-based classifier algorithm............................................................ 48 3.6 Heuristic selection rules.......................................................................................50 3.7 Temperament-based filtering algorithm.............................................................. 53 3.8 Relevance feedback algorithm.............................................................................55 4.1 Simulation algorithm................... 60 4.2 Experimental procedure...................... .62 5.1 Comparison of “popularity similarity” measures by accuracy of recommendation (2,000 documents, 20 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament. (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms............................................................................... 114 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 Comparison of “popularity similarity” measures by accuracy of recommendation (2,000 art images, 20 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament. (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms................... ............................................... ......... 116 5.3 Comparison of “popularity similarity” measures by accuracy of recommendation (100 art images, 4 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament. (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms..............................................................................118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures 2.1 Categorization of related research................................ 13 2.2 lung's four sets of preferences................... 21 2.3 Sample array of document vectors............. 23 3.1 A temperament-segmented feature of the real-world information space ......27 3.2 The architecture of the temperament-based information filtering model........... 28 3.3 User's perspective at training phase ...................................................30 3.4 User's perspective at inference phase ... 30 3.5 The learning process................................... 31 3.6 The construction of a temperament-segmented sample information space........ 33 3.7 Segments of a temperament-based sample information space...........................36 3.8 The classification process.................................................... 45 3.9 The filtering process .... 49 3.10 The relevance feedback process ...................................................53 4.1 Overview of the prototype system architecture ..... 57 4.2 Accuracy of recommendation for the temperament-based filtering method by searching segment Si for users with unknown temperament and interest (2,000 documents, 0 = 0.05 to 0.55, 20 tests), (a) Accuracy average (%). (b) Accuracy graph of the top four items recommended..................................73 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3 Accuracy of recommendation for users with unknown temperament and interest (2,000 documents, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph... ..............74 4.4 Accuracy of recommendation when given the user temperament (2,000 documents, 0 = 0.10, upper quartile segments, 20 tests): temperament- based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph......75 4.5 Accuracy of recommendation when given the user interest key terms (2,000 documents, 0 = 0.10, A . = 0.07, 20 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph.................... 76 4.6 Accuracy of recommendation when given the user temperament and the user interest key terms (2,000 documents, 0 = 0.10, A , = 0.07, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. ..... 77 4.7 Accuracy of recommendation for users with unknown temperament and interest (2,000 art images, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph ...............81 4.8 Accuracy of recommendation when given the user temperament (2,000 art images, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph 82 4.9 Accuracy of recommendation when given the user interest key terms (2,000 art images, 0 = 0.10, A , = 0.15, 20 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph................................... 83 4.10 Accuracy of recommendation when given the user temperament and the user interest key terms (2,000 art images, 0 = 0.10, A , = 0.15, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph........................................................ .84 4.11 Accuracy of recommendation for users with unknown temperament and interest (100 art images, 0 = 0.10, 4 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph................................... 88 xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.12 Accuracy of recommendation when given the user temperament (100 art images, 0 = 0.10, upper quartile segments, 4 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph...................... 89 4.13 Accuracy of recommendation when given the user interest key terms (100 art images, 0 = 0.10, X = 0.04, 4 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph...................................90 4.14 Accuracy of recommendation when given the user temperament and the user interest key terms (100 art images, 0 = 0.10, X = 0.04, 4 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph...................... ...91 4.15 Representation styles at the abstract level........ ... 94 4.16 Representation styles at the detail level...................... ........ .............................95 4.17 Accuracy of recommendation for users with unknown temperament and interest for static information space (4 representation styles, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph................................................ 100 4.18 Accuracy of recommendation when given the user temperament for static information space (4 representation styles at the abstract level, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based. (a) Accuracy average (%). (b) Accuracy graph ................................... 101 4.19 Accuracy of recommendation when given the user temperament for static information space (4 representation styles at the detail level, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based. (a) Accuracy average (%). (b) Accuracy graph................... 102 4.20 Accuracy of recommendation for users with unknown temperament and interest for dynamic information space (4 representation styles, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph.......................................................... 105 xiii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.21 Accuracy of recommendation when given the user temperament for dynamic information space (4 representation styles at the abstract level, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph .....106 4.22 Accuracy of recommendation when given the user temperament for dynamic information space (4 representation styles at the detail level, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph....................................... 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract The rapid proliferation of on-line digital information is changing the ways of our everyday life. New techniques are needed to assist people in managing this potentially overwhelming and diverse digital information space. The most popular approach is content-based filtering which draws on the vector space model of conventional textual information retrieval. However, such keyword-based systems usually require the user to know what they want exactly and are not suitable for serendipitous search. In addition, multimedia information units cannot be fully featured by key terms or filtered on quality or style. Although social filtering addresses some of the limitations of content-based filtering by clustering information by user evaluations or opinion, no predefined concept classes are used to describe or simplify the meaning of the clusters. Moreover, none has taken human temperaments into account in the classification or filtering process. Human temperaments have been recognized as a predominant factor in the patterns of human behavior. We hypothesized that the accuracy of an information recommendation system can be significantly improved by employing user temperament for filtering and customization. To test the hypothesis, Temperament- based information filtering model was proposed as a solution to characterize the information space by taking human factors, particularly human temperament, into xv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. consideration, which may categorize the information space into conceptually coherent classes and improve recommendation. A prototype system was developed and demonstrated in three application domains - a document collection, an art image collection, and representation styles, by utilizing a combination of simulation and user-studies testing. The notion of human temperaments was explored and learned for the representation and segmentation of an information space. Furthermore, the learned temperament concept was employed for the interpretation and measurement of the relevance for classification and recommendation of the information units. The results of our experiments indicate that the accuracy of recommendation using temperament-based filtering generally outperforms that in content-based filtering. The system effectiveness is improved by heuristically searching the partial structurally classified space that matches user temperament. The quality of specific search as well as serendipitous search is enhanced by providing the optimal predictions that are pertinent to not only user interests but also user temperament. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction The rapid proliferation of on-line digital information is changing the ways of our everyday life. New techniques are needed to assist people in managing this potentially overwhelming and diverse digital information space. Information customization is thus increasingly viewed as a critical component of any information access system. There are a number of active efforts under way to develop adaptive recommendation services for customization. Information filtering is one of the common techniques tackling the tasks of user-customized information selection and delivery [16, 17, 18, 68, 69]. In information filtering, streams of information are produced and presented to the user after filtering the source data by user preferences [13, 31, 57]. However, existing information filtering techniques (e.g., Content-based filtering and Social filtering) have limitations in their ability to satisfy the user. Moreover, to identify concepts in a knowledge base and to construct internal representations for partitioning the diverse concepts into appropriate categories have been challenging issues and have not drawn much attention in environment- dependent learning [50, 56, 60]. Furthermore, none has taken human temperaments 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. into consideration in the classification (or concept learning) process of recommendation service. The most popular approach to information filtering is content-based filtering [3, 4, 12]. Such keyword-based document/information retrieval systems (e.g., Web search engines) draw on the vector space model of conventional textual information retrieval [51, 52]. In the vector space model, user queries and document contents are represented by term vectors, which span an n-dimensional space. The similarity between any two vectors is measured by the formed cosine angle. The smaller the angle (or the higher the cosine value), the higher the similarity. The similarity measures are also used to group vectors into clusters to facilitate search. Such similarity measures depend only on the term features of the vectors without considering any concept or context (i.e., the global properties or the “environment” surrounding the items). The obtained classes may have no simple conceptual description and may be difficult to interpret [37]. The problems encountered in content-based filtering are, for example: information units such as image, audio, video, art or physical items cannot be fully featured by the attached captions or key terms; and items cannot be filtered based on quality, style or point-of-view. In addition, the system usually requires the user to known their interest exactly to request relevant information and is not suitable for serendipitous search [55]. In contrast, social filtering recommends information by sharing evaluations or opinion of other users for clustering based on similarities between the user interest (rating) profiles by statistical analysis, particularly 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. correlation coefficient. Although social filtering [55] addresses some of the limitations of content-based filtering by considering the conceptual dimension of user opinion, no predefined concept classes or simple conceptual representations are used to interpret or simplify the meaning of the clusters. Moreover, no content analysis is used to represent the textual features of an information unit. Human temperaments have been identified as a predominant factor in the activity patterns of human behavior and preferences [6, 19, 26, 28, 41]. In addition, neuroscience research indicates that temperament is an innate property of the brain [48, 59]. Some of the studies have confirmed the relevance of temperament to the public tastes in perceiving or interpreting the information in general, such as visual art, music, and literature [15, 27, 30]. The inherent inter-related patterns between user temperaments and user interests may lead to a better understanding of user personality and improve the service of an information system. The potential for employing human temperament as an effective information filtering technique is strong. We hypothesized that the accuracy of an information recommendation system would be significantly improved by employing user temperament for filtering and customization. Consequently, The purpose of this study was to characterize the information space by taking human factors into consideration, which might categorize the information space into meaningful classes and improve recommendation service. In particular, we propose an intelligent multiagent 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. approach to incorporate human temperaments into the filtering process of the information recommendation service. Our approach was to devise a new filtering mechanism called temperament- based filtering method [34], which addresses segmentation, learning, classification, and filtering techniques based on Keirsey's temperament theory [28, 29], set theory [25], probability theory [62], the distributions of temperaments [20], and statistical reasoning [10, 38, 47, 49]. A novel segmentation framework was introduced to estimate and partition an information space into temperament-based segments by observing the temperament and interest distributions of the sample users. These segments were further clustered by intra-segment similarities. In addition, a temperament weighting function was introduced to reflect the importance of an information unit in terms of temperaments. A learning agent was designed to automatically learn the temperament segmentation concept represented by temperament-based segments and temperament-weighted centroids. An agent is a program, such as software robots on the Web, that perceives and acts in an environment [22, 49]. Moreover, A “popularity similarity” measure for classifying new information units was proposed and examined. The learned knowledge integrated with heuristic selection rules was then applied to infer optimal target segment and cluster values of information units by classification and filtering agents. Each user had a user profile, which maintained user temperament and preferences. User profiles were updated by a monitor agent. Furthermore, heuristic selection rules were integrated with the filtering process to facilitate information 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. recommendation when dealing with user requests to the database and user feedback for relevance refinement. An alphabetized list of the key terms, each with a brief definition, relevant to the temperament-based information filtering method is provided in Table 1.1. In this study, a prototype system was developed, implemented and experimentally tested to evaluate the proposed temperament-based filtering approach in comparison with content-based filtering method. An experimental user-studies testing procedure was conducted to demonstrate the improved effectiveness, by utilizing a combination of simulation and user-studies testing. The prototype information recommendation system developed was applied to and demonstrated in three application domains - a document collection [35], an art image collection, and representation styles. Besides profiting from sharing similar patterns of human preferences and behaviors dominated by human temperaments, the proposed temperament-based filtering approach takes advantage of content-base filtering in vector representations and content analysis. To be empowered with the knowledge of human temperaments, temperament-based filtering method could significantly increase the accuracy of the recommendation service by providing information units that are pertinent to not only user interests but also user temperament. In addition, the system effectiveness could be improved by heuristically searching the partial structurally classified space that matches user temperament. The performance 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. improvement reflected in the results of this study would demonstrate the feasibility of incorporating human factors into an information recommendation process. The remainder of this dissertation is organized as follows. In Chapter 2, we review related research, introduce temperament terminology, and describe the background of vector space model. Chapter 3 presents the method of temperament- based information filtering model for customized selection and delivery. We describe the design of the architecture and the collaborative agents of the system and provide the mechanisms for segmentation, learning, classification, filtering, and feedback refinement. Chapter 4 starts with the description and methodology of an experimental prototype system followed by the evaluation reports of the temperament-based filtering method in comparison with the content-based filtering method in three application domains. In Chapter 5, we examine the “popularity similarity” measures. Finally, Chapter 6 discusses the experimental results and concludes the consequences with anticipated contributions and future research directions. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Term Definition Agent A program, such as software robots on the Web, that perceives and acts in an environment. Centroid A pseudo-vector represents the average term frequency characteristics of the information units in a cluster viewed as the center of a cluster and is usually defined as the mathematical average of the information vectors in a cluster. Centroid temperament weight An estimate reflects the average temperament-adjusted popularity of the centroid in a cluster within the user population and is defined as the average temperament weight of the information units in a cluster. Classification The process of deciding the appropriate cluster, category, or class for a given information unit. Cluster A group of similar information units. Clustering The process of grouping similar or related items into common categories or classes. In the vector space model, if the similarity measure between any two information vectors is greater than some threshold, the two vectors are grouped into the same cluster. Usually, an information vector is compared with the centroids of clusters. Content-based filtering The process of filtering by extracting term features from the text of information units and user query to determine the relevance between the information units and user query based on the vector space model. Also called cognitive filtering. Filtering The process of selecting and recommending information units to the user based on heuristic selection rules and user profiles. Table 1.1 Key terms relevant to the temperament-based information filtering method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Term Definition Information filtering The study of systems for indexing, clustering, learning, classification, selecting, and recommending streams of data to the user by filtering the information units against user profiles. The information units could be in any data format, not only text. Information retrieval The study of systems for indexing, clustering, searching, and recalling data, particularly textual data, to the user based on user query. Information space A group of information units that a user wishes to get information from. Information unit A piece of information that a user may be interested in. This could be a data item or an object in the form of text, graph, table, image, audio clip, or video clip, etc. Information vector An information unit represented by a vector of term weights. Inverse document frequency Abbreviated as IDF, this is a measure of how often a particular term appears across all of the documents in a collection. It is usually defined as log(collection size/number of documents containing the term). So common words will have low IDF and words unique to a document will have high IDF. Keirsey’s temperament theory David Keirsey classified the sixteen personality types of MBTI system into four temperaments as SJ - sensing and judging, SP - sensing and perceiving, NT - intuiting and thinking, and NF - intuiting and feeling. MBTI An abbreviation for the Myers-Briggs Type Indicator (MBTI), which is a questionnaire for helping people to identify their innate personality from sixteen different types. Table 1.1 (continued) Key terms relevant to the temperament-based information filtering method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Term Definition Personality type The psychological type theory developed by Carl Jung is adopted and categorized into sixteen personality types based on the four pairs of opposite preferences in the MBTI system by Katherine Briggs and Isabel Myers. The personality type theory is later merged with Keirsey’s temperament theory. Popularity of an information unit The proportion of user population that is interested in an information unit. Popularity similarity The textual content similarity between an information vector and a centroid vector adjusted by the centroid temperament weight of that centroid. Relevance A measure of how well an information unit is pertinent to the user interests. Relevance feedback The process of refining the results of a retrieval using a given query. The user indicates which information units from those returned are most relevant to his query. The system typically tries to find terms common to that subset, and adds them to the old query. It then returns more information units using the revised query. Segmenting an information space The process of organizing the information units in the information space into collections that would semantically coherent with Keirsey’s four temperaments. Similarity The measure of how alike any two information vectors are, or how alike an information vector and a centroid vector are. In the vector space model, this is usually interpreted as how close their corresponding vector representations are to each other. A popular method is to compute the cosine measure of the angle between the vectors. Table 1.1 (continued) Key terms relevant to the temperament-based information filtering method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Term Definition Social filtering The process of filtering information units by evaluations or opinion of other users based on similarities between the history of the user’s rating and the ratings of the other users by statistical analysis, particularly correlation coefficient. Also called collaborative filtering. Stemming The process of removing prefixes and suffixes from words in a document or query in the formation of terms for the internal representation of the system. Stop list A list of common words, such as prepositions or articles, that are poor discriminators. It also refers to words that have a high frequency across a collection. Since stop words appear in many documents, and are thus not helpful for retrieval, these terms are usually removed from the internal representation of a document or query. Sometimes called a negative dictionary. Temperament The manner of human behaviors and preferences, such as extraverting, introverting, sensing, judging, perceiving, intuiting, thinking, and feeling characteristics, of a specific individual. Temperament weight of an information unit The quantification of the influence of the relevant temperaments on the popularity measure of an information unit within the user population. Term A single word or concept that occurs in an information unit or query. Term frequency Abbreviated as TF, the number of times a particular term occurs in a given document or query. This count is used in weighting the parameters of a model. Term weight The evaluation of term importance for terms in an information unit or a query. Table 1.1 (continued) Key terms relevant to the temperament-based information filtering method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Term Definition Term weighting The strategy to evaluate the importance of each term occurred in an information unit of an information space or a query for purposes of content identification. TF-IDF method A popular weighting scheme in the vector space model applied to the term features of each information vector by TF * IDF (term frequency * inverse document frequency). The weights are sometimes normalized to sum to 1, or by dividing by the square root of the sum of their squares. User profile A vector of terms that characterizes the information that the user would be interested in. Vector space model A representation of documents and queries where they are converted into vectors. The features of these vectors are usually the terms in the document or query, after stemming and removing the stop words. The vectors are weighted to give emphasis to terms that exemplify meaning and are useful in retrieval. In information retrieval, the query vector is compared to each document vector. Those that are closest to the query are considered to be similar, and are returned. Table 1.1 (continued) Key terms relevant to the temperament-based information filtering method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Related Research In this chapter, we first review the related research on customized access systems. Then we introduce Keirsey’s temperament theory and temperament terminology. Finally, the background of the popular vector space model in conventional textual information retrieval is described. 2.1 Customized Access Systems The fast growth in computer user population and the massive increase in information accessible by the computer networks have drawn much attention of the research communities toward designing customized access systems. The main strands of related research can be categorized into two aspects, Web-specific and general-purpose, as shown hierarchically in Figure 2.1. 2.1.1 Web-Specific Systems Search Engine. Keyword/dialog based query facilities and pre-existing categories are used and followed to access the Web information, such as Yahoo [67] and WebCrawler [64], 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Customized Access System Web-Specific General-Purpose SIMS Intbrmia CAWO 1996 1998 2001 Yahoo WebCrawler WebWatcher Letizia 1995 1995 Application Adaptive Specific Assistant Database Mediator Filtering Personal Information Social Content-Based Collaborative Filtering Discourse t / 1 s \\ t t MSNBC MyYahoo WAG Aurora Boll GroupLens Ringo HistView ConText 1998 2000 2002 1994 1995 2002 1995 Collagen 1997 Social Content-Based Keyword Filtering Filtering Expansion \ / t \ / Fab WBI WebMate 1997 1997 1998 Dynamic Profiler 2001 Figure 2.1 Categorization of related research. Browsing Guide. These systems advise the user which hyperlink to take on the current Web page by analyzing the information on the current page viewed by the user, but do not provide significant search in locating the information. WebWatcher [3, 24] is a tour guide, which helps the user during the navigation by learning from experiences of multiple users to improve the advising skills. Letizia [33] can recommend nearby pages based on the particular application domain. Letizia differs from WebWatcher in that it does not require the user to state a goal at the outset, instead trying to infer “goals” implicitly from the user’s browsing behavior. Knowledge Support. More advanced knowledge exploiting methods are applied in these systems which can learn from past interactions with the user and provide a d a p t i v e a s s i s t a n t s dynamically. 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A p p l i c a t i o n s p e c i f i c systems, such as news sites, MSNBC [39] and My Yahoo [40], allow users to set up custom profiles. News retrieval “agents” collect news stories and other data according to a user’s preferences, and when he or she logs into the Web site they can browse through pages of customized news stories, stock quotas and other financial information, and sports scores and stories. The strength of Web sites is their connectivity to large multimedia databases, combining the content of multiple issues, publications and even digital libraries into one place. The drawback is their primitive customized data retrieval and customization capabilities. Fab [4] uses best-first search to collect pages from the Web, according to search profiles, which are based on user profiles, at regular time intervals. The collected pages are submitted to a central repository. Recommended pages are selected only from this repository based on the user profile. Because the repository contains the pages that the agents believe will best match the interests of the current user population, the recommendation combines both c o n t e n t - b a s e d and s o c i a l f i l t e r i n g approaches to predict user’s interests and recommend Web pages. The user evaluates those recommended pages and the ratings are used as feedback to update the user and search profiles. Dynamic Profiler [66] uses c o n t e n t - b a s e d c o l l a b o r a t i v e f i l t e r i n g techniques to create dynamic user profiles, form user communities, and make recommendations. The system does not require explicit user ratings, yet it analyzes user logs, fetches the documents accessed and categorizes them based on a supervised clustering scheme. Vector space model is used to represent the documents. In addition to the stop words, all the words with the gini index higher 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. than a predefined value were also removed from contention. A set of seeds (pseudo­ centroids) which are representative of the classes in the original taxonomy, are constructed. Each seed consists of a vector in which the number of words with a non-zero weight is restricted to a predefined maximum. Each user is described by a vector of document categories. Projected clustering scheme is used to classify test documents and find user communities. Each user community is identified by a vector of the categories that they are most interested in together with a weight, which is indicative of their level of interest. Recommendations are made based on the user profile and the peers in the same community. WBI [7, 63] provides a group of agents on a user’s workstation to observe user actions and offer assistance by personalizing the Web. For each user, the system keeps track of his/her personal history record and stores the sequence of URLs visited by the user along with the text of each URL. The text is stored in the structure of Salton’s vector space model. C o n t e n t - b a s e d f i l t e r i n g is used to analyze the similarities between the pages or paths in the personal history of a user. In addition to highlight interesting links as in browsing guide, shortcut links are added to a URL for common paths. An agent is triggered regularly to contact global search services to discover new documents that might be interesting to the user based on user profile clustering and keyword extraction. An editor agent watches each document the user browses, and alerts of the discovered URLs in the same subject areas are added to that Web page. WebMate [12] utilizes vector space model and positive training-example method to learn a user profile. A user has at most N domains of interest and the TFEDF vector 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of a new positive example is combined with the vector having the greatest similarity after there are N vectors in the profile. In the trigger pairs model, if a word S is significantly correlated with another word T, then (S, T) is considered a “trigger pair”, with S being the trigger and T the triggered word. The search and relevant document feedback are refined by k e y w o r d e x p a n s i o n based on the trigger pairs model which is different from the traditional keyword expansion method in its ordered pair property. WAG [11] allows the user to query (instead of browsing) the Web by constructing a p e r s o n a l i z e d d a t a b a s e , which is populated by both the data and the related locations from one or more Web sites, according to the user’s interests. User feedback is obtained by a dialogue-oriented interaction. Description Logics (DL)-based knowledge representation system is used to model the conceptual schema of the information domain. The database is constructed from the Graph model with a visual syntax and graphical operations and accessed by a visual query on the conceptual schema built from the interested site. P e r s o n a l i z e d W e b m e d i a t o r s are becoming more and more popular. Aurora [21] is a transcoding system providing a Web interface that targets and adapts content in existing Web pages to help users, particularly in the disabled community, to obtain Web-based services. A transaction model describes the tasks that a user must accomplish in order to obtain specific services. Each step of transaction corresponds to a segment of Web page data and describes its functional semantics. XML-based framework is used to semantically transcode Web data in aggregations according to abstract user goals and the WBI modules are extended to execute the processing cycle. For each service 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. domain, a set of transaction schemas specified by Document Type Definitions (DTD) of XML is used to describe the abstract tasks by specifying the semantics and ordering of the transaction steps that lead to the service goal. Boll [8] proposes a flexible architecture in which modular services between providers and users are mediated. With a c o m m o n o n t o l o g y , the service provider assembles a service description that is sent to the mediator and registers at the mediator. The mediator collects descriptions of the services from different providers in a service registry. User profile schema description is to be developed as a specific Composite Capabilities/Preference Profile (CC/PP) vocabulary [32, 45], which is described by a corresponding RDF schema [9]. A user provides a request along with a privacy policy, which describes how the mediator has to treat the user personal information using P3P preference exchange language standard of World Wide Web Consortium (W3C), to the mediator. Based on the user information, the mediator searches the most suitable provider services from the service registry. The actual service is then implemented. The most targeted personalized content is retrieved to the user’s Web portal and delivered to the user. 2.1.2 General-Purpose Systems The general-purpose systems can be accessed through Email, Usenet, wire services or other commercial providers, in addition to the Web. In these systems, sophisticated knowledge representation, machine learning, and searching techniques 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are presented to cope with the complexity and dynamism of the information environment. Adaptive Assistant. Grouplens [44] predicts a user rating by correlating that user's rating history with other users' ratings. The system helps people find netnews articles they will like. Ringo [55] compares four social filtering algorithms for "Word of Mouth" and concludes that constrained Pearson r correlation coefficient outperforms Mean-square difference, Pearson r and Artist-Artist on their database. Constrained Pearson r is a variant of standard Pearson r correlation coefficient by increasing the correlation coefficient only when there is an instance where both people have rated an artist positively or negatively. The system provides personalized recommendations for music albums and artists based on taste similarities between the interest profile of the user and those of other users by statistical analysis. A user is presented a list of artists. The user rates the artists for how much they like to listen to them by giving a score. Both Grouplens and Ringo require user feedback by explicitly rating the pages and predict user interests by s o c i a l f i l t e r i n g . HistView [61] allows users to specify their preferences relative to their personal history or the histories of other users using a graphical technique called control shadows. History-based specification of intent is displayed as the bars with visual “shadows” in a histogram. The shadows are a second set of bars and active controls that enable users to express their intent in terms of the history data by stretching a bar to indicate “more of this” of shrinking a bar to indicate “less of that”. ConText [14] is a video delivery system based on content annotation and context- 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. driven concatenation. The system provides a non-linear access to an evolving documentary collection of media elements from wire services and other commercial news providers, in addition to the Internet by c o n t e n t - b a s e d f i l t e r i n g . The user interacts only when dissatisfied with the current automated presentation. The drawbacks are the user population is not considered in predicting the user interest, the task of maintaining the description space grows as content is added and a more flexible and dynamic means of annotating content space is needed. Collagen [46] provides a generic framework, which embodies c o l l a b o r a t i v e d i s c o u r s e principles for maintaining and communicating the decisions made between the agent and the user. A user interacts with the agent by selecting from the user communication menu, which is dynamically changing and systematically generated from the discourse agenda. The discourse agenda is constantly updated by interpreting the internal state of focus stack, history list, and recipe tree. Information Mediator. These knowledge representation systems provide access and integration of multiple sources of information by resolving the semantic conflicts and query optimization and query planning. SIMS [2] takes a domain-level query and dynamically selects the appropriate information sources after considering the availability of the various databases and all possible semantic optimizations by firing all applicable rules and collecting candidate constraints in an inferred set. The system accepts queries in a descriptive form of a class of objects about which information is desired. The interface enables the user to inspect the domain model as an aid to composing queries. Informia [5, 23] uses an object-oriented application 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. framework with a set of communicating and cooperating components. The source selector component combines structural and content descriptions and is customizable to provide additional domain-specific rules and hints for particular applications based on user’s need. The system supports non-Boolean relevance assessment and allows the combination of exact and approximate query match semantics. In assessing relevance, every retrieved document is associated with a source relevance score given by the source and an internal relevance score based on a vector-space similarity between the textual representations of the query and the document. Informia differs from SIMS in supporting relevance assessment and allowing the combination of exact and approximate query match semantic. CAWOM [65] is a tool that generates w r a p p e r s for command-line oriented legacy systems from a high- level specification for interoperability. The wrappers are generated through the OMG CORBA [58] interoperability standard. The wrapper-generator is implemented with the Java version of the ANTLR parser-generator and GJ is used for the parse-tree representation. A simple template-style macro language is built to facilitate programming the code-generation. A definite-clause grammar (DCG) system built around PrologCafe, a Prolog-to-Java translator, is used. A wrapper is generated for Apache that enables certain administrative functions of the Apache server to be accessed programmatically from a CORBA. The GDB-like debugger for Java (JDB) is also wrapped to allow CORBA-compliant clients to access JDB’s services. 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Introvert iNtuition Thinking Judging r~ ~ : ~ >-4 Sensing - t Perceiving j ! * ’ ■ — l d - S d t n n Behavior Extravert t Feeling Figure 2.2 Jung's four sets of preferences. 2.2 Human Temperaments Temperament and psychological type represent two systems of classifying personalities, which later converged. Philosophers and psychoanalysts have devoted many efforts on analyzing human temperaments and preferences since ancient Greek era. Hippocrates (5th century BC) first observed four basic human behaviors, choleric, phlegmatic, melancholic, and sanguine, which he named temperaments. In 1920s, Carl Jung [26] asserted that people are fundamentally different in their normal behavior and characterized by their preferences (Figure 2.2). Thus people can be classified into “psychological types” based on their particular preferences. Later, Katherine Briggs and her daughter Isabel Myers [41] adopted Jung’s theory and designed the Myers-Briggs Type Indicator (MBTI), which is a questionnaire for helping people to identify their innate personality from sixteen different types. In the widely used Myers-Briggs type system, a personality type is represented by a four- letter code of personal preferences (such as, ESTJ). Each letter code is chosen from a pair of opposite preferences. The four pairs of opposite preferences, which make up the sixteen different personality types, are: Extraverting (E) and Introverting (I), 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Temperament Preferences Behavior Style Myers-Briggs Types SJ Sensing, Judging Guardian Duty Seeker ESTJ, ESFJ, ISTJ, ISFJ SP Sensing, Perceiving Artisan Action Seeker ESTP, ESFP, ISTP, ISFP NT iNtuiting, Thinking Rational Knowledge Seeker ENTJ, ENTP, INTJ, INTP NF iNtuiting, Feeling Idealist Ideal Seeker ENFJ, ENFP, INFJ, INFP Table 2.1 Keirsey’s four temperaments. Temperament ( % ) MBTI (%) SJ ESTJ ESFJ ISTJ ISFJ 46.7 9.9 9.6 15.6 11.5 SP ESTP ESFP ISTP ISFP 21.4 4.8 5.7 6.4 4.5 NT ENTJ ENTP INTJ INTP 16.1 2.8 4.7 3.5 5.2 NF ENFJ ENFP INFJ INFP 15.8 2.5 6.3 2.6 4.3 Table 2.2 The percentage distributions of the temperaments in the united states. Sensing (S) and iNtuiting (N), Thinking (T) and Feeling (F), and Judging (J) and Perceiving (P). Within each pair of opposite preferences, a person leans toward one or the other in most cases. Taking one preference from every pair will form a four- letter code that represents each person’s type. David Keirsey [28, 29] criticized that Hippocrates’ four temperaments are misleading and derived his own theory of categorizing temperaments. He correlated his temperament theory into the MBTI system and classified the sixteen personality types into four temperaments as SJ, SP, NT, and NF (Table 2.1). A Keirsey’s temperament consists of 4 related MBTI types. Each related MBTI type has the two temperament letter codes in it. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Doc 1 S ’ Xu * 1 2 Xlm Doc 2 * 2 1 *22 % 2 m Doc n Xnl % r i 2 Figure 2.3 Sample array of document vectors. A statistical report [20] on the percentage distributions of the temperaments in the United States showed that most people are SJs (46.7%), sensing and judging (Table 2.2). 2.3 Vector Space Model In this temperament-based filtering model, the information space is segmented by the intersection of Keirsey’s four temperaments. The information units in the same segment are clustered by contented-based approach that adopts conventional vector space model [51], Both information units and user requests are represented as term vectors in an n-dimensional space, which consists of a set of key words, phrases, or concepts extracted from the information units. A sample array of n documents and m terms is shown in Figure 2.3. In the array, cell X y is the weight of term j assigned to document i. Each term weight W ,. of a term or concept m i in the vector of an information unit d is computed by TF-IDF (Term Frequency and Inverse Document Frequency) method and is often defined as follows: 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where TFi = the number of times term or concept m i appears in information unit d (the term frequency) DFi = the number of information units in the collection which contain mi (the document frequency) IDF, = the inverse document frequency and n = the number of information units in the collection. In the vector space model, the similarity between any two vectors is measured by the formed cosine angle Termxy = the weight of term y assigned to document x and m = the number of terms in the collection. The smaller the angle (or the higher the cosine value), the higher the similarity. The similarity measures are also used to group vectors into clusters to facilitate search. V , - V j _ X L T e r m i k - T e r m , where 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the proposed temperament-based filtering method, the format of an information unit in the information space is not bound by text as in traditional information retrieval, and can be a media object, such as document, Web page, image, or audio/video clip. In addition to the vector representation, an information unit is associated with 8 counters. These counters accumulate the statistical data for the two possible evaluation values (like or dislike) of the four temperaments (SJ, SP, NT, and NF). For example, when a user is an SJ and rates an information unit d 1 as “like”, the SJ-and-like counter of d 1 will be incremented by 1. In such a structure, information vectors maintain content features for similarity measures and the counters reflect the global opinions and temperament concepts. The advantages of content-based and social filtering are combined and the temperament concept can be developed and learned. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Temperament-Based Information Filtering Method The temperament-based information filtering method provides an intelligent multiagent approach to incorporate human temperaments into the filtering process of an information recommendation service. This approach is to devise a new filtering mechanism, which addresses segmentation, learning, classification, and filtering techniques based on Keirsey's temperament theory, set theory, probability theory, the distributions of temperaments, and statistical reasoning. By presenting information units that are consistent with user interests as well as user temperament, the accuracy of the recommendation service may be improved. In this chapter, we describe the design and basic principles of the temperament- based filtering model. The concept of a temperament-segmented information space is represented and the functional architecture is overviewed in Section 3.1. In Section 3.2 the segmentation function is formulated and segmentation criteria are discussed. Section 3.3 provides the learning mechanism and algorithm to learn from the temperament concept. In Section 3.4, a “popularity similarity” measure and a classification function are proposed. The classification mechanism and algorithm are 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. People Categorized by Temperaments Sets of Inform ation Units liked by Tem peram ent- Categorized People Temperament- Segmented Information Space Figure 3.1 A temperament-segmented feature of the real-world information space. presented to infer optimal location for classifying new information units. Section 3.5 offers heuristic selection rules as well as the filtering mechanism and algorithm for various user profile situations. In Section 3.6, Relevance feedback is discussed. 3.1 Overview of the Temperament-Based Filtering Model 3.1.1 Temperament-Segmented Information Space From the temperament point of view, an information space in the real world can be specified as having a temperament-segmented structure by identifying the interdependent feature of human temperaments and preferences (Figure 3.1). The information units in the information space viewed by the people categorized by temperaments, and sorted by the viewer preferences, will form four sets of information units. Each set contains information units liked by a particular temperament type of people. These four sets form the four concepts of an information space. However, these sets are not mutually disjoint. The intersection 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Training Q u e stio n n aire Tem peram ent Sorter Statistical T em p e ram en t C A n a ly s is N Agent^ Learning & R atin g s R esults Sample Users Training Examples D a ta Set C ounters T em p e ram en t- Based W eighted C en tro id s Learning Phase S,* m “t s Inference Phase Agent R eq u est F eed b ack U se r Profile A d a p ta tio n T m p m t ■ f i i » Key T erm s Tem peram ent- Based Inform ation Filtering C ateg o rized In fo rm atio n Classification A sent In fo rm a tio n U nit li Tem peram ent- Segmented Inform ation Space R eco m m en d ed In fo rm atio n U nits User Inform ation Repository Figure 3.2 The architecture of the temperament-based information filtering model. of the four sets will partition the information space into 16 temperament-based segments. 3.1.2 Functional Architecture Consider an information repository that presents information to users spanning a variety of preferences. The interactions between a user and the system are supported by intelligent user interfaces at a Web site. From the system's perspective, the architecture of the temperament-based information filtering model is shown in Figure 3.2. In this model, the multiagent system incorporates Keirsey’s temperament concept into the recommendation process. The model has two key phases: learning and inference. In the learning phase, a novel segmentation framework is introduced to estimate and partition an information space into temperament-based segments by 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. observing the temperament and interest distributions of the sample users. At the beginning, Sample users are sorted by answering Keirsey’s temperament sorter, which is a questionnaire analyzing user temperament, at http://www.keirsey.com. These sample users then rate the information units in the training data set. The resulting triplets (information unit id, user temperament, user rating) are sent to the Analysis Agent where data are analyzed and accumulated in the associated counters. A learning agent is designed to automatically learn the temperament segmentation concepts. The Learning Agent retrieves the statistical results from the Analysis Agent and learns the temperament concept, which is represented by a set of temperament-based segments and centroids of clusters in the segments. In the inference phase, the Classification Agent categorizes new information units in the information repository (e.g., the Web) into a temperament-segmented information space by inferring the temperament concept learned in the learning phase. When a user issues a request, the Monitor Agent intercepts the request and updates the corresponding user profile. A user profile keeps track of user temperament and user interests for a particular user. Furthermore, heuristic selection rules are integrated with the filtering process to facilitate information recommendation when dealing with user requests to the database and user feedback for relevance refinement. From the user's perspective, the interaction activities are simple. At the training (system learning) phase (Figure 3.3), sample users train the system by providing their temperament types and rating the information units presented by the system. At the inference phase (Figure 3.4), users request the system for interested information 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Web Browser Servers HTM L Forms & Java Applets Temperament, Ratings Temperai»ent’ Ratings Evaluation Form Training Examples __ ses Keirsey’s Temperament Sorter Response Data Sorter Result m Web Server DBMS Server SQL Program s & APIs HTM L Documents CGI Scripts Java Applets & Applications Tools Training Temperament, Ratings Analysis _Agent_ Training Data Set Figure 3.3 User's perspective at training phase. W eb Browser Servers User HTM L Forms & Java Applets W eb Server (HTTP) DBMS Server Request or Feedback SQL Program s & APIs Request/Report Forms HTM L Documents HTM L Report Java Applets & Applications CGI Scripts Keirsey’s Tem peram ent Sorter Responses Sorter Result Tools Request or Feedback Result The W eb Databases Inform ation Repository Figure 3.4 User’ s perspective at inference phase. units by providing interest term keys. For a first time user, the temperament type is also required to complete the request. Users then rate these information units to 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Training Data Evaluation Analysis Agent Learning Agent Training Examples Temperament Sorter Statistical Temperament Theory Strategies Learning Algorithm Sample Users Counters 1 1 Training DataSet | f Sample Temperament- W orm ation Weighted Space Training Segmented Examples & by Centroids Temperaments Set Theory Probability Theory Segmentation Criteria Temperament Weighting Functions Clustering M easure Figure 3.5 The learning process. evaluate the recommendation service and acquire refined information list. In both phases, users can find their temperament types by following a link on the user interface to the page of Keirsey's temperament sorter and responding to the questionnaire. 3.2 Learning the Temperament Concept The learning agent learns to estimate and partition an information space by observing the temperament and interest distributions of the simulated users. The knowledge of temperament concept is represented by a set of temperament-based segments in addition to temperament-weighted centroids. The learned knowledge is then applied to infer the optimal target segment and cluster values of a new information unit by classification and filtering agents. A diagram of the interactions 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. between the components in the learning process is shown in Figure 3.5. The analysis agent analyzes the data collected from the sample survey to estimate the interest distributions of the sample users based on user temperaments and user ratings of the training examples. The Learning agent then employs the statistical results to develop the temperament concept by the strategies described in the learning mechanism which includes temperament theory, set theory, probability theory, segmentation criteria, temperament weighting functions, and clustering measure. There are two main design issues involved in learning the temperament concept. The first is to decide how to segment the information space in terms of human temperaments. And the second is to decide how to estimate the temperament weighs of information units in the segments in order to estimate the centroid temperament weights of clusters for “popularity similarity” measure of the classification function in the inference phase. These problems are solved by employing segmentation criteria and temperament-weighting functions in addition to the conventional clustering techniques. 3.3 Segmentation Criteria The proposed approach for segmenting the information space by human temperaments is formulated by Keirsey’s temperament theory, set theory, probability theory and statistical reasoning. In the temperament-based filtering model, the information space is partitioned by the intersection of Keirsey’s four temperaments. David Keirsey [28, 29] asserted that people can be classified into four temperaments 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sample Information Space Sample Users Categorized by Temperaments Sets of Training Examples liked by Temperament- Categorized Sample Users Temperament- Segmented Sample Information Space U ser Tem peram ent & R atings T raining Data Set C onstraints Segmentation Criteria Segmentation Bias Functions Elimination Threshold Segment Ordering Figure 3.6 The construction of a temperament-segmented sample information space. as SJ - sensing and judging, SP - sensing and perceiving, NT - intuiting and thinking, and NF - intuiting and feeling. A statistical report [20] on the percentage distributions of the temperaments in the United States showed that most people are SJs (46.7%) compared with SPs (21.4%), NTs (16.1%), and NFs (15.8%). To partition a sample information space into temperament-based segments, a simple random sampling is conducted and user ratings on the training data set are collected to measure the temperament and interest distributions of the sample users and categorize the training examples into temperament-based sets (Figure 3.6). The temperament-segmented sample information space is then constructed from these temperament-categorized training examples by set intersection operations under the constraints of segmentation criteria. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A temperament-segmented sample information space is used to estimate the temperament-based segmentation of a target information space in the inference phase. The segmentation criteria are to control the construction of the segments of information space. Segmentation function is applied to form the temperament-based segments. A preset threshold is used to maintain a confidence level in supporting the learned knowledge. The bias data are eliminated and the segments are ordered to improve search. 3.3.1 Segmentation Function In the information space, Keirsey’s four temperaments form the four events, SJ, SP, NT, and NF. The same terms are used interchangeably to indicate either temperaments or events in this paper for convenience. Each information unit has two possible target values, like and dislike. An event t is defined as the set of all the information units evaluated “like” by users with temperament t . An information unit evaluated as “like” or “dislike” by a user with known temperament, will fall into one of the 16 segments produced by all the possible intersection of the four temperaments. For simplicity, consider only the positive examples that information units are evaluated as “like” by users. The segmentation function is introduced to model the segments of the information space partitioned by the intersection of the four temperaments and may be defined as 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where n = { 1 , 2 , ...,2|r|} = {I,2, ..., 1 6 } T = { S J , SP, NT, NF} |T| = size of T T n = { t \ t i s a temperament value in segment Sn} Dt = the set of information units evaluated as “like” by users with temperament t and Dr = the set of information units evaluated as “like” by users with temperament f A sample information space segmented by temperaments is shown in Figure 3.7. In this example, the segments and the relevant temperaments are Sx = DS J n DS P n DN T n DN F T x = {SJ, SP, NT, NF} $2 ~ DS J o DS P o Dm Dnf T2 = {SJ, SP, NT} $3 ~ DS j ^ Dsp Pi D nf Dnt r 3 = {SJ, SP, NF} Tu = {NT} r i5 = {NF} Ti6 = 0 where = the set of documents not evaluated as “like” by any user 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. N . Legend: ®i6 Inform ation Space . * s . ' k D s j ■■■■"■" D SP ®NT ..... ®NF Figure 3.7 Segments of a temperament-based sample information space. Thus, an information unit evaluated “like” by all temperament types of users will be in segment S : . Similarly, an information unit evaluated “like” by all temperament types but NF type of users will be in segment S2, and so forth. 3.3.2 Preset Threshold A greater popularity indicates the information unit has a higher probability to be liked among that user population. The popularity of an information unit k within a user population of a particular temperament type t is defined as the conditional probability of interested users given that temperament type, P ( l i k e k \ t ) . To maintain a minimum confidence level that the system believes in the learned knowledge, an information unit is learned to be in one of the segments except S1 6 only when sufficient evidences in popularity have been observed to pass a preset threshold &, otherwise, the information unit is retained in S1 6 . An information unit in S 16 is considered not evaluated “like” by anybody. 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.3 An Illustrative Example In learning the temperament concept to segment the information space, an information unit k of the training data set is not classified into D t until P(likek\t) exceeds a predefined threshold 0 to maintain a confidence level that the system believes in the learned knowledge. Table 3.1 shows an example of the statistical results obtained for 5 information units in the training data set rated by 1,000 simulated users, where “yes” indicates “like” and “no” indicates “dislike”. The popularity of d 3 is zero for all the temperament types, thus d 3 is classified into S i 6 = D<p in which an information unit is not evaluated as “like” by any user. On the other hand, three popularity values of d 9 are nonzero: P ( l i k e d9\ S J ) = (97/(97+370)) - 0.21 P { l i k e d9\ S P ) = (14/(14+200)) « 0.07 P(liked9\NF) = (0/(0+158)) « 0 and P(liked9\NT) = (51/(51+110)) « 0.32 Assume that threshold 6 - 0.10, then d 9 is considered liked by SJs and NTs but not SPs or NFs, because only P(liked9\SJ) and P ( l i k e d 9 \ N T ) exceed 0 . Hence, d 9 is an element in both D sj and D n t but not Dsp or D Nf , and is classified into S 7 where S 7 = D sj n D n t - D$p - D N p by definition. The knowledge of the segment concept is represented by a set of ordered pairs (item id, segment #), such as ( d 3 , S i d ) and (d 9 , s7 ). 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Information SJ SP NT N P Unit Yes No Yes No Yes No Yes No d3 0 467 0 214 0 161 0 158 d9 97 370 14 200 51 110 0 158 dl5 66 401 50 164 92 69 57 101 d23 11 456 57 157 46 115 0 158 d36 24 443 59 155 95 66 68 90 Table 3.1 Sample statistical results. 3.3.4 Bias Elimination To eliminate the bias data and false alarms in the sample survey, the training data are randomly selected from the sample survey according to the percentage distributions of the temperaments in the United States. Thus, the percentage distributions of the temperaments in the formed training data set are the same as those in the United States. This process can be repeated several times and the results averaged for statistical significance. 3.3.5 Ordering the Segments In order to search systematically, segments are sorted in descending order of their accumulated percentage distributions of temperaments, ^ P ( t ) , with the index ieT„ value starting from 1. Thus, a segment with a lower index will contain information units having a higher accumulated probability (or popularity). When a search starts from Sj (or from a segment with the smallest segment number in a partial space) down to the tail of the ordered list in filtering information units to a user, the search always examines a segment which is the currently maximally probable estimate of the recommendation. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4 Temperament Weights of Information Units 3.4.1 Temperament Weighting Function It is obvious that an information unit in a segment with a higher prior probability indicates the likelihood of a larger interested population. However, consider the situation that a segment consists of more temperament types than the other segments especially when the information units in these segments have the same prior probability. The information units in the segment consisting of more temperament types exhibit more human diversities in the interested population and thus may have heavier weights. To cope with such observations, temperament weight is introduced to quantify and estimate the relative influence of various temperaments on the popularity measure of an information unit. The temperament weight of an information unit d j in segment S n can be set to the fraction of relevant temperaments in S n multiplied by the prior probability P ( i i k e nj) as I t I W ] = ^ p m e n j ) (3.1) where n , T n , T , and |T| are as defined previously, |T„| = size of T n and P ( l i k e n j) = the prior probability that an information unit d . in segment S n is evaluated as “like” 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Since temperaments are mutually exclusive with ^ P(t) =1, by the theorem of teT total probability, equation (3.1) can be rewritten as , ) m ( 3 ' 2 ) r I m Tn where P ( l i k e nj |r) = the conditional probability that an information unit d j in segment S n is evaluated as “like”, given user temperament t and P ( t ) = the percentage distribution of temperament t in the United States [20] Although the observed conditional probability, P(likenj\t), provides a good estimate of a measurement when sample size is sufficiently large, it could be poorly underestimated when the sample size is relatively small. The bias of the conditional probability can be overcome using the m-estimate [38] defined as k , + m p k + m where k - the total number of sample users who have temperament t and rated information unit d j k t = the number out of k for which information unit d j is rated as “like” p = the prior probability to be estimated and m - a constant called equivalent sample size which is to augment the k actual observations by an additional m virtual samples distributed according to p 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To compute the value of P(likenj\t) using the m-estimate method with uniform priors and with m equals to the total size of the temperament types in the segmented information space, the estimate for P(liken j |t) will be kt +l (3.3) Notice that when the sample size is sufficiently large, the m-estimate of P(likenj\t) is asymptotically close to k, (3.4) 3.4.2 An Illustrative Example The following simple example illustrates the temperament weight estimate of the information units in the segmented information space. Consider the statistical results shown in Table 3.2, in which the # of user responses is represented in the scale of 100 responses. Information unit dl is rated “Yes” (like) by 1900 Sis, 1000 SPs and 1800 NFs, thus dl is in segment S3 and T 3 = { S J , S P , N F } by the definition of S3. Similarly, d2 is in Su and T U = { N T } , d3 is in S l6 and Tl6 = 0, d4 is in S7 and Ty = {SJ, N T } , d5 is in Sx and T x -{SJ, S P , N T , N F } , and d6 is also in S x . Applying the statistical values in Table 3.2 to equations (3.2) and (3.4), the temperament weights of the information units are computed as follows: 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W j = (3/4)(19/20*0.467+10/12*0.214+18/20*0.158)=0.5731 w2 = (1/4)(10/20*0.161)=0.0201 w3 = 0 w4 = (2/4)(10/10*0.467+40/40*0.161)=0.314 w5 = (4/4)(10/10*0.467+20/30*0.214+10/20*0.161+10/20*0.158)=0.7692 w6 = (4/4)(50/50*0.467+30/30*0.214+10/10*0.161+10/10*0.158)=! Information Unit # of User Responses SJ SP NT NF Yes No Yes No Yes No Yes No dl 62 19 1 10 2 0 10 18 2 d2 45 0 5 0 10 10 10 0 10 d3 40 0 20 0 5 0 5 0 10 d4 90 10 0 0 20 40 0 0 20 d5 80 10 0 20 10 10 10 10 10 d6 100 50 0 30 0 10 0 10 0 Table 3.2 Sample statistical results (scale of # of User Responses: 100) 3.4.3 Estimating Centroid Temperament Weights of Clusters To reduce the size of comparisons when searching within a segment, the information units are grouped into clusters by similarity measures. The commonly used cosine similarity of any two vectors V ,- and V j is denoted simply as S i m ( V i , V j ) = (V ,- • V j ) / (\Vi\ x \V j\). Within a segment, the information units are grouped into clusters by conventional cosine similarity measure. The basic idea of clustering is to compare an item with the centroid vector of a cluster and group it into the cluster if the similarity measure is greater than a fixed threshold A . The centroid vector of a cluster is defined as the mathematical average of the information vectors in the cluster. The clustering procedure for temperament-based filtering is as follows: 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. For each segment in the information space 1.1. An item is randomly picked as the centroid vector of the first cluster 1.2. For each of the rest items in the segment 1.3. For each of the cluster 1.3.1. If the similarity measure of the item and the centroid vector is greater than a predefined threshold A , the item is grouped into that cluster and a new centroid is generated; otherwise the item is the centroid vector of a new cluster Table 3.3 Clustering procedure. The centroid vector of a cluster is defined as the mathematical average of the information unit vectors in the cluster. The centroid temperament weight e i of a cluster c ,. may be defined in the same manner as the mathematical average of the temperament weights of the information units in the cluster and thus given by 1 m ei = —1Lwj m % where m = size of cluster c, and W j = the temperament weight of an information unit d j in cluster c i Thus, a centroid temperament weight is an estimate of the popularity within the user population for the information units in a cluster. The learned estimate of the centroid temperament weights as well as cosine similarity measure provides a set of probabilities and numerical measures in the later classification process to predict into which cluster of which segment, a new information unit is likely to fall. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.4 The Learning Algorithm The following algorithm learns the concept of temperament-segmented information space and the centroid temperament weights of clusters in that space. Input: P () is an array stored the four percentage distributions in US for the four temperaments in T ={57, S P , N T , N F } , D is a set of rated training data vectors, and A n a l y d a t a is an array stored the statistical results of the eight counters for each training data from a survey. Process: 1. Segment the information space by temperaments for each subset T(n) of T s(n)= f | A - £ a te T (n) fe (T - T (n ) ) 2. Rank the segments by probability values for each segment S(n) S'(n) = J > ( f ) teT (n ) sort (S(), T(), S'O) into descending order of S'() 3. Estimate temperament weights of information units in the segments for each segment S(n) for each information unit d j in segment S(n) p | ter(n) 4. For each segment S(n), cluster the information units by cosine similarity measures 5. Compute the centroid temperament weights of clusters for each segment S(n) for each cluster c( . m = size of cluster c ,. Output: A list of information space segments with the associated temperament values (S(), T()), and lists of temperament-weighted information units and centroids. Table 3.4 Temperament-based learning algorithm. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Learning Agent Updated Tem peram ent- Segmented Inform ation Space Tem peram ent- Segmented Inform ation Space Classification Agent o o ,o . . o Segm ents O u tp u t Classifier Algorithm Strategies Popularity H euristic Sim ilarity Function Tem peram ent- Weighted Centroids Updated List of C luster Indicators C en tro id T em p e ram en t W eight Classification M echanism Inform ation Repository T he W eb DBs In fo rm atio n U nit Figure 3.8 The classification process. 3.5 Inference in the Classifier The classification process is to apply the learned temperament concept to infer the optimal pair of the segment and cluster in the temperament-based information space for a new information unit in the information repository. Traditional classifier algorithm classifies a new information unit into a cluster based only on the content similarity measure such as in the content-based filtering method. In contrast, the temperament-based classifier incorporates the temperament concept into the similarity measure (Figure 3.8). To estimate the target segment and cluster values of an information unit in the information repository, the classification agent employs the learned knowledge of temperament-segmented information space and 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperament-weighted centroids by the strategies of the classification mechanism, which includes “popularity similarity” measure and heuristic classification function. The assumption underlying the classifier algorithm is that a new information unit having content features similar to that of a widely liked set of information units by a particular group of people is probably liked by that group of people. Also, by observing the fact that a higher centroid temperament weight indicates the information units in the same cluster has a greater popularity among the user population, the centroid temperament weights is considered as an important factor when formulating the similarity measure to classify a new information unit. 3.5.1 The Popularity Similarity Measure The “popularity similarity” measure is to estimate the level of importance of both popularity and content similarity between a new (target) information unit vector V k and a centroid vector V c of a temperament-classified cluster c. Thus the measure relies not only on the traditional cosine similarity S i m ( V c , V k ) , but also on the temperament weight e c of V ,.. This lead to the definition of the “popularity similarity” between V c and Vk be expressed by P o p S i m ( V c , V k ) = e(. + Sim(Vc, V *) where V c - the centroid vector of a temperament-classified cluster Vu = a new information unit vector 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. e c = the temperament weight of V c and Sim(Vc, Vk) = the traditional cosine similarity measure The “popularity similarity” measure could be expressed in the other ways, such as using the product instead of the sum of the two factors. The two alternatives in addition to cosine measure are examined in more detail in Chapter 5. 3.5.2 Heuristic Classification Function The proposed approach to classifying a new information unit into a temperament-based information space is to assign the optimal pair of segment and cluster based on the “popularity similarity” measure. The heuristic classification function of the optimal pair is the maximally probable estimate of the target segment and cluster, { S tarSet,Ctarget), for which the “popularity similarity” P o p S i m ( V c , V k ) is maximum, as follows: ( S , m e t , C IBBet) = arg max P o p S i m { V c, V *) seS,ceC s where S = j s is a segment of a temperament-based information space} C s = {c | c is a cluster of segment s } V c = the centroid vector of a temperament-classified cluster V k = a new information unit vector and PopSim(Vc, V jt) = the “popularity similarity” between Vc and Vk The location of the new information unit can be adapted by its probability distribution of popularity from observing user feedback for that unit. The 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. classification agent also maintains a set of cluster indicators to be used in the filtering process for heuristic selection. A cluster indicator is the nearest neighbor of the centroid vector of that cluster. In contrast with that a centroid is an artificial vector, a cluster indicator is a real information unit. 3.5.3 The Classifier Algorithm The following classifier algorithm is to find the optimal pair of the segment and cluster to classify the new information unit based on the proposed “popularity similarity” measure and heuristic classification function. Input: A is the segmented information space where A = {( s , c ) | s e S and c e C s , 5 is a set of segments, C s is the set of clusters in segment s}, V is a set of centroid vectors, e is a set of centroid temperament weights, / is a set of cluster indicators, M is a set of indicator similarity values, and V * is the vector of a new information unit k . Process: for each new information unit k do 1. Find the maximally probable estimate of the segment and cluster pair that (S ta rg eb Q a rg et) = argmax P o p S i m ( V c , V k ) = argmax ( e c + Sim(Vc,Vk)) seS ,ce C s s€ S ,c e C s 2. Update the centroid vector of cluster C target in Stargei 3. Update the indicator of cluster C targ e t in S ta rg e t heuristically C= C ta r g e t if Sim(Vc, Vk) > M c I c = object id of k M c= Sim(Vr,Vk) Output: A classified information unit k with the maximally probable estimate of segment and cluster pair ( S k , C k ) = { S target,Ctarget) and an updated set of cluster indicators. Table 3.5 Temperament-based classifier algorithm. 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Filtering Module Filtering M echanism Tem peram ent- Segmented Inform ation Space User Profile Adaptation M onitor Agent S trateg ies •P > In fo rm atio n I U nits P T e m p e ra m e n t K ey T erm s R eq u est (U ser T e m p e ra m e n t an d /o r In te re st K ey T erm s) Filtering Algorithm Request Interception Tem peram ent Record Key Term s History C onstraint Satisfaction H euristic Selection Sim ilarity M easure Statistical Reasoning U se r Figure 3.9 The filtering process. 3.6 Inference in the Filtering Process The task of the filtering process is to apply the learned temperament concept to recommend relevant information from the temperament-segmented information space to the user. The assumption underlying the filtering process is that if there is a widely liked set of information units by a particular temperament type of people, then a user having that temperament type would probably like the information units in that set. The strategy of temperament-based filtering is different from content- based filtering in finding the optimal information units to match not only user interest term keys, but also user temperament (Figure 3.9). The user temperament constraint is used to restrict the search space where the centroid temperament weights are added to the similarity measures for new information classification to 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. guarantee conceptually meaningful solutions. Furthermore, heuristic selection rules are integrated with the filtering process to facilitate information recommendation when dealing with user requests to the database. If a user did not give enough descriptions about user temperament or interest key terms, the system will utilize the user profile to improve the recommendation. A user profile is a record of user behaviors, which include temperament, interest term keys, and feedback of a particular user. 3.6.1 Heuristic Selection Rules The knowledge of the temperament concept is informative to improve the search process in addition to the user interest key terms. The following heuristic selection rules based on the temperament concept are employed by the filtering agent to find information units for recommendation. • For a user with unknown temperament and interest, the agent returns the information units in segment S j where the information units are considered interested by any temperament of people. • When given the user temperament, the agent searches only the partial space, which contains that user temperament in its segments. • When given the user interest key terms, the agent selects items from the optimal target cluster by applying the same heuristic function used in the classification process. • When given the user temperament and the user interest key terms, the agent searches only the partial space, which contains that user temperament in its segments and selects items from the optimal target cluster by applying the same heuristic function used in the classification process. • The agent searches only the upper (higher popularity) quartile segments in the information space at quick search mode. Table 3.6 Heuristic selection rules. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The agent may heurisitically collects and returns the cluster indicators of the upper (higher popularity) quartile segments to the user at quick search mode. A cluster indicator is a nearest neighbor of a centroid and is maintained in the classifier algorithm. These selection strategies may eliminate redundant similarity measures and simplify the filtering process. 3.6.2 The Filtering Algorithm The following algorithm is to recommend the user a ranked list of information units, which best match user temperament and interest term keys. If a user did not give enough descriptions about user temperament or interest key terms, the system will utilize the user profile. A user profile is a record of user behaviors, which include temperament, interest term keys, and feedback of a particular user. Input: Request is a pair of (user temperament u , user interest term keys Vk), the segmented information space A = {(s,c) | s E S and c e C 5 , S is a set of segments, C s is the set of clusters in segment s } , V is a set of centroid vectors, / is a set of cluster indicators, and U is the user profile. Process: Update user profile U Case 1: Given no user specification 1. If user temperament is in the user profile go to Case 2 return 2. Else no user temperament is found 2.1. Display historical list of interest key terms in the user profile 2.2. If the user selects one of the interest key terms go to Case 3, Statement 2. return Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. For the segment s j in the information space 3.1. Collect the information units in segment s j 3.2. Retrieve the temperament weight of each information unit collected 4. Rank the information list by temperament weight Case 2: Given user temperament u only 1. Display historical list of interest key terms in the user profile 2. If the user selects one of the interest key terms go to Case 4 return 3. While within the partial information space where the segments contain u 3.1. If search mode is “quick” for each segment s of the upper quartile segments 3.1.1. If only indicators are needed collect the indicators of the upper quartile segments go to Statement 3.4. 3.1.2. Else go to Statement 3.3. 3.2. Else search mode is “advanced” for each segment s 3.3. Collect the information units in segment s 3.4. Retrieve the temperament weight of each information unit collected 4. Rank the information list by temperament weight Case 3: Given user interest key terms V k only 1. If user temperament is in the user profile go to Case 4 return 2. For each segment s in the information space 2.1. Find the maximally probable estimate of the cluster, which matches the interest ( s , c ta rg et) = argmax S i m ( V c , V k ) seS,c& Cs 2.2. Compute the similarity between the interest vector V k and each information unit in cluster C ta rge t, and collect the information unit 3. Rank the collected information list by descending (temperament weight * similarity) Case 4: Given user temperament u and interest key terms V k 1. While within the partial information space where the segments contain u , find the maximally probable estimate of the segment and cluster pair that (.Starget>Ctarget) = argmax S i n i ( V c , V k ) s eS ,ceC s 5 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. Compute the similarity between the interest vector V k and each information unit in cluster Ctarget of segment S target, and collect the information unit 3. Rank the information list by descending (temperament weight * similarity) Output: Deliver the ranked information list to the user. Table 3.7 Temperament-based filtering algorithm. User Profile A daptation Tem peram ent Record Key T erm s History Tem peram ent- Segmented Inform ation Space M onitor Agent Tem peram ent, Key Terms Modified Key Term s Feedback Feedback Feedback Tem peram ent Filtering Interception A lgorithm M odified Key T erm s M odule Relevance Feedback M ethod Inform ation Units Refined Inform ation Units Figure 3.10 The relevance feedback process. 3.7 Feedback Refinement The task of feedback refinement addresses the issue of adapting the recommendation service to user responses in order to provide a satisfactory recommendation service (Figure 3.10). For this, the Ide Dec-Hi method is adopted and modified. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.7.1 Relevance Feedback Method Although the conventional relevance feedback techniques, Standard Rocchio, Ide Regular, and Ide Dec-Hi, are considered having similar retrieval quality [1], the Ide Dec-Hi method is computationally very efficient and outperforms Ide Regular method and Standard Rocchio method in some cases [53]. The Ide Dec-Hi method adds all relevant document (information unit) term weights directly to query (interest key) terms, but subtracts only the top-most non-relevant document (information unit) terms. The basic idea of feedback refinement is to move requested interest term keys toward the documents rated "like" and from the documents rated "dislike". However, Aalbersberg [1] pointed out that negative relevant feedback can be omitted without significantly detracting from the retrieval quality. Hence, the modified Ide Dec-Hi method considers only the relevant information units and the modified query (interest term keys) vector is reformulated by =V, + £ v , ‘ V r *eD) where V k = the original interest (term keys, query) vector D k = the set of relevant information units (documents) retrieved for interest vector V k and V rk = a relevant information unit (document) in D k r 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The agent utilizes the modified Ide Dec-Hi method to refine the search results when given feedback by the user. 3.7.2 The Refinement Algorithm The following algorithm utilizes the modified Ide Dec-Hi method to refine the search results when given a feedback by the user. A ranked information list relevant to the feedback is collected and recommended to the user. Input: Request is a pair of (user temperament u, original interest term keys V k); D k is a set of relevant information units retrieved, {V r k}, for interest vector Vk ; and U is the user profile. Process: 1. Compute the modified interest vector of V k v , = v , + V r ‘eD r * 2. Update user profile U 3. Call the filtering algorithm (procedure) by passing the new Request pair of (user temperament u, modified interest vector V k) Output: A refined and ranked information list to the user. Table 3.8 Relevance feedback algorithm. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Experimental Prototype Implementations In this research, a prototype system was developed, implemented and experimentally tested to demonstrate the potential of the proposed temperament- based filtering method to automatically search information units for the user in comparison with the content-based filtering method. After presenting the design and methodology of the experimental prototype system, the following sections report four experiments conducted on three application domains: a document collection, an art image collection, and representation styles. The experimental results show that the proposed temperament-based filtering method is generally better than the content-based filtering method. 4.1 Overview of the Experimental Prototype System The functional architecture of the prototype system is shown in Figure 4.1. The implementation of the system has two key phases: learning and inference. In the learning phase, user-studies testing is conducted on a Web site. The user interfaces serve as the front end for the end-user to enter ratings (two-level scale), requests (temperament, interest key terms) and feedback (change of ratings) to the system. 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Learning Phase Inference Phase Classifications Unitl / ^ l t e r t o ^ s Agent The W eb Information Temperament-Based Learning "N Segments Web Pages Data Set Tem peram ent W eighted Centroids Training Data Set Test D ata Set Inform ation Units Request, Feedback Tem peram ent, Interests 3 E .Tem peram ent, Interests User Simulation Interfaces Sim ulated Sample Users Sample User Profiles Figure 4.1 Overview of the prototype system architecture. On the other hand, the interfaces are for the system to monitor and intercept user inputs, and to deliver and present the recommended information units to the user. The user interfaces are designed in HTML and CGI script languages. Each user has a user profile. A user profile records user temperament and keeps track of user interests and feedback of a particular user. A simulation model is constructed to generate simulated user profiles based on the user profiles obtained from the sample users [36, 54]. The learning agent learns the temperament concept to segment the information space formed by the training data set after observing the temperament and interest distributions of the simulated users. The information units in a segment are further grouped into clusters by intra-segment similarities and the centroid temperament weights of the clusters are estimated. The learned temperament concept of the temperament-based segments and temperament-weighted centroids is 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. then applied by the classification and filtering agents to classify and recommend information units in the test data set to the simulated users in the inference phase. At the beginning of a learning session, the system will prompt the user to enter his or her temperament if the user is a first time user; otherwise the system will retrieve the user's temperament type from the user profile. Meanwhile, the system will provide an optional link to the Web page of the Keirsey's temperament sorter, in case the user would need it to find out personal temperament type. On receiving the user temperament, the system automatically presents a list of training examples to that user. The user rates these training examples. If the user is not familiar with a training example or does not have a strong opinion, the user is asked not to rate that item. These ratings along with user temperament form a sample user profile. After completing the learning sessions provided by the sample users, a group of simulated users are generated based on the sample user profiles. The learning agent segments the training data set into temperament-based sample information space according to the interrelationship between ratings and temperaments of the simulated users. This temperament-segmented sample information space serves as a model to classify new information units in the inference phase. Furthermore, the learning agent estimates the temperament weights of the information units and centroids in the temperament segments by the statistical results obtained from the simulated user profiles. In the inference phase, the classifier classifies the information units in the test data set into the formed temperament-segmented information space based on the 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. optimal “popularity similarity” measures between the information units in the test data set and the centroids of the learned temperament-segmented sample information space. A ranked list of information units selected by the filtering module from the temperament-segmented information space is recommended to the simulated user based on the status of the simulated user profile. 4.1.1 User Simulation To better measure the effectiveness of the temperament-based filtering mechanism, a large set of experiments and a huge amount of data could potentially be required. We have therefore conducted a series of experiments and chosen to study the behavior of the mechanism with the help of user simulation based on the sample user profiles recorded in the sample survey. The users who participated in the experiments were adults living in the United States and mostly USC students. To enhance the quality of the behavioral analogy, a simulated user profile of a specific temperament type is built from a set of three randomly selected source user profiles of that temperament type. Moreover, to eliminate bias and false alarms of the source user profiles in the sample survey, the percentage distributions of the temperaments in the simulated user population are set to be the same values as those in the United States. Thus, the simulated user profiles are randomly selected from the source user profiles according to the percentage distributions of the temperaments in the United States described in the previous section. This guideline leads to the following procedure of the simulation algorithm. 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. For each temperament type do 2. For each simulated user profile of a specific temperament type do 2.1. Form a set of items based on three randomly selected sample user profiles of that temperament type 2.2. Randomly pick up items from that base set until the desired number of items is collected 3. Repeat step 2 until the number of simulated users generated for that type achieves the value such that the percentage distribution of the temperaments in the simulated user population is the same as those in the United States. Table 4.1 Simulation algorithm. A simulated user query vector was constructed by adding all the vectors of the information units in a simulated user profile. Assuming n information units in a simulated user profile, a simulated user query vector k is defined as 4.1.2 Evaluation Metrics One of the main purposes of this work is to study the relative efficiencies of the filtering methods so that proper choices can be made in various practical contexts. We have decided to use percentage accuracy [12] as the performance measure in the experiments. The percentage accuracy is the fraction of positive user feedback in the top n items recommended by the system. Each filtering method was applied to the training data set and evaluated according to how frequently it recommended the information units taken by the user, particularly for the top 10 items recommended to the user, in the separate test data set. n j=i 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In order to obtain statistically significant estimates of filtering performance, the sample data is separated into training data set (for the training/learning phase implementation) and test data set (for the inference phase implementation) in several possible ways. Each filtering method is then applied to each training data set and evaluated on the associate test data set. The results of these experiments are averaged. This procedure is run for both temperament-based and content-based filtering methods. 4.1.3 Experimental Procedure The proposed temperament-based approach incorporates human temperaments into the filtering process of the information recommendation service. Besides sharing similar patterns of human preferences and behaviors dominated by human temperaments, this method takes advantage of content-base filtering in vector representations and content analysis. For comparison purpose, content-based filtering method is also evaluated. In the content-based approach, information units are recommended based on the content similarities between a user interest and the information units in the database. The similarity measures depend only on the term features of the document vectors without considering any concept or context. To evaluate the approaches discussed above, the procedure implemented in the series of experiments is summarized as follows. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1) Construct vectors of the information units based on vector space model. a) Parse the information units to identify the individual words. b) Deleting the stop words in order to eliminate common words (poor discriminators). c) Truncating suffix to original word stems (for example, effective to effect, systems to system) in order to reduce the size of files. d) Compute term weights as a function of term frequency and inverse document frequency to construct vectors of the information units. 2) Construct simulated user profile vectors from the real user profiles by simulation algorithms. 3) Implement temperament-based filtering methods. a) Compute the popularity distributions of the information units based on user temperament and preferences. b) Segment the information space (training data set) by segmentation function and a preset threshold. c) Compute the temperament weights of information units based on the popularity distributions. d) Cluster information units in a segment by cosine similarity measure. e) Classify the testing data set into the optimal (segment, cluster) location by “popularity similarity” measure and heuristic classification function. f) Sort the segments into descending order by their accumulated percentage distributions of temperaments. g) Recommend information units in the test data set to the simulated users under different given conditions by heuristic selection rules. h) Compute the percentage accuracy of the method. 4) Implement content-based filtering methods. a) Cluster the information units in the information space (training data set) by content similarity measure. b) Classify the testing data set into the optimal cluster by cosine similarity measure. c) Recommend information units in the test data set to the simulated users by cosine similarity measure. d) Compute the percentage accuracy of the method. Table 4.2 Experimental procedure. 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.4 Experimental Setup In order to explore how well can the temperament-based filtering method learn to automatically and adaptively recommend information units to the user, four experiments were conducted on three application domains: a document collection, two art image collections, and representation styles. The performance of the proposed temperament-based method was evaluated in comparison with the content- based filtering method. The first experiment was conducted on a document domain with 2,000 Web pages of natural language texts. The second and third experiments were tested in the context of art images. In contrast to the long text of the documents, art images were annotated in very short terms consisting of only the titles and artist names. Two art image collections were used in this study: one with 2,000 art images and the other with 100 art images. Besides the size difference, the two collections vary in the way of picture selection. For the collection of 2,000 images, more than one picture of an artist may be selected. Thus, the vector representations of the annotations for the art images may have higher repetition terms. On the other hand, for the collection of 100 images, only one picture of an artist may be selected and the vector representations of the annotations for the art images may have lower repetition terms. The last experiment was to examine the relationship between user temperaments and user preferences in perceiving information with four representation styles at two semantic levels. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.4.1 General Considerations The proposed temperament-based filtering method focuses on capturing the interrelationship of human temperaments and user preferences hidden in the perceptual recognition process of user behavior when stimulated by the presentation of an information unit. In the experiments, the features (e.g., syntax, color, texture, shape, and size) of an information unit are not extracted or considered individually except key terms, rather they are characterized as a property presented on a whole as in a way that a user perceives it. 4.1.4.2 Threshold in Popularity for Segmentation Segmentation is a key step towards partitioning the sample information space into conceptually meaningful segments that would correspond to human temperaments. An information unit is learned to be in one of the segments if the popularity, or prior probability, of that information unit for a given temperament type exceeds some threshold value 0. The threshold values were examined in more detail in Section 4.2.2.1. For the analysis of the experiments, a base threshold 0 = 0.10 was used to represent the minimal evidence necessary to form the segments by the temperament filtering method. 4.1.4.3 Threshold in Cosine Similarity Measure for Clustering To reduce the size of comparisons, information units in the same segment are grouped into clusters if the cosine similarity measure is greater than some threshold A . A higher cosine coefficient indicates a greater similarity, whereas a lower 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. threshold produces fewer clusters. The main consideration of clustering is to keep the number of items neither too large nor too small and without losing representative characteristics in a cluster. To obtain an average of about 10 units in each of the generated clusters in content-based filtering, A was set to 0.07, 0.15, and 0.04 for experiments #1, #2, and #3, respectively. Whenever the user interest key terms were concerned, the same A value was applied to temperament-based filtering for clustering for comparison purpose. However, experiment #4 exploits the representation styles without employing cosine similarity measure and no A value is applied. 4.2 Experiment #1 on a Document Collection of 2,000 Web Pages To populate the database, 2,000 hyperlinks to the Web pages of the referenced articles were collected from the following 12 news Web sites that contain articles about high technology: cnet (www.cnet.com), Computer Reseller News (www.cm.com), DM Direct Newsletter (www.dmreview.com/editorial/dmdirect), Information Week Online (www.iweek.com), Internet Week (www.intemetwk.com), Internet World (www.intemetworld.com), Network World (www.networkworld.com/news), PC Week (www.zdnet.com/pcweek), The Standard (www.thestandard.com), TECHNews (www.acm.org/technews/articles), UP Side (www.upside.com), Zdnet News (www.zdnet.com/intweek). 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The actual contexts of the referenced articles were obtained and stored in the database by following the hyperlinks. Each article is an information unit. From the sample collection of 2,000 articles in the database, a fixed dictionary of 16,474 terms (word stems) was obtained after Porter algorithm [17, 43] was applied in deleting common words and removing suffix. The common words ignored were from a standard stop list of 566 words. The TF-IDF formula was then applied to compute term weights. To avoid over-fitting and reduce memory and communication load, only the 100 highest-weighted terms were kept in the document and user profile vector representations. This criterion is applied in all the experiments of this study. Recent experiments have shown that using too many words leads to a decrease in performance when classifying Web pages using supervised learning methods. The optimum is between 30 and 100 [4, 42]. The produced document vectors formed the sample data set. A document vector is a set of term weights, Dj=(tij, t2j, ..., tn j), where ty indicates the weight of term k in document j and assuming n terms in the document. The sample data were split into 20 data sets, 19 sets as the training data set for training and 1 set as the test data set for testing, in 20 possible ways. Each filtering method was then applied to each training data set and evaluated on the associate test data set. The results of these 20 tests were averaged. In a pilot test, the sample data was split into 40 groups. The results of the larger groups were similar to the 20 groups, but doubled the cost of storage space and CPU time. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.1 User Interfaces, Modeling, and Profiles Several graphical user interfaces were constructed on a Web site (Appendix A), http://www.usc.edu/dept/cs/tfiltering/infoworld, for the user to access the document information repository. To evaluate the experimental prototype, user-studies testing on the document domain was conducted between February 9, 2000 and February 16, 2000. The most difficult tasks encountered were to find users with the desired temperament type, especially NT and NF, and the quantity of users willing to test the system. Among the 28 adult users participated in the experiment, there were 18 SJs, 4 SPs, 3 NTs , and 3 NFs. After registered at the Web site of the experiment, a user was provided with a search tool and a list of indexes to the hyperlinks of the 2,000 document titles in the database. The user could search the database either by the author name or the article title. In addition, a user could directly browse the article titles as well. Each article title was linked to the original article at a Web site. The users were encouraged to pick up at least 200 articles of interest by clicking the corresponding checkboxes presented next to the information units. The monitor agent monitored and recorded the preferences of each user over time as the user picked up more articles. These records constituted the source user profiles learned. A user profile kept track of the index, title, and author of each article that was interesting to a particular user. At any point of a search session or any later revisit session, the user could choose to pick up more articles of interest or remove the unwanted articles 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. from the user profile. The user profile was designed to accommodate the evolving structure. Based on the source user profiles, a group of 1,000 simulated user profiles were generated by the simulation algorithm. By taking into account the statistical results on the percentage distributions of the temperaments in the United States, the simulated users consist of 467 Sis, 214 SPs, 161 NTs and 158 NFs. In an attempt to faithfully reflect the properties in the sample data, a simulated user had 200 articles of interest in the simulated profile as that was the number of articles encouraged to have for a real user. The query vector representations of the simulated user profiles were then constructed. 4.2.2 Experimental Results 4.2.2.1 Case 1: For Users with Unknown Temperament and Interest To help the user find interesting information units in a serendipitous search, temperament-based filtering is viable and productive by offering the information units in segment S i to the user. Such filtering strategy is based on the assumption that S i contains all those items evaluated as “liked” by everyone in the user population and would probably liked by the new user. In contrast, content-based filtering fails to predict any user behavior as the method requires the user to provide what they want exactly. The recommendation process consists in fixing a threshold 0 in popularity P { l i k e i \ t ) to achieve a certain confidence level for the segmentation. To investigate 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the features of the temperament-based filtering method, a series of popularity threshold 0 ranged from 0.05 to 0.55 were employed for segmentation and the recommendation results varied accordingly (Figure 4.2a). The first column is the top n items recommended from the test set by the filtering agent. The second column is the percentage accuracy of how many pieces of information units are interesting in the top n returned by the agent when 0 = 0.05. The third column is the percentage accuracy in the top n returned when 0 = 0.10, and so on. From 0 = 0.05 to 0.25, the percentage accuracy was slightly fluctuated between 29.03% and 41.53%. As higher threshold was applied the accuracy was likely to increase before the over-fitting effect appeared at 0 = 0.30. Thereafter, the accuracy fluctuated and gradually decayed. The over-fitting effect eventually expelled out all the items in segment Sj and the accuracy dropped dramatically to zero at 0 = 0.55. A symbol, in the table indicates that no item was recommended, and the percentage value was not available and considered to be zero. The popularity threshold indicates the minimum confidence level that the system believes in the learned knowledge. The percentage accuracy of the top four items recommended from segment Sj by the temperament- based filtering method when different popularity thresholds were applied is plotted in Figure 4.2b. The results indicate that the average accuracy of the top four items recommended is quite stable under the various thresholds applied. By concerning the effect of the initial bias typical in a simulation and the size of a segment not too 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. small, 0 = 0.10 instead of 0.05 was used as the base threshold by the temperament filtering method for the remaining analysis of the experiments. A comparative accuracy of the two methods using 0 = 0.10 for temperament- based filtering is shown in Figure 4.3. Temperament-based filtering obtains an average accuracy of 40.63% in predicting the top 1 item the user would be interested in, 36.37% in predicting both the top 2 items the user would be interested in, and so on. The gradual downward progression indicates that the accuracy decreases as the difficulty to predict more items increases. In comparison to the content-based filtering which is incapable of operating under this situation, it is shown that the temperament-based filtering method is robust and suggestive. 4.2.2.2 Case 2: Given the User Temperament In stead of doing a blind navigation, the user is assisted by a ranked list of information units, which matches the temperament of the user by the temperament- based method. The filtering agent recommends the classified testing information units by searching the partial space, which contains that user temperament in its segments. The performance accuracy is estimated for the four human temperaments, SJ, SP, NT, and NF, respectively. However, content-based filtering requires user query to make any further process and no recommendation is observed. To investigate the heuristic search rule employed by the temperament-based method, the filtering agent made two searches: one tracked only the upper quartile segments in the partial space and the other exhausted all the 8 segments in the partial space. The results showed that the performances of the two searches are almost the 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. same. However, the heuristic rule of searching the upper quartile segments would reduce the operation cost significantly. As illustrated in Figure 4.4, temperament- based filtering outperforms content-based filtering when given the user temperament. For users of SP temperament, temperament-based filtering even achieved an average accuracy of 51.64% in predicting the top 1 item the user would be interested in. Interestingly, the percentage accuracy is generally increased when user temperament is provided than when no user profile is involved. For all the top 10 items recommended to the SP or NT users that are relevant to their interests are between 38.55% and 51.64% accuracy, much better than that in case 1, between 29.63% and 40.63%. Although the accuracy for SJ users is not that promising, the accuracy for NF users in predicting all the top 6 items falls between 32.61% and 43.89%, which is improved when compared with that in case 1. 4.2.2.3 Case 3: Given the User Interest Key Terms To evaluate the performance, a simulated user profile that served as the user interest key terms was transformed into a term vector representation and compared with the centroid vectors of the clusters in the information space for both temperament-based and content-based methods. The information units in the cluster of the highest similarity are presented to the user. To avoid overly weighing temperament, cosine similarity measure is used in both methods for processing the relevant search. As illustrated in Figure 4.5, temperament-based filtering method has a better overall performance that for the top 10 items recommended it maintains a better accuracy ranging between 26.41% and 54.35% and produces average accuracy 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. improvements of about 82% over the content-based filtering method, ranging between 9.68% and 34.95%. It is interesting to note that the performance of content- based filtering in this case is somewhat worse than that of temperament-based filtering when given only the user temperament in case 1. 4.2.2.4 Case 4: Given the User Temperament and Interest Key Terms In this case, the strategies applied in cases 2 and 3 are combined. The filtering agent of the temperament-based method searches only the upper quartile segments in the partial space, which contains that user temperament. The cosine similarity measure is used in both methods for the relevant search. From the resulting percentage accuracy of the two methods (Figure 4.6), the accuracy of temperament-based filtering for SJs, SPs, or NTs in predicting all the top 10 items exceeds that in content-based filtering respectively, whileas the accuracy for NFs is about the same as that in content-based filtering. This suggests that when adapting information filtering to user temperament as well as interest key terms, more than 85% of the user population would be better satisfied with the recommendations provided by temperament-based method than by content-based method. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n Popularity Threshold 8 items .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 1 40.63 40.63 40.33 41.04 41 .53 40..55 40,.07 39,.60 39.23 39,.57 - 2 36.37 36.37 36.43 37.04 37,.30 37,.12 36..85 35,.88 35,.35 36,.22 - 3 34.60 34.76 34.86 35.41 35.51 35,.43 35..01 34.32 34,.02 34..70 - 4 33,.99 34.00 34.03 34.55 34.13 34..21 33..81 33,.40 33,.37 33..34 - 5 32,.93 33.00 32,.77 33.24 32,.93 33..04 32,.77 32,.67 32,.46 32,,15 - 6 31,.96 32,.06 31,.94 32,.26 32,.08 32..02 31..95 31,.76 31 .56 30,.96 - 7 31..23 31,.26 31..23 31,.45 31..39 31.,02 31..14 30..81 30,.62 29..82 - 8 30..68 30,,74 30..40 30,.65 30..67 30..21 30..41 30..15 29..71 28,,77 - 9 30..13 30,.22 29..72 29,.98 29..98 29..45 29..76 29..42 28..87 27..76 - 10 29..52 29.63 29..03 29,.40 29..30 28..87 29. 19 28..72 28..09 26..80 - (a) 50 •Top 1 items •Top 2 items •Top 3 items •Top 4 items 2 0 - j 1 ---- 1 ---- '---- 1 ---- 1 ---- 1 ---- 1 ---- '---- 1 ---- ' 0.0 0.1 0.2 0.3 0.4 0.5 P o p u la rity T h r e s h o ld (b) Figure 4.2 Accuracy of recommendation for the temperament-based filtering method by searching segment Sj for users with unknown temperament and interest (2,000 documents, 0 = 0.05 to 0.55, 20 tests), (a) Accuracy average (%). (b) Accuracy graph of the top four items recommended. 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n Filtering method items Temperament Content 1 40.63 0 2 36.37 0 3 34.76 0 4 34.00 0 5 33.00 0 6 32.06 0 7 31.26 0 8 30.74 0 9 30.22 0 10 29.63 0 (a) ■ - T emperament-based filtering —o — Content-based filtering 40 - 35 - 30 - 25- o ? • ^ 2 0 - o | 15- < 1 0 - 5 - 0 6 8 10 2 4 T o p n ite m s r e c o m m e n d e d (b) Figure 4.3 Accuracy of recommendation for users with unknown temperament and interest (2,000 documents, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n Temperament-based Content- items SJ SP NT NF based 1 33.84 51.64 42.48 43.89 0 2 28.84 43.64 41.30 43.75 0 3 27.30 42.24 40.48 40.85 0 4 26.66 43.38 40.75 36.12 0 5 26.15 41.59 39.76 34.70 0 6 25.80 40.33 38.67 32.61 0 7 25.07 39.84 38.35 30.73 0 8 24.98 39.09 38.94 28.09 0 9 24.74 39.21 39.06 25.59 0 10 24.60 38.55 38.66 23.65 0 (a) "Temperament-based filter "Temperament-based filter ■Temperament-based filter "Temperament-based filter •Content-based filtering ng for SJs ng for SPs ng for NTs ng for NFs 50- 40- d 30 o < 0 20 - 10 - — 1 — 10 T o p n ite m s r e c o m m e n d e d (b) Figure 4.4 Accuracy of recommendation when given the user temperament (2,000 documents, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). ( b ) Accuracy graph. 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n items Filtering method Temperament Content Improvement 1 54.35 34.95 56% 2 44.71 31.20 43% 3 41.05 31.85 29% 4 37.07 27.72 34% 5 33.35 27.21 23% 6 32.29 16.14 100% 7 29.86 13.83 116% 8 27.15 12.11 124% 9 26.14 10.76 143% 10 24.61 9.68 154% 82% (a) • T emperament-based filtering •Content-based filtering 55 50 45 40 35 30 25 20 15 10 5 - 4 6 8 T o p n ite m s r e c o m m e n d e d 10 (b) Figure 4.5 Accuracy of recommendation when given the user interest key terms (2,000 documents, 0 = 0.10, A , = 0.07, 20 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n T emperament-based Content- items SJ SP NT NF based 1 47.73 53.14 55.79 54.94 34..95 2 42.47 44.83 55.36 36.38 31 .20 3 38.64 41.45 50.75 32.06 31,.85 4 36.12 37.70 51.84 23.71 27,.72 5 33.87 33.90 38.86 14.52 27..21 6 32.06 31.63 27.75 10.32 16.14 7 30.10 28.33 23.79 8.85 13.83 8 27.90 26.02 20.81 7.74 12.11 9 26.77 24.14 18.50 6.88 10.76 10 24.69 24.33 16.65 6.19 9.(68 (a) ■ Temperament-based filtering for SJs —o — Temperament-based filtering for SPs —si— Temperament-based filtering for NTs T —Temperament-based filtering for NFs 0 Content-based filtering 55 50- 45- 40- 35- 30 - « 2 0 - 15- 1 0 - 8 10 0 2 4 6 T o p n ite m s r e c o m m e n d e d (b) Figure 4.6 Accuracy of recommendation when given the user temperament and the user interest key terms (2,000 documents, 0 = 0.10, X = 0.07, 20 tests): temperament- based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3 Experiment #2 on an Art Image Collection of 2,000 Pictures To populate the database, 2,000 art images of 206 artists were collected from the www.art.com Web site. An art image is an information unit. For each art image, the artist name and the art image title were obtained, stored in the database, and used as the context representation. Due to the small size of the terms in a context representation, a fixed dictionary of 2,169 terms (word stems) was obtained after Porter algorithm [17, 43] was applied in deleting common words and removing suffix from the sample collection of 2,000 art images in the database. The common words ignored were from a standard stop list of 566 words. The TF-EDF formula was then applied to compute term weights. The produced information unit vectors formed the sample data set. The sample data were further split into 20 data sets, 19 sets as the training data set for training and 1 set as the test data set for testing, in 20 possible ways. Each filtering method was then applied to each training data set and evaluated on the associate test data set. The results of these 20 tests were averaged. 4.3.1 User Interfaces, Modeling, and Profiles Several Graphical user interfaces were constructed on a Web site (Appendix B), http://www.use.edu/dept/cs/tfiltering/artworld, for the user to access the art image information repository. To evaluate the experimental prototype, user-studies testing on the collection of the 2,000 art images was conducted between February 9, 2000 and February 16, 2000. Same as in the first experiment, the most difficult tasks encountered were to find users with the desired temperament type, especially NT and 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. NF, and the quantity of users willing to test the system. Among the 32 adult users participated in the experiment, there were 20 SJs, 5 SPs, 4 NTs , and 3 NFs. After registered at the Web site of the experiment, a user was provided with a search tool and a list of hyperlinks to the 206 artists for the 2000 art images in the database. The user could search the database either by the artist name or the image title. In addition, a user could directly browse the art images of a particular artist by clicking on that artist name as well. The users were encouraged to pick up at least 200 art images of interest by clicking the corresponding checkboxes presented next to the information units. The monitor agent monitored and recorded the preferences of each user over time as the user picked up more images. These records constituted the source user profiles learned. A user profile kept track of the index, title, and author of each art image that was interesting to a particular user. At any point of a search session or any later revisit session, the user could choose to pick up more images of interest or remove the unwanted images from the user profile. The user profile was designed to accommodate the evolving structure. Based on the source user profiles, a group of 1,000 simulated user profiles were generated by the simulation algorithm. By taking into account the statisticl results on the percentage distributions of the temperaments in the United States, the simulated users consist of 467 SJs, 214 SPs, 161 NTs and 158 NFs. In an attempt to faithfully reflect the properties in the sample data, a simulated user had 200 art images of interest in the simulated profile as that was the number of art images 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. encouraged to have for a real user. The query vector representations of the simulated user profiles were then constructed. 43.2 Experimental Results As expected, the temperament-based filtering method outperforms the powerless content-based filtering method when no user information was provided, ranging between 28.86% and 39.54% (Figures 4.7), and when given the user temperament, ranging between 25.64% and 56.36% (Figure 4.8). Figure 4.9 shows that the two methods are compatible while the temperament-based filtering method, ranging between 13.89% and 38.04%, has slightly less average accuracy than the content- based filtering method, ranging between 14.72% and 40.50%, when given the user interest key terms. However, when given the user temperament and the user interest key terms, the temperament-based filtering method, ranging between 11.22% and 34.86%, has a somewhat lower average accuracy than that of the content-based filtering method, ranging between 14.72% and 40.50%, as shown in Figure 4.10. 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n Filtering method items Temperament Content 1 39.54 0 2 37.08 0 3 35.49 0 4 34.18 0 5 33.20 0 6 32.11 0 7 31.25 0 8 30.38 0 9 29.60 0 10 28.86 0 (a) ■ # Temperament-based filtering —o — Content-based filtering 45 -, 40- 35- 30 - q > - 25- o'- 2 0 - 15- 1 0 - 5 - 0 2 4 6 8 10 T o p n ite m s r e c o m m e n d e d (b) Figure 4.7 Accuracy of recommendation for users with unknown temperament and interest (2,000 art images, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n T emperament-based Content- items SJ SP NT NF based 1 33.17 47.92 32.30 56.36 0 2 31.32 45.67 32.19 49.45 0 3 29.96 41.50 33.89 43.80 0 4 28.95 38.85 35.13 38.46 0 5 28.77 40.18 32.28 36.57 0 6 27.63 39.08 32.58 34.85 0 7 27.45 37.95 32.53 32.36 0 8 26.79 37.63 32.82 29.64 0 9 26.27 37.00 32.19 28.77 0 10 25.64 35.33 32.12 27.99 0 (a) 60- 50- 4 0 - Co' 0s* ~ 3 0 - o c o | 2 0- < 10 - 0- • Temperament-based filtering for SJs *Temperament-based filtering for SPs “Temperament-based filtering for NTs »Temperament-based filtering for NFs ■Content-based filtering ~ i ---------- '-------------1 ---------- ' i--------1 ---------- 1 — 2 4 6 8 T o p n ite m s r e c o m m e n d e d 10 (b) Figure 4.8 Accuracy of recommendation when given the user temperament (2,000 art images, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n items Filtering method Temperament Content 1 38.04 40.50 2 30.57 33.35 3 25.26 28.11 4 21.33 24.80 5 18.72 22.03 6 17.81 19.84 7 17.05 18.29 8 15.90 16.70 9 15.61 16.89 10 13.89 14.72 (a) ■ Temperament-based filtering —O— Content-based filtering 45 40 35 30- 2 5 - 20 - 15- 1 0 - 5 - 0 2 4 6 8 10 T o p n ite m s r e c o m m e n d e d (b) Figure 4.9 Accuracy of recommendation when given the user interest key terms (2,000 art images, 0 = 0.10, X = 0.15, 20 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph. 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n T emperament-based Content- items SJ SP NT NF based 1 33.32 34.35 34.86 20.73 40.50 2 27.19 28.04 24.82 24.24 33.35 3 22.34 23.46 20.38 16.98 28.11 4 21.66 19.91 20.59 14.99 24.80 5 20.29 15.46 17.97 11.80 22.03 6 19.86 14.35 18.84 8.22 19.84 7 18.40 15.39 16.66 11.59 18.29 8 18.71 14.01 14.76 12.47 16.70 9 19.54 13.39 13.01 11.53 16.89 10 14.79 12.62 12.59 11.22 14.72 (a) 451 40- 35- 30- ^ 25- u ' 2 20 - 15- 10 - 5 - 0 - - 0 (b) Figure 4.10 Accuracy of recommendation when given the user temperament and the user interest key terms (2,000 art images, 0 = 0.10, A , = 0.15, 20 tests): temperament- based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 84 •Temperament-based filtering for SJs •Temperament-based filtering for SPs •Temperament-based filtering for NTs - Temperament-based filtering for NFs •Content-based filtering — i ----------- > ----------- 1 ----------- '----------- 1 ----------- > ----------- 1 ------------ ■ ----------1 — 2 4 6 8 10 T o p n ite m s r e c o m m e n d e d Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4 Experiment #3 on an Art Image Collection of 100 Pictures The data collection and organization in this experiment are similar to those in experiment 2.1 using a larger image collection. However, in order to maximize the art images with different feature representations, this smaller database contained 100 art images of 100 artists randomly collected from the www.art.com Web site. Only one picture was selected for each of the 100 artist. Due to the small size of the database and the terms in a context representation, a fixed dictionary of 378 terms (word stems) was obtained after Porter algorithm [17, 43] was applied in deleting common words and removing suffix from the database. The TF-IDF formula was then applied to compute term weights. The produced information unit vectors formed the sample data set. The sample data were further split into 4 data sets, 3 sets as the training data set for training and 1 set as the test data set for testing, in 4 possible ways. Each filtering method was then applied to each training data set and evaluated on the associate test data set. The results of these 4 tests were averaged. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.1 User Interfaces, Modeling, and Profiles The design and operations of the user interfaces are similar to those of experiment #2 using a collection of 2,000 art images. Several Graphical user interfaces were constructed on a Web site (Appendix C), http://www.use.edu/dept/cs/tfiltering/artcenter, for the user to access the art image information repository. To evaluate the experimental prototype, user-studies testing on the collection of the 100 art images was conducted between October 3, 2000 and October 19, 2000. Among the 125 adult users participated in the experiment, there were 66 SJs, 21 SPs, 16 NTs, and 22 NFs. For each user login session, the monitor agent monitored and recorded the preferences of each user over time as the user picked up more images. These records constituted the source user profiles learned. Based on the source user profiles, a group of 1,000 simulated user profiles were generated by the simulation algorithm. By taking into account the statistical results on the percentage distributions of the temperaments in the United States, the simulated users consist of 467 SJs, 214 SPs, 161 NTs and 158 NFs. A simulated user had 20 art images of interest in the simulated profile. The query vector representations of the simulated user profiles were then constructed. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.2 Experimental Results The temperament-based filtering method has significantly better average accuracy of the top 10 items the user would be interested in across all the cases. While content-based filtering failed to make any recommendation, temperament- based filtering obtained a plausible average accuracy of 60.33% when no user information was provided (Figure 4.11) and ranging from 53.01% to 69.98% when given the user temperament (Figure 4.12) in predicting the top 1 item the user would be interested in. Moreover, when given the user interest key terms, temperament- based filtering method obtained an average accuracy varying from 43.32% to 89.73% in predicting the top 10 items the user would be interested in and produced average accuracy improvements of about 86% over the content-based filtering method as shown in Figure 4.13. Figure 4.14 shows that the temperament-based filtering method maintains a better performance for all types of user temperaments, ranging between 39.49% and 91.74%, as compared with the content-based filtering method, ranging between 21.32% and 68.34%, when given the user temperament and the user interest key terms. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n items Filtering method Temperament Content 1 60.33 0 2 51.01 0 3 45.63 0 4 42.28 0 5 39.59 0 6 37.43 0 7 35.85 0 8 34.09 0 9 32.54 0 10 31.12 0 (a) ■ Temperament-based filtering —O— Content-based filtering 65-j 60- 55- 50- 45- 40- 35- 30- 25- 2 0 - 15- 1 0 - 5- 0- o c o 0 2 4 6 8 10 T o p n ite m s r e c o m m e n d e d (b) Figure 4.11 Accuracy of recommendation for users with unknown temperament and interest (100 art images, 0 = 0.10, 4 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n T emperament-based Content- items SJ SP NT NF based 1 59.69 69.98 56.52 53.01 0 2 50.86 59.93 44.95 50.55 0 3 47.23 54.40 40.22 43.99 0 4 44.50 47.40 37.15 40.90 0 5 41.71 42.78 35.99 40.06 0 6 39.76 39.45 34.58 38.29 0 7 37.96 37.52 33.72 36.14 0 8 36.30 34.77 31.83 35.13 0 9 34.42 33.85 30.85 34.09 0 10 32.96 32.04 29.57 32.17 0 (a) ■ Temperament-based filtering for SJs —O— Temperament-based filtering for SPs —< 2 * — Temperament-based filtering for NTs Temperament-based filtering for NFs < D Content-based filtering 0 2 4 6 8 10 T o p n ite m s r e c o m m e n d e d (b) Figure 4.12 Accuracy of recommendation when given the user temperament (100 art images, 0 = 0.10, upper quartile segments, 4 tests): temperament-based vs. content- based. (a) Accuracy average (%). (b) Accuracy graph. 89 80 70 60 ^ 50 S ? ^ 40 o 2 3 30 8 < 20 10 oH Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n __________ Filtering method items Temperament Content Improvement 1 89.73 68.34 31% 2 84.16 56.21 50% 3 81.32 49.88 63% 4 73.99 44.89 65% 5 68.04 39.12 74% 6 72.84 34.24 113% 7 65.51 30.08 118% 8 58.32 26.55 120% 9 52.41 23.66 121% 10 43.32 21.32 103% 86% (a) ■ Temperament-based filterin g —O — Content-based filtering 951 90 - 85 - 80 - 7 5 - 70 - 6 5 - 60 - 55 - 50 - 45 - 40 - 35 - 3 0 - 2 5 - 2 0 - 8 10 0 2 4 6 T o p n ite m s r e c o m m e n d e d (b) Figure 4.13 Accuracy of recommendation when given the user interest key terms (100 art images, 0 = 0.10, A , = 0.04, 4 tests): temperament-based vs. content-based, (a) Accuracy average ( % ) . (b) Accuracy graph. 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top n T emperament-based Content- items SJ SP NT NF based 1 89..13 91.74 83..62 80..36 68..34 2 84..52 85.84 78,.60 73,.77 56..21 3 79..94 79.51 75.94 62..99 49.,88 4 73.,59 71 .49 69.23 62..96 44..89 5 67..71 64.38 63.63 56..94 39..12 6 73..87 69.83 78,.09 70,.58 34..24 7 66..97 61 .33 70..88 67 .66 30,.08 8 59..90 55..98 63 .50 59..56 26..55 9 53..92 50.25 57,.62 52..94 23..66 10 43..68 41 .45 42,.57 39..49 21..32 (a) Temperament-based filtering for SJs Temperament-based filtering for SPs Temperament-based filtering for NTs Temperament-based filtering for NFs Content-based filtering 951 90- 85- 80- 75- 70- 65- & SO­ S' 55-- 45- 40- 35- 30- 25- 2 0 - 0 2 6 8 10 4 T o p n ite m s r e c o m m e n d e d (b) Figure 4.14 Accuracy of recommendation when given the user temperament and the user interest key terms (100 art images, 0 = 0.10, A , = 0.04, 4 tests): temperament- based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 Experiment #4 on Representation Styles This experiment is designed to explore the relationship between user temperaments and user preferences in perceiving information of various representation styles to demonstrate that content alone is not sufficient and utilizing user temperaments is beneficial for adaptive personalized assistance in on-line presentation and selection. Pieces of information that are presented to the user are randomly labeled with duplicated serial numbers or letters, such as (a) or (b). No additional annotations are provided for the representation styles that searching by the key terms is not meaningful and content-based filtering is not applicable for recommendation. In contrast, the temperament-based filtering method is powerful for either a static or dynamic information space. An information space, such as a questionnaire, the questions formulated are usually fixed, rather than changed in nature. Thus, a target information space may be categorized to be either static or dynamic. Therefore, two experiments were conducted to evaluate the temperament- based filtering method on both aspects. 4.5.1 Questionnaire Design and Modeling A questionnaire was designed to capture how user temperaments may influence user understanding and selection of the information presented in different representation styles. The data of user temperaments and interest distributions of the sample population obtained from the sample survey is to make inferences about user preferences of the population by the proposed temperament-based filtering method. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the questionnaire, the information to be communicated is represented at two semantic levels: abstract and detail. The semantic level of a representation is determined by both the contents of its constituents and their configuration in granularity. At the abstract level, the information to be communicated is that Carson Palmer, for example, is a student at USC; whereas at the detailed level, the information to be communicated is Carson Palmer's profile as a football player. At each level, there are four representation styles: a graph, text, a concept object data model, and a relational database model (Figures 4.15 and 4.16). The users may select their favorite representation from among many alternatives. However, by a consideration of presenting questions in a simple form to the user and obtaining indicative user disposition, only two alternatives labeled in letters were provided in each question, such as Let us assume that we are trying to communicate an information fact, for example, that Carson Palmer is a student at USC. You will be presented several questions, and asked which of the two alternatives is more effective for YOU in receiving the information fact. Which is more effective for you (a) or (b) and the associated representation styles could be accessed by following the hyperlinks on the menu of the same page. The 2-combinations of the four representation styles makes a set of six questions for each semantic level. A representation of information is an information unit. Two information units are 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Carson Palmer (a) Graph I. Carson Palmer is a student at the University of Southern California. (b) Text I. Student Gar&on Palmer USC Students S u lta n A b d u l-M a lik S ham sud-D in A b d u l-S h a h e e d M a r c e ll A llm ond_____________ K evin A rb e t D oval B u tle r Sunnv B vrd C h r is C ash M a tt C a s s e l A aron Graham A lex Hoinjes Chahwa L in ZeTte M oreno B ren n an Ochs C a rso n P alm er K ris R ic h a rd M arkus S t e e l e Z ach W ilso n (c) Concept object data model I. (d) Relational database model I. Figure 4.15 Representation styles at the abstract level. 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) Graph II. • USC sophomore quarterback Carson Palmer was born on 12/27/79. The 6-fo o t-5, 220-pound Palmer with IV experience comes from Laguna Niguna after graduated from Santa Margarita High School. (b) Text II. Figure 4.16 Representation styles at the detail level. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Name 12/27/1979 — @ * Student Carson Palmer S) RS Sophomore Laguna. Niguel S) Santa Margarita l»{s): ---- Quarterback (c) Concept object data model II. U SC F o o tb all R o ste r Name Pas H gt W gt Birthday Cl Exp H om etow n High School Sultan Abdul DE 6-3 240 9/26/77 Sr 3 V Arcadia Arcadia Shamsud Abdul DE-DT 6-4 250 10/12/77 Sr 3 V Los Angeles Verb urn Dei Marcell Alim ond WR 6-1 190 5/28/81 So IV Anaheim St. Paul Kevin A rbet S-CB 5-11 175 3/26/81 So IV Stockton St. Mary's Doyal Butler TE 6-3 245 2/4/80 Jr Je Tucson, AZ Sabino Aaron Graham LB 6-1 225 6/12/81 So IV Bakersfield Bakersfield ChrisH eward TB 5-11 180 2/2/82 Fr ■ Los Angeles Banning Zeke M oreno LB 6-3: 245 10/10/78 Sr 3 V Chula Vista Castle Park Ifeanyi Ohaiete s 6-2 225 5/22/79 Sr 3 V Los Alamitos Los Alamitos 1 Kris Richard rCB 6-0 1 SO 10/28/78 Jr 2 V C arson Serra Markus Steele LB 6-3 220 7/24/79: Sr IV Long Beach Chanel Zach Wilson OG-OT 6-5 315 10/14/79 So IV Bellflower Mayfair (d) Relational database model II. Figure 4.16 (continued) Representation styles at the detail level. 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. presented to the user as two alternatives in a question. Thus, each semantic level consists of 12 information units. The labels of the presentation styles in a question are simply to distinguish one alternative from the other and are not suitable to be used as search key terms. No conventional vector representations of the information units, instead the distributions of the user preferences based on temperaments are obtained and stored in the database for analysis. 4.5.2 User Interfaces, Modeling, and Profiles Several Graphical user interfaces were constructed on a Web site (Appendix D), http://www.use.edu/dept/cs/tfiltering/football, for the user to access a main menu and a questionnaire with several questions of some specific information. To evaluate the experimental prototype, user-studies testing on the representation styles was conducted between October 3, 2000 and October 19, 2000. Among the 121 adult users participated in the experiment, there were 68 SJs, 18 SPs, 15 NTs , and 20 NFs. For each user login session, the user will be presented a questionnaire with six questions at the abstract level and six more at the detail level and asked which of the two alternatives is more effective for the user in receiving the understanding the presented information. The user may change answers as needed. The monitor agent monitored and recorded the preferences of each user over time as the user answered more questions. These records constituted the source user profiles learned. 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.53 Experiment #4.1 for Static Information Space If a target information space is static in some extent, no new information classification is necessary. Recommendation will be offered to a user according to the popularity distributions of the information units and preference distributions of the current user population in the temperament-segmented information space. Based on the source user profiles, a group of 2,000 simulated user profiles were generated by the simulation algorithm. By taking into account the statistical results on the percentage distributions of the temperaments in the United States, the simulated users consist of 940 SJs, 420 SPs, 320 NTs and 320 NFs. The vector representations of the simulated user profiles were then constructed to form the sample data set. The sample data were further split into 20 data sets, 19 sets as the training data set for training and 1 set as the test data set for testing, in 20 possible ways. The temperament-based filtering method was then applied to each training data set to develop the static segmented information space, and evaluated on the associate test data set. The results of these 20 tests were averaged for the abstract level and the detail level, respectively. 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.53.1 Experimental Results In this experiment, the system presents a rather fixed questionnaire of representation styles to the user. Neither user query nor key terms search is available for the user to select the favorites. The keyword-dependent content-based filtering is not applicable. In contrast, temperament-based filtering has a remarkable performance. For users with unknown temperament and interest, the average accuracy achieves 82.80% at the abstract level and 94.50% at the detail level in prediction the top 1 item (a representation style) the user would prefer for some question (Figure 4.17). In predicting all the top six items that the user would prefer, the average accuracy is 64.58% at the abstract level and 75.36% at the detail level (Figure 4.17). Furthermore, when given the user temperament, the average accuracy maintains a high efficiency ranging between 60.99% and 88.13%% at the abstract level (Figure 4.18) and between 72.19% and 100.00% at the detail level (Figure 4.19) in predicting the representation styles the user would prefer in the questionnaire. 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top ________Filtering method n _T emperament-based items Abstract level Detail level 1 82.80 94.50 2 74.33 86.75 3 70.75 84.03 4 68.56 82.61 5 66.76 80.31 6 64.58 75.36 (a) ■ Temperament-based filtering - Abstract level —o — Temperament-based filtering - Detail level 100 80- 60- >N 0 1 40- i ■ 2 0 - 0- 6 1 3 4 5 2 T o p n ite m s r e c o m m e n d e d (b) Figure 4.17 Accuracy of recommendation for users with unknown temperament and interest for static information space (4 representation styles, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Filtering method n Temperament - ■ Abstract level items SJ SP NT NF 1 86.06 70.48 84.06 88.13 2 78.94 68.93 65.94 76.25 3 70.71 72.86 66.98 71.88 4 66.06 75.48 65.78 69.61 5 64.34 75.81 64.25 64.50 6 63.23 69.01 60.99 66.35 (a) ■ Temperament-based filtering for SJs —o — Temperament-based filtering for SPs —A— Temperament-based filtering for NTs ■ y Temperament-based filtering for NFs 90 80 70 60 ■ 5 " 2 - 50 o’ « 40 o 30 <c 20 10 0 -------- , -------- ! -------- , -------- j -------- , -------- , -------- , 0 2 4 6 T o p n ite m s r e c o m m e n d e d (b) Figure 4.18 Accuracy of recommendation when given the user temperament for static information space (4 representation styles at the abstract level, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Filtering method n Temperament - Detail level items SJ SP NT NF 1 92.77 100.00 94.06 92.81 2 85.00 97.14 82.50 82.50 3 82.16 95.79 78.33 79.79 4 80.69 93.33 79.38 77.42 5 77.60 90.29 77.75 77.75 6 73.65 83.29 73.13 72.19 (a) •Temperament-based filtering for SJs •Temperament-based filtering for SPs • Temperament-based filtering for NTs •Temperament-based filtering for NFs 100- 90- 80- 70- cr 0s - 60- > * O 50- 40- 8 < 30- 20- 10- 0- 0 T o p n ite m s r e c o m m e n d e d (b) Figure 4.19 Accuracy of recommendation when given the user temperament for static information space (4 representation styles at the detail level, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5.4 Experiment #4.2 for Dynamic Information Space On the other hand, if a target information space is dynamic in nature, new information units are classified into the temperament-segmented information space. Recommendations will be made from the adapted information space. Supposed that key terms are not available, the cosine similarity measure is not applicable for classification. The approach to classifying a new information unit into a temperament-based information space would be reduced to assign the optimal pair of segment and cluster, (S target,C target), for which the centroid temperament weight, e C i is maximum, as follows: (>W , , C(a rg „ ) = arg max P o p ( V c , V k) szS ,ceC s = arg maxOc + S i m ( V c , V k)) A‘ £ S ,C & Cs = arg max e c seS ,csC s The location of a new information is then dynamically adjusted by the distributions of the user preferences in the user population on the fly. Based on the source user profiles, a group of 1,000 simulated user profiles were generated by the simulation algorithm. By taking into account the statistical results on the percentage distributions of the temperaments in the United States, the simulated users consist of 467 SJs, 214 SPs, 161 NTs and 158 NFs. The vector representations of the simulated user profiles were then constructed. For the evaluation, the twelve information units at a semantic level were divided into two sets, one for training and the other for testing, in 20 possible ways. The 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperament-based filtering method was applied to each training data set to form the sample temperament-segmented information space. The information units in the associate test data set were then classified into the sample space and filtered to the simulated user. The results of these 20 tests were averaged. The procedure was applied to the abstract level and the detail level experiments, respectively. 4.5.4.1 Experimental Results In this experiment, the system presents a questionnaire that is subject to change to the user. However, neither user query nor key terms search is performed for the user to select the favorites. The keyword-dependent content-based filtering is not applicable. In contrast, temperament-based filtering makes good predictions. For users with unknown temperament and interest, the average accuracy achieves 72.84% at the abstract level and 88.88% at the detail level in prediction the top 1 item (representation style) the user would prefer for some question (Figure 4.20). In predicting all the top six items that the user would prefer, the average accuracy is 48.57% at the abstract level and 52.09% at the detail level (Figure 4.20). Moreover, when given the user temperament, the average accuracy maintains an efficiency ranging between 48.5% and 77.79%% at the abstract level (Figure 4.21) and between 51.8% and 98.08% at the detail level (Figure 4.22) in predicting the representation styles the user would prefer in the questionnaire. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top ________Filtering method n _T emperament-based items Abstract level Detail level 1 72.84 88.88 2 67.42 83.41 3 62.58 75.43 4 57.63 67.86 5 53.23 59.96 6 48.57 52.09 (a) —■— Temperament-based filtering - Abstract level —O— Temperament-based filtering - Detail level 100 n 80- 6 0 - o § 4 0 - 2 0 - 0- 1 2 3 4 5 6 T o p n ite m s r e c o m m e n d e d (b) Figure 4.20 Accuracy of recommendation for users with unknown temperament and interest for dynamic information space (4 representation styles, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Filtering method n items Temperament - Abstract level S J SP NT NF 1 77.79 73.13 67.02 72.78 2 68.28 75.54 64.84 65.6 3 63.48 69.83 61.78 60.72 4 59.54 62.02 57.14 56.46 5 56.94 53.6 52 53.4 6 52.61 48.99 50.4 48.5 90- 80- 70- 60- Co' S - 5 0 - >» o g 40- Q 30- < 2 0 - 10 - 0- 0 (a) ■Temperament-based filtering for SJs ■Temperament-based filtering for SPs ■Temperament-based filtering for NTs ■ Temperament-based filtering for NFs T o p n ite m s r e c o m m e n d e d (b) Figure 4.21 Accuracy of recommendation when given the user temperament for dynamic information space (4 representation styles at the abstract level, 0 = 0.10, upper quartile segments, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Filtering method n Temperament - Abstract level items SJ SP NT NF 1 87.21 98.08 82.4 81.91 2 81.79 94.23 77.41 77 3 75.44 83.54 72.83 72.98 4 68.92 73.67 66.41 68.02 5 61.76 62.89 59.6 60.3 6 53.61 52.88 51.8 52.8 (a) ■ ■ ■ Temperament-based filtering for SJs —o — Temperament-based filtering for SPs —A— Temperament-based filtering for NTs " v Temperament-based filtering for NFs > 1 ■ 1 1 1 ■ 0 2 4 6 T o p n ite m s r e c o m m e n d e d (b) Figure 4.22 Accuracy of recommendation when given the user temperament for dynamic information space (4 representation styles at the detail level, 0 = 0.10, 20 tests): temperament-based vs. content-based, (a) Accuracy average (%). (b) Accuracy graph. 107 100 - 90- 80- 70- g BO­ S ’ 50- c o 3 40 - o y < 30- 2 0 - 10 - 0- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter Notes It is interesting to note that when using the temperament-based filtering method, a system with a term-centric collection performs better when a higher threshold for cosine similarity measure is applied. On the other hand, a system with an image- centric collection performs better when a lower threshold for cosine similarity measure is applied. The conventional cosine similarity measure based only on key terms without considering other features in the information unit causes overfitting effect appear at an earlier stage for a non term-centric collection when compared with using the content-based filtering method. 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Examination of the Popularity Similarity Measures Similarity measurement is a technique used in clustering analysis for document classification that assigns items to automatically created groups when the degree of association between items and groups exhibits sufficiently large similarities or short distance. In Chapter 3, a modified similarity measure called “popularity similarity” measure is introduced with the goal of improving the effectiveness of information recommendation. In Chapter 4, the “popularity similarity” measure is employed in the experiments to classify a new information unit into an optimal location of the information space. In this chapter, two possible expressions of the “popularity similarity” measure are examined in more detail. The experimental environments of the three experiments discussed here are the same as Experiments #1 to #3 as described in Chapter 4. Among the two possible expressions of the “popularity similarity” measure under consideration and the conventional cosine similarity measure as well, the rationality of the proposed addition equation is confirmed by the experimental results on the three data sets collected. 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1 General Considerations In the temperament-based filtering method, the classification process consists of two main tasks. The first is obtaining for each cluster a centroid temperament weight in reflecting its potential importance of temperament influence. And, the second is measuring the context similarity between a new information unit and the centroids of the clusters in detecting the degree of similar terms in the context representation. A new information unit is then classified into an optimal target segment and cluster location ( S ta rge t , Q a r g e t ) where the combined effect of the two factors has a maximum value. The above considerations lead to a “popularity similarity” measure which may be computed as a composite function taking into account the centroid temperament weight e c of a centroid vector V c as well as conventional cosine similarity S i m ( V c, V * ) between a new information unit vector V * and V c . The two possible equations computing the “popularity similarity” are P o p S i m ( V c , V k ) = e c + S i m ( V c , V k ) (5.1) and P o p S i m ( V c , V k ) = e c S i m ( V c , V k ) (5.2) 5.2 Experimental Results For comparison purpose, the two possible “popularity similarity” equations - Equations 5.1 and 5.2, in addition to the conventional cosine similarity measure, were experimentally tested. Based on the three collections under study, the 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. experiments tested for four user conditions: (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, and (d) Given the user temperament and the user interest key terms. Three experiments were conducted. One averaged over 20 tests for a collection of 2,000 documents, another averaged over 20 tests for a collection of 2,000 art images, and the other averaged over 4 tests for a collection of 100 art images. The average accuracy figures for the top 10 items recommended by the system in comparing the three measures for the three data collections were shown in Tables 5.1 to 5.3 and discussed in the following sections. The results show that the addition equation (Equation 5.1) appears to be a generally better approach than either the multiplication equation (Equation 5.2) or cosine similarity measure for classification. 5.2.1 For a Collection of 2,000 Documents The addition equation produced somewhat better average results, ranging between 30% and 41%, than that of either using the multiplication equation, ranging from 19% and 39%, or using the cosine measure, ranging from 19% and 37%, when neither the user temperament nor the user interest was provided (Table 5.1a). The results of the three approaches are similar for the four user temperaments (SJ, SP, NT, and NF) as shown in Table 5.1b when given the user temperament. It may be seen from Table 5.1c that significant improvement using the addition equation was obtained by achieving an average accuracy of 54%, instead of 35% using the multiplication equation in predicting the top 1 item the user would be interested in 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. when given the user interested key terms. In this case, the cosine measure had a moderate figure of 47%. Table 5.Id shows that the addition equation outperforms both the multiplication equation and cosine measure in most cases when given the user temperament and the user interest key terms. 5.2.2 For a Collection of 2,000 Art Images The addition equation maintained a performance, ranging between 29% and 40%, compatible with that using the document data set when neither user temperament nor user interest was provided (Table 5.2a). In contrast, the performances of the other two approaches deteriorated, ranging between 13% and 29% for the multiplication equation and ranging between 5% and 21% for cosine measure. Table 5.2b shows that the addition equation is generally slightly better than the other two approaches when given the user temperament. The three approaches are compatible with similar average accuracy figures when given the user interest key terms or when given the user temperament and the user interest key terms (Tables 5.2c and d). 5.2.3 For a Collection of 100 Art Images It is interesting to note in Table 5.3a that the average accuracy using the addition equation, ranging between 31.12% and 60.33%, is significantly better than that of using the multiplication equation or cosine measure where no recommendation was made when no user information was provided. Table 5.3b shows that with some minor exceptions, the addition equation is generally better than the other two 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. measures when given the user temperament. Surprisingly high average accuracy was obtained using the addition equation, ranging between 80% and 90%, in predicting the top 3 items the user would be interested in when given the user interest key terms or when given the user temperament and the user interest key terms (Tables 5.3c and d). Moreover, the addition equation made fully significant performance improvements over the other approaches as shown in these two tables. 5.3 Summary In examining the “popularity similarity” measures, the addition equation provides a generally better performance in the experiments for three collections. Moreover, the addition equation is much more comprehensive than either the multiplication equation or the conventional cosine similarity measure. As shown in Table 5.3a, the addition equation obtains plausible predictions, whereas neither the multiplication equation nor the cosine measure makes any prediction. From the experiments, it is clear that while for the multiplication equation, a high centroid temperament weight could overly reflect the importance of the temperament factor; the cosine measure ignoring the temperament factor might underestimate it. New information units would then be placed into inappropriate clusters in the classification process and impair the results. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Accuracy of Recommendation, % n items e c + Sim(Vc, V * ) e c S i m ( V c , V k) Sim(Vc,V k) 1 41 39 37 2 36 35 31 3 35 32 29 4 34 30 24 5 33 28 25 6 32 26 22 7 31 24 24 8 31 22 26 9 30 20 23 10 30 19 19 (a) Top Accuracy of Recommendation, % n f’ c + S i m ( V c, V k) e• c Sim{Vc, V k) S i m ( V c, V k) items SJ SP NT NF SJ SP NT NF SJ SP NT NF 1 34 52 42 44 34 52 42 39 33 52 45 36 2 29 44 41 44 30 44 40 40 29 45 42 40 3 27 42 40 41 28 42 39 38 27 42 42 35 4 27 43 41 36 26 43 40 36 26 42 41 34 5 26 42 40 35 26 41 40 33 25 40 41 32 6 26 40 39 33 25 40 39 31 25 39 40 29 7 25 40 38 31 24 39 40 28 24 37 39 28 8 25 39 39 28 24 39 39 28 24 36 39 29 9 25 39 39 26 24 38 38 28 23 35 37 29 10 25 39 39 24 23 37 38 27 22 34 36 29 (b) Table 5.1 Comparison of “popularity similarity” measures by accuracy of recommendation (2,000 documents, 20 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms. 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Accuracy of Recommendation, % n items e c + S i m ( V c , V k) S i m ( V c , V k) S i m ( V c , V k) 1 54 35 47 2 45 31 44 3 41 31 46 4 37 30 39 5 33 27 32 6 32 23 26 7 30 23 22 8 27 19 20 9 26 18 17 10 25 16 16 (c) Top Accuracy of Recommendation, % n e c + S i m ( V c , vk ) e c S i m ( V c, V k) S i m ( V c , V k) items SJ SP NT NF SJ SP NT NF SJ SP NT NF 1 48 53 56 55 41 43 54 45 43 45 46 46 2 42 45 55 36 34 43 52 33 40 47 43 29 3 39 41 51 32 30 39 49 26 46 49 52 24 4 36 38 52 24 29 37 47 23 36 37 39 19 5 34 34 39 15 29 34 44 20 29 29 31 15 6 32 32 28 10 25 30 42 18 24 25 26 13 7 30 28 24 9 24 30 40 17 21 21 22 11 8 28 26 21 8 19 24 32 16 18 18 20 10 9 27 24 19 7 17 19 26 14 16 16 17 8 10 25 24 17 6 16 17 24 13 14 15 16 8 (d) Table 5.1 (continued) Comparison of “popularity similarity” measures by accuracy of recommendation (2,000 documents, 20 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Accuracy of Recommendation, % n items e c + S i m ( V c , V k) e c S i m ( V c , V k) S i m ( V c , V k) 1 40 2 9 21 2 37 2 5 18 3 35 23 12 4 34 20 12 5 33 17 9 6 32 16 8 7 31 14 7 8 30 13 6 9 30 13 5 10 29 13 5 (a) Top Accuracy of Recommendation, % n e c + S i m ( V c,vk ) e c S i m ( V c , V k ) S i m ( y c■ ,V k) items SJ SP NT NF SJ SP NT NF SJ SP NT NF 1 33 48 32 56 31 52 24 43 29 49 21 31 2 31 46 32 49 31 45 27 34 25 42 21 26 3 30 41 34 44 29 39 28 29 23 37 25 21 4 29 39 35 38 27 38 25 27 21 35 24 20 5 29 40 32 37 25 37 24 24 20 35 25 19 6 28 39 33 35 23 38 22 22 18 34 28 16 7 27 38 33 32 22 37 20 20 19 33 27 15 8 27 38 33 30 21 36 19 19 18 33 26 15 9 26 37 32 29 19 35 18 18 21 32 27 15 10 26 35 32 28 18 33 15 18 16 29 16 13 (b) Table 5.2 Comparison of “popularity similarity” measures by accuracy of recommendation (2,000 art images, 20 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms. 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Accuracy of Recommendation, % n items e c + S i m ( V c, V * ) e c S i m ( V c, V * ) S i m ( V e, V i ) 1 38 37 36 2 31 34 30 3 25 30 25 4 21 27 22 5 19 24 19 6 18 23 17 7 17 21 16 8 16 20 15 9 16 18 15 10 14 17 14 (c) Top Accuracy of Recommendation, % n < ? C + S i m ( V c, V k) e c S i m ( V c, V * ) S i m ( V c items SJ SP NT NF SJ SP NT NF SJ SP NT NF 1 33 34 35 21 30 46 27 30 32 35 27 25 2 27 28 25 24 26 39 20 27 28 26 22 22 3 22 23 20 17 25 34 20 26 23 22 21 20 4 22 20 21 15 24 30 23 33 20 19 18 19 5 20 15 18 12 21 27 24 35 19 16 17 16 6 20 14 19 8 21 26 33 35 17 9 16 14 7 18 15 17 12 21 23 35 38 16 8 16 14 8 19 14 15 12 20 21 31 36 16 6 21 11 9 20 13 13 12 19 20 28 40 17 1 19 9 10 15 13 13 11 15 17 14 18 15 1 13 8 (d) Table 5.2 (continued) Comparison of “popularity similarity” measures by accuracy of recommendation (2,000 art images, 20 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Accuracy of Recommendation, % n items e c + Sim(Vc, V k ) e c S i m ( V c , V k) Sim(Vc, V k ) 1 60 0 0 2 51 0 0 3 46 0 0 4 42 0 0 5 40 0 0 6 37 0 0 7 36 0 0 8 34 0 0 9 33 0 0 10 31 0 0 (a) Top Accuracy of Recommendation, % n e c + S i m ( V c, Vk) e c S i m ( V c , V k ) Sim(Vc-,vk ) items SJ SP NT NF SJ SP NT NF SJ SP NT NF 1 60 70 57 53 49 56 41 53 46 55 48 47 2 51 60 45 51 42 61 32 51 38 44 39 41 3 47 54 40 44 50 53 28 42 49 35 47 39 4 44 47 37 41 44 45 39 41 45 33 44 36 5 42 43 36 40 41 42 37 40 42 29 40 33 6 40 39 35 38 37 38 36 37 41 27 38 31 7 38 38 34 36 35 35 35 36 40 28 38 29 8 36 35 32 35 33 34 34 33 38 30 35 28 9 34 34 31 34 38 32 33 32 37 30 34 25 10 33 32 30 32 29 28 26 31 36 24 28 29 (b) Table 5.3 Comparison of “popularity similarity” measures by accuracy of recommendation (100 art images, 4 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms. 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Top Accuracy of Recommendation, % n items e c + S i m ( V c , V k) e c S i m ( V c , V k) S i m ( V c , V k ) 1 90 65 54 2 84 57 45 3 81 47 38 4 74 45 31 5 68 38 25 6 73 37 21 7 66 32 18 8 58 28 15 9 52 25 13 10 43 22 14 (c) Top Accuracy of Recommendation, % n e c + S i m ( V c, V k ] ) e c S i m ( V c , V k) S i m ( V c■ ,V k) items SJ SP NT NF SJ SP NT NF SJ SP NT NF 1 89 92 84 80 64 51 53 53 43 48 54 45 2 85 86 79 74 65 56 53 43 48 34 52 41 3 80 80 76 63 60 50 38 41 40 29 43 41 4 74 71 69 63 66 53 40 33 46 30 47 35 5 68 64 64 57 55 48 33 30 37 33 39 29 6 74 70 78 71 50 45 33 29 32 28 33 24 7 67 61 71 68 44 38 32 25 27 24 29 22 8 60 56 64 60 38 36 28 22 24 17 25 19 9 54 50 58 53 34 32 25 19 21 15 22 17 10 44 41 43 39 29 26 23 17 13 16 20 17 (d) Table 5.3 (continued) Comparison of “popularity similarity” measures by accuracy of recommendation (100 art images, 4 tests): addition equation, multiplication equation, and cosine similarity measure, (a) Users with unknown temperament and interest, (b) Given the user temperament, (c) Given the user interest key terms, (d) Given the user temperament and the user interest key terms. 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Conclusions The accuracy of an information recommendation system may be significantly improved by employing user temperament for filtering and customization - a hypothesis tested by temperament-based filtering model. The results from a combination of simulation and user-studies testing in three application domains indicated that temperament-based filtering method outperforms the content-based filtering method in most of the cases. The information space was categorized in terms of human temperaments and the distance between the user's ranking and the system's ranking of the same documents may be decreased. A well defined “popularity similarity” measure is crucial for an appropriate classification of new information units. Two possible expressions in addition to the cosine similarity measure were examined. The rationality of the proposed “popularity similarity” measure was confirmed by the experiments. The segmentation and classification mechanism noticeably enhanced the quality of specific search as well as serendipitous search by providing better conformed information units as the adaptive optimal predictions adjusted to reflect the dominant roles of both key terms and 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. temperaments in the recommendation process. In addition, the system effectiveness was improved by heuristically searching the partial structurally classified space that matches user temperament. In contrast, Content-based filtering based only on key terms similarity has limitations in the ability to satisfy the user in recognizing user personalities. The problem with Content-based filtering was the lack of utilizing known categories or predetermined concepts when constructing clusters. The resulting clusters therefore may be meaningless, difficult to describe, and seriously impair the performance. We expect this study to provide a new general approach to customized information selection and delivery by incorporating human factors, particularly human temperament, into the adaptive information recommendation process. This temperament-based filtering method presents an effective framework for the analysis of the inherent interrelated patterns between user temperaments and user interests, the modeling of the internal representations for partitioning the diverse information space into meaningful segments, and the construction of user profiles. The mechanism for the inference of classifying new information units into the temperament-based information space is also proposed. This classification is built upon known concepts of human factors to simplify system configuration and implementation. In addition, the technique for the optimization of exploring information units by considering the critical dimension of human temperament provides a basis for further exploiting predefined concepts of other human factors or environment properties in the services of the computing systems. The accuracy of 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the information system may be substantially increased, the management of the conceptual cohesiveness between the disordered information space presented by a computing system and the user’s recognition of the real world may be enhanced, and the goal to better satisfy the user may be achieved. While the experimental results are positive, the filtering algorithms focus on temperament, not with other characteristics of human factors, such as gender, age, education level, experience with system, user demographics, and different cultural features. These problems will be the subject of future research. 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References [1] Aalbersberg, Usbrand Jan. Incremental Relevance Feedback. In P r o c e e d i n g s o f t h e A C M S I G I R S p e c i a l I n t e r e s t G r o u p o n I n f o r m a t i o n R e t r i e v a l C o n f e r e n c e , pages 11-22, Copenhagen, Denmark, 1992. [2] Arens, Yigal, Chun-Nan Hsu, and Craig A. Knoblock. Query Processing in the SIMS Information Mediator. In P r o c e e d i n g s o f t h e A R P A / R o m e L a b o r a t o r y K n o w l e d g e - B a s e d P l a n n i n g a n d S c h e d u l i n g I n i t i a t i v e W o r k s h o p , 1996 and in Huhns, Michael N. and Munindar P. Singh, editors, R e a d i n g s i n A g e n t s . Morgan Kaufmann Publishers, San Francisco, CA, 1998 [3] Armstrong, R., D. Freitag, T. Joachims, and T. Mitchell. WebWatcher: A learning apprentice for the World Wide Web. In P r o c e e d i n g s o f t h e A A A I S p r i n g S y m p o s i u m o n I n f o r m a t i o n G a t h e r i n g f r o m H e t e r o g e n e o u s , D i s t r i b u t e d R e s o u r c e s , Stanford, CA, March 1995. [4] Balabanovic, Marko. An Adaptive Web Page Recommendation Service. In P r o c e e d i n g s o f t h e F i r s t A C M A G E N T S I n t e r n a t i o n a l C o n f e r e n c e o n A u t o n o m o u s A g e n t s , Marina Del Rey, CA, February 1997. [5] Baija, Luisa M., Tore Bratvold, Jussi Myllymaki, and Gabriele Sonnenberger. Informia: a Mediator for Integrated Access to Heterogeneous Information Sources. In P r o c e e d i n g s o f t h e 1 9 9 8 A C M C I K M I n t e r n a t i o n a l C o n f e r e n c e o n I n f o r m a t i o n a n d K n o w l e d g e M a n a g e m e n t , Bethesda, MD, November 1998. [6] Baron, Renee. W h a t T y p e A m I ? . Penguin Books, New York, NY, 1998. [7] Barrett, Rob, Paul P. Maglio, and Daniel C. Kellem. WBI: A Confederation of Agents that Personalize the Web. In P r o c e e d i n g s o f t h e F i r s t A C M A G E N T S I n t e r n a t i o n a l C o n f e r e n c e o n A u t o n o m o u s A g e n t s , Marina Del Rey, CA, February 1997. 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [8] Boll, Susanne. Modular content personalization service architecture for e- commerce applications. In P r o c e e d i n g s o f t h e 4 t h I E E E I n t e r n a t i o n a l W o r k s h o p o n A d v a n c e d I s s u e s o f E - C o m m e r c e a n d W e b - B a s e d I n f o r m a t i o n S y s t e m s , pages 199-206, Newport Beach, CA, June 2002. [9] Brickley, D. and R. Guha. Resource Description Framework (RDF) Schema Specification 1.0. W 3 C C a n d i d a t e R e c o m m e n d a t i o n , World Wide Web Consortium (W3C), URL: http://www.w3.org/TR/2000/CR-rdf-schema- 20000327, March 2000. [10] Chamiak, Eugene. Bayesian Networks without Tears. A I M a g a z i n e , pages 50- 63, Winter, 1991. [11] Catarci, T., D. Nardi, G. Santucci and S. K. Chang. WAG: Web-At-A-Glance. I n t e r n a t i o n a l J o u r n a l o f C o o p e r a t i v e I n f o r m a t i o n S y s t e m s ( I J C I S ) , 7(2&3), June & September 1998. [12] Chen, Liren and Katia Sycara. WebMate: A Personal Agent for Browsing and Searching. In P r o c e e d i n g s o f t h e S e c o n d A C M A G E N T S I n t e r n a t i o n a l C o n f e r e n c e o n A u t o n o m o u s A g e n t s , Minneapolis/St. Paul, MN, May 1998. [13] Chu, Wesley W., Chih-Cheng Hsu, Ion Tim Ieong, and Ricky K. Taira. Content-Based Image Retrieval Using Metadata and Relaxation Techniques. In Sheth, Amith and Wolfgang Klas, editors, M u l t i m e d i a D a t a M a n a g e m e n t U s i n g M e t a d a t a t o I n t e g r a t e a n d A p p l y D i g i t a l M e d i a , pages 149-190. McGraw-Hill, New York, NY, 1998. [14] Davenport, G. and M. Murtaugh. ConText Towards the Evolving Documentary. In P r o c e e d i n g s o f t h e A C M M u l t i m e d i a ‘9 5 C o n f e r e n c e , San Francisco, November 1995. [15] Evans, Joan. T a s t e a n d T e m p e r a m e n t : A B r i e f S t u d y o f P s y c h o l o g i c a l T y p e s i n t h e i r R e l a t i o n t o t h e V i s u a l A r t s , reprint edition. Hyperion Press, 1979. [16] Foltz, Peter W. and Susan T. Dumais. Personalized Information Delivery: An Analysis of Information Filtering Methods. C o m m u n i c a t i o n s o f t h e A C M , 35(12), December 1992. [17] Frakes, William B. and Ricardo Baeza-Yates, editors. I n f o r m a t i o n R e t r i e v a l . Prentice Hall, Upper Saddle River, NJ, 1992. 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [18] Goldberg, David, David Nichols, Brian M. Oki, and Douglas Terry. Using Collaborative Filtering to Weave an Information Tapestry. C o m m u n i c a t i o n s o f t h e A C M , 35(12), December 1992. [19] Goldsmith, M. and M. Wharton. K n o w i n g m e - K n o w i n g y o u . SPCK, London, United Kingdom, 1993. [20] Hammer, Allen L. and Wayne D. Mitchell. The Distribution of MBTI Types in the US by Gender and Ethnic Group. J o u r n a l o f P s y c h o l o g i c a l T y p e , 37:2-15, 1996. [21] Huang, Anita W. and Neel Sundaresan. Aurora: a Conceptual Model for Web- Content Adaptation to Support the Universal Usability of Web-based services. In P r o c e e d i n g s o f t h e A C M C U U C o n f e r e n c e o n U n i v e r s a l U s a b i l i t y , pages 124-131, Arlington, VA, November 2000. [22] Huhns, Michael N. and Munindar P. Singh, editors. R e a d i n g s i n A g e n t s . Morgan Kaufmann Publishers, San Francisco, CA, 1998. [23] Informia (http://informia.ubilab.ch). [24] Joachims, T., D. Freitag, and T. Mitchell. WebWatcher: A Tool Guide for the World Wide Web. in P r o c e e d i n g s o f t h e I n t e r n a t i o n a l J o i n t C o n f e r e n c e o f A r t i f i c i a l I n t e l l i g e n c e ( I J C A I 9 7 ) , Nagoya, Japan, August 1997. [25] Johnsonbaugh, Richard. D i s c r e t e M a t h e m a t i c s , 5th edition. Prentice Hall, Upper Saddle River, NJ, 2000. [26] Jung, Carl. P s y c h o l o g i c a l T y p e s . Harcourt Brace, New York, NY, 1923. [27] Jung, Carl, translated by R. F. C. Hull. T h e S p r i t i n M a n , A r t , a n d L i t e r a t u r e . Princeton University Press, Princeton, NJ, 1971. [28] Keirsey, David and Marilyn Bates. P l e a s e U n d e r s t a n d M e . Prometheus Nemesis books, Del Mar, CA, 1978. [29] Keirsey, David and Ray Choiniere. P l e a s e U n d e r s t a n d M e I I : T e m p e r a m e n t C h a r a c t e r I n t e l l i g e n c e . Prometheus Nemesis books, Del Mar, CA, 1998. 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [30] Kemp, Anthony E. T h e M u s i c a l T e m p e r a m e n t : P s y c h o l o g y a n d P e r s o n a l i t y o f M u s i c i a n s . Oxford University Press, New York, NY, 1996. [31] Klippgen, William, Thomas D. C. Little, Gulrukh Ahanger, and Dinesh Venkatesh. The Use of Metadata for the Rendering of Personalized Video Delivery. In Sheth, Amith and Wolfgang Klas, editors, M u l t i m e d i a D a t a M a n a g e m e n t U s i n g M e t a d a t a t o I n t e g r a t e a n d A p p l y D i g i t a l M e d i a , pages 287- 318. McGraw-Hill, New York, NY, 1998. [32] Klyne, G., F. Reynolds, C. Woodrow, and H. Ohto. Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies. W 3 C W o r k i n g D r a f t 1 5 M a r c h 2 0 0 1 , World Wide Web Consortium (W3C), URL: http://www.w3.org/TR/CCPP-struct-vocab/, March 2001. [33] Lieberman, Henry. Letizia: An Agent that Assists Web Browsing. In P r o c e e d i n g s o f t h e I n t e r n a t i o n a l J o i n t C o n f e r e n c e o f A r t i f i c i a l I n t e l l i g e n c e ( I J C A I 9 5 ) , Montreal, Canada, August 1995. [34] Lin, Cha-Hwa and Dennis McLeod. Temperament-Based Information Filtering: A Human Factors Approach to Information Recommendation. In P r o c e e d i n g s o f t h e I E E E I n t e r n a t i o n a l C o n f e r e n c e o n M u l t i m e d i a & E x p o s i t i o n , New York, NY, July 2000. [35] Lin, Cha-Hwa and Dennis McLeod. Exploiting and Learning Human Temperaments for Customized Information Recommendation. In P r o c e e d i n g s o f t h e 6 t h I A S T E D I n t e r n a t i o n a l C o n f e r e n c e o n I n t e r n e t a n d M u l t i m e d i a S y s t e m s a n d A p p l i c a t i o n s , Kauai, Hawaii, August 2002. [36] McLeod, Dennis and Antonio Si. The Design and Experimental Evaluation of an Information Discovery Mechanism for Networks of Autonomous Database Systems. In P r o c e e d i n g s o f t h e I E E E I n t e r n a t i o n a l C o n f e r e n c e o n D a t a E n g i n e e r i n g , pages 15-24, Taipei, Taiwan, March 1995. [37] Michalski, Ryszard S., Jaime G. Carbonell, and Tom M. Mitchell. M a c h i n e L e a r n i n g - A n A r t i f i c i a l I n t e l l i g e n c e A p p r o a c h . Morgan Kaufmann, San Francisco, CA, 1983. [38] Mitchell, Tom M. M a c h i n e L e a r n i n g . McGraw-Hill, Boston, MA, 1997. [39] MSNBC (http://www.msnbc.com). 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [40] My Yahoo (http://my.yahoo.com). [41] Myers, Isabel. M a n u a l : T h e M y e r s - B r i g g s T y p e I n d i c a t o r . Consulting Psychologists Press, Palo Alto, CA, 1962. [42] Pazzani, M., J. Muramatsu, and L. Billsus. Syskill & Webert: Identifying Interesting Web Sites. In P r o c e e d i n g s o f t h e 1 3 t h A A A I N a t i o n a l C o n f e r e n c e o n A r t i f i c i a l I n t e l l i g e n c e , Portland, OR, August 1996. [43] Porter, Martin. An Algorithm for Suffix Stripping. P r o g r a m 14(3): 130-137, 1980. [44] Resnick, Paul, Neophytos Iacovou, Mitesh Sushak, Peter Bergstrom, and John Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In P r o c e e d i n g s o f t h e A C M C o n f e r e n c e o n C o m p u t e r S u p p o r t e d C o o p e r a t i v e W o r k ( C S C W ) , Chapel Hill, NC, October 1994. [45] Reynolds, F., J. Hjel, S. Dawkins, and S. Singhal. Composite Capability/Preference Profiles (CC/PP): A user side framework for content negotiation. W 3 C N o t e 2 7 J u l y 1 9 9 9 , World Wide Web Consortium (W3C), http://www.w3 .org/TR/NOTE-CCPP/, July 1999. [46] Rich, Charles and Candace L. Sidner. Collagen: When Agents Collaborate with People. In P r o c e e d i n g s o f t h e F i r s t A C M A G E N T S I n t e r n a t i o n a l C o n f e r e n c e o n A u t o n o m o u s A g e n t s , Marina Del Rey, CA, February 1997. [47] Rich, Elaine and Kevin Knight. A r t i f i c i a l I n t e l l i g e n c e , 2nd edition. McGraw- Hill, New York, NY, 1991. [48] Rusalov, V. M. T h e B i o l o g i c a l B a s i s o f I n d i v i d u a l P s y c h o l o g i c a l D i f f e r e n c e s . Nauka, Moscow, 1979. [49] Russel, Stuart J. and Norvigg Peter. A r t i f i c i a l I n t e l l i g e n c e A M o d e m A p p r o a c h . Prentice Hall, Upper Saddle River, NJ, 1995. [50] Salmon, Steven A., Bradley C. Love, and Woo-Kyoung Ahn. Feature Centrality and Conceptual Coherence. C o g n i t i v e S c i e n c e , 22(2): 189-228, 1998. [51] Salton, Gerard and Michael J. McGill. I n t r o d u c t i o n t o M o d e m I n f o r m a t i o n R e t r i e v a l . McGraw-Hill, New York, NY, 1983. 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [52] Salton, Gerard. A u t o m a t i c T e x t P r o c e s s i n g . Addison-Wesley, Reading, MA, 1989. [53] Salton, Gerard and Chris Buckley. Improving Retrieval Performance by Relevance Feedback. J o u r n a l o f t h e A m e r i c a n S o c i e t y f o r I n f o r m a t i o n S c i e n c e , 41(4):288-297,1990. [54] Scheaffer, Richard L., William Mendenhall III, and Lyman Ott. E l e m e n t a r y S u r v e y S a m p l i n g , 5th edition. Duxbury Press, Boston, MA, 1996. [55] Shardanand, Upendra and Pattie Maes. Social Information Filtering: Algorithms for Automating 'Word of Mouth'. In P r o c e e d i n g s o f t h e A C M C H I '9 5 C o n f e r e n c e o n H u m a n F a c t o r s i n C o m p u t i n g S y s t e m s , pages 210-217, Denver, CO, 1995. [56] Shen, Wei-Min. A u t o n o m o u s L e a r n i n g f r o m t h e e n v i r o n m e n t . Computer Science Press, New York, NY, 1994. [57] Sheth, Beerud and Pattie Maes. Evolving Agents For Personalized Information Filtering. In P r o c e e d i n g s o f t h e 9 t h I E E E C o n f e r e n c e o n A r t i f i c i a l I n t e l l i g e n c e f o r A p p l i c a t i o n s , Orlando, FL, 1993. [58] Siegel, Jon. C O R B A - 3 F u n d a m e n t a l s a n d P r o g r a m m i n g , 2nd edition. John Wiley and Sons, Hoboken, NJ, 2000. [59] Simonov, P. V. and P. M. Ershov. T e m p e r a m e n t , C h a r a c t e r , a n d P e r s o n a l i t y : B i o b e h a v i o r a l C o n c e p t s i n S c i e n c e , A r t , a n d S o c i a l P s y c h o l o g y . Gordon and Breach Science Publishers, Philadelphia, PA, 1991. [60] Tenenbaum, Joshua B. Rules and Similarity in Concept Learning. A d v a n c e s i n N e u r a l I n f o r m a t i o n P r o c e s s i n g S y s t e m s , 12:59-65. S. A. Solla, T.K. Leen, K.- R. Muller, editers. MIT Press, Cambridge, MA, 2000. [61] Terveen, Loren, Jessica McMackin, Brian Amento, and Will Hill. Specifying Preferences Based on User History. In P r o c e e d i n g s o f t h e A C M C H I 2 0 0 2 C o n f e r e n c e o n H u m a n F a c t o r s i n C o m p u t i n g S y s t e m s , pages 315-322, Minneapolis, MN, April 2002. 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [62] Walpole, Ronald E., Raymond H. Myers, and Sharon L. Myers. P r o b a b i l i t y a n d S t a t i s t i c s f o r E n g i n e e r s a n d S c i e n t i s t s , 6th edition. Prentice Hall, Upper Saddle River, NJ, 1998. [63] WBI (http://www.almaden.ibm.com/cs/wbi/). [64] WebCrawler (http://webcrawler.com). [65] Wohlstadter, Eric, Stoney Jackson, and Premkumar Devanbu. Generating Wrappers for Command Line Programs: The Cal-Aggie Wrap-O-Matic Project. In P r o c e e d i n g s o f t h e 2 3 r d I n t e r n a t i o n a l C o n f e r e n c e o n S o f t w a r e E n g i n e e r i n g , pages 243-252, Toronto, ON, Canada, May 2001. [66] Wu, Kun-Lung, Charu C. Aggarwal, and Philip S.Yu. Personalization with Dynamic Profiler. In P r o c e e d i n g s o f t h e 3 r d I E E E I n t e r n a t i o n a l W o r k s h o p o n A d v a n c e d I s s u e s o f E - C o m m e r c e a n d W e b - B a s e d I n f o r m a t i o n S y s t e m s ( W E C W I S 2 0 0 1 ) , pages 12-20, San Jose, CA, June 2001. [67] Yahoo (http://www.yahoo.com). [68] Yan, Tak W. and Hector Garcia-Molina. SIFT - A Tool for Wide-Area Information Dissemination. In P r o c e e d i n g s o f U S E N I X W i n t e r 1 9 9 5 T e c h n i c a l C o n f e r e n c e , New Orleans, LA, January 1995. [69] Yochum, Julian A. A High-Speed Text Scanning Algorithm Utilizing Least Frequent Trigraphs. In P r o c e e d i n g s o f t h e I E E E I n t e r n a t i o n a l S y m p o s i u m o n N e w D i r e c t i o n s i n C o m p u t i n g , pages 114-121, Trondheim, Norway, 1985. 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A User interfaces of Experiment #1 on a Document Collection of 2,000 Web Pages Jusi'y* '.*4. ".’'■ .» S l S s s f i S i l « S | l ;$ * '% % » * • & ■ ? ,,* s , ’ s ^ * : S fe B ie Y ew R rw tfta feofe firip E B i’i . v . v . / » ’ ..I. ’. . ‘J * ■ •* « ;■ . •■ *.'.■ ' j ,v » i#]5wp:/'wvf# flsc v » c^x. '!“! ; i- - ’ I . ■ ) '. r r.;: ^ in.vjrjtu'r : ■ ■ ( > .'• ; i mi-rliK. id.- :'n!r<'1 -.- .■ .la.'h i- m 'i=\.K * h.’ .1 i" i” : .-r. ■ ■ ■ ■ ii v ;-i.. r .-i.rr..< nimii'.:M''a -nf. p : ;v.‘i.i|i..'i..in i': I'Mr.''. ih. illBBBllllMIBM^^MlMiM lllliliii^ ^B^B^^^^^M^^KMBMlBlMBp iMIBIilliir l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l " ............................ I.. f You can also browse the site as. ? , gusts t, cbsf.k b & * • m ■ N i i ; •. i . « • . ji'S s*..^ .- :-i.: * i > : ,\-‘V « * •: : I i« • -1 1' i * .• ? » ' s '• . t’. i Figure A.l Home page. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. r m m a m m B m s s a B M s s s s s s s s B K L Y ^ . ' ■ 4itai * J * } $ A*** j # w f .V J jj- J ••■*'. - ■ ' ■/■!. • A H ,;: :iifr W o ,-I d ~ : f ■ dy • : • . ■ < j J*.- -’■ i :i>fr ■ ■ •J* ' sr*,r * s s •,sr»o jo assess Jrc Inter-relafonship betweenusrtoirgiefwienta told U aer Interests. Th* user safest favorite articles frem toe satasase and tsees stew in a personal article library. A user may ibgin as a new user, a rr-gsrsiid -,>ir -ir> 3..ii! f newo- rjds'.-to;. » ;i :,e provided a m et profile, wtfieh sii( record usertemp«sment.arid V = •;' -art, -f u*t>. -isrsiia v>ei iirr-s '!(« t,Cf s jce-a A i-i .'er*? a; ; ■ .-afi-iUi 'isor ;■ :? f,-- iesean:>i ina.ysii: - guest may s!*" prtt do info-n men to toe rosce'tr. hy pro- c;nj psisoral temperament and eefiecting the favorite atopies. However. She guest profile only keeps track of the currant session and no user History tan be applied tor tonft-term modeling "•is in;yofia ryAi* *i! . c 'kw; tte u .e.'i, ciioost hm or H temperament code from »list (SJ-Sensing, Judjgng; SP -Sensing, Perceiving; NT ■ iN li jiang, Thinking; and S IP - ififiuitoig, Feelrng) if the user Is a (fat tim e user, otherwise toe system w ifi retrieve the iempi rare* h-cenorr to- us:r cro% iieanrtote :ne system sd- provide optional links, such as Temperament gt fte footnote and ittetoiy VourTsmperament an the finked pagt, tongresy's ISI®W S!rtS?f*».*t WdfMto* oussSorinaire te rS pirtg people to tdeht^r their innate personafity temperaments. Filling out fhe qttesfionnalre and sficHng on the Store OueEttomairt button at toe end of tot questionnaire, the laermayfinid our personal tomperamehtli Guardian: SJ; Artisan: SP; Iseatei: NT; or «rror.!f (4 fine-si.se togs tot syiiaw s» il nirseii tor • ; ,- L > .,1 1 . , . * l w ,-i.a . , ,-m ,, * n ( > < > i a • SfarCnS&Gs w rcut®y6 Ine SraCifiS in uec QeKf&SSS. KJarwlestofsgfo! S » > e to !K .« fJ a w 9 * a « The resuSfrorri this stuSywillbe used to tram toe system to learn die sfi.K iS'M dicr aw.em vJlT .ko- ':inp!t'.-Tlhe;jil.sy of toe services altered fcythe compufirtg systems end achieve fhe gca. to c-tfu- eebih tte < :.t- S s e ( S m • » ! » I Ss&toAms ( P n m a U to s e a e * ; »w Figure A.2 Introduction to the Web site. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hite Kd4 p a w rfic i I cm d r& ± I V k ^ ,J J j j * § ; & * ' } ’ ^ J ftv -'rte '$)hd< ').y .’ p t ^ t g r ^ A sJresa ' h i a*r, M v V-!::c^ii- ic A o .i.r '=.* ■ .?=■’ :■ i.v ■ ■ m 7 . ■ • - r.. ..5 ; Ji. .■ • .-.-n : ■ > ■irt!iPi®iiliii^!plpH»lSiftifllliSI8i ; ' - . * '• ii » ( ,_ I' rl l‘‘ - ii • ! ] if i jjg s! ■ • ■ f i t : !r iitiH H r, T'W il III.' , | ;!»'•• ■“ ■ “' tv) . iN iiit'u ; ■ r v im * H ou w n itfcnlify jm * tcmpciamoat Iw « < '.fear j Cm’ • ■ . . ■ ■ i : n . i h i ■* a s .- :> ■ fx . - - .iv . : w . - I: , : . . ■ r „ .d . : = , ; ...■ • t -u I ■ ■ • :i. ii . . . . * b fe w Figure A.3 User registration. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B fM m m n s m m a g m s m & z v & i& r ' ' - •" " ^ Bite Ka* V.«w J * m &4> ■ '‘-'vi' v .J.SiV. > . jiW 'j » s * jiifociy , 'Sv . j ^ A&jtesa 'ItioptW v" - ; ■ ■ . 'V :ii‘-iv yP ''V p;h-f< up/:‘i)it'>-o=tei/ .i;>e: teoi " | r :0 * U you faiisot *eia pass-wiiid, u f c i 's j ; * lifts; «.you> fisyf terirs !o tte yjob iite,otps ,> v 5 Y < > u can also te)»ss the site as a guest drk. « .■ { ■ : --.;■ . i f\. ■ ' ■ ; N o.U -igv :ii ayu ( I’O S'O C ii i ... .;' ■ ; ■ ■ - ‘ ‘ . : . Figure A.4 User authentication. i: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ihte Eda View Iftvotfcs loob Heip. v ^ ) x 5 i 4 f * s « * r i y * ^ H js o iv . \v ^ g s ? • A u k ’ s s s h t& i C b e d : '-V:.v v i.-r-.l ' ■ ■ -.1.. •• . .■.!.!: ,-. .3 ii:P>i if '. iMiisi ps-s! Ot*1; . ! < r .,- . i N .:^ ■ ; - .-i I P . ■ 1 ' ; Iir-.(,-.:^ .-itJL ■ ! ..■ ■ .:! Figure A.5 Password checkup. ^ l a t e n w i Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hite H cfct Vfcw PaAintes iisfe ^ Ji ■ ) ,$ i& .i& ic): 'ji ^ V ^ G t o . ^ I i i i i - r f > » ' / y w vjt v A J d te is f e . I r f t f i r w w w > v > ifc.i.- C :W .i: i" .,- . . -.v','. ■ . ■ - .■ ■ . . ( ■ ■ : ■ : .r - ■' 'i l ' '-■ ■ !• ;Hi ■ 'u -i 1:1 ‘ j ' t f T -tJ iijir .n ■ . I t e l ii> ■ ; ■ T4f tiir if if ii- r r • - f o r [gglpssii p i& iS Iifillg tiiliiililttp S H M M H iii^ M ilil^ * !^ * ^ * * ! ‘ » ■ :.-j 1 ■ ■ . ■..: :a j.; 1 i ? : ? ; * ? ; : ;:; r ; :;:s ;:;:$ j ? j s f ; k ; ; ; ; ^ ; ; ? ; : ; . ; # ; ! ® ; ; ; ^ :i-.T '! ' K.IV JS:. 1 i --I-:» -A ■ !- i l! . ] v.V K -.illfJ.fl i ; > V , « < • h f e - S S l Figure A.6 Guest login. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. J j g g j g B M S jM K k * H d ir V t*« 3 ? a « in !« s lads ifols ,^ r - = .: S ■ * -„ * , J ) t -jfrmtf itinir-7 J< * J A d o r n s b i i u ; s s g B « j i ' i ) X • ! \ ¥ i ? l : ' . m . n L s u n •liti • • iA -• & * > & | t ' t e a H i T V • » ■ V - x i u > y si.........ill ......nisi....... l i S i i l i i l i i i i i i i i i i i i i i i i i i i i i i ...ins........ ......M l....... ■ n \ llfllK lpB IB fl X ' - i : < > ) i S i ! A f t l l l l l l l l l l l l l l l B B I l ^ M l l l l l l l l l l l l l l l l l l i ) M I I H lM iiH l liiS iill ll i ll l li l li i ll ll l ll 1 ..Ilillll.1 ...1 .. I X H V j i ii- ; X i . x w m m m m h m h l / i ' V - i i ' - . ! ! ! ; i M j ; ■ ' ? » i > » S V S j l l l l l l l j l i l l l l 8 l i l l l l l ) l i l l l l l l l l l l l l l l l , v mmmm m ism m fiiSa Figure A.7 Search Tool. <0 im -s'rtsl Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. v - > 'i * ,J '] $ $ }lt< fri-y - V * j . v * A ii?£ is 3;M ^'i^->^pp<i5ie4,«rixV fcw/Uf-- j> i& /ri^;e’ . ,P<^i'^.ia*iyU:\^&?'(-rc» Vliujsjia jiillPiifiS^ f ClMr \n lb? <toei&tt!i’ + to nii^k the stick s jfoi * * Js s 1 O M a n d ? S u b m it button K :f ws*tnd^he >oa 1 i l l 1 1 1 1 I . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I . . . . . . . I ‘ 1 -~ T ^ r ^ u i s * H e a t X h^ssf H « iih 4 D a # 5 ttm m * r : m , W i'« W W 5 « r f t* ? ^ Ah(V SC^FflS ‘ lit v ffiM T o n # rn o \ rn m O n M afrtfrsm * i Aid s to t i y f e f w o a i s i i ............. v . t " V £ W te f c Is Am1 * * &n$wef.\ipt Ms AAFOKP^ADeiA i 2 J r pjSR fa G « ? Ihhw R «% t«n ....... . * Aa*th T :. ir-M -itf. I.h*-? rir^.w bii Win: t A & A K 3 H A M . W LLA M g :. ' R * 0 . H * t t i i V i .l o .] f . c 'j J s *J,- !,)? :-.ii3 e ;y o s T c* U AA&CWPOOEtA , ; ■ ■'.-«» i.s u .rfh i-. .,i. ti> £4ivk.*.,F iog-T H ? M i AWB£tf = V - r ^ . - . L - -:.-. ^-i--1 --: 1' i-i-je* '4<i< ; ' - s W mu i f m m . KBJ..Y x&li&SH ................................. M s S W f W W T O D V . Figure A.8 Selection of the interested articles. Gfc» Purr - < & hfcnet Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. l*ifc Eift Vssw PdVvntrs X oijis? H -^Ip 4 - < f k = k J j ^ " r M , -\ ^ K - i'je v V * ^ A id le s s > u a-p 7 Arti» b!L ' i? 4 ^ j ♦ CRN RSS€AR€Hi - ~ > seW S CBffTERS. 3a»rch WuiiWeb : - ■ i h s r v R ^ r u ' * t> * itnstbr r,) > wjwW KiaHBMM j i i i j i j i j i j i ^ V ittSk"xfi SVMrVl ^f^oie/S • * ■ D a i l y N s w a > A n c? g $ S % m CvberSource Goes Linux * E-coonier.ee- se rv ice s ready f o r Red.Hat.» V A ., o th e rs By Amv R ogers. Computer Resetier N ew s San- Jose, Calif. ■ 2:331PM ESTTues,, Feb, 01, .2G0Q CyberS’mwce Corp. Tuesday saidiis e-. commerce transaction servioes wiU now be available for Linux The company is offenng a development kit for Linux, as well as Commerce Components for Red HaiLiniix 5 1,5.2- and 6.1, Caldefa Systems Inc. OperiLinnx 2,3 and OpenLitwx eServer2.3, VAIJnuxand SuSe releases, officials said, CyberSource services will also be available for Intersbop Coramunications Inc's ComrnerceRaQ onRedHat's Linux 6.0, developed in tandem with Cobalt Networks. Hi Ik Back a ii lira KfiKMng) iX »)>J<w ta > 4 w fto kSKi'feoosBf*EApKnl'n®,ASK »if Ji«i«sief'.+,t)tAaimpfiEt( i •aJil C vbeiScm rce foi fcCDJUDl O fferings m s m m m | I II' I P & IS T * (£ S ? \ i p m m $ m i I c * m m t ■: I f j 0- i' * > ? .$ &&■'* 1 ! j K | j * <?K B',V S’ .'4R r‘ d-'fM ! I I *$*SP8CIS*!& " § § i | ' K 'l j ’ * « IW U M S tf. t I t t W H S O U H O ■ . ■ . . . i ! SO SJJT lQ tt i Figure A.9 An original article linked. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. IjtS1 H t : \tk?r F A 'tlll^ iiQiiis BsIb ^VrJk •* ■ 'Z * ,J 1 ^ i. ‘Pa^M-ses .J rh r c v - r} - ® - ■ § .] attp-:^-a.^on‘ 4 .* • s<h; Hi kti’ em /U d o . cgi , i’ u..' V - V - u n :.- ' -■==■: ■ ■ -■ .-. -. ■ ■ v • . j i C lk ii^ rifed rffeseib rrraii owe s s r i r e d C s r > i 5 j f'lfck, on ifey Rem ove k aftr. to pwrows *rfflfced iten» fcar you’ <»Spct«??s ' f < i i 'r c | y x a jw r^ y th e ite trk v?x?. f e iJ b e lo v c ) i n ........... j ..... .......I......... ... .....I ........ j... ........ ........... I.% .... ..... ...j * j I I ........ j..j t ........ s £*i HM'ARD c 'MO! 12B4 * ' ” ” ’ ’" j j .... j.. ......... . . . . . . . . . . . . % . . . . . . . . . . . . . . . . . . . . . . . j . . . . j . . I . . . . . . * . . . . . . . . . . . . . . . . » .■ .: o* '? ■ ■ r-f ^ vfB ■ = ".*» Fi: iW c ’ -.5 'AMl C i*-.-«it 1 £ d AlSHAM fM sZ i& 'k ................. .......... R*«-• H it i Tj, l|,f:u ;li ilA ‘l - r ^ ? m AAKON&iCAEH?l> .. V C c -.-ii.rtM v -!,,'- »**»•> S.- - ^ if , J f*s--1SAI-* i if, r U hux S to rk * S u rg e In Mt**d M arket ; Z d S A N T T * & m w * i m >' lfo * o r * \ 4 9 $ * z U m ix -ter ctevM ss j j>i s ^ m * j M » ........ ! sim fc W m ^ X A M b j nd «B.Vyt«KnW&Sf0iflWJC!,:‘'r i l . \ 'W j r t « ;,,iG iH *3 i f f ................................ ...................................................... ; m U M auft. ;M'-S 91/11 m P U . i OfttMSKi. TOP ip s ip s is iiiis iH s s iB iis iB M S M is s ii^ ^ ii^ ii^ ^ K i^ iis N iiiis s w i Figure A. 10 Remove of the unwanted articles. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. & S i i t i ftw rite i . L o t ; * H e l p A i l i r e s * ® J h J 5 . i V » ^ f ^ c i i 8 r t T j . ! i 1 2 - j c / - f c t n & . ‘ ! « i i U £ i * J 4 k ilSglllll V iM ir I a: i§M T» D <m m Unit'* ®« Msinfrsni# ' ; j B J W A iS Q P M < A m : - M ' . 1 » 5a*m T sifcv sfss Uaas fvm £ S3 SH SsSsS. j A3H*W WUWrtS ; £j Rssj Hart Ljptn T« include Cfl ilanagsm srrt Tools. ! w o N w > i% iA .................... " - ; i i?} lim a® «kK { $usg% In SfiMd fcfesiurt ' j ! mtai Sr*TOo> ' I ! » , L sm so r t t e s s e s U m w tor t f e t H c a , ' I !5TFPH!-f-/»iASsWAMn ! i r w m w a i 'n M i y j p w 'M i i P 3 2 .F « * W 8 i« .tt( M — M p — ■ — ,: j r.N t r in m aty. atom, km '■:.!.!-H*’ - ■ ■ ; s -:{ :c ='. - i -.* < ■ sL-.rc I-I 1 1 .V.!. :: -f.v | RAH£W !A OApSTO .................... ...... I m w m jiaB cw ^ 7 - ~ ~ ~ M - -* ■ j S -.O T T 3 tM N M U A N D J0 i# (S fcrCMCiHT I M W K W M K O M } I s» Red Ha? Joins ryajhj? * n * a d g « l Vlnitf ■ ' ra rv tN j \ w g k w m C H olb SIlIiMlilSlliilBiSIKlSlBSiMIHISllli^MIISlSSIllIlllliBIMfBfiillfflSiiMSllSlf Figure A .ll Partial user favorites at a login session. 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. da/ < )uV rt4 J,*M ;nfo k b ti',1 tk aj c*i ' i v u r l f.o r v i'-.-.. ..." ■ ' ■ 33 S M €$«£**** R s s th S ts A s s e s s W A N ,tr s e t v j « . y t s Srts Ad*5)intefratw« v t « p^rtor.m an^e 1h ff«u ah y ? * t s ’ fcy«* tia m m m ^ m im m m im ia m m ^ im a ^ ^ ^ m m m ^ i ; * K ta H ig M k * l*r*vMte S a ltw a r * F«sr T y in g T o f itf h e r £ 4 far& H ip ;4 f *< s j . f|M v*4l SON I 4 j AT&T T&r0ftin&A$t*t*K g f jte fp jfc e * W w H o s i e r S « t Vfcffv I ,a k f o m ^ rn n : mVM WACfCP. njjr.H O O tfF * AND DAyK) DftJCPT-ft I 4 < ? Wfeb f f& m fa r # C ro u p U rv«K o XHTML I m g f& jciM . ? k h & & I JLRRULD M O M X H T W i iM w cu r,ra:o rffff& - i-;u ..:= V J. v ' 1 .;r: ,■: N- '.: ■ ’ . -;.«•! .-;i-.v -• n u t. i ■ : • ; > ■ ..,' ^■-■••i 1 .. «i>: K vw .*:?■:..■ h., j JOHNS srottN&'a.fflNHM.*/i& ! fi VSn ? ® > G - Smsftmoss HANKAI VhriWPHRy. KEVK YOWItl « * 3 tAWESUM SlUHltVAtfl. M C V.-M .K i ACS I S lA M ^ iO lsi ...................' ............ ........................... (BA WWRtO PC vv ffk q h m & .. /-.X ? : m > . » < w .< i : > w is ri.fc iM i oi'^iir.'.I ; > '. ! ■ « ■ i PC WEtfc 8 W f i so M ctM o « :.V W n 2 * < w l l j d s y i j e s j i t **SI*!«y w e rjd j iMKEfiSTf'fSP, r-r,VWE*M)Nt»® Figure A.12 Partial user favorites from a user profile. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix B User Interfaces of Experiment #2 on an Art Image Collection of 2,000 Pictures Rvoiiiss jShti'* ■ ■ i / W ek'O iiW " lo A < " ^onu! a T h :' i- . ip; iirli/'if.il !tiV i'.-:' Si i.!v ■ ■■rt./fa."-.. ;'i. > 'i h i-■ ■ -m-l- i •" ! l hi: ■ • ■ ■ ■ ■ in* >;< ■ '■ ■ ■ ■ n .r m r n'.ili..: ' i 1 - ■ . ' ..ii- -i= r :p i .:i . i i’lis -:;f■ ■im.i. -i. ■ ii:; pi ”-.u !• '- ifn- - ii 1 { : K i:-; v tr. [. ri.. jv - . ■ m il; ‘K ftissy a * I ijS H s m s tB tb e * e b s if o ic ii& ! $ ? ‘ .i'.i. HV iii? 'M t-'id • : in-- ■ 1 s Yfiu ta n aha V a w se Pie sSf, as i Rises!, d i r t ■ c < > * : v I « i * * ! I' ,»",iV ',"'.ll . ’* •; iV rf'li .} 1 ’ l . I. . • . } ‘ 1 ■ ■ ■ Figure B.l Home page. * \ v \ t V* S iic 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ;.......... ffr V f c s hstow ”.> & * -X ■ J J '1 ft* ! ' j'* ® i* s j>i* 3 .V , # 3 ' J M e , m * » f e r * A . F p j* * * t/W q r t « * I - a t• = « . '» . n t ■ * •*,i^K ' ■'s, :;l The study csnaucted w this » ® s8e is designed i# ass* ss the inttrrefsttortatsp between user temperaments and usw isteeets. The user BBSrsitectfaeorite art iirsgesfroai tfte tiatebasi stm! keep them in a personal art safer? A titer way to p i as anew user, a registered user, or s guest. A new or wgistereduasrvsill he ; ■ ■ avned s uit■ srortis .trim iuJ le-sui L .se: t-mte -ar,«r; an; keep treck.pf user interests overtime. This types* users wUpertir 1 estvaiaafeie resource for w seassj analysis. Asuestmaysts® provide irrtbnnatipft to the research by providing personai temperament and ccfeeang thefteortte an triages. However,the guest profbe only keeps trap of the cwrentsession and m liter history ran o * .rppiir a f j 1 . uihg-o^v. M odeling 1 t? m » A rt W orld siAtemwI prompt the user to Bftooat til* or her t.Tipera,:i«!'.icoeafra"!listS,:;siru 8F■ Se;;i.r:j | Perahfing, M T • Ittoitmg, Thinking; m m - iN tuiting, Feeling} if : the user is iltrstdme user, etfterwltelhe system*# retrieve the | ’ ternperimerd type t^ tr w user profile, bteamdtde, the system w i8 ‘ provide opSenei links, such as Temperament at&e footnote and 'deputy Your Ter.iperamer.'on Sali.itort page te ;.e it t'i { Tempersneni Sorter J, an o rt-lln'e ijueetoWnr far heipfejj people i ® iderrtify tharrir.ftEte personality temperamems Filling out the questionnaire and eU cking on the Scare ChiesSorrnare button at < ► ■ ■ enc ■ ! > Pie Pwsf'inria rt the nee: iia, ? * ■ « -wl psisoni' i temperament is Guardian: 8J; A ctttark : 8P; fdeefisc Iff; or ; R«8or»l:Nf. A fter au*erlogsh,Sies¥aera!*#preseM the user* search too! ic rsrAvi the a-'- images m the os-abese s ; | | | | | | | | | ; | | | | | | | ; | | | | | | | | | | | | | | | | | | | | | ; | | ; | i ; ; | ; | | | | ; | | | | | | ; | | | | | | ; | | | | | | | | | | | | | | | | | | | | ; | | | | | | | | | | | | | | | | | | | | | W w h ld w a tiw a r j i i j i i i i i i i i j i i l i i i i i i i i i E j StifesBiam «t }f w r n v M ! i f f e i d s T r i a i ? » # * « ■ » "fie iis.j,! f-or. ih:i study i»;t be _s«o to tain Sis system i; ioarn ! dmtwrteeptpfhumantempeimert^wMehinayimprpvefheeiua^ty. i ofbw services ottered by the ttrt^uttng systems and achieve the oca! tc O erter vaasly the user ihietA <e«w iW ^*w tfiw i*B aiM W ifB «R Figure B.2 Introduction to the Web site. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. iifc Edtf Favorite* Voflis n > b U fcV -fe v 3 jf Js.^f'-h 1 & H fcso iy ■ .;,j’ v .Jf v Audr:-4S ^mtj^/wtpv* r a & .’ i: '- . .!}■)*'.1 , in I.-. .-:V |. ■ .! '■'• v.-.- •;-=■ ■ : IlSISllpB^HBBlilBSBBftSlSSSIIIIillSlilill^BIIiBSll^^^^BllBl > " V i ■ ' i!-:U iifi!!.'. t l.iiii in , j ,.. , (.»»• r i !1 : i!% i!ii!,', I'■ ■ ]«!.' J 5 . - ¥ < s i can aha tm & * ihe % 25 a g j « -,sJ .... m1 - ' • ■ : . : = ■ ■ ■ - ■ ■ ' • • ;- . i - ■ . " ■ - • ■ ! ! : ;=.k- ii!, ...I: l l i l i i i l l I t l l IS® ii^gSWy: Figure B.3 User registration. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. h 't l ? - Y » w j- ^ c u f e s 'jjy& k H e f t •» / I Jjl il-Jffif* If ^JfsverV'A ^ ^ ^ a d n iiS - f e . 1 w r h 'i x l B^IBp^^MBlMMWl^BB^^BIBlBsii8(P||jiji|ifllitlll^BBIB ’ II y o u fctfifli tijjjr p9Es*3;<i, r !y ;i I g jg : * H t e it your first-fim® to Ite «k * sfe. eft * Vp, * Y o u c s r . abo brow se ih s site ?-s ? g u e s i , d e i i*»* • ! ■ - .1 !:<;.- ., <M h ..7.i-i>::-> l j-.^s I ! I.. :.. i = = ,' ■ ? ■ ■ I \ I in I in in j . Figure B.4 User authentication. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B k Eos Sew KsverSss loois ft'fe » -• J J 4l »|Sbi.* .'iHsvaitei '#Hito.-s . V , i ®f * ‘ V Q lV .rii -£! -5 -' ^ Cn-:' . * -■ v .-ri.i :" ■ -•-. . ,-| - . ’li; I...: i ii i V . i r v . " ) " - :>-.'........- ..'.Hi.1 ;>7-. I - r ? ■ - Figure B.5 Password checkup. & L x a sri# 1 ) ; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ififc 'ja o k I h jp v-»ljat>. ■ » i j I . M ^vc-ntv > ,y V J ST i ~ .s i > l m » 5 ris^M f a c - i i B l i ^ ............................................................. 1 i i i i l i s i i i i i i i i i i i i i i i i i i i i i i i i i s i i i i i i i i i i i i i i i i i B l s i i ^ i l i s i l i i i i i i i i i p l i i l i i i i i i i i i i i i i i S i i i i i i i i i i i i i i i J t i i i t i i s t i i i l iliiiljiiiiiiiiiisiiiiiiiilliiiiiiiiiiiiiiiiiiiiB iiliS B iiifiillilliiiillijiiiilillliiiiliiiiiiiiiiiisiiiiiiiiiiiiiiiiifiiiiiii ' iif l l - l W l f i P . T itu s U ! ■ ■ si!ii!ii!liiii!iiiiii|iiii!ii[iiiiiiiiii!iiiilsiiiiliii!S ilS lil|(B iiliililiB ii!iB iiiiiii|iiii(iiiiiiiiiiiiiiii!iiiiil ' s c . "D . ■-■ 'I '. - ..' : i N-!Jt iff-.-. ,i- v i f ■ I!-...*.. JPIPPili P IIIIP : § I SIS » * p II IIP i l l i l l P PI P IP P H I* Pit I P * P S P II # j ' ‘ ' • hlemt Figure B.6 Guest login. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. h r J r K r i H V jsw H iiV u n K if Loi 'h H e lp ^ r-'fe » v ■ * «J A $ 'j^ry-u-fp^ V -*i A id u > & C f B « v a & / c L O ^ A s : ; $ iB Iilllliiillllllillllllllllllll > 1 .' j m m m m ■ ■■1H Sr£h:i; j t ’taj ) S sm tA i 1 f t 'v A' -.ii> v a ; h .f >!■!:& ..... ..................... n o n H H M M I ■ ■ S H l M M I iiiiiii^ llll^ fc lllllllllllllllllllllliii B s a a S , r % ■ ■ z p w a m m ^ m Sflnvffii p B M i l IIIIIIIIIIIHillSllllllllllllllllllllllllllllllllli . ... S i i l i . . . ....... ... j ..... jV n rft ,r j .... i j i p i i .. .......... ...... M B i i s i g g i ■ jijn i I M * h h H h b h i i t i f i n N l H H i .m i ... ............... ... t i i i ...I.............. B B f l l S ^ B M . ........ .... . .. .. . . . . . . . . . . . . . IllllllifiSillllllllilllllilllilllllllll N M H H i H B p a a f t a m w w m w m m .. ....................... .... Hnsi r ® h * M M p B l l i M l B l ■ ■ H H F sS -± - llllllllllii^ V t m v & i H M B — m ■ i S n i l V M H H H i l y ' r * $ ■ liiiM iiiiiiiiiiiiiiiiiiiiiii B l i l l H B B B B l l ......i n . ........... .... ........ ■ j j g M i l j l i H H H H I iiiiiiiiiBtfSMiiiiiiiiiliiiiiiiiiiiiiiiiiii I S .............. .... .............. .. .. I ! ! l ^ ^ ^ j ! ! ! ! |j s j i |j i i |i | K I K I ^ H i ) ........... . .. .. .. ... . j. ... .... 1 < fctsine Figure B.7 Search Tool. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ' Hsft Vie ' - 1 _ Toots Help lj >Vf » J ] j h v'Xi* } - ,f'fc \ir.y S'* v,.i ,J Acdte»5 €J „ * » & » / ^hshVfa/.-^-t-.'-/,ntns-Wra»wiiis/ ^iyt&.i«‘ '!> 3% % i!3g‘-riH $ S; !.-. . F:.' .if:..*' ! T fe^anfriS ^heckbo^R jJS srl y o u T iB '^ n t^ s j < 'k & y r f 'T t s ! S E ib m it b Jb iii to tvn ts y u ia tav n rtn I I ■ I jap sre s? EndfiC I j w m j M «et,riri.4'. W asiW o B tijp r M oral, i 's u d c SuSj'fti z>M W jferfco B rid st I n T » ro x , JH B •M s ib ' i, Cia!»fe ■ ■ ■ ■ I IKmmIIiIH :| !> . < H WdiKfcoBiiite '■ ■ ' : ■' • s .J. * A M The Iron B?lda.« sA Atgeftteui} M g n d , " 'l a u d s v m BndRe At AracnimL i3W Mumt, Claude M H m ■ M j ‘ A -t iS^/S -■Mi ■ ■ ■ I A + A fi* ■ ■ ill x S w nw i \ IU M I 7 f 1 ■ » - - V * Itw il ..7.J Figure B.8 Selection of the interested pictures. 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. H $ r H d £ V f e * F s v o n t e s Ic o a E d o ■ * * ;} ’ m J / t e . it . i * t ‘ -vviifri _$4i*yviy t -V* J $ * i \.T r .“ -'.'n;* /. C lr-k w $ £ • chsrfctnfcrt f e > rn*ii f c s m s - • {'Vr.i * .» ii tlur Kerno ve b > fion & ? *rrviw m atfrd iten*i irj>rn /■** ^?.!fet;iK?«$ I < 2 i t i i n j r T ( i O u r u t l > e i l f c i * Your G&Ifcry SeaF'h i| Ks*mw; ■ I t H I i J a p o v g e ta if e R u ta m y j ..... #***» S t k S u r fllPIIII^^ Wafc? *?o Bndfetf fc Tbs FoSv W ER M o r & f * C l a u d e * ■ $ mm T t» h x > n B i T C & e 3 l I m f f l m ....... r : HrkisS At AtRwte.iii, Monet, Ckiite m m Bnd&e A t A r js r ta s il. IS 74 M w u ttta u d f JL- = ~^«e IMP « I f t t e d i e i Figure B.9 Remove of the unwanted pictures. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ie v _/? hr- .'S* A j i r t i S S t 'O i t $ « J r ^ , e 3 ^ ~ '- h a £ . w . i / 5 s s b i n £ « t e s s j 3 Y ^ u r i 3 a i k r y S > 5 a r c h ■ I feanc-sR B t* te f M n rt; CLvm W g o ln B B s n J s ! ; M a n e t , C k u fc W s t r t * Elides In The f t * . WU3 m m S Bridge ft! AijjertouL iiP 4 M tiiir f X b tif r . m 3 Scons. Sndge, Rouen ..{ , i .:■ . v . B ridge 3! A rfcs J8SS t L e f W j t ^ t e iSfc&l Vanl’O g H , Vjrrrrt C h a n tfttO o ft g r a te ? f in ite * a g C n a k '* * fterfor. Pfcrits AugastK r i H i T w o a n a B rijjtr R i* i, J «£*& f jD o a e # U t t u u * Figure B.10 Partial user favorites at a login session. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f* j-W f ■ * '1 tf; ^Mi?- >.v > ,y* ,Jg . A d ji'X ^ - € .! hfip, J fr',‘ sn'^At:‘ X a s * v hm* g M i l a - 'g a i V r y r g i ' < ■ - . ‘.r i : i - i ; ■ ■ ■ ■ 'J > | S & n 0\£ m M u s r ia r - A j i s & i T T s e G l s a r u ’ js M U t e t , i « d ? i pl'& XM ii T h e A n f c e i i s ’.A ' !•.!;■. s R rifiBdM U . t W atsr fc&:rci riaudr N y f Y i p h r 's s , N tB I '. '. , : I ! : . - W j &tloo B H d ftfi M u n c f r , C f e u c f e , S a m r A o n & f e S e im f tf o r v ^ * 1 & k f e P^itc-raps a O 'vw ny .*:,■ ■ ■ ■ . ‘ i.«: ■ Road Id Uiv»m? in Murrt, ('feuo?. Figure B .ll Partial user favorites from a user profile. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix C User Interfaces of Experiment #3 on an Art Image Collection of 100 Pictures n .h-ss € 3 ilteat gftrt* en W i#R i V 4 .> .U :o = -K - ic A:; Oruci* ■ | hi ■ S' .id . '. p . . ..li'T B ld ll.'R -I i;if i ! ii,‘ pu;p' :i'i V " I i : u U - . . 1 ill;1 . . 'lliV -'l I.-.. l -."Tl-il s'::.'.-’ ,! n s'J. p. r--..-:-sa = i. i.- sP:PFi", ii’i l- ilM . » I . i?ji> :‘i! ■:« l i i i ’i; . . ‘I i.--1 in lil fv r. r j'-l.-pj-. " .wit * i f a f o ie f u s t fin ® Sr. 3 » ‘ * o b s f e cfbdr. .y> 11 : j* * . . * * * . . • ..« « « ■ . ► ■ \ t* •' I k \i s vs •* I * :..v *m! ? .p •: i- I i . « u . Figure C.l Home page. .« * r. 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fk ? 1 0 i t f e f r P & H iin T 0 * J H * v ’Vi - J j 3 si'urr i u * « rtw J A*.I J The tfiidy cenductsdan this web * is dedaned to assess fl» ■ inter-raistionship betivoeti Ksa Id c^ d ^ M rti snd uest interests. The user rosy sstea Itoprif* artfaegds from tile dafatrase and kbep t o ia a personal Mt g tfn y .it user m ay login a* anew ;,eci • ■ r.jj:iis;fa ,'v.i t'jja-.s: .- ■ : SK«:ra;.shs:SiSc‘-K '!> ' » ■ Ovrir.i« liis i prgPh w.tifI- s it .fiei.s ..sir tircpe-Vrrl 5‘-J : k e^ hack of u$»r interests over tim e. TOi type ofussrtwS serve a:- - : vsiaet..: rejoice •«: .esesi'on yis vm » . guest -,s a > ;tj^ provide W fem attD rs to tee research by ptwiddig petsossj tempemwHtandwBectingSwfsvoriteartsmages. However,e» guest profB e only keeps track of the currant session and no user history cat he ajspfed fcr iongtem modeling Tiis A rt Cents' s/iter: w l;i om m *-e aisi to ;h j«s: rys y ,m ti temperament code ham ate {§J - Sensing, judging: S P - Sensing. Pstresitfinaj if^iffiog: IW nfciR gc sm d M F - IN tB sb n g , if the user is a first thus user: otherwise die system w ill retrieve the Imeeioueni type Itu rri Wr urai p-otue Keanvthfc tha cystem ■ » :! provide optional ft**, such as Temperament it the footnote and IrienrTyVni-Tempereneni onft'fmseepagc.lc.A'. .* , ■ > ■ _ . ' sr-s-r,.- ■ - ■ ■ '■ ■ v..- 1■ .anon amQjo=Son,-aii..io-heimra oerplt i to tentSVifwir innate personality temperament*. rtfeg out 8w qutei-onnv; fe'.h nt.k ng on tee Scce OuesKiwe butte" 3 1 the end <dBtegu*sto(wa»t|8i» user m ay Sod out personal temperament is G uarto $J; Artisan: SP: Ideate N T ; er tadani-. V -lA ei s ueh io;t In. #u system wi' oranfl (hr use, r . , search tool to retrieve the art BragesSlt the database. ; pinieisV sssniipw ejnM ii! I l l l l l l l l l l l l l l l l l l l i l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l P l l l l l l l l l l l l l l l l i l i l l l l l l l l l l l ^ ! eitfftiTqmiwiHJteB* Trs r=s.;kf om this study be usm tc t-.'h ie L yste-rfi :c ! is r :'i lono-pr 0 1 hums tif.tiperament. :< hir.f’ -lay imb. cet T h e iveii*. or the iiiv cm otteee by the esnptdnp systems and acnifve tee g c tte b e a tfE W ith * Jter S e v t i S e u w B ifw g m rt * * * Figure C.2 Introduction to the Web site. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ■ ■ ■ ■ ■ i'$ • * : ■ ) ? ■ * ■ • H J ) _ $ dl'fa'li ^ j F 4 w » & « . ^ H t R ^ r y V'V k--; V i\ = ■ . ; .- ‘ .r. . !i ■ ■ . ■ • ■ : - :,-,niV.-.-r:.c#- ! ; - < ■ -. ■ ,- ‘ ‘r- S8IIlliliB|lI|8lliS|illS^Bi^^B!^Bi^BBl > ' ■ • ! s-n-m- i : -, - i srr ti . uji:-- T esi ' 1 i">. it f w m ii \ 1 hi;i! !ij ■ . i i V i " • HI illtotmr 1 e 'ini iJ ‘ " ' * V 0 ;j c^n iflyi hm^:<r yiK as 3 g n e it I N;.« x<-<: i M '-.;.-- ........ l i ■ : j .im a t - 1 i Figure C.3 User registration. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. c c ft s P g v w tte * -fo o ts AilSiess 3 rtt;W fV ifew w tm its:! ’ if *yixjf'i!j»3i y n a taBwoiil erai (w? 1 ’ }ftiis + jew first firoc sa # r w efe ils, cirl -jy * Vouc3 1 abc browse ibr site segued. c^dUpse r .-. I . ' . - : . ' ... ;■ f ■ -, ■ .ssi-i : -,. i . , - . - i , i i '.■!■’;■.■ m >■.. i i ■:•.■«-. tiflg ip is p p !»8W Figure C.4 User authentication. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I I Fife f r a yiw > JV oflfci. lo o fc -ie b • i- S a d w ,~ J 3 '■ j K V v 'i a t c t ^K ^iy N * i,£ ) & > ' * A&lxs* «v. Und j ' flcil'p- i!! irt V iH iT tb i'f W i J fllBiB|Bti^BBI^(hlS8tBllHlIlilli§illSSPllllll(Mil^illl®IiiiliPiBi ||i ||: ^ ; ; i ; |: ; i ; |: ; i ; l ; ; ; ||l ; i ; i ; l i : ; i ; : ; i ; ^ ||: ; i ; i ^ I l l i l l l l l l l i y ^ l l l l l l g ^ l l l l l l l l l l l J ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ^ ; ; ; ; ; ; ; ; ; ; : ; : ! ; ; ; : i.’ i . • • N;.w i v rs "\ ;, ■ . * * 4-* < # . ■ - » . . -.: «; r I ■ si*- * Figure C.5 Password checkup. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i * * j i 4*** .jiw ^ vj T s r v ^^^tfit4'ci«sipj/iw>l V a n ! j® ll^B|liiillBilllM lHlliB^BIIBi(liSliilil!SBii|IIH lBliIliiPIII IstBpiBifHptf^BH^^HBiiillfeiflHiipiiiiflBHiiiiKHisiiv |i!i^^HiNl^BiMt!BHipiM^wiii^^^Mt^^Bfiill|jtiil!sipi^^BiiiiBiiMiBiiisii .... !V :.: ! ^i.: ? i.-i.'.i. I '.;• ,- J : !-»il.'-c'- I • • > ■ ! ■ - ! . — ii: si !.:„ ■ > :', Figure C.6 Guest login. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. J W r t f r y v . - ,.■ » i ■ ’= ■ ■ ? . ■ ■V iittei ■ i i ■ i.n- !}:v;. -.. b > \r-i-t ...... i - , V O ’ ■ i. diji* I m t e f i i k '! ;f c = r-.S'J4! .. r.feiTiM c i 1 v * « S ’ j !J l!S ? fcsfai.-fc ..... ■ « ;..!» { • A - B i M msm .V iM jfjJ M M I S I i i l l Msm i-jsr")!, * < < a . S - T f B S i'-fji- k 0,jtU« iglilijjipji | | p | | | | | g | | E a l s s * Jrfr'S llS lllllllllllllllllll :sa=.v:bi am ....... t '/ e i . &.W!3 l i |||i ^ M p l ! ! ! ! l ! l ! ! ! l i » V 4 g t 'i c « I.. .. lllll|lllP llH ^ ^ ^ M B H H H llgl I g^gj in a B M I l S B i j j l l l l l l l l l l l l l l l l l l l l i i M i i i i i i i i i i i i i i i i i i i Figure C.7 Search Tool. S s^ u u i ! r o ir . '} tj V : CBffiSi ; A/; in S f'r- - j”S lllllJ^^IIIII dSBriHam L -w 'L a * y v - l i i W l l l l l l l B I H H ginvfl! ■ ■ ■ ■ ■ ■ * tlHNM Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m * lidil Vjf:* ^av’ O r^ s Tfcois fieip <-& *<; ■ » » J l -1 j$ & •£»«■ •* .■jjvvosrta* ? ..* $ * i i ^ A j & h d $ 4 ^ a f l ^ i 'A r ^ v y ^ i w t f 3 S v - W / W ^ c s b citssr• i ^ n A s U u a • « « , 7 v ^s » a * fa tS 'if i? 'v & v j r r . A h S-*:i'.‘: '-:*' < ‘liV - C Ktk o*l ife ^bertJboas e& to y o u is rentes. { Ir ft o n S'-t Hdfrroff bu& in i& yjfti sH'-iti iiv e r t e s m llllllllllllliiiilllllllllllli A J b r e s t G h d i Agj&a'd, < V i ita fcrfr ^ A tt t" i3 0 H e r r e i V - s Robert A A & 1 3 j p s t L'-gffyhz I’ l i e r C v s » a n d B i J f c r i w s e t A b b - i * M a r e d c h 3 t" u k * fl®ks-Rifius S tk tf , i e s o n S^r,ac & Pitetuh * 0 . s A *J AS Keafr it*.Csnrtn>«i A 'dow* tVj&r .ferros! 1 - 3 »A H W o tS iV r^liK.yy.h t3<*qu<4 lean M cM i l . - n 1 1 - - > 1 * . ' } . * Internet Figure C.8 Selection of the interested pictures. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bdif View Tools lie ie * J * * 3 *>.*« *b ijm w te $Hw * £ - i> - > ;* . J A i M ^ S & s c K a i .W ? t i'l / s d t S c # t I'-.-'ir 1 :v -. ( '• ) - ; .'■ ■ ■ • .i I ■ :« •■ '■ '. :•’ i» ■ ■ -./ I s .- ■ - » . * * ■ .^ ■ ■ f f ■ > . | -C l^ir or1 - *1»? S u b m it v^Hi /n rrn io v v ?T sy « Icrdt f r m vsmr c.dterftQr*s j Sand iroa1 vtmv ftstHay J the iterm \wjhe BMBIillMililB^^BBllMiMMj^^MMil 'i s K tsw ve H S.“:? *.-».:• { ■ ■ :. s -.-.. .; A bM . ■ ^Resa-Ais gjB..^....... ^ Keane Ail Rcaiy Jm * !hi u&n& Abrams, ftgpy fccnrrasi H ■ ^R erasw i i&orfby * >rist£uante B a s q t t d f t , J s a r r M i e h s I H P *"kerj£«s >fiUs halier? ‘ Swmh | | * . j./* -. Eaifry- K d x st .V^rter & R w d s Bonnard, Pfene. .. J L t w m Figure C.9 Remove of the unwanted pictures. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. - v " » $&«b~p •:'» ■ » ;Jr ^ Jufri/z^syf poru*i ■sc.^w <H^^&^’ ?-::>'csnT^-h^!53ls ?vs4g’ , V -L -' i-IIV O .i. ■- K i'tf Cal*?r* ■ I ■ 1 & *tffc>*Cr4 aid UteSSfcrOSt • A h V e t, f & r i v i d i l h B i o c & s {'•J 30 Hercu&s ■ 5" ifoh'tardw ?ssmr EUHsrdi Bmvt* Q o y i^ e i I aid for Vjj]a&s Chagall M art? B ^ a d o ? a Y o u n g 'f c o m e n C a^xt, AfrKfwart Beach C^rafahlb* Jfohn l l i i llll iiii ■-KNMI-H H anTfjfifK Cdw • ?. * . • * » . * ?'. ■ U Scv.sr.t PhnJ Kif««llSr hf/Jgfa* l‘ T'j . * .. Figure C.10 Partial user favorites at a login session. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. $)? K dH Vfesv Tods Hefc i- fotfjc * J } fj i£ - ' f * .J f , A id fc s s &j f c t^ iis a u tp » w tsc i h / •c J w iW ia hahsi&jn f,i \ .* ■ > < : \f: S: J : . t \ . i’.’M * .. S$nO> 4 i Search ! ; Aifcsfsf-anienaM'wwT’y, W * M < sm r, Ofeafc C 130 Hsku& s Sarlfcy, Kcibof .Sunday Aifotico*\ l%$7 S M rtC v boa$ine S w jy N $ te v ar*liGRh» VyrsT;8 8 1 1 1 ?hs Madton Hurt jiafry, Dspnd^Vnrit B o y s i n a f c B S j n f l a j Hofcwr, W.ryftnw JSiysi Dove ^asw j. Fsbkj * Onst CanyifiA 'ff e * s . Oe&s Ile’hsn, Gbv&ni 39 @ D u » litSU d At ArMsrawii, 18 ?4 .T j Figure C.ll Partial user favorites from a user profile. 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix D User Interfaces of Experiment #4 on Representation Styles - 6 BR < 4 : h ~ :n - si' r t n >. i s i ; ' . - ! ilM > 1 .1 I' .|K M " SI'.T S '- f • ' M ’h ■ ' s.- - lo 'l I h i .:! ■ '.’ ? ■ ■ ■ iTi10\- . i i V . H i T I U " . ! - . . 5 : - i * r . . . n - i l s i > • slsss? ::"PT i si1 .. . . . . is’ i! ■ ■ .sli l.- h s .irs .i'. t •,•••■ ..‘l- 1 " ' tlv S V I I - . l ! Ii'i S '. - - . , .-ill • I t ..I . K. V . « ' * 1**“ ; * . * I ,1* w ‘ ^.1* ,4! * ' *' I i - i . i .* 1 . . .1 Figure D.l Home page. ij. lio ..!U. I h U & y o u r f b $ t t i m e t o t5i e w * h a s x c l i c k ■ju c m a b s o b f t j x s e t i n s i l e as- a g i i e s i d tc f c PJ la' 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ft €« ‘f e n tSM i« «b «■ •h6--* j ! 5 4v > ! , f ‘ J > « i» -J * * j j'.tJ'J l ^ ^ ||^ ||h s t p ; / y w w w , i B c e d H ( ’ £ f e p y t s / t f ^ a f e g J f o o t b a l / s l K w t 3 W . h t i r f A h U IV.otfcr.: V.srlcl Tw a m ooneucted on this >nm site is dwjgnsO to n i t t i ft* MsT'NttfemNjl OciS»e*rtte*r»«pMaRieTis and usermteraste. Tre : '= ■ : ■ : mi> seise. f'sv-viif infj-’ istio,' j.-ftsciiin.i,..1 '* (rein i > £ itetitbasearikssepthem inapetssraifoasM comet A usenrssy fopr is s new useii 8 registered a w , or % guest A new or registered user wit be provide« a user profit*, which will reswd usertemperament add keep back of user interests over S im. This o' usiss «•> strvs 3s v»t ate'r -osnu'cs > r;= as:ri i'liiy -s Ague* aisy alx- oroide ■.TfonrcjScn!;. Sit MSsB'ch by piavkSng persons! temperament *t<! csSectingthefosorite art («aS*S. However. 8* gu«i profite oniy keeps track W8te current stsstsn sffo h® user history can be applied far fong-term modeling. ThisFoctSsaH Field systsmMpfOMptiheyssriioohocsthlsorhsr h -Ab-iPrH cow J Hit jS.i ■ ?t-::ihS ^ " 3 S* • Sws:'.£ Pwceivfrfjg N T -INtuing, T M nM ng; MMllwFM&tg)* r-:ifsr-safir«:nuii!yfei oS-s-uist the sysf.'isiKi- 1'*!!,.», js:iii ferpcrunu* >yps sren ire us*? prof, *. rA eanw hfc tr-r. s.stsin a* !' isroside optional (inks, such as Temperament si tie footnote and issilry V s;.',1 i'eripeismerr. crth* linked p,5, a T^BWBWlSSSE*! w ***** ♦teSBonmiri for helping peep'# , to identSyttieHmtats personaRytempetantepis. Filling out the doesranni re see cksirg on rc Score QucstoMnire Sutton at itsf end of 8» pu*sitteii!^». Si* user m ay find out pereena! j temperament is Guardian: SJ; Attteaiv 8P: Itteafef: N T; or RsSonai: N F. A fter i user fogs in, the system wB present the user a ssi-vh fco: -o r.-Stey* tr> m Tram st on ,t (he datecssi I foa«®ipi».?!l5,e|ssi|!es« l l l l l l l l l l ! l l l l l | l l l i i i i i i l l l l i l l l i l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l i i l l l l l l l l l l | l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l | l l l l l l : I ttsgftforlKsim tee ire rii./il 'rair this sl‘ ,i,ly y > i, i> s issd to ha ', the r.yiwn tciibin thi ■ : ■ : -.cepr c( hums" a.isi'imeiil w ch nay iup, eve tie onelity of the services rgfcreOfeytfte eamptfimg systems m tt achieve the ao a! to bed*- to:- u'er # o » tl! te a r a ,! l* i* e ti« M i e w e !l& E e e M lfe » s Figure D.2 Introduction to the Web site. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission frfr lidi! \jw-v F a v o u rs T o o k Help ■ I V\ V !n'i^:':“, „ ' ! ■ ■ ' . '■ ’; : ■ : * » ! ivsll .':0 P l e a s e >ij=n if): >'■.■. ' . ..■ * - ' •• ' ■ :,i i. ■ !-. ■ ;r ■ ■ • ■ ■ s! S li Ui l v v ■ - ill ■ f - I ] ' sMuaini,'. Tli:iil :> > ■ ‘ l’ J ( . i f i 'u ti i! , ! -viu: ' BBBllliljpliljlll^l J .i ■ - .; .if* * ’ " < •!-. p .•■ .; ■ ■ ■ •' -f }* * ■ * f‘ xt « r » d m f & y?K feMpaanmit *m Oat} G o! i * Y ©j car. E lse M oil - if Pie y:r as a R ik’s I ■ I - : Hn:-! |,H--IS : *;■ !■ .:■ ..-i: •,-■:: : .-I ! :--.!:< ■ !! I ■ I » !: Figure D.3 User registration. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ali&t'tt 3 Q fcK p ://W K i» tte< htoil ‘ - '- M-.n-.r U ' ';o v.rV i1 .: :-:-’-'j: |9BiMPp(B^^^Bfl^B^^BII^^^^BB(Ml||§iiHKi|iiSl " " " " * - : “ — SBll^^^BBIBiltJiiiy|illikl^BBi^BlflBIHII 1 ■!yeiifoiiroiy!M paE«m,riid.rta'. ‘ 'i S% U jotr iist-tww r« ftr vrrti. sits, c sf- S t “ nv ' ■ ■ ■ .,* ? ■ • i.s. :- r- .« ■ - ■ : - . ■ , & tk«j * : : r < ■ ■ i -.. v : , ■ . : :i-.' s', r.-i i.--, i i : , = n,*.;: i ■ ■ ... • ■ > iiJWiiKiPtl Figure D.4 User authentication. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. F ile M i Y v m i% 'Jo ra :s l o o i s I h r - **8k)s * 1 <5 J .'a 4 A r a l .jK ivw aa • V* , J HF • M u F / v > w v a M a i l ........... \ ; i ; : : ; ; ; : ; ^ ^ ! ; ! ; ! ; ! ; ! : ! ^ ! ; : ; : ; ! ; ! ; ! ; ! ; : ; : ; : ; : ; : ; : ; : S : : : : ': ; : ; : : : ; : ; : - : ; : ; : i : ; : ^ ; : ; : i : i : ; : ; : ; : ; : ; : ; : ; : ; : ; : ; : ; : ; : ^ ■ * .% ...& ! : ! ..... ........... .... ':)'■». : l it - .* rv“: . * '" • k j-.* « ,s t1 .it ,...-■: « i * ■ ’ 1 ■ ’... 1 Figure D.5 Password checkup. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. K B t Eat Jfa sw FsranW i j'cols BHb - J 1 -.3 i t & u i i lf tw p m »j H? ’■ ' l»iiiiita*Sii«#lii»*^^^^^piiiyfefeWlM^™ v ■ •■ ■ , . . si - i .f l ! UtoVil Jlf.r.1 -..-■ . N- W L l l - i n i i - ■■.:« . - - f < i . : J : 'V n -;:, , ^= . : n : I - lir t. „ Figure D.6 Guest login. # lateral Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. •* 'r ,' J . ) JS J}*--*-'i ,^ H " u a « „J? Address ,T • P fc ssr -'tfr o K i a f r ^ ? ts ^ T S i» t -jr t fe v u l J r v ir .t < h « * w » y j f i n »#* a jtn n a n v e \- ir fr r r « a ts ; n pw i^ahnns 5 l l ^ j M f j ^ ^ J i ; f i ■ !:!■ ■; L b i b f i f f ; i f';'; i i : : ; :! : : : 1 1: : ! E ; : : i h i f : ': '; i f. ; : f j ;y :; ! : : ! i ; f ■ : i f i Z i i f ; ? i i f f ; I : i f : i 1 1 1 b b b b : : : | i b f f j i ' ■ : ; '; b j f b b i i b l b :' i : L f" If L . ; : f f ■ f ; ' b i f ■ f i f f ■ :b b f f : '. if b b . ■ i i . ■ ::: b ' ■ i : y ; . b : ; f i b ! : : f : " f f f b : f f , f b . 1 | f ' : . ; . | : | b I f f i b N L i | : b i i f b b b f :' l ' 1 1 ■;1 j j f l i l i f i f i f | ''C s w w P ilrn rrtfa a itric rt'rtU S r : ’ ] | . llff 1 1 1 1 ! SIS W HSKI g.:^a)^ S -- - ■ \ \ j i Oi s ^ r a ? feV tryfl^lcoOMWuittTfis ^ V tio n iglK toifr^tec*3nva& ,ifctf i ^ v i m Pafefter is 2 sSuder& at U&C. Y c -i^ U lfe \ -^uestiBn), a id a & n ) *fn&t. o f tfe i\*n aitw ia’ iiv?,;, s m ore «i!fectW3 fct Y 1AJ m Ksc&vvr# B » ffdsmnaTturt fa c t. ! , t Which & p m sffacliw fct you j 2. W f f l d ju m o s e d S s n w is w d : • ' & e!sm ai D a ftfc a s e M ojct i | 3. Wriif&fimiCTdfrcftw J& T K iSi M MlBifliiM llllllMlBllll^B^^M^^MlBlMBlBI^BIlBBMWIBl^B^MillB»BIBMBlily' I i l l I I I . . . . . . . I I . . . . I. . .. . .. . . I . . * . . . . . . I . . . . . 1 . 1. . . ... . . . . .. . I . I . S I * .... Figure D.7 Main menu and a questionnaire. 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.««S Csson Palmer ^rapfel C .H i'S '- K 5 C.:rTJif- :S i . S V L G '.:^ - V' *!{lC: Av of Scathe'*'. Cf.zifomla. .. i ■ - » ' 4 1 Figure D.8 Presentation of the representation styles at the abstract level. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. .. Carson Palmer (a) Graph I. * Carson Palmer is a student at the University of Southern California. (b) Text I. f T : W i; S tu d e n t ; Carson Palm er use Students ci> S u lta n A b d u l-M a lik Sham 3ud-D in A b d u l-S h a h ee d M arc e l l A llm ond_____________ K ev in A rb e t D oyal B u t le r Sunny B vrd C h r is C ash M a tt C a s s e l A aro n Graham A lex Holmes Chahwa L in Z eke M oreno B ren n a n Ochs C a rs o n P alm er K r is R ic h a rd M arkus S t e e l e Z ach W ilso n (c) Concept object data model I. (d) Relational database model I. Figure D.8.1 Representation styles at the abstract level. 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. w m a a s im s m e I S * i * t - o v j s f f c a < W . f H «**% , n:to:ijk6 £PM tx • jx «V-^isfvKvtj-fcrfLci-n ; - S.--A .~.m • m , ti Cj;i;> o u Palmer _ P B C f l i E ■ 'A i'ik lri' Ots^i a ■ OSC iCpioSiOre qucrterbGc c ierson Potmen sa.; 'corn on 12/27/73. Thi 6 ■ fco■ -5. ?20- pound Pi-lBii-- ivitn IV exper-ie.ic? cores fi'or Lagunt Nigu/’ d eft or graduGTed mom San'a Mer'gas'ifs f'ligh School. Figure D.9 Presentation of the representation styles at the detail level. 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 Carson Palmer PHOflLt Class: High Schoor ^ Birthday: r ''’ ’ 2- .1 w i Q ~ 'f * 5 s - l l it:’.-- (a) Graph II. * USC sophomore quarterback Carson Palmer was bom o r» 12/27/79. The 6 -fo o t-5 r 220-pound Palmer with IV experience comes from Laguna Niguna after graduated from Santa Margarita High School. (b) Text II. Figure D.9.1 Representation styles at the detail level. 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. use ) I ISii ,-'V ^ ^ ^, I 'T - .M n e 5 tucleiit .cc?r' ~* - ■ '-•'•• - * •• ‘ ^ * . . ^ - w v \ y * V ‘ * V 3 f ~ t M ? ‘ f..e S S S f ls : J > :■ ' \ t j '- ■ - .:/ ^ S j ----- Uv— 5f;:t * MVrg.-rir*. , • ■ '' F o o t b a l h . ■vJPlaver.^-: A -t‘ * !S}- (c) Concept object data model II. use Football Roster Name Pos H gt W g t B irth d ay Cl Exp H om etow n H igh School Sultan Abdul DE 6-3 240 9726777 Sr 3V Arcadia Arcadia Sham sud Ab dul DE-DT 6-4 250 10/12/77 Sr 3V Los Angeles V erbutoDei M a call Alim ond WR 6-1 190 5/28/81 So IV Anaheim St. Paul Kevin A rbet S-.CB 5-11 175 3/26/81 So IV Stockton St. Marys Doyal Butler TE 6-3 245 2/4/80 Jr Jc Tucson, hZ Sabino Aaron Graham LB 6-1 225 6/12/81 So IV Bakersfield Bakersfield C hris H oward TB 5-11 180 2/2/82 Fr — Los Angeles Banning Z eke M oreno LB 6-3 245 10/10/78 Sr 3 V Chula Vista Castle Park Ifeanyi Qhalete K ris Richard S CB 6-2 MM 6 -0 225 S C 180 5/22/79 BSSI 10/28/78 Sr H a Jr 3 V H f ® t l P 7 ' - Markus Steele LB 6-3 220 7/24/79 Sr IV Long Beach Chanel Zach Wilson OG-OT 6-5 315 10/14/79 So IV Bellflower Mayfair (d) Relational database model II. Figure D.9.1 (continued) Representation styles at the detail level. 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ir iK M i S lk» Ks»ot*s 'isoi Hap ■ i 'B a d l r - - v jl :% ■ i . .^ i / A d d r e s s ‘ f- ) > te f a « a ? < s s s d 3 - . - ; » f c /. A d w s f r j i - f e * 5 iis fo s u jr - ji sit u s a v 3 ir J !!> 1 i* w s*4 ? < n s t o O T ! i> -.n in r « is siiifejflstkiite^ftii ■ v.aipK iti’ f < ’* » * » * P sfem s'» a stufefif a t USfl Y M s-.i-u ite i M sseteW i is'wai auetfcrsv sid askxi w h ic h (d S h e !*e atetm trw s is« o ’ fe'fe? to r T f O t; in isceivin a th e w fer m a ftm laft. . 1 V f lir t 6 t * i l K « r f t « 'i i '< K & B 'S V i J . | ' C oa^uB SO w tD aisbissM oasJl ! f KefeionaiUasbae Mattel | ! ! l l l l l l l l l l l l l l | l l i ! ! ! ! l ! ! l l ! l l ! l l i l l l ! ! ! l l ! ! l l l ! l ! ! ! ! ! ! ! ! l ! ! ! ! ! ! l ^ I '■ C o n cep tu al O bjvd D atabase Ms&fci \ i 4« W .|^ h i5 « v « e ^ ti^ d iv e f e r ^ o u MMll|BiBiBlllllliiM|BlMBBpi^^W^^^Ml^^W^MW^^^^B^MlBBKIlMBI '' Oorr:«pfa^ Uhijsct Uttdbae Mo&H 5. WfcttA v > nv»« e f e $ j *& far jw u r KeiaHonaj Database Model 3 J S . Wtech w fflflse pJfcctivR for >wi '' H elT^M aW teieiT Figure D.10 Questions at the abstract level. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. .. II ..................................... I...j ...I...I .....I.. ................. § ....1 ...1 ......... 1 ................ 1 ..1 ....................1 ... 1... ! ........ j........... ......... .... }jti us a w w Hat»« w tr> M A tr» c c s m w lt sn Wforrwfcr. fast Sor w.amp® Vat Cat am ?8&raar,s jwoftte. Y V jti «It te pttsefed w*?n4 j issjittons, m i iskei whth <4the1 *6 atonafret is rear u®«stt*e t* 'lO'u in weewrat the uiterrftat fstl ■ 1 W tifch a mare effctwe fet »oti | ? , W h ic h ; » t o w * e flk iw . f c i y o u { t'bsxiituat'.itosdftatahaeM txieiJi H eatonai b W u feM o titi l! 3. vvtno) jj r a w effective f o r C oir^aiO bK JD ateteeM w fcrH 4, W tis t U rrare<3fctfi«r far yew ! Cotjccphtal S Shfxi I k b t m s Modri H i. Which i5TO® 5«ffecti«T jffl jO U • ' .Relaisrai Database Mcdei U j 6. Whwh ti m a r dlfcstivs' tor you * ■ Kefcfonal Database Model it Figure D.ll Questions at the detail level. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. S d 'J V g ’ - v P astes T o o ! * H a f e * AsUrJS? •£j&tp:fr'!i &ff}pM\MVK = $ * « ■ * .haI,* ^ ^ ]> * i bhT\* ta t fccxisi & ffWW: >--js*. * ;c o ,! >ai:*.i:'nui. ] ' - ; » l s v.l ir 'l» i Sign Oar- . Wmu | ; 1 to il 1 1 U i^f - T ftX tl j > C oa^tt^O & ^tU m b& asi^yi : - j.tatetfpa! *■/«•>«« fctafei i ! • feljM;:Sii.'SShw;ifcWl 3. f r r ip i i ~ O ta v tK i '» jk » tW n s * MoiA 1 * IP .... 1 ........I ... OM tcapnuJ iT & jert Dai*!** l/lai* t > . Cmvb \ < 3 rtpb ? * '■ : ■ • i ' '- l ., ■ ■ ...mm \ mvwjt ! ; ^ ^ ^ ^ l i ! ! l l l l l ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! l Pektt6r.il F*bU%z Ms W l g laieraei Figure D.12 Partial user favorites from a user profile. 178 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 
Asset Metadata
Creator Lin, Cha-Hwa (author) 
Core Title An adaptive temperament -based information filtering method for user -customized selection and presentation of online communication 
Contributor Digitized by ProQuest (provenance) 
School Graduate School 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag Computer Science,OAI-PMH Harvest 
Language English
Advisor McLeod, Dennis (committee chair), Knight, Kevin (committee member), O'Leary, Daniel (committee member), Pryor, Larry (committee member), Shahabi, Cyrus (committee member) 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c16-253176 
Unique identifier UC11339307 
Identifier 3093784.pdf (filename),usctheses-c16-253176 (legacy record id) 
Legacy Identifier 3093784.pdf 
Dmrecord 253176 
Document Type Dissertation 
Rights Lin, Cha-Hwa 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button