Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Massive user enabled evolving web system
(USC Thesis Other)
Massive user enabled evolving web system
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MASSIVE USER ENABLED EVOLVING WEB SYSTEMS
by
Qingfeng Anna Li
________________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2008
Copyright 2008 Qingfeng Anna Li
ii
DEDICATION
This is dedicated my parents, Mrs. Zhen Zhong and Dr. Tingkai Li, my grandparents,
Mrs. Chun Zhu and Mr. Shu He Zhong, and my advisor, Dr. Stephen Lu.
iii
ACKNOWLEDGEMENTS
I would like to thank all of my friends, family, and supporters. My parents, who have
always been there for me. My friends who’ve always helped me through tough times.
Dr. Stephen Lu, my advisor, whose patience and advice guided me throughout my
research and studies. Dr. Barry Boehm, and Dr. Yan Jin, my committee members, who
have encouraged and helped me invaluably throughout this difficult process. I’d also like
to thank Dr. Michael Zyda and Dr. Wei-min Shen for their guidance and support. Last
but not least, I’d like to thank my best friends Le Wang and Rosa Chan for everything
they’ve done for me.
iv
TABLE OF CONTENTS
Dedication ii
Acknowledgements iii
List of Tables vii
List of Figures viii
Abstract x
Chapter 1: Introduction 1
1.1 Motivation and Research Problem Statement 1
1.2 Key Research Issues 3
1.3 Massive User Enabled Evolving Web Systems (MUEEWS) 4
1.4 Thesis Organization 6
Chapter 2: Research Background 8
2.1 User Generated Content 8
2.2 Collaborative Tagging 23
2.3 Software and Web Engineering 38
Chapter 3: The Fundamental Research Problems 42
3.1 Functional Complexity Analysis 42
3.2 Functional Complexity Analysis of MUEEWS 44
Chapter 4: Massive User Enabled Evolving Web System Approach 47
4.1 Theoretical Foundations and Analysis 47
4.1.1 The Logic of Collective Action (LCA) 47
4.1.2 LCA’s Analysis of MUEEWS 51
4.1.3 Participatory Action Research (PAR) 54
4.1.4 PAR’s Application in MUEEWS 57
4.2 Theoretical Framework 59
4.2.1 Current MUEEWS Framework 59
4.2.2 PAR-based MUEEWS Theoretical Framework 62
4.3 MUEEWS Theoretical Framework’s Functional Complexity Analysis 65
4.3.1 Functional Complexity of MUEEWS Theoretical Framework 65
4.3.2 MUEEWS Functional Periodicity Determinants 66
4.3.2.1 User Usability Tolerance 67
v
4.3.2.2 Meta-Content Sufficiency 70
4.3.2.3 MUEEWS Operational Functional Periodicity 71
4.3.3 Detailed Analysis of User Usability Tolerance 75
4.3.4 Detailed Analysis of Meta-Content Sufficiency 78
4.3.4.1 Meta-Content Stability Statistical Analysis 79
4.4 Hypotheses of the Simulation and Test Environment 82
4.4.1 Hypotheses of MUEEWS Theoretical Framework 82
4.4.2 Hypotheses Analysis with MUEEWS Functional Complexity 85
4.4.3 Hypotheses Validation Methodology 86
4.5 Simulation & Test Environment 87
4.5.1 Simulation Environment 88
4.5.1.1 Simulation Environment without Theoretical Framework 89
4.5.1.1.1 Simulation Environment without Theoretical
Framework Architecture 90
4.5.1.1.2 Simulation Environment without Theoretical
Framework Interface 93
4.5.1.2 Simulation Environment with Theoretical Framework 94
4.5.1.2.1 Simulation Environment with Theoretical
Framework Architecture 94
4.5.1.2.2 Simulation Environment with Theoretical
Framework Interface 95
4.5.2 Test Environment 98
4.5.2.1 Test Environment Architecture 98
4.5.2.2 Test Environment Implementation 100
4.5.1.3 Test Factors and Response Variables 103
4.6 Test Scenarios & Validation 104
4.6.1 Test Scenarios and Test Goals 104
4.6.2 User Usability Tolerance Limit Analysis Test Scenario 105
4.6.2.1 Control or No-Tag Test Scenario 105
4.6.3 Hypotheses Validation Test Scenarios 106
4.6.3.1 Unorganized Tag Test Scenario 106
4.6.3.2 Organized Tag Test Scenario 109
4.6.4 Test Scenarios Definition 113
4.6.5 Statistical Validation for Test Data 115
4.7 Results & Validation 116
4.7.1 Test Hypotheses Definition 117
4.7.2 Test Results 117
4.7.2.1 Organized vs. Unorganized Simulation Environment
Results for Tagging 118
4.7.2.2 Control or No-Tag Test Scenario Results 122
4.7.2.3 Organized vs. Unorganized Test Environment Results 125
4.7.3 Statistical Validation of Results 130
4.7.3.1 Organized vs. Unorganized Simulation Environment
Results for Tagging 131
4.7.3.2 Organized vs. Unorganized Test Environment Results 132
4.7.4 Hypotheses Validation 134
vi
4.7.4.1 User Usability Tolerance Validation 134
4.7.4.2 Meta-Content Sufficiency Validation 134
4.7.4.3 Functional Periodicity Validation 135
Chapter 5: Conclusion & Contribution 137
5.1 Conclusions 137
5.2 Contributions 143
Chapter 6: Future Work 146
Bibliography 148
vii
LIST OF TABLES
Table 1 UGC Platform Survey 9
Table 2 Research Approach Survey 11
Table 3 Collaborative Tagging Approaches 26
Table 4 Statistical Analysis of Test Time Average 126
Table 5 Statistical Analysis of Test Average Rating 128
Table 6 Statistical Analysis of Test Incompleteness 129
Table 7 Statistical Analysis for Unique Tag per Link 132
Table 8 T-test Analysis for Test Time 132
Table 9 T-test Analysis for Test Rating 133
Table 10 T-test Analysis for Test Incompleteness 133
viii
LIST OF FIGURES
Figure 1 Research Approach Overview 7
Figure 2 Vizster depicting three intersecting social networks 18
Figure 3 FolksAnnotation Architecture 29
Figure 4 Tag Visualization Example 35
Figure 5 Software Engineering (Spiral Model) 38
Figure 6 Web Engineering (Cross-Functional) 40
Figure 7 Time-dependent Combinatorial Complexity 44
Figure 8 Time-dependent Periodic Complexity 44
Figure 9 Participatory Action Research (PAR) 55
Figure 10 Current MUEEWS Framework 60
Figure 11 PAR Based Theoretical Framework 63
Figure 12 MUEEWS Combinatorial Complexity for UUT 67
Figure 13 MUEEWS Periodic Complexity for UUT 69
Figure 14 MUEEWS Functional Periodicities 72
Figure 15 Infeasible MUEEWS’ Functional periodicity 74
Figure 16 Tag Stability Statistical Analysis 80
Figure 17 Portal Homepage 87
Figure 18 Simulation Environment Architecture 91
Figure 19 Popup with URL information 92
Figure 20 Simulation Application Tagging without Consensus Formation Tool 93
Figure 21 Simulation Environment with Consensus Formation Tool 95
ix
Figure 22 Simulation Application Tagging with Consensus Formation Tool 96
Figure 23 Test Environment Architecture 99
Figure 24 Test Application Start Page 100
Figure 25 Test Application Search Results Page 102
Figure 26 Test Application Rating and Comments Page 103
Figure 27 Base Case Test Scenario 106
Figure 28 Unorganized Tagging Test Scenario 108
Figure 29 Organized Tagged Test Scenario 112
Figure 30 Number of Unique Tags per Link 119
Figure 31 Tag Stability without Consensus Formation Tool 120
Figure 32 Tag Stability without Consensus Formation Tool Top Tags 121
Figure 33 Tag Stability with Consensus Formation Tool 121
Figure 34 Tag Stability with Consensus Formation Tool Top Tags 122
Figure 35 Correctness vs. the Probability of Incompleteness 123
Figure 36 Time vs. the Probability of Incompleteness 124
Figure 37 Rating vs. the Probability of Incompleteness 124
Figure 38 Test Time Averages 125
Figure 39 Test Average Rating 127
Figure 40 Test Incompleteness 129
x
ABSTRACT
User-Generated Content Websites[1], or “social media" Websites[2], such as
YouTube[3] or Flickr[4], have gained tremendous success for their ability to use user-
generated meta-content to further their user experience. However, current UGC
developments resemble the ad-hoc Web developments of the 1990s and Software
developments of the 1960s[5]. Past history has shown what these approaches can lead to,
and there are already failed UGC developments caused by these uninformed approaches.
In this thesis, the research will be geared towards finding a theoretical framework that
target the unique behavior of UGCs, and discovering means to remedy their problems and
complexities.
The ad-hoc approaches to UGC developments are caused by a lack of
fundamental understanding these systems’ behaviors. This work will illustrate that these
UGCs are Massive User Enabled Evolving Web Systems (MUEEWS), where user-
generated meta-content are used to evolve system functionalities. By making users a key
stakeholder in enabling its functionalities, MUEEWS creates an evolution that occurs
during the usage phase of the software. Current Software and Web Engineering models
do not explain the effects of this phenomenon[6], and current UGC research target
specific applications that are difficult to apply extensively. Since the evolutions of the
functionalities are dependent on the contributions of very large group, MUEEWS
contains a number of collective action issues that can create system failures.
The lack of fundamental understanding creates many complexities for MUEEWS
in both the development and usage phase. Therefore, for the improvement of future
xi
MUEEWS developments, this work will provide a theoretical framework to guide its
development and manage its complexities. The theoretical framework is evaluated using
functional complexity analysis and validated through simulations and test scenarios. The
findings illustrate that the application of the theoretical framework can greatly reduce the
probability of system failure in MUEEWS and can improve its meta-content and system
functionality performances. This thesis focuses on the users’ contributions, and how to
leverage those contributions to improve system functionality. Since the current field of
UGC is not focused on theoretical or system approaches of MUEEWS, this research topic
can be used to make various important contributions to the field of UGC.
1
CHAPTER 1: INTRODUCTION
1.1 Motivation and Research Problem Statement
User-Generated Content (UGC) refers to any type of media, from text to video
that is produced and influenced by the site’s end-users, rather than traditional
broadcasters[1]. The first wave of applications that would later be defined as UGCs
came as blogging, podcasting, and wiki applications. Now these systems have extended
to Flickr[4], YouTube[3], MySpace[7], Rewer, Friends Reunited, Facebook,
Wikipedia[8], and many others, all of which are build on the concept that end-users not
only generate, but also control the content and makeup of these sites[1]. The term came
into the mainstream in 2005 in Web publishing and new media content production
circles[1]. Early examples of user-generated content applications, such as bulletin boards
and groups on portals such as AOL and Yahoo existed[1]. But this new wave of UGC
sites has become a phenomenon, with YouTube being named TIME magazine's
"Invention of the Year"[9]. Also known as “social media" applications[2], UGC
applications have gained tremendous attention because of their abilities to use rich, user-
generated meta-content to further their user experience.
Although the UGC field has yet to heavily invest in reliable market surveys of
failure and success development statistics, one can see from the .com failures of the past,
what the use of ad hoc, hacker-type approaches that currently plagues this industry can
quickly lead to. Due to the technological ease in developing these applications, the
frantic rush to be public (i.e. on the Web), and the competitive market schemes that are
trying to push this phenomenal forward, these systems are being developed without
2
systematic techniques, sound methodologies, and quality assurance. In the November of
2000, shortly following the collapse of the .com bubble, the Cutter Consortium[5]
uncovered the top problem areas in large-scale Web application projects; failure to meet
business needs (84%), project schedule delays (79%), budget overrun (63%), lack of
required functionality (53%), and poor quality of deliverables (52%)[5]. The Cutter
Consortium concluded that the ad hoc development of web developments during 1990s
resembled that of the software development practices of the 1960s[5].
The respective problems identified by the Cutter Consortium above are already
evident in UGC applications today. For example, there has been much work, whether
from individual efforts, start-ups, or large companies to develop photo mapping
applications. The intent of these applications is to provide users with rich photo-enabled
descriptions of locations beyond location name, contacts, and reviews in current mapping
applications such as Yahoo maps, Google maps and MSN live. Attempts such as
Mappr[10], Photogalaxy[11], Panoramio[12], and etc, have gained small successes in
obtaining user contributions. However, they have not achieved what is necessary to
successfully deliver the end-goal of their product, which is to provide a map of the world
fully populated by user photos. Without a complete set of the required information, these
sites are unable to claim that they provide an experience beyond what is offered by
current mapping systems, such as Google Maps.
There is currently a very competitive development race, as there was with Web
developments in the 1990s, to become the first to provide the next popular UGC
application, whether it is cloning the likes of MySpace, YouTube, or Flickr, for specific
audiences, such as children, or to deliver new UGC interactions, such as touch[13]. The
3
need for both speed and the ability to drive user contributions has become key concerns
for any UGC application developer. However, currently, there are no rigorous methods
and processes to guide these development efforts. Therefore, without a set of guidelines
to follow, UGC developers are left to their own devices, which results in the ad-hoc
development efforts seen today. Even popular applications, such as YouTube, have
received user complaints regarding the lack of responses for certain search queries,
among other problems. The research below will explore the root of these problems and
propose a theoretical methodology to capture and remedy the current ad-hoc development
approaches of UGC applications to date.
1.2 Key Research Issues
The key issues that occur with UGC applications, such as YouTube, is that the
developers have very little control over what type of content and meta-content the users
are generating. This lack of control is directly related to lack of understanding the
developers have regarding the nature of these systems. Therefore, this research will
focus on the development of theory-based methodologies and practical strategies to
improve the fundamental understanding of these systems. The Research Background
section will explore the state of art of UGCs and collaborative tagging, which is a meta-
content gathering technique that has become pervasive in current UGC systems. As this
research will discuss below, the meta-content generation process is the key in
maintaining order and control in UGC systems. Furthermore, the Research Background
section will discuss current software and web engineering approaches and why UGCs
present a unique challenge that these approaches do not address. The following sections
4
of the Introduction will discuss the overall approach of this research, which is detailed
further in the following sections of this thesis.
The key in understanding the UGC development challenges is to acknowledge
that the most important evolutionary changes in UGC applications occur in the usage
phase, i.e. the generation of the users’ meta-content. This unique evolution created the
incompatibility UGC has with existing Software & Web Engineering methodologies,
which in turn has caused the developers of current UGC applications to fall on ad-hoc
development approaches. This is due to the fact that Software and Web Engineering
address changes in the development, and not the usage phase. This incompatibility also
exists with other Engineering fields where the UGC approach is being applied. For
example, Apple’s product developments heavily utilize user input. This research will
directly addresses the users’ behaviors and interactions with the meta-content they are
generating to find approaches that can be used to remedy UGC and collaborative tagging
applications and beyond.
1.3 Massive User Enabled Evolving Web Systems (MUEEWS)
This work will illustrate that “social media” or UGC applications are in fact
Massive User Enabled Evolving Web Systems (MUEEWS). A Massive User Enabled
Evolving Web System (MUEEWS) is a software system that merges the private actions
(i.e. user-generated meta-content) of a large group of users into collective utilities. In
turn, those collective utilities enable an evolution of the systems’ functionalities beyond
the abilities of the initial function set. This process reflects a dramatic shift in
stakeholder roles between the developer and the users from their definitions in traditional
Software[14] and Web Engineering[5]. Because of this shift, MUEEWS has an
5
evolutionary software development process with complexities that cannot be managed
with current Software and Web Engineering tools alone. Traditionally, only the
developer is responsible for enabling functionalities in the software[4][15]. In
MUEEWS, the users also play a key role in the success of the system’s functionalities,
and those who can successfully harass the users’ production of user-generated content
will succeed.
To illustrate this, Tag Search[16, 17] will be used as an example. Prior to Tag
Search, search engine developers had great difficulty handling multi-media indexing and
search. Search methods that employ image recognition for categorization are difficult to
program and take a large amount of memory space and time to run[18]. Recognizing the
need for multi-media search, Google offered Google Image and Video Search, which
employs text search methods based on indexing the labels associated with the image files.
However, since most images on the Web are poorly labeled, the search engine’s
functionalities were limited. Tag Search changed multi-media search as the world knew
it, with examples such as Flickr[17] & YouTube[19]. By combining its users’ privately
generated tags, Tag Search was able to generate search engine ready indexes for the
entire application[16].
Take Flickr for example, an online photo archive that allows the users to post,
view, and share their own photos; the application quickly attracted many users with its
large memory space and simple interfaces[17]. Given the large memory space, the users
needed to index their files to keep their online archives manageable. Therefore, the users
tagged their photos as a way to manage their files. Since users’ tags are often similar,
(i.e. most users would use the tag cat to describe a photo with a cat in it), similar tags
6
were used as indexes for the entire site, which translated into a photo search engine
easily. Unlike current Software and Web Engineering development processes, where the
users are only the target of the software end product, in MUEEWS, the users are a critical
component in enabling the evolution of key functionalities.
There are three key characteristics that illustrate MUEEWS’ evolutionary
functionalities. The first is its user-enabled evolution of functionalities, which is the
software system’s ability to enable evolutions for its functionalities as more people use
the software. The second is its emergent user behavior, which means the ways the
software system can be used will change over time. The third is its socio-technical
development cycle[20], which is created by combining the first two characteristics traits.
Emergent user behavior enables the evolution of MUEEWS’ functionalities, and the
evolved functionalities, in turn, drive user behavior changes when their new versions are
re-presented to the users. This socio-technical cycle is the driving force behind
MUEEWS’ evolution of its functionalities; however, it is also the cause for MUEEWS’
ad hoc development processes, since it is not a component in current Software and Web
Engineering methodologies.
1.4 Thesis Organization
The power of MUEEWS comes from its ability to leverage the meta-content
contributions of a large group of users to enable evolutions in its functionalities.
Therefore, to address the functionality complexities[21] that arise from MUEEWS’
evolution, this research will analyze its complexities based on a functional complexity
approach. As a means of capturing the users’ meta-content generation, there is a great
deal of work in Social Science and Economics that deal with the collective actions[22] of
7
large groups. Therefore, this research will use collective action approaches to perform a
theoretical analysis of MUEEWS. After these analyses, this research will use an
approach from participatory action research (PAR)[23] to build a theoretical framework
for MUEEWS’ development process. Furthermore, this research will build a simulation
and test environment that mimics a MUEEWS system for the validation of the proposed
theoretical framework. Lastly, this research will generate test scenarios in the test
environment to validate the hypotheses in association with the theoretical framework.
This approach is detailed in the Figure 1 below.
Figure 1 Research Approach Overview
8
CHAPTER 2: RESEARCH BACKGROUND
To establish why this topic is so critical, the research background section will first
illustrate the current state of art in UGC research. This section will establish that the
current research in UGC is focused predominantly on the analysis and expansion of
existing UGC applications, the analysis of users’ social data and infrastructure, such as
social networking, and new and unique UGC applications. This research will illustrate
that these studies do not focus on the developmental approach for UGCs, nor do they
address the improvement of UGC meta-content generation or functionality evolution.
Furthermore, since this research is focused primarily on the topic of leveraging the user
generated meta-content to improve system functionalities, it will further explore the
current state of art in the meta-content gathering techniques utilized in UGCs, which is
known as collaborative tagging.
2.1 User Generated Content
The current field in UGC applications extends from the sharing of multimedia
such as photos and videos, to text based collaboration such as wikis[24], and has even
transcended to online games[25], and various other platforms. Essentially any platform
where users are able to create and share their content can be categorized as UGC, also
knows as User Created Content (UCC)[1], User Create Media (UCM)[1], or “social
media”[2]. To give a brief overview of current UGC applications, the table below lists
the main state of the art platforms, based in part on the report of the Working Party on the
Information Economy (WPIE), where UGCs are available[26]. However, the state of
UGC is not limited to these examples. In fact, further below, this research will discuss
9
some of the new and unique UGC applications, such as UGC enabled ubiquitous
computing techniques.
Table 1 UGC Platform Survey
Type of Platform Description Examples
Blogs Web pages containing user-
created entries updated at
regular intervals and/or user-
submitted content that was
investigated outside of
traditional media
Popular blogs such as Blogger,
BoingBoing and Engadget;
Blogs on sites such as
LiveJournal; MSN Spaces;
CyWorld; Skyblog
Wikis and Other
Text Based
Collaboration
Formats
A wiki is an application that
allows users to add, remove, or
otherwise edit and change
content collectively. Other
sites allow users to log in and
cooperate on the editing of
particular documents.
Wikipedia; Sites providing
wikis such as PBWiki, JotSpot,
SocialText; Writing
collaboration sites such as
Writely
Group-based
aggregation
Collecting links of online
content and rating, tagging,
and otherwise aggregating
them collaboratively
Sites where users contribute
links and rate them such as
Digg; Sites where users post
tagged bookmarks such as
Del.icio.us
Podcasting A podcast is a multimedia file
distributed over the Internet
using syndication feeds, for
playback on mobile devices
and personal computers
iTunes, FeedBruner, iPodderX,
WinAmp, @Podder, Songbird
Social Network Sites Sites allowing the creation of
personal profiles
MySpace, Facebook, Friendster,
Bebo, Orkut, Cyworld
Virtual Worlds Online virtual environment. Second Life, Active Worlds,
Entropia Universe, and Dotsoul
Cyberpark
Content or
Filesharing sites
Legitimate sites that help share
content between users and
artists
Digital Media Project
Forums, Bulletin
Boards
Online messages threaded
based on topics
Classroom Bulletin Boards,
Technical or consumer services
applications help forums
10
To illustrate the current state of art in UGC research, this research must expand
beyond the UGC applications on the market and identify the current dominant trends in
UGC research. Current research topics in UGC applications have been focused primarily
in the three concentrations discussed below. The first is focused on the analysis of
existing UGC applications to utilize their observed benefits toward new applications and
new audiences, such targeting children friendly content. The second examines the social
infrastructure and data, such as social networking[27], of these applications to improve
their user experiences or provide new functionalities, such as entity recommendation
based on social networking[28]. The third is targeting new and unique applications the
UGC approach has not yet been applied to but can contribute towards, such new
interfaces like touch.
In terms of creating a theoretical framework for remedying the ad-hoc
developmental behaviors of MUEEWS, which is the focus of this research, the three
topics discussed above are too specialized to be applied towards general theoretical
methodologies and practical strategies. However, some of these researches offer specific
observations that could be used to enhance the overall understanding of UGC
applications, which can in turn be used to indirectly improve the quality of UGC
development. However, individually, none of these researches directly address the
overall complexities and issues that arise with ad-hoc development of current UGC
applications, as this research will illustrate through the detailed discussion below.
11
Table 2 UGC Research Approach Survey
Approaches Examples Pro Con
Analysis and
Expansion of
Existing
UGC
Applications
Wiki applications
targeting enterprise usages
Community based open
source code developments
Benefit analysis of shaper
usage in Wikis
Analysis of online
encyclopedias
UGC Collaborative
Tagging Applications
(Later Section)
- Aids in the
understanding of
how UGC
applications can
be applied in
various fields
- Aids in the
understanding of
the current state
of the art in UGC
applications
- Does not
examine the
developmental
nature of UGCs
- Does not
address the
improvement of
UGC meta-
content
generation or
UGC
functionalities
UGC User
Data and
Infrastructure
Analysis
Social network analysis
Computer-supported
social networks
Generation of user profiles
for information filtering
Visualization of Social
network in UGC
applications
Social infrastructure of
UGC applications
- Aims to improve
the users’
understanding of
the data and
underlining
infrastructure of
the UGC
applications
- Does not
directly address
the
improvement of
the UGC meta-
content
generation or
UGC
functionalities
Development
of New and
Unique UGC
UGC Authoring for
mobile multimedia content
UGC generation and
usage in video/computer
gaming
Social Navigation, Social
Maps, and Annotating
Space
- Aids in the
understanding of
possible UGC
applications and
developments in
the future
- Does not
directly address
the
improvement of
the UGC meta-
content
generation or
UGC
functionalities
Wiki applications targeting enterprise usages
One particular research group addressed how to engineer and manage wikis[29,
30] towards particular usages (e.g. internal company usage vs. general audiences)[31]. In
this particular research topic, the authors examined how wikis can be used to open a
dialog between developers and customers regarding the products they provide. By
12
providing this open dialog, not only does it allow the customers to get in touch with the
corporation, but also offer helpful advice to other users[31]. Their overall approach is to
use wikis to drive a customer-centric business. This customer-centric approach makes
the needs and resources of individual customers the starting point for planning new
products and services or improving existing ones.
As such, wikis are used to enable customers to not only access but also change the
organization’s Web presence. Research projects such as these, creates opportunities for
joint content development and “peer production” of Web content[31]. An increasing
number of organizations are experimenting with the use of wikis and the wiki way to
engage customers[31]. Although this type research helps shed some light on the role
UGC applications are serving collaborative communities, it does not examine the
developmental nature of these UGC applications, or the improvement of the UGC meta-
content quality, which is the focus of this thesis.
Community based open source code developments
Another popular application for wikis is the open source code development
community, such as in the work “Adopting Open Source Software Engineering (OSSE)
Practices by Adopting OSSE Tools”[32]. In this study, the wiki’s usage in open source
community developments was examined. The key difference between open sourced
development wikis and customer engagement wikis is the technical involvement of the
users. However, it is the general openness and dynamic qualities of the wiki and UGC
applications that make it such an appealing platform for collaborative developments. Just
as with the customer engagement wiki research discussed above, the open source wiki
13
and UGC applications only serve to shed light on the improvements UGC can make
towards user collaborations, and not improvements on overall UGC development[32].
Benefit analysis of shaper usage in Wikis
Other researches addressed issues such as the analysis of current UGC
applications and trends. For example, one such research project, focusing on the analysis
of wikis, outlined the importance of shaping[33], which is an emergent role. A shaper is
anyone that who integrates or edits the work of others in wikis. This research focused on
how the role of the shaper naturally evolves in wikis and do not need to be assigned.
They also identified how critical the role of the shaper was in the behavior of the
wiki[33]. Although such findings can be used to benefit the understanding of UGC
applications, they do not directly address the ad-hoc development issues of UGCs or the
formation of its meta-content. This exemplifies the current approach in UGC research to
date, which is usually too specialized to be applied widely across all UGC applications.
Analysis of online encyclopedias
This particular research presents the results of a genre analysis of two web-based
collaborative authoring environments, Wikipedia and Everything2, both of which are
repositories of encyclopedic knowledge and are open to contributions from the
public[34]. Using corpus linguistic methodologies and factor analysis of word counts for
features of formality and informality, they show that the greater the degree of post-
production editorial control afforded by the system, the more formal and standardized the
language of the collaboratively-authored documents becomes, analogous to that found in
traditional print encyclopedias[34]. These findings shed light on how users, acting
through mechanisms provided by the system, can shape the features of content in
14
particular ways. The researchers further concluded on sub-genres of web-based
collaborative authoring environments based on their technical affordances[34]. This
particular research illustrated the user behavior of wikis under the constraints of the
technical tools they were provided. Although this illustrates some user behavior changes
based on technical constraints, the constraints are very specific and are not applied across
all UGC applications.
UGC Collaborative Tagging Applications
As illustrated in the UGC platforms table above, there are many non-text based
UGC platforms, unlike the wiki research cases discussed. Therefore, to enable various
features such as search, index and categorization for the users of these non-text based
UGC applications, the technique of collaborative tagging is used. This technique is
currently the state of art in UGC meta-content generation, and is used cross various
platforms such as Flickr, YouTube, MySpace, just to name a few. Historical meta-
content driven applications that were developed prior to UGC applications, such as image
recognition[18], had long validated the benefits of meta-content [35, 36]. But because
they were not able to provide a sufficient amount of meta-content manually, these
systems often failed when they were applied on a large scale level. Therefore, there is
currently a lot of research being done in data-mining community to simply mine for the
data they need from UGC applications[37].
These collaborative tagging applications also apply the same approaches as other
UGC applications for improving user loyalty and user experience [37-39], such as
ontologies[17]. The main interest in these types of research is booming in the mobile
multi-media industry’s, following their realization of the potential in UGC applications to
15
provide alterative means of rich web interactions for their users[40]. Since the research is
focused heavily on the effect meta-content has on the evolution of UGC applications,
collaborative tagging applications are of particular interest to this thesis work. However,
this thesis will illustrate below that the research in collaborative tagging, like the other
UGC applications discussed above, also do not directly address the ad-hoc developmental
issues of UGC applications.
Social network analysis
The infrastructure studies in UGC have been dominated by social networking.
Many researchers have placed their research efforts on understanding and utilizing social
network effects to increase the popularity a UGC application[27, 41]. Like other social
media sites, Flickr and YouTube allow users to designate other users as contacts and
track their activities in real time. The contacts lists from the social networking are the
backbone of many UGCs, such as Facebook or MySpace.
Many researchers believe that these social networks facilitate new ways of
interacting with information, e.g., social browsing. The contacts interface on Flickr
enables users to see latest images submitted by their friends. Through an extensive
analysis of Flickr data, they show that social browsing through the contacts' photo
streams is one of the primary methods by which users find new images on Flickr. This
finding has implications for creating personalized recommendation systems based on the
user's declared contacts lists[42]. These works tend to keep their focus on the social
network and its network effects to help maintain virtual communities in UGC
applications and not the functionality performance of UGCs [27, 43, 44].
Computer-supported social networks
16
Computer-supported social networks (CSSN)[27] were developed prior to current
UGC applications, and explored how communications networks foster in virtual
communities. CSSNs became an important bases of virtual communities and computer-
supported cooperative work in the 1990s. CSSNs fostered virtual communities that are
usually partial and narrowly focused, although some do become encompassing and
broadly based. CSSNs accomplish a wide variety of cooperative work, connecting
workers within and between organizations who are often physically dispersed. CSSNs
also link workers from their homes or remote work centers to main organizational offices.
CSSNs have strong societal implications, many of which could be applied towards
current social networking UGCs to gain further understanding of the social networks
people build in these applications[27], but do not shed any light on the other aspects of
UGC such as functionality performance or meta-content generation.
Generation of user profiles for information filtering
The concept of building user profiles is of great interest to social networking
websites and other UGC applications. The focus of this topic is on the creation and
update of user profiles[45]. Some of the methods include user-created profiles, which is
the most simple and natural approach, where the user specifies his/her area(s) of interest
by a list of (possibly weighted) terms. The specified terms are used to guide the filtering
process. Another common approach is the system-created profile by automatic indexing,
where a set of data items which have already been judged by the user as relevant, are
analyzed by software (using stemming algorithms), in order to identify the most frequent
and meaningful terms in the text. Additionally, there is the system-plus user-created
profile, which is a combination of the above two approaches, where initial profile is
17
created automatically (by automatic indexing), then, the user reviews the proposed profile
and updates it (by adding or deleting terms, and changing their weights)[45].
Other approaches include system-created profile based on learning by Artificial
Neural-Network (ANN). Based on a sample set of data items that have already been
judged relevant by the user, an ANN may be trained[45]. The inputs of the ANN are the
meaningful terms, and the outputs are the relevance judgments of the users. After
training, the ANN may serve as the user profile for future filtering. Another popular
method is Rule-based Filtering[45]. All previous methods deal with the creation of a
content-based profile. Contrarily, a rule-based profile consists of a set of filtering rules.
Questioning the user on his/her information usage and filtering behavior can generate
such rules[45]. An alternative method for creating a rule-based profile for a user is to
inherit filtering rules from user-stereotypes, similar to the inheritance of a content-based
profile. This topic focuses on improving the understanding of users and not the system as
a whole.
Visualization of Social network in UGC applications
In addition, there has been extensive work performed in the visualization of social
networks in UGC applications. The social networks visualizations are in part based on
the visualizations of the communications networks that they rely on[46]. In recent years,
the WWW have witnessed the dramatic growth in popularity of online social networking
services, in which millions of members publicly articulate mutual "friendship" relations.
Guided by ethnographic research of these online communities, a research group have
designed and implemented a visualization system for end-user exploration and navigation
of large-scale online social networks, which is illustrated below in Figure 2[46].
18
The design illustrated below builds upon familiar node-link network layouts to
contribute techniques for exploring connectivity in large graph structures of UGC
applications, supporting visual search and analysis, and automatically identifying and
visualizing community structures[46]. This research provided evidence of the system's
solid usability, its capacity for facilitating discovery, and its potential for engaging the
users in furthering their social activity. There are various other visualization tools
implemented in the collaborative tagging space, which this research will discuss in
further details in the below section.
Figure 2 Vizster depicting three intersecting social networks
19
Social Infrastructure of UGC Applications
Some of the work that deal with the social aspects of UGC applications focused
on the effect these applications have on the social consciousness of its users and using
those effects to increase user participation[47]. Some historical research work prior to
current UGC applications, such as forums, based themselves on much smaller groups,
with the intent of studying social consciousness in virtual communities. Some of these
researches focused on game theory as a means for groups in virtual communities to
function at a self-sustaining level. Other works, such as “Voices from the Well”, seek not
only to maintain these virtual communities, but also used them to collect data regarding
this new social terrain and to propose new means of driving users’ interactions[48].
This type of research has become particularly important in the fields of health
care online communities, where the users’ perception of trust, reputation, competence and
goodwill, which are currently determined by the users’ ratings and other references and
qualifications, have been shown to greatly impacts a system’s success[49]. Additionally,
the developments of new and unique UGC applications are growing rapidly, and there are
various fields where the application UGC is dependent on its social infrastructure. The
social infrastructure information can serve to aid the understanding of UGC user
behaviors, but not the UGC as a whole.
UGC Authoring for mobile multimedia content
There are a wide variety of different approaches for authoring UGC multimedia
content. There is a research project that compared two common approaches used in
mobile phone based UGC multimedia authoring[50]. In the first approach, the user
primarily creates completely new content. This may include capturing content with the
20
phone input devices, such as taking photographs or recording audio, as well as editing the
content with the phone. Examples of tools based on this approach include general-
purpose image, video, sound, and multimedia presentation editors. The basic design and
features of mobile tools using this approach are not so different from similar tools for
workstation environments. The second approach is based on creating new content by
personalizing or containing pieces of ready-made content. For example, a multimedia
message could he authored by selecting a ready-made message template and just
customizing the message text. A more complicated example might include authoring an
animation sequence by first selecting a ready-made animated character and then
composing a script that the character will perform in the animation from a set of ready-
made actions[50].
User interfaces in these kinds of UGC applications are usually fairly
straightforward and can be implemented through the browsers. One of the main
challenges in authoring tools based on ready-made content is the management of the
content. Users have different needs and like different styles of content. They also prefer
the content to the unique, so each piece of content is typically used only once[50].
Therefore, to keep the users interested in the tool, a very broad and continuously updated
selection of content is needed. This research found that the most straightforward way is to
keep all ready-made content in a single centrally administered place. This type of
research broadens the analysis of UGC content to include mobile interactions. However,
this type of research is too specialized to be applied towards the work in this thesis.
UGC generation and usage in video/computer gaming
21
“Modding” is an increasingly popular participatory practice where game fans
modify and extend officially released game titles with their own creations[51]. Research
in this area have introduced the reader to the diversity of user-created game content and
to the multifaceted online networks, referred to here as “modding scenes”, used for the
making and sharing of the “mods”. The organization of the modding scenes serves the
role of maintaining various online networks and managing developer involvement. Most
of these research topics are case studies devoted to a detailed analysis of the various
forms of user-created content for each particular game. In addition to the custom game
content itself, the tools modders employ in the creation of the custom content also play a
key role in their contributions[51]. The specific insights gained from studying the
particular modding scenes can be applied towards other UGC applications, but do not
assist in establishing a theoretical foundation for UGC applications.
Social Navigation, Social Maps, and Annotating Space
One common goal in mobile computing research is finding ways to make systems
more social, the below researches focus on evaluating the success of these efforts. Most
researchers believe that systems with a social element are often more dynamic, and a
better reflection of user concerns[52]. In one research topic, putting user-created content
in a tour guide resulted in a more authentic reflection of the space that is being toured.
This is particularly true when users are a cross-section of individuals with different
relationships to the space including both space experts and novices. These approach focus
on context-aware computing because the people who regularly visit a space know all
about how and when the space gets used and who inhabits it. What they say in and about
the space reflects that understanding. Another benefit is that user-created content gives
22
users more power and allows them to steer its use towards their own needs and
interests[52].
Systems that provide these capabilities allow people, “to collectively construct a
range of resources that were too difficult or expensive or simply impossible to provide
before”[52]. Most researchers believed that the content users contribute is likely to be
qualitatively different from the factual information an institution like a museum or
university administration would develop. The creators of GeoNotes, a location-based
social information system were thinking along these lines. They note that the “social,
expressive, and subversive,” qualities of content created by users may be more interesting
than content created by administrators which “tends to be ‘serious’ and ‘utility
oriented’[52]. A particular topic of interest in UGC is to test these assumptions.
Although these researches offer interesting information regarding the application of
UGCs, they are too specialized to be applied towards all UGC application development.
The research topics that examine existing UGC applications do not extend their
analysis to the developmental nature of UGCs, nor do they address the improvement of
UGC meta-content generation or UGC functionalities. The research studies that examine
the UGC’s social data are heavily focused on social networking, which only attributes to
the propagation of these applications and are not process-oriented. Therefore, they do not
directly address the improvement of the UGC meta-content or UGC functionalities.
Lastly, the development of new and unique UGC applications can assist in improving the
understanding of future UGC applications, but they also do not address the improvement
of the UGC meta-content generation or functionalities, nor extend into development
methodologies. The topic of collaborative tagging, which deals more directly with meta-
23
content, is discussed below. Although some of the research in collaborative tagging is
used in the analysis aspects of this thesis, it will be illustrated that these researches also
do not directly address the above issues that this thesis is focused upon.
2.2 Collaborative tagging
Collaborative multimedia tagging is an organizational technique that can be used
to create structure and order in any kind of information resource application. It has
recently gained a lot of attention for its application in UGC applications[53, 54] that are
changing the face of the web. UGC applications, such as YouTube[3] and MySpace[7],
have seen tremendous success due to their ability to use rich, user-generated meta-content
to further their users’ experience. Collaborative multimedia tagging is a meta-content
gathering technique that asks users to assign keywords (i.e. tags) to entities. It is defined
as collaborative because users are able to assign different tags to the same entity they
share. Since the focus of this thesis is the evolution of the meta-content, a detailed
understanding of collaborative tagging and its current state of the art research is essential.
User Generated Meta-content is the backbone of the current UGC movement
since it allows for new organizational methods (i.e. meta-content enabled search) not
available to traditional UGCs, such as online bulletin boards. Collaborative tagging is a
process where users add and share meta-content (i.e., tags) to other shared items such as
online photos. The most important contribution of this organizational method is the
concept of folksonomy[55], a term that was coined by Thomas Vander Wal. It is used to
describe what is essentially a user-generated social taxonomy, or “bottom-up social
classification”. Its characteristics are the bottom-up construction, a lack of hierarchical
structure, and the tags’ creation and use within a social context[56]. From a practical
24
standpoint, a folksonomy is a communal categorization or taxonomy that is formed
through the contributions of the users of a tagging system[57].
One of the most cited studies in this field is Golder and Huberman’s work on the
collaborative tagging system and UGC application, del.ici.ous
TM
[58]. The key
contribution of their research is the illustration of the folksonomy formation as observed
through the user data they gathered from del.ici.ous
TM
[39]. In their research, Golder and
Huberman illustrated that each tag’s percentage of use, particularly the popular tags,
tends to stay the same after a certain amount of user contribution (i.e. around 100 tags)
[58]. Therefore, their research illustrates the process of tag usage stabilization, which is
when folksonomy is formed. The causations they contribute towards the formation of
folksonomy are: limitation and shared knowledge.
In collaborative tagging systems, each entity has its own tag collection (i.e. tag
cloud); therefore, there is inherently a limit to the amount of vocabulary that could be
used to describe that entity. However, folksonomy was observed prior to the exhaustion
of the vocabulary, Golder and Huberman attribute this behavior to the shared knowledge
of users. For example, most common users would label a photo of a cat with the causal
words such as cat or kitten, rather than a more obscure term such as feline[58]. In the
theoretical framework’s functional complexity analysis section, this collaborative tagging
statistical approach is used to analyze the meta-content stability. However, collaborative
tagging is not a development methodology.
Collaborative tagging does not address the ad-hoc nature of UGC development
nor provide theoretical methodologies and strategies. It does focuses on the meta-content
aspect of UGC applications, as well as methods for improving the meta-content quality.
25
However, this thesis will illustrate that the current collaborative tagging research topics
do not approach the improvement of meta-content from with a system view, but mostly
rather from a vocabulary view. Therefore, the below approaches mostly attempt to
improve the meta-content quality without directly incorporating the users in the meta-
content improvement process. Otherwise, there has been some work focusing on
leveraging user behaviors, such as their social networks, to indirectly improve their meta-
content. However, these applications have been shown to be inconclusive regarding its
direct improvement on the meta-content quality and generation.
The research overview illustrates the types of work this thesis considered in
building the theoretical foundation. There are currently a few main approaches for
addressing the key challenges of collaborative tagging systems: formal taxonomy or
ontology, statistical and pattern analysis, social networking, and visualization[17, 59, 60],
as illustrated in Table 3 below. The direct method is the use of a formal taxonomy or
ontology. One way to use a formal ontology method with collaborative tagging is to
derive one from the tags of these systems through data mining[17, 58, 61]. Another
method is to employ ontology seeding[62], which embeds an ontology into the system
before the users begin to tag and usually asks the users for additional semantic
information to ensure that the tags they contribute follow the ontology’s conventions. The
main concern with ontology seeding is that it requires additional user contributions,
which may be difficult to convince the users to participate in when it is based on a
general purpose UGC application. The majority of ontology seeding research is being
currently being studied in eLearning research projects.
26
The main appeal of integrating a formal ontology and taxonomy is its ability to
give the meta-content (i.e. tags) a frame of reference, which would greatly assist in
reducing the inconsistency and ambiguity of the meta-content, otherwise known as the
tags. It would also reduce the haziness of these systems by giving them a hierarchical
structure, which removes the ambiguity caused by the confusion in abstraction levels.
Therefore, the application of an ontology could serve to improve the meta-content’s
quality. However, although formal taxonomy and ontology approaches directly address
the meta-content quality issues of collaborative tagging systems, they tend to be
relatively restrictive, which conflicts with the informality of folksonomy[8, 63].
Table 3 Collaborative Tagging Approaches
APPROACHES PROS CONS
Formal
taxonomy and
ontology
Directly address the issues
of meta-content quality
improvement. A well-
defined and experienced
field in databases and data
mining.
Can be too restrictive and conflict
with the informality of folksonomy.
Do not address the improvement of
the meta-content generation process
or UGC system functionalities
Statistical and
Pattern
Approaches
Illustrates important
information regarding user
generated meta-content and
do not require additional
user contribution
Do not directly address the
improvement of the meta-content
generation process and UGC
system functionalities.
Social Network
Approaches
Has shown promising
results in guiding users
behaviors and providing
additional user social
behavioral information
Do not directly address the quality
of the meta-content, or the
improvement of the meta-content
generation process and UGC
system functionalities
Visualization
Approaches
Has shown promising
results in guiding users
behaviors and providing
additional users’ visual
behavioral information
Do not directly address the quality
of the meta-content, or the
improvement of the meta-content
generation process and UGC
system functionalities
27
Additionally, some of the preliminary research data has shown that in tasks such
as search queries, ontology data-mining methodologies have not shown a significant
improvement in functionality performance over methods using only folksonomy[17].
Another popular methodology in collaborative tagging research is the use of
statistical and patterns analysis, such as the Golder and Huberman work discussed above,
which is used to find the key factors in folksonomy’s formation and stabilization[58].
Typical factors used in these statistical and pattern analysis methods are tag usage
frequency, popularity, ranking, and co-word relations. The reason these approaches are so
popular is due to the fact that they have been shown to work well in other general web
applications, such as the PageRank[64] algorithm of Google and the collaborative
filtering algorithm of Amazon’s recommendation algorithm[65]. Certain researchers have
found that, given the heavy presence and effect the users have in collaborative tagging
systems, concept-based analysis methods do not capture all of the behaviors of
folksonomy[66]. Therefore, there have been many user-focused research projects that
attempted to leverage the social networks of the users towards a better understanding of
how folksonomy is formed[38].
These social network methods are based on user, tag, object models[66], which
illustrate not only the relations within their own context but also to each other. This
allows researchers to utilize the social network of the users to validate the tags or
objects[67]. Furthermore, mapping tags with objects and users may assist researchers in
determining which tags the users consider to be synonyms[39]. Another approach that
utilizes information additional to tags to improve user behaviors is visualization, which
could involve anything from displaying the related objects to the users based on their tag
28
patterns to illustrations of the users’ social networks[68]. Both of these avenues of
research have produced an improvement in user behavior. However, because non-
ontology approaches are more indirect in how they address the challenges of improving
meta-content quality, it is difficult to validate whether or not they will improve the meta-
content generation process and functionality performance, such as tag enabled
multimedia search.
Ontology Approaches
In the data-mining ontology approaches, FolksAnnotation[69], a system that extracts tags
from del.ici.ous
TM
and maps them to various ontology concepts, as illustrated in Figure 3
below, found that semantics can in fact be derived from tags. Although the majority of
the tags don’t fall into a formal ontology definition, they do produce a lot of latent or
implicit semantics, which the researchers could use to build appropriate ontologies
from[69]. Furthermore, these researchers found that collaborative tagging is able to
generate a great deal of contextual information that allows for more flexible concept
mapping than standard metadata[69]. A similar study from Yahoo research also
advocates the ontologically driven filtering/data-mining approach as a choice between a
pure tag model and a pure taxonomy model. Their exploratory filtering method is
modeled after various traditional multimedia and text search algorithms developed prior
to collaborative tagging[70].
However, in the case of FolksAnnotation[69] and the Yahoo study, when a search
query test was applied to the ontologically mapped datasets, the improvement over the
unchanged folksonomy datasets was small, and therefore inconclusive. Also, the Yahoo
study concluded that due to the inconsistency of the users’ tagging, which is the result of
29
Figure 3 FolksAnnotation Architecture
an uncontrolled vocabulary, a search query test after the application of their ontology
methodology still produced a relatively high error rate[70]. Therefore, although the data-
mining methodology is simple and directly addresses the meta-content quality
improvement challenges of collaborative tagging, it has not been able to confirm whether
or not the resulting ontologies would yield better results in functionality performance.
Furthermore, these approaches do not address the issues of meta-content generation; only
the meta-content after it’s been generated.
30
Ontology seeding methods give researchers the ability to shape the user’s
metadata into the desired ontological or hierarchical structure, rather than search for it
like the data-mining approaches. In the Yahoo study, the researchers suggested that their
data-mining method could be improved by the use of a community support moderation,
which requires users to contribute further information, such as semantics, along with their
tags. In the “CommonFolks” Project[62], the developers seek to combine WordNet, an
existing English ontology, with the fast and easy metadata creation of collaborative
tagging techniques. The goal of their project is to build a set of ontologies for the
annotation of learning resources used in eLearning systems. Tagging in CommonFolks is
performed in a RDF format[62], with a predicate and an object to assign the learning
resource. This is a very important step to ensure that the metadata generated by
collaborative tagging stays consistent and can be appended in an “is a” relationship to
WordNet[62].
Other similar researches, such as the SemKey project[71], also used user
contributions to improve their ontologies. The users of SemKey are allowed to state their
semantic assertions for the resource while they tag. The ontological and hierarchical
structure of this project is based on Wikipedia[24] and WordNet[72]. The SemKey
system built an ontological and hierarchical base by combining the strengths of the two
applications to minimize their weaknesses and then incorporated them into the SemKey
application. This system makes use of Wikipedia’s hierarchical thesaurus structure, and
WordNet’s formal ontological structure. Therefore, when the users perform their tagging,
they are asked to add both types of this information along with the tag. The researchers
31
believe that incorporating Wikipedia’s hierarchical structure allows users to offer
semantic information that is not quite as strict as the format of WordNet[72].
There are currently many other projects, such as the Semantic Halo project[73],
that are attempting to derive sets of ontological information from additional user
contributions, particularly additional semantic contributions. The main attraction of this
approach is that it is able to maintain the unrestrictive structure of folksonomy and also
integrate more formal hierarchical and ontological structures into the metadata sets.
However, currently the researchers behind these projects have not published their
conclusions regarding whether or not this methodology can provide significant
improvement the quality of the meta-content in its formation of folksonomy. Also, it is
uncertain whether or not users of large-scale applications would find the overhead of
contributing semantic information appealing. Therefore, although ontological approaches
directly address the issues of meta-content quality, they are too restrictive or demands too
much overhead to be applied towards a theoretical approach. Furthermore, these
researches are more concerned with building a good ontology than the improvement of
the meta-content generation or functionality performance.
Statistical and Pattern Approaches
Golder and Huberman used a stochastic urn model, which was originally used to
represent how diseases spread and contaminate, to model the formation and stabilization
of folksonomy. From this model and an in-depth analysis of del.ici.ous
TM
tags, they
concluded that the reasons for the tag usage stabilization were limitation and shared
knowledge. They believe that vocabularies serve as limitations for users, since any
specific language offer a limited vocabulary for any particular entity. Additionally, they
32
concluded that most users tend to have some shared knowledge (i.e. most users would
naturally tag a resource with common overlapping synonyms). Therefore, tag usage for
any entity will stabilized over time. By studying the statistical behaviors of the
collaborative tagging system, Golder and Huberman’s analysis was able to illustrate the
key factors involved in stabilizing tag usage and forming a folksonomy[58].
This thesis utilized the Golder and Huberman analysis methodology in the
functional complexity analysis of the theoretical framework, which is discussed in detail
in section 4.3 below. Statistical and pattern methodologies have been shown to work
very well in general Internet indexing and searching, such as the Google PageRank
algorithm[64] or the collaborative filtering algorithm in Amazon[65]. In fact, researchers
have introduced methodologies such as the FolkRank[70] algorithm and the
“interestingness” algorithm modeled after their general web counterparts. The
FolkRank[70] algorithm was originally based on the popularity of the tags, but evolved to
using the “interestingness” algorithm[42], which takes a page from both PageRank and
collaborative filtering by tracking not only the popularity of the tags, but also the ratings
of the users who contributed them.
In this same vein, other researchers have investigated the formation of
folksonomy and tag usage behavior patterns based on other factors such as frequency of
use and co-word clustering[74-76]. These researchers were able to illustrate that tagging
actually mimics behaviors quite like classification in conventional classification systems.
Therefore, some of these researchers have proposed the integration of a field known as
Semiotic Dynamics, which studies how humans or agents use their communications to
form semiotic systems, by treating tags as the basic dynamic communication entity[77]. It
33
is their belief that these types of research studies can be used to investigate how to assist
users in “learning” how to improve the quality of the meta-content [78, 79] in
collaborative tagging systems.
The appeal of statistical and pattern approaches is that they have been shown to
work well in other general purpose web applications. They are also very apt at providing
various types of information to enrich the understanding of these collaborative tagging
applications. However, the improvements these approaches make towards the meta-
content’s quality and generation are relatively indirect and therefore difficult to control
and standardize. The observations gained through these approaches can be applied
towards aspects of a theoretical framework, but needs additional theoretical foundation,
which is discussed in section 4.1 below.
Social Network Approaches
The social network approaches incorporates social network knowledge into
collaborative tagging systems to improve the understanding of tag relationships[79]. One
study built various ontologies from tags based on concepts and communities[66]. This
research found that although concept-oriented ontologies conform much better to a
formal structure, they tend to lose track of the relevance of the tag or resource.
Therefore, some studies have built a community-based ontology utilized both the
metadata generated by the users and their personal social networks[80]. On a
collaborative tagging site which offers community support, such as Flickr, the users are
allowed to track and browse through their friends’ pages. Some of the current research
studies enable the users to map their own tags to other tags on their friends’ networks.
This so-called “matchmaker-based” recommendation system is able to convey social
34
concepts such as personal interest and trust, which can all be used to verify the validity of
a tag or object[80].
A research conducted by Mika concluded that concept-based ontologies are often
an inaccurate representation of the importance of the tag. Additionally, this study
concluded that an ontology based on a distributed community-based network contains
concepts that are “actually important to del.ici.ous
TM
users”[66]. Other works also
utilized community-based ontologies by mapping and tracking the community data and
incorporating them into the collaborative tagging systems[81, 82]. One of the key
advantages that community-based ontologies have over concept-based ontologies is that
the concepts are based on the strength of the links rather than the number of times the
concepts have been used as a tag. This is very valuable in accurately presenting the
importance of a tag, since concept-based ontologies can easily be swayed by a small
number of users repeatedly reusing a few tags and concepts[66].
In other social network approaches, many researchers attempted to build networks
from tracking the relations among the tags, users, and objects[83], which formed a
tripartite network [39, 81]. In these cases, folksonomy is usually described as the
formation of a collective understanding of an entity (e.g., shared photos, bookmarks)
through the communication of tagging[84]. Marlow and his fellow researchers advocate
the use of socio-technical designs[39] to resolve the collaborative tagging systems’
problems of controlling the meta-content’s quality. They proposed a model that illustrates
how to connect resources and users via tags, which illustrates the components of
folksonomy. They suggests that identifying certain portions of the users’ network that use
related resources, or tags, can be useful for inferring tag semantics and synonyms[39].
35
Although the field of social networking is growing rapidly, and has been shown to
dramatically change user behavior, its methods do not directly address the issues of meta-
content quality in the tags. Therefore, it is much more difficult to validate whether or not
these methods can truly improve meta-content generation challenges in collaborative
tagging.
Visualization Approaches
Some researchers have also incorporated the help of visualization tools[85, 86],
illustrated in Figure 4 below, such as a navigation map[68, 86] or displaying the social
network relations of the users. Works such as Cloudalicious[87] illustrate the tag clouds
or folksonomies as they develop over time for a given URL. Furthermore, other studies
Figure 4 Tag Visualization Example
36
have begun to incorporate the tag and entity relationship, such as displaying related
entities as hints to assist the user in finding the appropriate tags for each entity as they are
entering the tags. For example, one study introduced an approach that combined social
tagging and visual features in order to extend the navigation possibilities of image
archives. The user interface includes an intuitive visualization of tags, (i.e., related
images). Furthermore, this study made use of an editable navigation map to give the users
control over how the entities are related to each other and to the tags[68]. However, like
the social network methods, these approaches have been shown to greatly influence user
behavior, but not necessarily improve the meta-content generation or functionality
performance.
UGCs have come a long way since the days of bulletin boards and rudimentary
multimedia search engines. However, these applications still have a fair number of
challenges to overcome. Although collaborative tagging is currently a very popular meta-
content gathering method in information resource applications that has solved many
problems, it must improve its meta-content generation to ensure the confidence of its
users as these systems continue to expand. The research works discussed above mostly
address the meta-content quality and not the generation process itself. Formal taxonomy
and ontology approaches directly address the meta-content quality issues in collaborative
tagging systems. Furthermore, they can be directly applied towards other usages, such as
in the Large-Scale Concept Ontology for Multimedia (LSCOM)[88].
However, these approaches tend to be too restrictive and conflict with the main
appeal of folksonomy, which is its informality[8, 63]. Ontology seeding asks users to
provide further contributions towards the tags’ organization, which may not appeal to
37
users of large-scale web applications due to the overhead of such contributions. The
benefit of ontology seeding approaches is their ability to leverage the tags into a
hierarchical structure and still maintain the flexible nature of folksonomy[71]. However,
there hasn’t been enough testing performed to validate ontology seeding or data-mining
methods in the collaborative tagging field. In fact, there has been contradictory work
which illustrate that ontologies are not the most accurate representation of user generate
meta-content.
Statistical and pattern approaches such as Golder and Huberman’s work has
produced important information regarding the meta-content in UGCs and don’t require
any extra contribution from the users. But they require additional theoretical foundation
to be applied towards a theoretical framework. Social network approaches uses the social
networks of the users to improve the folksonomy, and visualization approaches attempt to
guide user behavior in enriching the UGC content. However, these approaches are more
indirect than the formal taxonomy and ontology approaches in how they addressing the
meta-content quality improvements and therefore require further validation.
Collaborative tagging has been shown to be a very influential meta-content
gathering methodology in information resource applications such as UGCs. In addition
to the approaches detailed above, there are various other technological advancements that
could serve to address the organizational problems of information resource applications,
such as ubiquitous mobile devices or AI methodologies that could contribute user
information indirectly or automatically, without requiring explicit user inputs[89].
Although the various non-ontological methodologies discussed above do not address
improvement of the UGC meta-content and functionalities evolution directly, they do
38
serve to further capture the user behaviors in UGC systems. Therefore, some of those
topics are referenced in the later sections when this thesis begins discussing the analysis
of the theoretical framework in section 4.2 and 4.3.
2.3 Software and Web Engineering
To address the ad hoc software developments in the 1960s, traditional Software
Engineering development models (e.g., waterfall) were formulated. They broke software
development processes into the following steps: requirements, design, implementation,
verification, and maintenance. The factors that impact the flow of these steps are the
schedule, budget, and user requirements of the system. Recognizing the rapid changes in
software development in the last few decades, current Software Engineering models (e.g.
Spiral, illustrated below), have changed those steps[90] to remedy the new complexities
and problems that have arisen with those changes. For example, the Spiral model iterates
through a set of requirement definitions, risk analysis, and prototyping stages, to improve
the understanding of the system requirements and limitations, before undertaking the
Figure 5 Software Engineering (Spiral Model)
39
implementation, verification, and maintenance steps[6, 14, 91, 92]. This new process
model allows the developers to discover and remedy problematic issues before
implementation occurs, which saves both time and budget.
Similarly, to address the Web development problems in the 1990s, Web
Engineering was developed. In Web Engineering, the key factors are the environment,
the corporate requirements, stakeholders, architecture, and breaking the project down to
sub-projects. The reason for this new methodology is because of the quick evolution of
Web technologies and customer demands. This has lead Web engineers to developed
specific process models targeting just applications[93], with examples such as the
WebComposition Process Model [90], which breaks down each application into
components for rapid and easy reuse and re-factoring. In addition, it used an open
interface that standardized the reuse management to ensure components developed by
different developers can still be reused. By componentizing a application’s internal
infrastructure, it makes adapting to new technologies and moving from legacy systems
much easier to address. Also, it makes adapting to new Web trends easier as well, since
those trends (e.g. popular types of user interface changes) only affect certain aspects of
the software [90].
Other Web Engineers focused on changing Software Engineering approaches to
better suit the design and implementation of applications. For example, a cross-
functional evolutionary process, as illustrated in Figure 6 below, which captures the rapid
evolutionary changes that occur in the requirements and constraints of a Web system, was
developed by reincorporating traditional Software Engineering process models into this
new model[94]. This was Web Engineering’s way of seeking to capture a software
40
Figure 6 Web Engineering (Cross-Functional)
development approach in application projects, given the quick evolution of Web
technologies and trends. Recognizing the challenge that rapidly evolving software
systems and large scale software systems presents to Software Engineering today, some
Software Engineering researchers have began to look for overlaps between systems and
Software Engineering to incorporate system engineering methods into Software
Engineering[95].
However, current Software and Web Engineering methodologies only address the
evolutions that occur based on the changes in requirements, schedules, budget, and
technology[5, 91] in the development phase. This is not the case in these UGC
applications, where the most important evolutionary changes occur in the usage phase of
the system (i.e. the generation of the users’ meta-content). Without the user-generated
meta-content, the system’s functionalities would not be able to deliver their intended
usage, which is illustrated in the photo-mapping example. However, by making users a
41
key stakeholder in enabling its functionalities, these systems incur a wide array of
collective action issues and complexities that causes the systems to fail. The lack of
fundamental understanding regarding the design, implementation, and maintenance
processes is what’s attributing to the current failure of these systems.
42
CHAPTER 3: THE FUNDAMENTAL RESEARCH PROBLEM
3.1 Functional Complexity Analysis
Unlike traditional software systems, MUEEWS have an ever-increasing, evolving
set of users & user-generated data, which create complexities that are very difficult to
manage in the “physical domain” (e.g. volume of data, complex communications
networks, and etc). In fact, given the evolutionary nature of MUEEWS, the physical
complexity[21, 96, 97] can grow exponentially. Due to these physical complexities, and
the lack of a solid theoretical framework for MUEEWS, the development processes of
MUEEWS are often treated in an ad-hoc manner without a clear strategy to deal with
these “complexities”. However, this section will show that if MUEEWS is approached in
terms of “functional complexity”[21], its development process can in fact be controlled
and designed in a systematical way.
Functional complexity is a “relative” concept: It evaluates how well one can
satisfy “what one wants to achieve” with “what is achievable (by what one has in
hands)”. For example, is opening a combination lock a complex problem? Only if you
do not know the combination; if you do, then the problem is quite easy. In other words,
functional complexity is the measurement of uncertainty in having a system to achieve
the desired functional requirements[98]. There are actually many different types of
functional complexity, all of which fits a specific model, and each model has specific
complexity properties that can be understood and remedied.
First, there is Time-independent Real Complexity (TRC), where not all the
function requirements can be satisfied at all times (e.g. conflicting requirements).
43
Second, there is Time-independent Imaginary Complexity (TIC), where there is a lack of
knowledge or understanding of the system itself, which often happens in systems with
decoupled or coupled functional requirements. Third, there is Time-dependent
Combinatorial Complexity (TCC), where the effect of future system events cannot be
predicted a priori (e.g. scheduling continuous shop jobs that affect each other’s outcome
down the assembly line). Lastly, there is Time-dependent Periodic Complexity (TPC),
where the uncertainty only exists in a finite time period (e.g. airline schedules are
periodic each day, but there are delays and unknowns during the day)[21].
If a system has Time-dependent Combinatorial Complexity, the system will
become “chaotic” or “uncontrollable” and eventually fail (see Figure 7). For such a
system to sustain, it must be transformed into a system into a Time-dependent Periodic
Complexity (see Figure 8) problem. This can be done by introducing a functional
periodicity into the system (i.e., a set of functional requirements of the system that repeat
itself periodically, which can be re-initialized at the beginning of each period)[97]. For
example, in the airline schedules scenario discussed above, the functional requirements is
a built-in reset, which is a few hours in the day where there no scheduled flights to allow
previous flights that were delayed to catch up. This prevents the uncertainties and
complexities of one day from carrying into the next, which would turn the problem into a
combinatory problem[97].
44
Figure 7 Time-dependent Combinatorial Complexity
Figure 8 Time-dependent Periodic Complexity
3.2 Functional Complexity Analysis of MUEEWS
MUEEWS is a time-dependent problem; since the evolution and enabling of the
evolving functionalities is dependent on the contributions the users make over time.
45
Unfortunately, most developers treat MUEEWS’ as a Time-dependent Combinatorial
Complexity, causing MUEEWS’ development process to be expensive, slow and ad-hoc.
Since every step in MUEEWS’ functionalities’ evolutions depends on previous stages of
the evolution (as reflected in the socio-technical cycles), trying to predict future states of
the functionalities in this continuous evolutionary process is the same as chasing a
moving target, which leads to combinatorial complexity over time.
Unlike the airline scheduling problem discussed in section 3.1, a simple rest
period would not dissolve the complexities that are produced in MUEEWS. However,
one can introduce other functional complexity properties to manage its complexity. One
such example is the friction between two dry sliding metal surfaces, which continues to
increases over time due to the following: the generation of wear particles, the
agglomeration of small particles into larger particles at the interface due to the externally
applied pressure, and plowing of the sliding surfaces by the agglomerated wear
particles[99]. This is another typical case of time-dependent combinatorial complexity,
which can be stopped by preventing particle generation and agglomeration[100]. To
disrupt this combinatorial process, the creation of undulated surfaces that consist of
pockets and pads has been applied. The pockets are created to trap wear particles and thus
prevent the particle generation and agglomeration process [99, 100].
Therefore, to reformulate MUEEWS from a combinatorial problem to a periodic
one, the developers must create a new approach that periodically mitigates the
complexity growth created by the ever-increasing amount of user-generated meta-
content. As discussed in the latter parts of this thesis, the theoretical framework will
inject means of capturing the complexities produced by the evolution, similar to the metal
46
surface example given above. The theoretical framework will therefore produce a
sustainable system that has a periodic functional complexity. The theoretical framework
will capture the user-generated meta-content evolution and the emergent user behaviors,
thus stopping them from growing uncontrollably. The theoretical framework and the
functional complexity analysis of MUEEWS after the framework has been applied will
be discussed further in section IV below.
47
CHAPTER 4: MASSIVE USER ENABLED EVOLVING WEB SYSTEMS
APPROACH
4.1 Theoretical Foundation and Analysis
As discussed in section II of this thesis, the current research in UGC and
collaborative tagging applications are primarily focused on application rather than
theoretical foundation. Many of the researches discovered very important factors to
consider in building a theoretical framework, but lacked the theoretical background to
make those factors applicable towards all UGC applications. Additionally, the focus of
current UGC research are on how to leverage the meta-content created in UGCs for other
usages, such as building ontologies, rather than the meta-content generation process or
the system functionality performance. The key in understanding the meta-content
generation and functionality performance of UGCs is to understand the users’
relationship not only with their contacts, which is reflected in social networking research,
but also their relationship with the system as a whole. Below, this thesis will explore
various theories from economics and social science that capture the users’ behaviors and
leverage them towards a theoretical framework.
4.1.1 Logic of Collective Action (LCA)
A collective utility, or a collective/public good, is defined as “any good such that,
if any person in a group consumes it, it cannot feasibly be withheld from the others in that
group.” In another words, even those who do not explicitly contribute towards a
particular good cannot be excluded from sharing the consumption of that good[101]. For
example, as the Flickr example given in Section I, even users who did not contribute any
48
tags cannot be withheld from the Tagged Search functionalities. Collective utilities are
just one component in the overall group behavior of MUEEWS, which will be use to
build a theoretical foundation for MUEEWS’ evolutionary process. Group behavior is a
topic that has long been studied in Social Science and Economics, in particularly, the
study of Collective Action focuses on the process of aggregating individual
actions/contributions into a collective entity.
The official definition of Collective Action is the pursuit of a goal or goals by
more than one person[102]. One of the most prominent works in the subject of
Collective Action is Mancur Olson’s “Logic of Collective Action”. The Logic of
Collective Action (LCA) states, “Rationale self-interested individuals in large groups will
not act on their common interests, even there are unanimous agreements in a group about
the common goal and methods of achieving it.” and “Unless the group of users is small,
rational, self-interested individuals will not act to achieve their common purpose.” It also
states that, “The only means of making a large group act in their own common interest is
through coercion or some special device” [101].
The way of implement this special device is known as a separate and selective
incentive. A selective incentive is an incentive that operates, not indiscriminately like the
collective utilities, which operates on the group as a whole, but selectively towards the
individuals in the group, as a reward or punishment for specific contributions[101].
Selective incentives are those member-only rewards given exclusively to the users that
actively participate in the group. In order to offer selective incentives, the group must
have complete distribution control over a particular resource, which means they can
prevent not only outsiders, but also members within the group from utilizing that
49
resource. For example, an incentive such as clean air is not a resource that a group can
have complete distribution control over.
There’re two common causes for why large groups do not naturally act in their
own common interest, and the cause is directly related what the type of group it is (i.e.
Market Groups vs. Non-Market Group)[103]. Market Groups refer to groups where the
collective utilities are of limited resources; hence any consumption by its members will
reduced the resources’ availability to the other members. Therefore, its group members
want to decrease the group size as much as possible since less group members mean a
larger share of the consumable resource for each member. This creates competition
among the group members and has been illustrated to fail with large groups in the classic
Prisoner’s Dilemma and other such examples[96].
Non-Market Groups refer to groups where the collective utilities are not limited
(i.e., non-rival and non-excludable). Therefore, there is no direct competition among the
group members for the resources and its group members naturally want to increase the
group size, since more group members mean more people to share the costs of producing
these collective utilities that benefit all. However, Large Non-Market Groups also have
problems getting its members to collaborate, and that is due to the concept of latent
groups[101]. A latent group is a group where any change made by a single member does
not explicitly impact the other members or the system as a whole, so there is no incentive
for the members to act. In these cases, “separate and selective incentives” are needed to
stimulate the rationale individuals of the group to act in the favor of the collective
utilities.
50
Latent groups with selective incentives form mobilized latent groups[101].
Mobilized latent groups use selective incentives to drive user contributions for the
accumulation of collective utilities (i.e., public goods). In addition, there are two other
types of large groups where the latency issue is controlled; Federal Group and
compulsory membership. Federal Group is a group that is divided into a number of
small groups, each of which has their own reason for joining with the others to form a
federation to representing the large group as a whole. The reason that Federal Groups do
not become latent groups is because they can utilize the small group structure to force
cohesion. Cohesion is difficult to achieve on a large scale level since the members
cannot be aware of everyone in the group, thus making it very difficult to apply a
cohesive force (e.g. peer pressure) across the board. Interestingly enough, large groups
usually begin as Federal Groups, because it is fairly difficult to drive a large group of
users to do something unless there is a structure of trust and etc already in place.
However, Mancur Olson illustrates that eventually even Federal Groups will
begin to lose its ability to overcome LCA dilemma when at the local level, the numbers
of members are becoming too large. Therefore, in order to achieve true longevity, all
large groups will eventually have to employ compulsory membership[101]. It is rare that
compulsory membership is the first means of organizing or driving a group to
collaborate. Because the formation of compulsory membership implies that there already
exists some instrument in the organization that enforces compulsory rules. However, it is
not possible for unorganized members to maintain a large group, even if they realized the
need for cohesion. The enforcement of organization is essential to all groups, especially
large ones, and compulsory membership provides that organization[101].
51
4.1.2 LCA’s Analysis of MUEEWS
MUEEWS clearly falls under Non-Market Groups, since there is no limited
resource that the members of Flickr, YouTube and etc, are competing for. However, even
though Non-Market Groups inherently want to increase group size, it still follows the
large latent group dilemma stated by LCA above. Therefore, in order for MUEEWS
developers to overcoming the LCA dilemma they must employ the LCA concepts as
discussed above. In this section, the various ways that these LCA concepts can/cannot be
employed will be discussed, as well as examples of how they are already being
employed, and the contributions they can make towards the theoretical framework of
MUEEWS.
A suggestion offered by many MUEEWS designers is that each MUEEWS must
occupy some unique niche (i.e. specialized functionalities) in order to succeed in the
current Web Services market[104]. This suggestion is one example of how the concept
of selective incentives and the control of resources are currently being applied in
MUEEWS development. An example of poor incentive models typically given to large
Non-Market Groups is online surveys, which are a very popular method used by
companies to gather user information. This is because they are inexpensive and relatively
simple from an administrative and maintenance point of view. However, they are ill-
suited for MUEEWS because they offer small or no private incentives, only the implicit
promise of collective utilities (e.g.., the survey results will be made available to users
who might find them useful in some ways later). A typical example is Please fill out this
survey so that this research can improve my customer service.
52
A variation of online surveys is online raffles, which have been used to fuel the
users’ contribution with the promise of a chance at a hefty prize. However, they often
fail with large groups because they are Market Group incentives, which depreciate in
value as the group size increase. Also, the biggest flaw of with both of these incentives is
that they do not offer any “secondary effects” (i.e. evolutionary effects), because they
cease to be useful after only one single iteration. Another ill-suitability of treating
MUEEWS like a market survey is that surveys are only good for gathering very simple
information, and often very few people take the time to answer the surveys accurately
unless strictly supervised. A correct way of using selective incentives is illustrated as
follows: In Flickr, the users must tags their photos to utilize the system’s tag related
functionalities. Otherwise, their photos will not be included in the Tag Search engine and
various other tag related functions’ considerations.
The Federal Group concept being used by the following existing MUEEWS
applications: For example, YouTube embeds a share/email to your friends’ functions for
each video you watch as soon as you finish watching. This is a common method used in
Social Networking[105]. In the case of the YouTube example, the small group’s reason
for joining the “federation” is the cohesive forces of their external social network of
outside of the MUEEWS. Therefore, harassing those external social networks, YouTube
can build a Federal Group structure embedded in its system. Beyond the external social
network of its users, MUEEWS systems like Flickr or YouTube also create internal
Federal Groups based on common interests and other subject matters. However, just as
Mancur Olson illustrated, as these groups grow in size, the federal group structure begins
to break down, and the users begin to call back into the paradox of LCA.
53
Compulsory membership is also a concept already employed in MUEEWS. For
every MUEEWS that exists, there is a specific set of guidelines the users must implicitly
follow just to use the software. They need to set up an account, contribute certain types
of data and etc just to utilize the software’s functionalities. Although the mandatory
nature of compulsory membership may concern some, compulsory membership is not
force onto the users. Only in the case where the users want to utilize the resources of the
system does compulsory membership come in to play. Interestingly enough, its been
observed that though Internet users are highly sensitive to the issue of privacy, especially
regarding government surveillance, users actually give away all kinds of information to
commercial sites without much thought[104]. Therefore, the information as demanded
by the compulsory membership component of MUEEWS has not met much resistance.
However, as stated in section 4.1.1 above, compulsory membership is generally applied
only once the system has already become a success, and is not effective in keeping the
users in the system while the system is still evolving.
Although these LCA concepts are being used in current MUEEWS, they are being
used without theoretical foundation or justification. Furthermore, these LCA concepts
are only necessary conditions for overcoming the collective action dilemma of LCA and
not a solution for how to overcome it. In Section 4.1.3, a new methodology called
Participatory Action Research (PAR) will be introduced. PAR is a methodology in
Social Science that attempts to motivate large groups by directly addressing each
individual member and incorporating their inputs into every part of the research process.
The similarities between PAR and MUEEWS and how to use PAR to build a theoretical
framework for MUEEWS will be illustrated. By doing so, the needed LCA conditions in
54
MUEEWS to ensure that MUEEWS can overcome the dilemma of LCA will be created,
which in turn will also manage MUEEWS’ functional complexity growth.
4.1.3 Participatory Action Research (PAR)
Participatory Action Research (PAR) directly attempts to address the latency concerns of
MUEEWS brought by the dilemma of LCA and provides the cyclic evolutionary
structure that is needed to remedy MUEEWS’ functional periodicity. PAR is a means to
mobilize a group collectively, such as a local community, to act in their own benefit. A
typical example would be an internal movement to educate the adults in lower income
communities[23]. PAR attempts to replace the traditional social science research models
where the researchers or civil service workers are directly assisting the community with
their problems. Instead, PAR finds the assistance action within the community, with the
social science researchers working only as guidance. There is a direct parallel between
MUEEWS and PAR, in both cases, the systems are attempting to overcome traditional
models where the developers/researchers are outside of the users’ community and the
users are treated simply as subjects[23].
PAR is defined as a "collective, self-reflective enquiry undertaken by participants
in social situations in order improves the rationality and justice of their own social...
practices"[106]. Furthermore, "the approach is only action research when it is
collaborative, though it is important to realize that the action research of the group is
achieved through the critically examined action of individual group members".
Reflection is one of the most important aspects of PAR, because it enables the
community members to submit their thoughts and feedbacks back into the system. It is
through the submissions of these thoughts and feedbacks that each member of the
55
community feels that he/she has an impact on the group/community as a whole[107],
which directly addresses the latency issues in the dilemma of LCA.
The exact process of PAR is a cyclic one, where the users collectively reflect,
plan, act and observe the actions of the community as well as their own. First, from their
reflection, which is the members’ analysis process of the situation at hand, the members
are able to generate a plan. Once a plan is in place, it will be put into action, and the
members will collect their observations while the action is taking place. Then, those
observations will be used to drive their reflections for the next interaction of PAR, which
again will repeat with the process of reflection, planning, action, and observation[108].
By consistently repeating this process and enforcing this structure, PAR is able to form
the means and organization for driving participation from the group members it’s
attempting to assist. This process is illustrated in the below diagram.
Figure 9 Participatory Action Research (PAR)
PAR is an emergent process. Typically, current PAR proceedings begin with
“expert research” and evolve into what is known as “true PAR”. In “expert research”
model, all authority and execution of research is controlled by the expert researcher. In
“true PAR”, authority over and execution of the research is a highly collaborative process
56
between the expert researchers and the members of the organization under study[23].
Furthermore, PAR is a process mostly controlled by local conditions, and one that
strongly encourages continuous learning on the part of the professional researchers and
the members of the organizations involved.
Beyond local knowledge, social science expertise is integrated into PAR to
generate new understanding for both sides PAR process (i.e. the researchers and the
community under research). Therefore, PAR is a trail and error, experimentation and
refinement approach to solving typical research problems[109]. The fact the PAR
actively engages the users and gives them a stake in their contribution in the knowledge
production of PAR gives them incentive to stay in the system. Eventually, as in “true
PAR”, the system will become one where the researchers operate as full collaborators
along side the other members. Therefore, PAR is an ongoing organizational learning
process created by emphasizing co-learning, participation, and organizational
transformation[110].
The key components of PAR[23]:
1. Collaboration: which takes place between the members of the group under
study and the researcher from the problem formation to the application and
assessment
2. Incorporation of Local Knowledge: which mostly includes the knowledge and
analysis contributed by the members of the organization that is under study
3. Eclecticism and Diversity (Multidisciplinary): which means the members can
all contribute information they believe to be relevant
57
4. Case Orientation (i.e. build theory from cases): which means building theory
and methods in PAR is mostly case-oriented activity, to draw comparison,
patterns, and operational concepts from repeated case applications
5. Emergent Process: which is not only evolutionary, but also intensifying, by
gaining dimension and depth as the research progresses.
6. Linking Scientific Understanding to Social Action: which means that during
the research process, both information on to solve the problems at hand, and
socially meaningful research results regarding the environment, behaviors,
and conditions of the members of the group, will all be gathered.
4.1.4 PAR’s Application in MUEEWS
PAR will be used to build the cyclic, evolutionary theoretical framework that
reflects the changes that MUEEWS’ functionalities undergo as the user-generate meta-
content evolves. This process will both directly address the LCA latency issues and
remedy the combinatorial functional complexity problem with current MUEEWS
framework. To find this theoretical framework, one will have to find the similarities
between PAR and MUEEWS. Then, keeping in mind the issues of the LCA latency and
functional complexity problems, apply those similarities into an evolutionary process to
serves the needs of those two issues. As one can clearly see, the key components of PAR
are also key aspects in MUEEWS.
Collaboration exists in both systems. Incorporation of local knowledge is what
sets PAR and MUEEWS apart from other systems of their respective fields (i.e. expert
research & developer driven Web systems). PAR uses user input to change the research
solution/actions and MUEEWS uses the user-generated meta-content to drive the
58
evolving functionalities of the system. Both systems promote eclecticism and diversity
(i.e. multi-disciplinary) approaches since they do not limit the type of user contributions
that are allowed in the system. And clearly, MUEEWS is not a theory driving process,
but rather a case directed process, as the users’ behaviors are not guided by some preset
theory (i.e. emergent user behavior) but by their goals and the factors of their
environment instead. Then there is the PAR emergent process, which is not only similar
to MUEEWS evolutionary changes, but also hoped to be duplicated for MUEEWS. As
for the linking of scientific understanding to social action, MUEEWS is heavily impacted
by the user’s social actions (i.e. social networking) as they affect the user’s inputs into the
technical system.
The arguments against implementing PAR in MUEEWS are most likely the
overhead in processing all the user feedbacks and the users’ need to contribute extra
information that the PAR process demands. However, there exist examples of Web users
informally utilizing the PAR process. In fact, many websites already asks for PAR type
of information, Amazon for instance, became famous for asking for user comments and
ratings on products, which is also currently one of its greatest strengths. In fact, it is the
reason that people choose Amazon as the broker for many small online retailers, which
users would have had a problem with purchasing from directly without the user
comments. Similarly to PAR, at every stage of iteration, MUEEWS users' contributions
are weighed and factored into the design and implementation stage for the next
functionalities of MUEEWS. However, the particular format required by PAR is too
strict for causal Web service systems such as MUEEWS, which doesn't and shouldn’t
59
exist in MUEEWS. However, even though the pattern will not be followed explicitly; the
components of PAR would have powerful implications in MUEEWS system.
Furthermore, the PAR process provides the behavior suggested in Section III
regarding the injecting of a functional periodicity. For each action that is performed in
PAR, much like an iteration of metal sheet example, there arises problems that need to be
addressed. Just as particles being cleared away to reduce complexity, PAR uses the
members’ inputs to clarify the plan for the next iteration of actions. As stated in section
III, in order to introduce functional periodicity, one must implement a set of Functional
Requirements that repeat periodically to reset the functional complexity of this
MUEEWS[111]. And as illustrated above, the way to periodically reset the functional
complexity of MUEEWS is by creating an evolutionary process like that of PAR’ in
MUEEWS.
Therefore, this thesis will define MUEEWS’ changes in terms of a cyclic and
periodic evolutionary process as modeled after the PAR process. For all the action
iterations in the PAR process, they will be redefined as implementation stages in the
functionalities’ evolution. Just as before an action can be performed in PAR, before the
functionality can be fully recognized, it needs to gather the users’ contributions (i.e. the
user-generated meta-content). However, unlike PAR, MUEEWS begins with a set of
functionalities, with evolutionary functionalities that are not operating at their full
potential. Since this initial set of functionalities, however limited, is still an action in the
redefinition, MUEEWS process begins with observation instead of reflection.
4.2 Theoretical Framework
4.2.1 Current MUEEWS Framework
60
In the Current MUEEWS framework, as illustrated below, the users are often
offered an initial function set at the beginning of the evolution of MUEEWS, which often
times do not function the way the application intended because of the lack of meta-
content. For example, when delicious
TM
first offered its search functionalities, it was
subpar at best. However, with the progression of time and the growth of its meta-content,
one can clearly see the improvement of its tag-enabled search functionalities. This is the
second stage of the MUEEWS evolution, the users’ usage of those function sets; which is
where the set of meta-content is generated. This set of meta-content, which were the tags
and titles the users assigned their bookmarks in delicious
TM
, enabled an evolution of the
function set, which is the tag-enable search.
Figure 10 Current MUEEWS Framework
61
However, the user generated meta-content often contains erroneous information,
such as misspellings, random symbols, and the various meta-content quality problems
discussed in the Collaborative Tagging Section 1.2. To improve the quality of their
search engine, the delicious
TM
development team has implemented various means of
eliminating bad content, such as removing misspelling and random symbols.
Furthermore, various research groups have been working on means of removing bad
content with the problems of ambiguity, inconsistency, and haziness, which were also
discussed in Section 1.2. However, these approaches have meet very difficult challenges,
also discussed in Section 1.2; since the developers have limited understanding of the
user’s intentions regarding the meta-content they contributed. In fact, this is a typical
example of the traditional researcher driven model discussed in section 4.1.3 above.
Therefore, during the phase where the developers implemented the evolutionary
changes to improve the quality of their MUEEWS system, much confusion arises since
the developers are not the ones who generated the meta-content. Therefore, when they
proceed to improve the system functionalities or meta-content quality, such as applying
ontologies onto the meta-content, or utilizing social networking to relate the meta-content
to each other, many of them fall short of fixing the users frustrations with the system.
The current state of delicious
TM
is a prime example of the traditional researcher driven
model, since various updates have been made to its meta-content in an attempt to
improve their search, but numerous user complaints remain. In fact, all collaborative
tagging applications, which utilizes user-generated meta-content to enable functions such
as search, all suffer from meta-content management problems.
62
In the theoretical foundation section above, this thesis illustrated the similarities
between MUEEWS and the typical social situations that require the PAR process. In the
current MUEEWS framework, the users only have access to their own meta-content, so
their control over the overall system is fairly limited. Just as discussed in the latency
issue of Logic of Collective Action, the users have very little incentive to improve the
meta-content since they cannot see the impact of their improvements. Therefore, even
though the users have first hand knowledge regarding the meta-content they generated,
very few in the current system would intentionally improve the meta-content. Therefore,
in the below section, this research will utilize the PAR process to build a new theoretical
framework for MUEEWS.
4.2.2 PAR-Based MUEEWS Theoretical Framework
As discussed in the Participatory Action Research (PAR) section, or section 4.1.3,
there are four stages in the PAR process; the reflection, planning, action and observation
stages. In the current MUEEWS framework shown above in Figure 10, the users’ usage
of the system is equivalent to the observation stage of the PAR’ process, since the users
are able to observe the changing state of the system’s meta-content driven evolution. The
meta-content generation stage is the action stage, since that’s the stage where the users’
changes are registered into the system. However, the current MUEEWS framework is
missing the reflection and planning stages. Therefore, to replicate the PAR process in
MUEEWS, this thesis work injected those two stages in the PAR based MUEEWS
theoretical framework below. This is the theoretical framework used to represent
MUEEWS’ evolutionary process, which captures the process of how users-generated
meta-content evolves and effects system functionalities.
63
The researcher/developer driven approach in the current MUEEWS framework
have already seen instances of failures as UGC applications are growing more complex
and its users demanding better performances. This thesis will show that the theoretical
framework illustrated in Figure 5 below, where the users drive the evolution, can greatly
improve both the meta-content generation and the functionality performance of
MUEEWS systems. Furthermore, the theoretical framework will provides a periodic
Figure 11 PAR Based Theoretical Framework
64
functional complexity by giving the users control over the evolution of the meta-content
during the meta-content’s generation phase.
This approach mimics the metal sliding example discussed above in section III by
removing the ever increasing complexities with an additional tool to achieve periodic
functional complexity. With the current MUEEWS framework, the evolution of the
meta-content is a combinatorial functional complexity. Since the developers must
determine when to implement the evolutionary changes based on how the meta-content is
evolving, each step of the evolution is dependent the steps before it. Therefore, the
developers are in fact chasing a moving target, which is a combinatorial functional
complexity problem.
To illustrate how the PAR based theoretical framework functions, this thesis will
use Flickr as an example. First, in Flickr, the users tagged their photos for easy access
with keywords such as cat, and etc, as step one (i.e. Use the Initial Function Set).
Second, the tags are associated with the image files as they generate their meta-content,
which is step 2 (i.e. Generates Meta-content from Usage). Third, Popular user tags can
be used as common keywords to index their respective images for an image search
engine, which is step 3 (i.e. Enables Evolution of Function Set). Fourth, unlike the
current MUEEWS framework, where the developers proceed to implement the
evolutionary changes, this new framework has the developers instead offer the user a
meta-content consensus formation tool. The intent of this tool is to give the users the
ability to reflect on the meta-content of the system and then plan for how to approach
their next meta-content generation.
65
The reflection and planning phases of PAR are illustrated through the steps of
viewing other users’ meta-content and then generating meta-content consensus
respectively. The addition of these two steps completes the four stages of the PAR
process. In this thesis work, the meta-content consensus formation is implemented as the
users’ determination whether or not to use the other users’ contributions as their own.
The implementation does not force the other users’ contributions on the users’ meta-
content generation. The users are simply able to reflect on the current state of the
MUEEWS’ meta-content generation and plan their next stage of action. The exact
implementation of this theoretical framework is discussed in the Simulation and Test
Environment section below. Furthermore, as discussed above, the implementation of the
PAR based theoretical framework changes MUEEWS’ meta-content evolution from a
combinatorial functional complexity into a periodic one, which will be further illustrated
in the MUEEWS Theoretical Framework’s Functional Complexity Analysis section
below.
4.3 MUEEWS Theoretical Framework Functional Complexity Analysis
4.3.1. Functional Complexity of MUEEWS Theoretical Framework
As discussed in the Functional Complexity Analysis and Theoretical Framework
sections above, the key for maintaining control of the functional complexity of
MUEEWS lies in finding its functional periodicity. Also, it was stated in the Theoretical
Framework section that the by applying the PAR-based MUEEWS theoretical
framework, the functional complexity of MUEEWS can be changed from a combinatorial
complexity into a periodic one. The current MUEEWS framework produces a
combinatorial functional complexity because the user-generated meta-content evolves
66
exponentially without a methodology to control the evolution. The current MUEEWS
framework implements MUEEWS’ evolutionary changes after the meta-content has been
generated. Therefore, it cannot remedy the functional complexities of the meta-content
generation process, which produces the combinatorial functional complexity of
MUEEWS.
In the current MUEEWS framework, the developers are the ones applying the
evolutionary changes. Since the users control how the meta-content is being generated,
the developers must determine when to implement these changes from outside of the
evolutionary system. This determination process is essentially chasing a moving target
(i.e. the evolution of the meta-content) and therefore generates a combinatorial functional
complexity. However, by implementing the theoretical framework discussed above, the
users are given the ability to control the way the meta-content is being formed during the
meta-content generation process. This is similar to the metal sliding example discussed
in section III above, the functional complexity of the meta-content’s evolution is being
removed as they are being generated; thereby forming a periodic functional complexity.
4.3.2 MUEEWS Functional Periodicity Determinants
There are essentially two factors that contribute to the functional periodicity of
MUEEWS: the User Usability Tolerance (UUT) and the Meta-Content Sufficiency
(MCS). The UUT measures a user’s usability performance during usage of the system.
The MCS distinguishes whether or not enough meta-content is available for the system to
function as intended. The UUT is a key factor in determining the functional periodicity
of MUEEWS because in order to ensure that sufficient meta-content is generated, the
user must continue to use the software. If the system’s functionalities become too
67
difficult for users, they simply abandon the system; ending their contributions to the
meta-content. The MCS reflects the evolution of the meat-content, which also impacts
the functional periodicity of MUEEWS. UUT and MCS are related because the UUT
growth in MUEEWS is shaped by the evolution of the meta-content. The stability of the
meta-content growth directly impacts the UUT growth, which is discussed further below.
4.3.2.1 User Usability Tolerance
The significance of the UUT factor in MUEEWS is best illustrated by using the
example of tagging. When a user first begins to use the tagging function in MUEEWS, it
is very easy to remember what tags he or she may have used for his or her entities,
avoiding overlap and confusion. But as time goes on and the meta-content he or she
contributed grows, it becomes increasingly difficult. Therefore, as the length of time and
the amount of meta-content in a MUEEWS system increase, the usability performance of
Figure 12 MUEEWS Combinatorial Complexity for UUT
Functional Requirements
Probability Density
Initial Functional Complexity Curve
Functional Complexity after a Phase in Evolution
Moving out of Design Range
UUT Limit
Design Range
68
the user also worsens for the tagging functionalities. The usability values used to
determine the UUT, such as user task time and incompleteness, which is discussed in
further in the Detailed UUT Analysis and the Simulation and Test Environment sections
below, will increase dramatically.
There is a certain limit of usability difficulty that users are willing to tolerate
before simply abandoning the system, as illustrated in Figure 12 above. Therefore, if a
system’s functionalities exceed the UUT limit of the users, they will begin to abandon the
system because it’s too difficult for them to perform their tasks effectively. In the cases
where the UUT behavior exceeds the UUT limit, the MUEEWS system will cease to
grow and eventually be retired from market. Therefore, in order to control the growth of
usability performance problems, one must contain the usability difficulty growth in the
MUEEWS by creating a functional periodicity through the application of the theoretical
framework, which was discussed in detail in the Theoretical Framework section above.
The meta-content’s evolution is what creates the fluctuations in the user’s usability.
Therefore, to carry out prevention methodologies for excesses to the UUT limit, the
developers must find a way to manage the meta-content’s growth.
As discussed in the Theoretical Framework section, to manage the meta-content
growth, the developers can follow the PAR based theoretical framework of MUEEWS.
That will create stability and control in the meta-content growth, which will in turn
contain the UUT growth. The periodic functional complexity created by the theoretical
framework is illustrated in Figure 13 as the UUT moving out of the design range of the
functionalities being used and then moving back as the MCS stabilizes. However, as this
research will discuss further below, the UUT cannot be reset until there is enough meta-
69
content to stabilize the meta-content’s growth. Therefore, for each entity in the
MUEEWS system, there is initially a functional complexity growth in the UUT that
moves it out of the design range of the functionalities. Then, as the meta-content
becomes sufficient enough to stabilize, the UUT growth will move back to the design
range as illustrated in Figure 13.
Figure 13 MUEEWS Periodic Complexity for UUT
Functional Requirements
Probability Density
Initial Functional Complexity Curve
Functional Complexity after a Phase in Evolution
Functional Complexity after a Phase in Functional Periodicity
MCS Limit
Design Range
Moving out of Design Range
Moving back into Design Range
As discussed in the Theoretical Framework section above, the key in producing a
functional periodicity lies in a user’s ability to form a consensus with other users as he or
she progresses with his or her own tasks in the system. This consensus formation
requires giving users the ability to review the state of the system, the impact of their
contributions on the system, as well as the others users’ contributions before they
progress to the next stage of meta-content generation. In the Simulation and Test
Environment below, it will be illustrated how critical the consensus formation is in
70
maintaining the functional periodicity of MUEEWS. The application of the theoretical
framework of MUEEWS helps the users see the strengths and flaws of the meta-content
that has been provided, which allows them to actively promote the stabilization of meta-
content generation.
The consensus formation process improves the quality of the meta-content and
makes the generation process more stable, which is illustrated the Results and Validation
section below. The exact implementation of this user consensus formation will be
discussed further below in the Simulation and Test Environment section. As stated
above, the UUT’s complexity growth is dependent on the MCS’ complexity growth.
Therefore, the theoretical framework’s improvements on the meta-content’s stability and
MCS, also leads to an improvement for the UUT complexity. In the implementation
section of this thesis, the user consensus formation is restricted to one entity at a time to
keep the task simple for the user. Implementations and tests will be used to validate the
theoretical framework, which is discussed below in the implementation and test sections.
4.3.2.2 Meta-Content Sufficiency
Similar to the UUT, as users use the MUEEWS software, the meta-content will
grow uncontrollably, as illustrated in Figure 12 for the UUT. The growth in meta-content
promotes the growth of its MCS. However, unlike the UUT, the growth in MCS
increases a user’s ability to leverage user-generated meta-content in ways that will
improve usability performance. For example, if users cannot remember the exact tags
they generated for an entity, they can simply use synonyms to search for the entity.
However, in order for the synonyms to be associated with the entity, either someone else
or the user himself must have tagged it in the past. This requires a minimal limit of
71
sufficiency for the meta-content, unlike with the UUT limit, which is a maximum limit.
However, visualization of both the UUT and the MCS limits would be similar to what is
shown in Figure 12.
Also, unlike the UUT, the exceeding the MCS limit does not directly threaten
system functionality performances. However, as illustrated in the MUEEWS’ operational
functional complexity section below, it is essential to reach the minimal limit of MCS as
quickly as possible due to its relationship with UUT. In addition, once the MCS limit is
achieved, further growth of the meta-content would not drastically change the overall
state of the MCS of the system. As this thesis will illustrate in the Detailed MCS section
below, when the meta-content of an entity has stabilized, the growth patterns of the MCS
remain the same throughout the usage of the system. Therefore, reaching the MCS limit
signals the stability of the meta-content; this in turn can be used to improve the UUT
performance, which creates the periodic behavior illustrated in Figure 13. The
relationship between UUT and MCS is discussed further below.
4.3.2.3 MUEEWS Operational Functional Periodicity
As discussed above, the relationship between the MCS limit and UUT limit
determines the functional periodicity of MUEEWS’ theoretical framework. The UUT
limit cannot be brought back into the design range of the functionality’s space, as
illustrated in Figure 13, until enough meta-content has been provided. For example, in
order for most users to find an entity based on its tags, the tags must reflect those users’
internal definition of the entity. For the massive number of users in MUEEWS, such a
definition would have to one that covers all of the users’ internal vocabulary associations
with that entity. Until that is achieved, the UUT limit for tagging and searching would
72
continue to grow out of the design range. The growth in UUT is caused by the growth in
meta-content. For example, unstabilized meta-content can cause searches to return
incorrectly associated entities, which would further confused the user than if the tags
hadn’t be contributed. Therefore, it is essential to meet the MCS limit requirements as
soon as possible to avoid exceeding the UUT, which is illustrated in Figure 14 below.
Figure 14 MUEEWS Functional Periodicities
Functional Requirements
Probability Density
Initial Functional Complexity Curve
Functional Complexity after a Phase in Evolution
Functional Complexity after a Phase in Functional Periodicity
MCS Limit
Design Range
Moving out of Design Range
Moving back into Design Range
UUT Limit
Operational Functional Periodicity
The UUT and MCS limits are the determinant for the functional periodicity of
MUEEWS. In the Simulation and Test section, the means of leveraging meta-content to
achieve the MCS limit and therefore improve the usability performance will be discussed
in further detail. Since the functional complexity of MUEEWS is determined by the
UUT and the MCS, its periodicity is the combination of their periodicities, which is
shown in Figure 14 above. The acceptable or operational functional periodicity of
MUEEWS lies in the range between the UUT and MCS limits. Of course, depending on
73
the particular application, both the shape and curvature of these figures can vary
dramatically from what is shown here. In fact, developers for each MUEEWS must
evaluate these curves on a case-by-case basis.
But one thing is certain; the MCS limit must come before the UUT limit in order
for the functional periodicity to remain operational. An ideal example of the MUEEWS
functional periodicity is illustrated in Figure 14 above. In instances where the UUT limit
comes before the MCS limit, it is infeasible to find an operational MUEEWS functional
periodicity. This particular scenario is illustrated in Figure 15 below. In such cases, the
MUEEWS is guaranteed to lead to system failure since users will leave the system when
the UUT limit is passed. The meta-content generation, which provides the means for
meeting the MCS limit, will come to a stop. Therefore, the system will never achieve the
MCS required to move back into the operational design range as illustrated in Figure 14
above. The system will then become a combinatorial complexity problem.
The close integration between the MCS and UUT limits is the reason that the
current MUEEWS framework, which applies meta-content improvement strategies from
outside of the meta-content generation process, leads to system failures. Although these
strategies can potentially improve the quality of the meta-content and meet the MCS
limit, they are unable to ensure that the system can avoid exceeding the UUT limit. Since
these developer-driven strategies are implemented outside of the MUEEWS system, the
developers are often unable to estimate if UUT limit will already be exceeded when the
evolutionary changes they implemented takes effect on the system. If the UUT limit has
already been reached, the users have already stopped generating new meta-content, which
represents the failure of the system.
74
Figure 15 Infeasible MUEEWS’ Functional periodicity
Functional Requirements
Probability Density
Initial Functional Complexity Curve
Functional Complexity after a Phase in Evolution
Moving out of Design Range
UUT Limit
MCS Limit
Infeasiable Functional Periodicity
Design Range
To remedy cases such as the one illustrated in Figure 15, this thesis applied the
theoretical framework discussed above. Since this methodology improves the quality of
the meta-content during the generation phase, it greatly increases the likelihood that the
MCS and UUT limits will fall in the operation functional periodicities shown in Figure
14 above. By taking the growing meta-content and leveraging it to improve its own
quality, UUT and MCS limits can kept within the operational range, making the evolving
MUEEWS system manageable. The UUT limit is not exceeded because the users have
sufficient meta-content to meet the new needs they are struggling with due to the
evolution of the meta-content. The MCS limit is met more quickly because the users’
contributions are being leveraged to improve the quality and stability of the meta-content.
To ensure that the MCS and UUT behaviors follows the pattern illustrated in Figure 14
75
rather than Figure 15, one must further analyze both the MCS and UUT limits, which is
discussed below.
4.3.3 Detailed Analysis of User Usability Tolerance
In most cases, the UUT curve is the main limiting factor or determinate for a
MUEEWS’ Functional periodicity. This is illustrated above in Figure 14 and 15 in
Section 4.3.2. One can allow the meta-content to grow as long as its limit occurs before
that of the UUT. User complexity and usability are subjects that have been researched
extensively in the fields of User Interface (UI) and Human Computer Interaction (HCI).
Usability is the measurement of the ease with which people can employ a particular tool
or other human-made object in order to achieve a particular goal. User complexity is the
complexity of a device or system from the point of view of the user. Both of these two
subjects are affected by user experience, user interface design, and many other physical
and psychological factors. In the case of MUEEWS, the consideration rests with
usability. The main focus of the UUT is the user’s inability to continue his use of the
software once usability becomes so high that he or she can no longer use the software to
achieve his goals.
In other more specifically targeted applications, such as medical 3d visualization,
the topic of usability is much more confided and definite, as is defined by a range for the
manipulation precision that the 3d simulation environment should have[112]. But this is
not the case in MUEEWS. It is necessary that MUEEWS develop functionalities with a
universal usability. Universal usability refers to the design of information and
communications products and services that are usable for every citizen. Professor Ben
Shneiderman has advocated this concept. He has provided a more practical definition of
76
universal usability: “…having more than 90% of all households as successful users of
information and communications services at least once a week.”[113] Usability testing
generally involves measuring how well test subjects respond in four areas: time,
accuracy, recall, and emotional response[114]. Although time and accuracy are fairly
easy to measure, recall and emotional response are by far more subjective[115].
The time factor essentially measures how long it takes a user to complete the
outlined tasks. Accuracy generally measures how many mistakes users make while
trying to complete these tasks. Recall is reflected by how much a person remembers after
periods of non-use. And emotional response measures how a person feels about the tasks
they completed[116]. Recall and emotional response are the most difficult factors to
quantify for the UUT factor, since the response can be very subjective. A typical
example of how these usability factors are being used today is illustrated in the
navigation pattern and zoomable interfaces studies performed by Hornbaek, Bederson,
and Plainsant, in which they defined usability factors in the following way.
Accuracy was calculated as the number of answers that were correct, partially
correct (i.e. some of the steps of the tasks were followed correctly), and wrong. For the
recall task, they measured the number of correct indications, corrected with a penalty for
guessing (the number of wrong guesses divided by the number of wrong answer
possibilities for the question). Task completion time was measured by the time subjects
used to complete the actual task, which does not include the time used for the initial
reading of the task, as well as the time used for entering answers. Preference (i.e.
emotional response) was determined by a subject’s indication of which interface he
preferred using and by the reason a subject gave for his indication[117].
77
MUEEWS’ UUT is evolving, and thus so is its usability. As users contribute
more and more, the MUEEWS becomes harder to use. This is largely due to an increase
in the required user experience (e.g. the need to remember many of the users’ own
defined tags). The user interface can also become more complex and cumbersome (e.g. if
the users’ own defined tags can be found through browsing, then there are more tags to
browse through). Usability evolutionary changes can be tracked remotely using the
usage data logs, from which one can extract the usability factor measurements. Although
there is an assortment of Web log analysis tools available[118], most of them, like
NetTracker, Webtrends, Analog, and SurfAid only provide limited statistical analysis of
Web log data[119]. For example, a typical report has entries in this form: During this
time period t, there were n clicks occurring for this particular Web page p[120].
New products use more sophisticated and complex analytic means but are generic,
require important manual intervention, and often resort to sampling due to the large size
of Web logs[120]. The most commonly used method to evaluate access to Web
resources or user interest in resources is to count page accesses or “hits”. However, this
is not sufficient and often not correct[120]. Web server log files customarily contain the
domain name (or IP address) of the request, the user name of the user who generated the
request (if applicable), the date and time of the request, the method of the request (GET
or POST), the name of the file requested, the result of the request (success, failure, error,
etc.), the size of the data sent back, the URL of the referring page, the identification of
the client agent, and a cookie—a sting of data generated by an application and exchanged
between the client and the server[120, 121].
78
Therefore, extensive data preparation is needed before usability information can
be extracted via the various data-mining algorithms[118]. To do so, it is essential to
identify a set of server sessions from the raw usage data. This allows for the following
information to be extracted: the exact accounting of who accessed the site, what pages
were requested and in what order, and how long each paged was viewed[118]. Only
through this session data would this research be able to extract the usability data of time,
accuracy, and recall, since each user’s usability factors are unique and each session must
be uniquely identified. The emotional response will be measured by ratings and
comments[65], as is the case with most current MUEEWS. These four factors will be
combined into the UUT analysis measure. The exact implementation of these usability
performance factors will be further discussed in the Test and Simulation section below.
4.3.4 Detailed Analysis of Meta-Content Sufficiency
MUEEWS’ uniqueness lies in its ability to harness user contributions and utilities
to enable its functionalities and/or generate new functionalities. Prior to MUEEWS,
historical applications, such as image indexing and search, attempted to achieve the same
purpose by harnessing inputs from a small sample user group or by hard-coding the
necessary meta-content. The user groups tended to be, at most, in the hundreds or
thousands and the hard-coded data was either insufficient or failed to reflect the users’
real needs[36]. Other studies performed their analysis with raw data, but the data was
often too difficult to process efficiently and effectively to provide the targeted
functionalities, such as image differentiation [122]. These historic applications are prime
examples for insufficient meta-content levels, which is why those studies failed to gain
much ground until the introduction of MUEEWS [123, 124].
79
Since MUEEWS systems’ content is produced by a massive number of end-users
rather than a few application developers or maintainers, the sufficiency of the meta-
content is entirely dependent on their contributions. Since the users of a MUEEWS
system are simultaneously the consumers and the generators of the meta-content, the
MCS must both be determined and achieved by the users themselves. Therefore, the key
in establishing order in the meta-content is to create a method for meta-content creation
guidance as users perform their contribution tasks. This reflects the PAR process
discussed in section 4.2.2, where users reflect on their own contributions and others’
contributions before moving onto the next task. This reflection process allows users to
form a consensus on the meta-content that is currently available before generating more
content. This process and its effects are discussed in further detail in the Results and
Validation section.
4.3.4.1 Meta-Content Stability Statistical Analysis
As discussed in research background section above, one of the most important
contributions to the study of UGC and collaborative tagging systems is the concept of
folksonomy, which was first coined by Thomas Vander Wal. It is used to describe what
is essentially a user-generated social taxonomy, or “bottom-up social classification.” Its
characteristics are the bottom-up construction, a lack of hierarchical structure, and their
creation and use within a social context[56]. From a practical standpoint, a folksonomy
is essentially a communal categorization or taxonomy that is formed through the
contributions of the users of a tagging system. Several studies, such as Golder and
Huberman’s work in the del.ici.ous
TM
system, illustrated the formation of folksonomy,
which created stability & organization. Similar studies were performed on Flickr, and
80
also illustrate that folksonomy can be formed through this process. Furthermore, these
studies also illustrated that the formation of folksonomy in fact creates stability.
One of the most cited works in folksonomy research is the del.ici.ous
TM
tags study
performed by Golder and Huberman. The key contribution of their research is the
illustration of the folksonomy formation as observed through user data they gathered
from del.ici.ous
TM
. In their research, Golder and Huberman illustrate that each of the
tag’s percentage of use, particularly the popular tags, tend to stay the same through a
certain amount of user contribution (i.e. around 50-100 bookmarks), as shown below in
Figure 16. This research illustrates that after a certain amount of contribution, the tag
usage behavior stabilized, which is when folksonomy was formed. This is also how this
research will determine the stability of the MCS, which will be further illustrated below
as well as in the latter sections below.
Figure 16. Tag Stability Statistical Analysis
Number of Bookmarks as a Measurement of Time
Percentage of Tag Use
Usage Percentage of one particular tag (e.g. cat)
Percentage of another tag (e.g. kitten)
Tagging Usage Stabilized
81
In Figure 16, the X-axis is illustrated as increments of time based on the number
of bookmarks added to one website, such as www.kitten.com, by many different users.
The Y-axis is the usage percentage of a tag, which is the number of times a tag is used
divided by the total number of times that all tags are used. For example, for
www.kitten.com, the tag cat may be used 20 times and the sum of the number of times
the other tags such as kittens or feline is used is 80 times. The percentage of tag usage for
the tag cat would then be 20%. At first, the tag growth and usage behavior in
del.icio.us™ seems chaotic, which can be seen in the section before the tagging usage is
stabilized. But with sufficient amount of tag contribution, the users on the site were able
to converge on a set of common popular tags. After 100 or so common tags were
defined, tag usage percentages stabilized even though the popularity of the bookmark
continued to grow.
However, Guy and Tonkin conducted a small-scale study and found that 40% of
tags in Flickr and 28% of tags in del.icio.us™ were erroneous. Furthermore, although
folksonomy is a naturally occurring phenomenon, it is currently not being leveraged to
improve the MCS or the UUT of the MUEEWS systems. Also, since there is no explicit
user consensus formation regarding the usage of the tags, the users are not aware of the
improvement or damage they are potentially doing to the system with their contributions.
Therefore, the users’ poor contributions often begin affecting the system and frustrating
themselves and other users before this naturally occurring phenomenon of folksomony
can form. One can only imagine that these problems will become even more prominent
with the expansion of these systems in more complex content, ever changing application
bases, and ever growing user expectations.
82
Therefore, it can be observed that user consensus or folksonomy in MUEEWS
systems is naturally achievable, but often doesn’t occur before the users have reached the
UUT limit of the system. The reason is that it could potentially take a long time for users
to establish MCS overtime, unless the entity is extremely popular. This is true especially
when users are not informed about the systems’ contents, other users’ contributions, and
even their own contributions and their effect on the system as a whole. In cases where
folksonomy is not achieved, the system’s functionalities will not perform as intended,
which often creates more and more usability performance problems for users as time
progresses. These problems occur because poorly tagged entities confuse users, and
more and more of them are generated as time progresses.
Many MUEEWS system designers have already seen the effects of this
phenomenon and have attempted to remedy it through various means. For example, in
Wikipedia, users are asked to actively participate in editing contributed content by others
and even to help to make sense of the contributed content. Wikipedia also actively posts
new empty topics to drive user participation, which could very well provide means to fill
the gaps in the folksonomy formed in collaborative tagging systems. These are all means
targeting the MCS problem of MUEEWS systems, which stems from a lack of user
consensus formation at necessary times. This topic and the resolution proposals are
discussed in further detail in the next section where this thesis will clearly define the
hypotheses of this thesis and the methods of validating it.
4.4 Hypotheses of the Simulation and Test Environment
4.4.1 Hypotheses of MUEEWS Theoretical Framework
83
The functional complexity analysis of MUEEWS’ theoretical framework,
described above, established that finding the limits of the MCS and UUT is critical to
maintaining the functional periodicity of a MUEEWS system. If the MUEEWS
application is not maintained within these limits, the system is vulnerable to performance
degradation during the usage phase of the software system. As discussed in the
functional complexity analysis, the degradation is caused by the system’s inability to
adjust to the evolution of the user generated meta-content. The evolution of the meta-
content occurs from both the users’ progressive addition of more user-generated meta-
content and their responses to the shared meta-content from other users. However, since
there is no guidance for this evolution, the changes that occur in the user-generated meta-
content can be sporadic, leading to instability, system performance degradation, and
overall system failure.
Since the meta-content evolution occurs in the usage phase, in particular, the
generation phase, stabilizing the meta-content during the generation phase is crucial for
addressing the degradation problems. To promote meta-content stability, users must be
guided in their generation of the meta-content so that it is produced in a consistent and
stable way. As discussed in the Theoretical Framework section above, in order to
overcome the LCA latency issues in massive user systems such as MUEEWS, users must
reflect on each other’s contributions as these contributions (i.e., the meta-content) evolve.
It is very difficult for the developers to provide such guidance because MUEEWS
systems contain millions of users, all with different styles, purposes, and needs. Such
variation results in disparate views regarding how to associate meta-content with shared
content.
84
Therefore, the users themselves must be able to guide each other in their meta-
content generation. However, since the instability of the meta-content is created by
different ideas on how to associate meta-content with the content they shared, users must
be able reach a consensus as they guide each other. The hypotheses of this research,
which will be illustrated with the test scenarios in the Test Scenarios and Validation
section below, is that in order to ease the instability of MUEEWS, users must form a
consensus as they guide each other during meta-content generation. The purpose of the
simulation and test environment, which is also discussed in more details below, is to
show that injecting a reflection tool in the form of a consensus formation tool as the users
generate meta-content can ease the system performance degradation of MUEEWS.
In section 4.3.2.3 above, it was shown that the usability values of the MUEEWS
must be kept under the UUT limit or the MUEEWS functionalities will move outside of
its Functional Requirements range and frustrate the users to the point where they will
abandon the system. However, it was also shown in section 4.3.2.2 that the meta-content
must reach a sufficient level before the MCS stability can be established. Therefore, to
improve a MUEEWS system’s chances of system failure, one must address both the
meta-content stability issue as well as the UUT performance issues of the current
MUEEWS framework.
Therefore, there are two hypotheses that this research will attempt to validate.
The first to illustrate that with the new theoretical framework of MUEEWS, which
focuses on the insertion of a reflection stage, can dramatically improve the stability of the
meta-content, which is also the MCS. The exact implementation of the reflection stage,
which is in the form of a consensus formation tool, will be discussed in further details
85
below. The second hypothesis will illustrate that the improvement of the MCS also
creates an improvement in the UUT performance and therefore, decreases the chance of
system failure for MUEEWS. The simulations and tests below will show that allowing
users to cooperatively reach a consensus promotes stability in the meta-content and the
MUEEWS system overall. As illustrated in the below section, the users were given a
simulation and test environment that recreated a MUEEWS environment with targeted
applications and test scenarios to illustrate these hypothesis.
4.4.2 Hypotheses Analysis with MUEEWS Functional Complexity
To further examine these hypotheses, this section will discuss the relationship
between these hypotheses and the functional complexity of MUEEWS’ theoretical
framework as illustrated above. The theoretical framework of MUEEWS focuses on the
injection of a user reflection phase into the current framework of MUEEWS. The
hypotheses are essentially trying to validate that the reflection phase creates a periodic
functional complexity for MUEEWS as discussed above. The operational periodic
functional complexity of MUEEWS is dependent on the speed of which MCS is formed
and the likelihood of the UUT exceeding its limit. Therefore, to illustrate the first
hypothesis, which states that the theoretical framework of MUEEWS improves the MCS,
the tests below must illustrate that the MUEEWS simulation with the theoretical
framework implemented stabilizes the meta-content faster than the simulation without.
To validate the second hypothesis, which states that the improvement of MCS
also translates to the improvement of UUT, this research must illustrate that the system
utilizing the consensus formation tool improves the UUT performances of the MUEEWS
system. Furthermore, this research will illustrate that without consensus formation, the
86
MCS’ evolution can lead to growing usability problems which can in turn lead to user
abandonment and eventually, failure of the system. Therefore, the usage of the consensus
formation tools also decreases the likelihood of user abandonment. The simulation and
test environments described below create the basis for validating these two hypotheses.
However, the validation of these hypotheses is further explored in the Test Scenarios and
Validation section below.
4.4.3 Hypotheses Validation Methodology
Based on the analysis above, there are three validation goals this thesis must
achieve to prove the validity of the hypotheses. First, this research will validate that the
UUT performance is directly related to the likelihood of user abandonment. This
addresses the original assumption of this research that the UUT performance is related to
the system failure of MUEEWS. Second, this research will validate that the simulation
environment with the consensus formation tool creates meta-content stability faster than
the simulation environment without. Third, this research will validate that the
improvement of the meta-content in the simulation environment with the consensus
formation tool produces better UUT values in its respective test environment than the test
environment without.
To validate the hypotheses regarding the impact of the theoretical framework,
there are two simulation environments and two respective test environments where one
set of the environments utilizes the consensus formation tool, and the other one does not.
The MCS and the UUT of both sets of environments will be compared in order to
validate the above hypotheses. To validate the assumption regarding the relationship of
the UUT performance and the likelihood of system failure, there is a test scenario where
87
the meta-data is manually populated and the aim of the test is to gather UUT values based
on the variance of the difficulty of the tests. The reason this research included this test
scenario, which is a controlled environment with no tag generation, is to ensure that the
UUT values used to determine the relationship between the UUT and the probability of
system failure is not influenced by the meta-content generation.
4.5 Simulation & Test Environment
The Simulation and Test Environment uses the same portal interface, which is
illustrated below in Figure 17. This portal interface is based on a typical content
management system with a standard UGC GUI interface that has been configured and
reprogrammed to fit the needs of the two environments utilized in this research. The key
functionalities of the portal interface itself are to provide support to the users for account
management and to serve as the main GUI interface for the users’ interactions with the
Figure 17 Portal Homepage
88
two environments. The users’ contributions are gathered through pages in the simulation
environment. These contributions are then used in the test environment to gather their
usability performance information.
4.5.1 Simulation Environment
The simulation environment is an application that mimics a MUEEWS system. In
a MUEEWS system, there are generally two key components, a content management and
a content sharing system. The content management system allows users to manage the
content they contributed to the MUEEWS system. The content sharing system allows
users to view each other’s content. In the simulation environment, the content is limited
to URLs or websites much like the del.icio.us
TM
application. Furthermore, since the
number of users for these two environments may be fairly limited, the users were not
asked to post their own URLs or webpage, since the disparity in their content could make
it too difficult to generate enough meta-content for each entity.
However, since both of the environments are simulations designed to study
particular tagging or usability behavioral responses from the users, the testing group size
was determined for sampling purposes and not for recreating an actual MUEEWS
environment. The test scenarios that require a large quantity of user meta-content
contributions will be simulated, which is discussed in further details in the test
environment section below. In the simulation environment, the users are working from a
common base set of URLs or websites, and are asked to generate meta-content (i.e. tags)
for those URLs. The exact implementation of the simulation environment is discussed in
detail below.
89
There are two versions of the simulation environment; one without the application
of the theoretical framework, and one with. As discussed in the hypothesis validation
section above, these two simulation environments are used to generate tagging behavior
comparisons. The application of the theoretical framework is implemented as a
consensus formation tool, which is discussed in detail in the sections below. In the
simulation environment without theoretical framework, the users were given the typical
tools that are provided in MUEEWS content management systems, such as search,
tagging, and the ability to add, edit or delete their contributions. Therefore, that
simulation environment is used to represent the current MUEEWS framework, where the
users do not explicitly form a consensus in terms of the meta-content they contribute.
The intent of the simulation environment is not only to capture the users’ meat-
content contributions, but also to create test data for the test scenarios discussed in the
sections below. The test scenarios require two simulation setups; therefore, there are two
different versions of the simulated environment presented to users. In one version, the
users are provided all the typical tools in the MUEEWS application listed above, but they
will not be given any consensus formation tools. In a second version, users are provided
consensus formation tools during the generation of their meta-content (i.e. tags). This is
necessary to establish meta-content generation scenarios to investigate the hypotheses of
whether or not the meta-content sufficiency or stability occurs more quickly with the use
of consensus formation tools, and whether or not the stability eases the system
performance degradation of the MUEEWS system overall.
4.5.1.1 Simulation Environment without Theoretical Framework
90
To use the simulation environment system without theoretical framework, users
must register for and maintain their own accounts using their login and password. Their
contributions (i.e. their tags and the URLs they have tagged) are illustrated through their
account pages, which they have sole access to, but the impact those contributions have in
the system overall will be illustrated in the collaborative tagging pages of the simulation
application, which all users can access. In addition to gathering the user-generated meta-
content as the users utilize the simulation system, the simulation environment also keeps
track of the MCS growth and stability by tracking the meta-content. The information is
used to determine both the MCS limit of the MUEEWS system and the hypotheses of
whether or not meta-content sufficiency or stability occurs more rapidly with the use of
consensus formation tools.
4.5.1.1.1 Simulation Environment without Theoretical Framework Architecture
The simulation environment architecture shown in Figure 18 below is the version without
the consensus formation tool (i.e. the application of the theoretical framework). This
architecture is a standard web architecture with the main divisions made between the
client and server sides of the application, which is divided by the HTTP protocol. Each
user is represented as a different client interacting with the system. Since most of the
processing occurs in the server side, it is further divided into in the database (DBMS),
SQL, application, GUI layers to express more details regarding the application process.
Aside from the application functionalities, there is also a service manager which deals
with some of the backend processing such as sending mail, maintaining user accounts,
and various other functionalities required in a MUEEWS system.
91
Figure 18 Simulation Environment Architecture
For the simulation environment without the consensus formation tool, the
application focuses mainly on gathering user-generate meta-content (i.e. tags) and logging
user tagging behavior data. The general user information is logged in the user account
information, but their actions during their tag generation are logged separately in the user
tagging behavior data. The user-generated mate-content is stored in a separate data structure
and the system generated content, or the URLs the users are given to tag are stored in its own
separate data structure. Each of these data structures are connected to an application
component through SQL and a GUI component that combines to manage each of the
corresponding functionalities of these data structures.
One of the main functionalities of this simulation environment is to maintain user
account information not just for the users’ access permissions but for tracking their
92
tagging behaviors. The main GUI component for this purpose is the user account/login
page, which keep the users’ account information in the correct components and data
structure. This component also keep track of which users contributed what type of
information in the tagging behavior data logging components and data structures. Then,
there is the collaborative tagging GUI interface, which connects the components and data
structures of the tag behavior data logging, the tagging capability, and the URL loading
and selection. This component, which is illustrated above in Figure 20 below, is how the
users are able to add, edit or delete their meta-content contributions. The last component
is the loading and selection of the URL, which takes the URL information stored in the
database and renders it for the users as they begin their tagging task. The URL is
illustrated as a popup, which is shown Figure 19 below.
Figure 19 Popup with URL information
93
4.5.1.1.2 Simulation Environment without Theoretical Framework Interface
The URL that the user is tagging will be included in a popup, shown above in
Figure 19, as the tagging page, illustrated in Figure 20 below, loads. On the tagging
page, the user is able to add tags separated by spaces as they do in delicious
TM
through
the form illustrated below. The users are also able to delete their contributions by
clicking on the [X] next to the tags they have added. The users can utilize this interface
repeatedly until they are satisfied with their contributions, at which point they can click
on the Go to the next link selection to add meta-content to a new URL. They may also
navigate away from the page to go to their account page to edit other URLs they have
already make contributions to, or take the tests offered through the test environment. To
Figure 20 Simulation Application Tagging without Consensus Formation Tool
94
eliminate overlap, the users are never given the same URLs they have tagged to test, this
is discussed in further details in the test environment section below.
4.5.1.2 Simulation Environment with Theoretical Framework
The simulation environment with the theoretical framework applied via the
consensus formation tool has one primary focus. As discussed in the Hypotheses
Analysis with Functional Complexity section, the theoretical framework’s primary goal is
to introduce a functional periodicity in MUEEWS. The way to achieve this is through the
stabilization and improvement of the meta-content in the MUEEWS system. The
theoretical framework of MUEEWS states that allowing the users to reflect on the state of
the system empowers them to improve the system. Therefore, the consensus formation
tool at its core is to enable this reflection process. The consensus formation tool
illustrates to the users popular meta-content that the other users of the system have
already contributed towards a particular entity. The users are then able to reinforce the
usage of this meta-content by adding it to their own meta-content contribution, or by
denying it.
4.5.1.2.1 Simulation Environment with Theoretical Framework Architecture
The implementation of the consensus formation tool requires quite a few additions to the
architecture of the simulation environment. At the beginning of the simulation
environment’s usage, there is very little meta-content in the system. Therefore, the
consensus formation tools had no data to recommend as the popular tag suggestions.
Therefore, meta-content was imported from the delicious
TM
API to populate the
consensus formation tool. This required some handling of XML exchanges by the
service manager and an additional component which simply worked with the importing
95
and illustration of these meta-content in the consensus formation tool. This suggestion
accompaniment is then integrated into the collaborative tagging component from the
original simulation environment, which is illustrated below in Figure 21.
Figure 21 Simulation Environment with Consensus Formation Tool
Delicious
API
Server (PHP, Perl)
User Generated
Meta-Content (Tags)
System Generated
Content (URLs)
DBMS
SQL
User Tagging
Behavior Data
Application
User Account
Manager
User Account
Information
Data Logging
For Tagging
Behavior
Tagging
Capability
URL Selection
and Loading
User Account
(e.g. Login/
Registration)
Collaborative
Tagging
Component
URL Loading
and Tracking
Internet (WWW, TCP/IP, HTML, XML, JavaScript…) HTTP
Users
Delicious XML
and Json APIs
Client
User Pages
(HTML,
JavaScript)
Client
User Pages
(HTML,
JavaScript)
GUI
Service
Manager
Mail
Imported Meta-content
(Suggested Tags)
Suggested
Tagging
Capability
XML
APIs
4.5.1.2.2 Simulation Environment with Theoretical Framework Interface
As illustrated above in Figure 19, the user is given the URL in a popup as they
begin their tagging tasks in both simulation environments. The interface of this version
of the simulation environment has been changed to incorporate the consensus formation
tool, which is implemented as a tag suggestion tool. As illustrated below in figure 22, the
users are able to add the tag suggestions to their list of meta-content contributions by
clicking on the [+] button next to the suggested tag. This not only enables the users to tag
the entity much more quickly, but also helps all of users form a consensus regarding what
96
is the most important tags to assign an entity. This is further illustrated in the test
scenarios and results and validation sections below.
Figure 22 Simulation Application Tagging with Consensus Formation Tool
The above simulation environment allows the users to reflect on other users’
contributions before progressing further in the meta-content evolution, as discussed in the
Theoretical Framework section above. This process allows the users to form a consensus
without falling into the latency issues of LCA. The latency issues of LCA in a massive
group, such as those in MUEEWS, are caused by the users’ inability to see each other’s
contributions. Also, it is caused by the users’ inability to see their own contributions
affecting the system as a whole. However, by incorporating the user’s contributions in
97
the meta-content generation phase as the means of guidance, the users are able to see the
other users’ contributions while they make decisions about their own meta-content
generation. Furthermore, they are also able to see how their own contributions are
affecting the system as a whole, since the list of popular tags are updated based on the
users’ meta-content contributions.
The implementation of the consensus formation tool is a popular tag or tag cloud
recommendation function, which recommends the tags that have been used the most by
all users who have tagged the particular URL the current user is tagging. This concept is
based on the tag cloud function used in the del.icio.us
TM
application, which is used to
show the users their own tag usage (i.e. what tags they used the most) overall, as well as
the system’s most popular tags (i.e. what tags all users used the most). Since the users
can reflect on these tags recommendations before deciding on whether or not to select
them or create their own during the meta-content generation process, it replicates the
PAR process illustrated in the theoretical framework section above. In the PAR process,
the group members reflected on the overall state of the group’s progress before moving
onto the next stages of interaction. Keeping true to this process, the implementation of
the consensus formation tool ensures that the users’ contributions are being reinforced as
the meta-content evolves.
There are some implicit consensus formations that occur in MUEEWS systems,
since users are likely to tag an entity, whether it is an URL or a picture, with some similar
tags. However, this type of uninformed consensus formation could take a long time, at
which point the users may have already passed their UUT limit. Therefore, a consensus
formation tool can help the users meet their MCS limit more quickly and help stabilize
98
the meta-content as they progress in their generation phase. Currently, many users of
del.icio.us
TM
, YouTube, and etc have reported frustrating experiences due to the slow
search speeds, and confusion during searches due to poorly tagged URLs. The work in
this thesis aims to target these problems explicitly with the use of the theoretical
framework.
4.5.2 Test Environment
The test environment has two purposes. The first is to provide a means for
simulating the various test scenario conditions for the validation of the test goals. This
aspect will be covered in more detail in the test scenario and validation section below
(figure 23). The second is to gather the UUT limit information, which, as discussed in
the Hypotheses Analysis with Functional Complexity section above, is essential for
determining the Functional Periodicity of the MUEEWS system. The tests are based on
the search function of the simulation environment. Since search is one of the most
important functionalities that tagging or meta-content generation assists with, the effect
of the test conditions on this function is therefore very significant and should accurately
reflect the problems and respective usability tolerance limits users have when using
MUEEWS systems.
4.5.2.1 Test Environment Architecture
Unlike the simulation environment, there is essentially a single architecture for
the test environment. The only variation the users are affected by in the test environment
is the data that is stored in the data structures. Just as the simulation environment, the test
environment also adheres to the basic web architecture, with a client/server structure over
a HTTP protocol. In addition, the server side also has the basic breakdown of a DBMS,
99
SQL, application, and GUI layer. Also, the other components it shares with the
simulation environment is a service manager component which handles the mail system
and other communication protocols and a user account manager.
Figure 23 Test Environment Architecture
The user account manager is still responsible for keeping track of the users system
sessions just as in the simulation environment. However, the other components are vastly
different. The test loading and tracking component randomly generates a test with a URL
to search for and brings up the respective meta-content that URL has associated with it.
The meta-content is gathered from the simulation environment’s database, whether the
test contains the meta-content from the data structures with the consensus formation tool
or not is randomly generated. The reasoning behind this is to allow the barrier of entry to
be recorded in both instances of the test environment. The search capability, which is
100
illustrated below in Figure 24, combines both the randomization of the test URLs and the
loading of the meta-content for each of the tests the user participates in.
Figure 24 Test Application Start Page
As the users take the tests, their test behaviors are recorded in the user test
behavior component, which records the test URL, the user, and the version of meta-data
the user was given to work with. In addition, the data logging of the user test behavior
components also records the usability performances of each particular test, which is the
main goal of the test environment. Lastly, whether the users were able to complete the
test or otherwise, they are able to comment on and rate the test they just took in the test
ratings and comments tracking component. As this research will discuss in the Results
and Validation section below, the comments and ratings of the users is another
component that is used to determine their usability performance and a way for the users
to express their UUT limitations.
4.5.2.2 Test Environment Implementation
101
When the users begin the tests, they are first shown a tutorial to eliminate the
effect of unfamiliarity on the early tests as much as possible. The tests themselves
consist of showing the user the contents of randomly selected websites, without the URL
itself, and asking the user to find the corresponding URL through the search feature of the
simulation environment, as illustrated in Figure 24. The test environment requires the
users to login in order to track their test performance or usability performance results,
which include test completion time, correctness, and completeness. In addition to the test
results, the test environment also keeps track of the users’ test feedbacks, such as user test
ratings and comments, to help this research further understand the usability tolerance of
the users as they progress through the tests.
Once the user has seen the contents of the URL, they can use keywords, which
they believe are associated with the URL, in the search function to find their target, as
they would in any search feature. The search function returns the results of the
keywords, illustrated in Figure 25, from the simulation environment’s application
database, not the Internet. If there are no search results or the results are not what the
users are looking for, they always have the option of searching again. Once the user sees
a URL they find promising, they can click on it to see if they have located the correct
URL. If they have found the correct URL, the test is completed. If not, they will be
prompted to search again until they find the correct URL or give up on the test. If they
chose to give up the test, they are able to click on the Skip This Test button on any of the
pages.
102
Figure 25 Test Application Search Results Page
On every page in the test environment, the user always has the option to quit the
current test, and move onto another test. Although this means that their tests will not be
counted, the test results indicate that users do in fact give up searching for a URL after a
certain time frame. This phenomenon clearly indicates a UUT limit, although there is a
range to the limit, it is relatively consistent. Additionally, the users are asked to rate and
comment on the test, as illustrated in Figure 26, which gives this research more in-depth
103
knowledge of usability issues. In order to establish a clear UUT limit, there are other test
factors that must be taken into consideration, which is discussed further in the test factors
section below.
Figure 26 Test Application Rating and Comments Page
4.5.2.3 Test Factors and Response Variables
The test factors are the method of investigating both the UUT limit of the users
and the test scenario conditions discussed in the later sections. The four key test factors
are test completion time, correctness, completeness, and the test ratings and comments.
The test time refers to the time the user takes to complete the search and identify each test
URL. The correctness test factor is the number of times users guessed incorrectly and
had use the search function again. The completeness test factor is measured by the users’
inability to complete a task, which is reflected through abandonment of the task.
Additionally, test ratings and comments are very helpful indications of a user’s gauge of
104
the difficulty of the search. These factors are combined to determine the usability of the
application as well as the UUT limit.
In the next section this research will discuss the test scenarios, which are based on
varying the MCS, by switching between the two versions of the simulation environment,
which is then reflected in the test environment when the users begin their tests. The MCS
is determined by the tag stability level, which is in turn determined by the tag usage
percentage. After a certain amount of contributions, which generally involves an URL
being tagged from 50-100 times, the URL tag usage percentage will stabilize and remain
stabilized thereon after. The URL with those characteristics are therefore considered
stabilized or has sufficient meta-content, and therefore will generate different UUT
results than the URLs that have not stabilized. This will be discussed in further detail in
the test scenario and validation section below.
4.6 Test Scenarios & Validation
4.6.1 Test Scenarios and Test Goals
As stated in the Simulation and Test Environment Section above, there are
essentially two hypotheses the test scenarios are trying to illustrate. The first goal is to
validate that without consensus formation, the user-generated meta-content’s evolution
can create system performance degradation, which is reflected in growing usability
problems. These problems can further lead to user abandonment and therefore, failure of
the system. The second goal is to validate that consensus formation tools (e.g., popular
tag recommendation and tag cloud sharing) can promote meta-content sufficiency
growth. In turn, this will create the stability that is needed to resolve the system
performance degradation of MUEEWS. Users were given both a simulation environment
105
and a test environment which recreates a MUEEWS environment and the test scenarios.
These test scenarios will show that by allowing the users to guide each other into a
consensus, the consensus tools are able to promote stability in the meta-content, and thus
in the MUEEWS system overall.
There are essentially four test scenarios. The first is the base case or untagged
test scenario, which allows users to perform their searches based on keywords generated
by the system. The second is the unorganized tagging test scenario, which allows users
to perform their searches based on the tags they generated without any consensus tools.
The third is the organized tagging test scenario, which allows users to perform their
searches based on their generated tags with consensus tools enabled. The tags that are
used in the latter two test cases are generated in the two respective simulation
environments discussed above.
4.6.2 User Usability Tolerance Limit Analysis Test Scenario
4.6.2.1 Control or No-Tag Test Scenario
The control or base test scenario, where user generated content is not used, is
intended to gather usability performance values for information regarding the relationship
between UUT and the probability of system failure. The main goal of this test scenario is
to find the UUT for the test factors’ values (i.e. the test completion time, the correctness,
the completeness, and the test ratings and comments) and the effect they have on the
probability of system failure. The base case test scenario data should reflect the general
usability data that you see with a search task test in a MUEEWS system. In particular,
this test scenario is interested in finding the UUT limit discussed above, which should
stay consistent regardless of which test scenario is in use. The base test scenario gets
106
searchable data (i.e. keywords) and sends users a test with a randomly selected link from
the database, asking them to find it using the generated keywords.
Figure 27 Base Case Test Scenario
In the base case test scenario, illustrated in Figure 27 above, the interaction
between the testing environment and the user is fairly simple. The tests are generated
randomly and keywords are determined by the developer. Users are simply taking the
tests and generating usability data. Since users uses no tagging input in this test scenario,
this test case is not related to the tagging scenarios. However, this base case test scenario
provides some preliminary results regarding user behavior during search task
performance and the users’ responses to the test environment. As these early results
indicate, there is a clear usability limit for the search task in the typical MUEEWS system
environment. Also, the users’ own ratings and inputs fluctuate in correspondence with
this usability limit. These results, as well as the other test scenario results are discussed
in further detail in the Results and Validation section below.
4.6.3 Hypotheses Validation Test Scenarios
4.6.3.1 Unorganized Tag Test Scenario
107
As discussed in the Simulation and Test Environment section, users are asked to
generate meta-content in two different simulation setups. In the first version, users are
provided all the typical tools in the MUEEWS application listed above, but are not given
any consensus formation tools. This is the unorganized tagging test scenario. In the
second version, users are provided consensus formation tools during the generation of
their meta-content (i.e. tags). This is the organized tagging test scenario. These two
simulations and test scenarios are necessary to establish whether or not the MCS or meta-
content stability occurs more quickly with the use of consensus formation tools. In
addition, these two test scenarios will be used to determine whether or not these created
stabilities ease the system performance degradation of the MUEEWS system overall.
The unorganized tagging test scenario is used to illustrate how poorly organized
meta-content generation can degrade a system’s functionalities and usability. The
searchable meta-content (i.e. keywords) is generated from the users’ tagging activities in
the simulated environment where they are not presented with any consensus formation
tools. The test environment is the same as the base case, where a randomly selected link
is used in the search tests. However, the URLs, as well as their associated content will
only be selected from the set of URLs that have unorganized tags as their meta-content.
The number of tags each link has in the test environment is limited to a set number so that
the performance of the unorganized tagging test scenario isn’t simply worse than the
organized scenario because there is less overall meta-content.
However, the tags are not limited in the simulation environment as their growth
patterns are tracked to determine the MCS differences between unorganized and
organized tagging. Aside from overall tag usage breakdown, another growth pattern of
108
interest to this research is the tag usage percentage discussed in the Simulation and Test
Environment section above. The tag usage percentage is used to illustrate the formation
of convergence, as it is in direct relation to how often a user selects a tag to represent an
entity, which is a website in this particular case. The tag growth patterns, as well as the
test data from all of the test scenarios’ data, are presented in further detail in the Results
and Validation section below.
Figure 28 Unorganized Tagging Test Scenario
In Figure 28, it is evident that the interaction between the simulation
environments and the user is no longer as simple as the base test case. User generated
tags are being inputted into the simulation environment from which the testing
109
environment receives its data. This process reflects the tagged search functionalities that
are quite common among current MUEEWS systems. Users’ inputs are being leveraged
directly to fulfill system functionalities of the MUEEWS system. In this case, the users’
inputs are directly impacting the testing environment where the search task’s usability
information is gathered. In this scenario, users are not shown other users’ inputs as they
progress through the tagging task, which means there is no opportunity for users to form
a consensus as they tag.
The hypothesis is that the performance of unorganized tagging is worse than the
organized tagging test scenario because the content will stabilize much slower without
user consensus formation. User consensus formation simply cannot occur when users are
uninformed regarding each others’ contributions. In addition, the users that are
performing these tagging tasks are not those who take the test. This is set in place to
prevent the users’ search test results to be influenced by the memorization of their own
tags for a particular link. Users are both generating tags and taking tests, but the URLs
they tag and the URLs they test have no overlap.
4.6.3.1 Organized Tagging Test Scenario
In the organized tagging test scenario, the users’ search tests are based on the
tags they generated with the use of consensus formation tool, which is implemented as a
tag cloud or popular tag recommendations. The popular tag recommendations are
generated by presenting the existing tags that are already associated with a particular
entity, which in this case is a URL, ordered by the tag usage percentage. This
recommendation is updated continuously as the users progress in their tagging using the
simulation environment. Therefore, this recommendation tool allows users to see the
110
contributions of other users before making a decision about their own contributions.
Furthermore, by illustrating the tags that have the highest tag usage percentage, the users
are shown the tags that most of the users have implicitly agreed should be associated with
the entity.
The organized tagging test scenario is used to show that consensus tools are a
means to promote meta-content stability and avoid system degradation. Therefore, as
discussed above in Meta-Content Sufficiency section, the tag usage percentage for both
the organized and unorganized tagging will be tracked consistently to illustrate their MCS
growth. The hypothesis of this research is that user consensus in MUEEWS is directly
related to the meta-content stability, which in turn directly influences the usability of
MUEEWS. In the Results and Validation section below, the MCS comparisons between
unorganized and organized tagging will be discussed in detail along with all the other
data from these test scenarios.
The search task in this test scenario is still based on a randomly selected link
used in the search tests. However, the URLs will only be selected from the set of URLs
that have organized tags as their meta-content. Therefore, since the organized test
scenario’s search tests only get their searchable meta-content from user tagging with the
use of consensus formation tools, the test results will reflect its respective system
degradation levels or UUT performance data compared to that of the unorganized tagging
test scenario. These values will be compared and analyzed in the Results and Validation
section below. Furthermore, the implications of these results and their analysis will be
used to extract the conclusions regarding the hypotheses of this research.
111
As shown in Figure 29, not only is the relationship between the user and the
system more complex than the base case test scenario with the addition of the simulation
environment, but the relationship between the simulation environment and the consensus
tool was also added. This relationship reflects the consistent updates that the simulation
environment receives when the users takes the recommendation from the consensus tool.
Also, the relationship illustrates the updates that the consensus formation tool receives
when the users confirmed its recommendations in the simulation environment. This
reflects the PAR process that was discussed in the Theoretical Framework section. As
discussed in that section, the users’ not only can see their own impacts on the system as a
whole, but they can also see the other user’s contributions, which allows the users to
overcome the LCA of a large group.
The organized test scenario also uses imported tags from outside sources. In
this case, the outside source is delicious
TM
. As discussed in the Simulation and Test
Environment section, the users are given the URLs for which they will generate tags for
and upon which they will perform their search tests. The chosen URLs were selected
from delicious
TM
and Google not only to be popular URLs related to a subject, but also to
contain sufficient tags and tag usage stability values that could be imported from
delicious
TM
. The hypotheses of this research are not only to illustrate that user consensus
can assist in the meta-content stability, but also to show that meta-content stability
prevents degradation of system usability performance. This implementation essentially
simulates an MUEEWS environment with millions of users’ inputs for the test user as
they progressively tag the URLs they are assigned.
112
Figure 29 Organized Tagged Test Scenario
Although the tags used in the consensus formation tool are imported from
outside sources, the tags being used in the test environment for this scenario are those
selected by the test users. Furthermore, if the test user does not agree with the
recommendations, he or she can simply ignore it and still submit his own meta-content,
even if it isn’t among the recommendations. Therefore, this implementation preserve the
PAR process’s ability to allow the users to reflect on the other users’ contributions before
submitting their own meta-content and to see their own contributions impact the system.
The respective usability values of this test case reflect that of more quickly stabilized
meta-content and the effect that stabilization can have on the UUT. Therefore, this test
scenario also illustrates the latter part of the hypotheses of this research, which is to show
113
that that MCS can resolve system degradation and poor UUT performance. The Results
and Validation section will discuss these outputs in further detail below.
4.6.4 Test Scenarios Definition
There are three main test scenarios. The first test scenario is a control or base
case test scenario where not tag generation is involved and the users are simply tested on
their search capabilities based on keywords in a Google search like environment. The
base case test scenario is used to find the UUT limit of a MUEEWS system and is not
used to validate the above hypotheses. The second scenario is an unorganized tagging
test scenario, which represents a MUEEWS system where users perform their tagging
without consensus formation tools. The unorganized tagging test scenario is used to
represent the hypotheses where insufficient MCS can lead to exceeding the UUT limit
and potentially system failure. The third scenario is an organized tagging test scenario,
where the users perform their tagging with consensus formation tools. The organized
tagging test scenario is used to represent the hypotheses where consensus formation tools
are used to improve the MCS and therefore the UUT of the MUEEWS system. The
consensus formation tool, which is implemented as tag recommendations, are driving by
imported tags with sufficient MCS from delicious
TM
as well as the users’ inputs, which is
discussed in further details below.
The control or base case test scenario is only the interaction between the test
and the user, with no meta-content input from the user. Since no tags were generated for
this test scenario, its test data only reflects the users’ usability values. Therefore, the test
data from the control tests scenario was only used to determine how the usability
variables impact the UUT limit. The key determinants of the UUT limit are the result
114
variables, time, rating, completeness and correctness. As mentioned above, it is the
assumption of this research that incompleteness reflects the UUT limit being exceeded.
Therefore, the relationship between time, rating, and correctness with completeness
determines the limits for the UUT. In the test results findings, it was determined that
correctness did not have a strong relationship with completeness. Therefore, time and
rating was determined to be the key variable in the UUT analysis. The details of these
test results are discussed further below in the Test Results and Validation section.
The unorganized tagging test scenario has tag inputs from the user. However,
there is no consensus formation tools provided. The purpose of this test scenario is to
illustrate the MCS behavior and UUT values of the tags generated without consensus
formation tools. This test scenario will attempt to confirm the hypotheses that the tags
generated without consensus formation tools are take longer time to converge, and
therefore may not meet the MCS limit in time. As discussed above, the sufficiency of
MCS is determined by the tag usage percentage stability. Also, this research hope to
illustrate the hypothesis that insufficient MCS makes exceeding the UUT limit more
likely. Therefore, this research will illustrate the improvement of the UUT and MCS
though the use of consensus formation tool. The improvement will be illustrated through
a comparative analysis of the MCS and UUT values in the organized and unorganized
tagging test scenario.
As discussed in the Simulation and Test Environment section, organized tagging
test scenario has much more complicated user interaction than the other two test
scenarios discussed above. First, the simulated tagging environment provides tagging
recommendations in the form of consensus formation tools. The consensus formation
115
tool makes tag recommendations based on the most popular tags in the system. Second,
the users generate their tags either based on the tag recommendations or based on their
own inputs. Lastly, the tests are generated with the tags that the users inputted into the
test environment. The consensus formation tools use imported meta-content from
delicious
TM
. The imported meta-content is generated only from URLs that have
sufficient MCS. The meta-content that is deemed sufficient was generated by thousands
of users. By including this meta-content in the consensus formation tools, the users are
interacting with the input from the users of delicious
TM
, thereby simulating a tagging
environment where there are thousands of user inputs.
The purpose of the organized test scenario is to illustrate the MCS and UUT
behaviors that utilize tags generated with consensus formation tool. The organized test
scenario will confirm that the hypotheses that the PAR process, where user reviewing
other users’ contributions before making their own, improves the MCS of the tag
generation. It will illustrate that tags generated with this methodology are more likely to
converge, and therefore meet the MCS limit more quickly. Furthermore, it will illustrate
that tests utilizing tags generated with consensus formation tools are less likely to exceed
the UUT limit. In the Test Results and Validation section, the specifics of these
validations will be discussed in detail. The variation of unorganized vs. organized
tagging is a variation of MCS levels, which is the test factor used to generate the result
variables, or the UUT performance values.
4.6.5 Statistical Validation for Test Data
In statistics, analysis of variance (ANOVA) is a collection of statistical models,
and their associated procedures, in which the observed variance is partitioned into
116
components due to different explanatory variables. The initial techniques of the analysis
of variance were developed by the statistician and geneticist R. A. Fisher in the 1920s
and 1930s, and are sometimes known as Fisher's ANOVA or Fisher's analysis of
variance, due to the use of Fisher's F-distribution as part of the test of statistical
significance. The type of ANOVA analysis that will be used for validation in this
research will be a one-way ANOVA analysis. One-way ANOVA is used to test for
differences among two or more independent groups. Typically, however, the one-way
ANOVA is used to test for differences among three or more groups, with the two-group
case relegated to the t-test (Gossett, 1908), which is a special case of the ANOVA. The
relation between ANOVA and t is given as F = t^2.
The ANOVA (analysis of variance) analysis will be used to determine if the
number of testers, the experiment setup, and data points generated are valid enough to be
considered as significant conclusions. The main comparison will be in the usability
performance data between the various tests scenarios discussed above. The use of the
ANOVA statistical analysis will determine if the variances of these test cases are
significant to prove the hypotheses. The analysis results and exact approach will be
further discussed in the Results and Validation section below. These results will be used
to confirm the hypothesis and test goals discussed in the above sections. The results will
have to validate that the system degradation can happen in open systems such as
MUEEWS, where the meta-content is user-generated. Furthermore, the results will have
to show that stable and well-organized meta-content can improve system degradation,
and user consensus tools can be used to stabilize meta-content.
4.7 Results & Validation
117
4.7.1 Test Hypothesis Definition
As discussed in the sections above, functional periodicity is the key for
maintaining control of the functional complexity of MUEEWS. The two key components
of the functional periodicity of MUEEWS are the UUT and the MCS. The UUT limit is
the users’ tolerance limit of the usability difficulties they encounter during their usage of
the MUEEW system. The MCS is the meta-content sufficiency, which is assessed by
how well the meta-content supports the functionalities of the MUEEWS system. The
UUT is measured in terms of the users’ usability performance, i.e. time, completeness,
and correctness. The MCS is measure in terms of the stability of the tag usage
percentages.
It is the assumption of this research that user abandonment is an indictor of the
UUT exceeding its limit. The first hypothesis states that exceeding the UUT limit leads
to failure of the MUEEWS system, and insufficient MCS is the primary cause for
exceeding the UUT limit. Not only does user abandonment indicates that the system is
performing its functionalities poorly, but it also cuts off the supply of the user generated
meta-content. The second hypothesis states that improving MCS through user consensus
formation tools is a means of improving the UUT of the MUEEWS system. A decline in
meta-content contribution means that the system has no future meta-content to leverage
for the improvement of its UUT. In that particular case, the system will continue to
perform its functions poorly until the system fails unless there is sufficient MCS to
improve the UUT.
4.7.2 Tests Results
118
The test results are broken down in three major categories; the tests results that
were used in defining UUT limit behavior, which were predominately drawn from the
base case test scenario, the test results that were used in making conclusions about user
tagging behavior, and the test results that were used to observe user test behavior. The
UUT limit behavior test results focused on the relationships the completeness data has
with the other test result variables. The tagging behavior data is essentially the MCS
data, which is based on the tag usage behaviors generated from the unorganized and
organized tagging environments. The test behavior data is based on the test result
variables (i.e. time, rating, incompleteness) from unorganized and organized tagging
scenario, which represents a variation in the test factor MCS.
4.7.2.1 Organized vs. Unorganized Simulation Environment Results for Tagging
The tagging behavior test results were essentially a comparative analysis of the
user tagging behaviors and MCS generation behaviors between the unorganized and
organized tagging environments. The comparison was setup to ensure that the user data
generation was consistent across the two environments. This research ensured that the
same numbers of overall tag sessions were recorded between both tagging environments
and the users never tag the same URL more than once across both environments. This
eliminates scenarios where users have better ideas regarding what tags to generate for a
link because they have already generated tags for that link.
Furthermore, this research ensured that half of the users were introduced to the
tagging environment with consensus formation tools first, and vice versa to ensure that
the effect the barrier of entry impacted both tagging environments equally. This research
also gave the users a tutorial on both of the environments before they began to further
119
eliminate the effects of the barrier of entry. The comparison findings concluded that less
unique tags were generated in the organized than the unorganized tagging environment,
illustrated in Figure 30 below. Since the numbers of tags generated overall were the
same across both environments, this comparison shows greater convergence in the
organized tagging environment than the unorganized.
Figure 30 Number of Unique Tags per Link
Number of Tags Per Link
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35 40 45 50 55 60
Link ID
Number of Tags
Without
Consensus
Tool
With
Consensus
Tool
As stated in the Test Scenario Review section above, the MCS sufficiency is
determined by the tag usage stability. The tag usage stability is the number of times the
tag was used divided by the number of users that had authored the tags. There were only
70 tag sessions recorded in the experiment, since each user only tagged one entity once,
versus the thousands illustrate in the delicious
TM
example above, the behaviors indicated
here is much less stable then the ones illustrated in delicious
TM
. However, this
comparison is still able to reveal that tags generated with consensus formation tools had a
120
much more stable behavior than the tags generated without. This is illustrated below in
Figure 31 and Figure 33.
However, since there is a great deal of data points which make the pattern of tag
usage growth hard to see, Figure 32 and Figure 34 illustrate the tag usage developments
of the top tags only, which illustrates the variation between to the two test scenarios much
more clearly. One of the hypotheses of this research states that the theoretical framework
of MUEEWS, which is implemented in the consensus formation tool or the tag
recommendation tool, can greatly impact the MCS. The significant improvement in tag
convergence in the tagging environment is a definite indicator of the validity of this
hypothesis, which will be discussed in further details in hypothesis confirmation section
below.
Figure 31 Tag Stability without Consensus Formation Tool
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
0 3 6 9 12 15 18 21 24 27 30 33 36 39
URL IDs
Tag Stability
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
Series10
Series11
Series12
Series13
Series14
Series15
Series16
Series17
Series18
Series19
Series20
Series21
121
Figure 32 Tag Stability without Consensus Formation Tool Top Tags
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
1 4 7 10 13 16 19 22 25 28 31 34 37 40
Tag Session
Tag Stability
Series1
Series2
Series3
Series4
Series5
Series6
Figure 33 Tag Stability with Consensus Formation Tool
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
0 3 6 9 12 15 18 21 24 27 30 33 36 39
URL IDs
Tag Stability
Series1
Series2
Series3
Series4
Series5
Series6
Series7
Series8
Series9
Series10
Series11
Series12
Series13
Series14
Series15
Series16
Series17
Series18
Series19
Series20
Series21
Series22
122
Figure 34 Tag Stability with Consensus Formation Tool Top Tags
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
0 3 6 9 12 15 18 21 24 27 30 33 36 39
Tag Session
Tag Stability
Series1
Series2
Series3
Series4
Series5
Series6
The sufficiency of MCS is measure by tag usage percentage stability as
illustrated in the Test Scenarios sections above. This research showed through the
tagging environment data above that users generated much more stable tag usage
percentages in the tagging environment with consensus formation tools than without.
This shows a faster convergence of the tags and therefore MCS sufficiency as indicted by
the tag usage percentage stability. Since the simulation environment with the consensus
formation tool is able to achieve the MCS limit more quickly, it is less likely to exceed
the UUT limit as discussed in the functional periodicity section above. This is further
illustrated below in the Organized vs. Unorganized Test Environment Results section
below.
4.7.2.2 Control or No-Tag Test Scenario Results
From the UUT limit behavior test results, this research found that there wasn’t a
strong relationship between test incompleteness and test correctness. The data was too
123
varied to draw any conclusion from. However, this research found the relationship
between test time and completeness to be very significant, where an increase in the tests’
average test times have a dramatic influence on the probability of incompleteness. Also,
this research found a significant relationship between test rating and completeness, where
an increase in the tests’ average rating has a dramatic influence on the probability of
incompleteness. Therefore, it was the conclusion that test time is the best and most
consistent indicator of when the UUT limit is being reached, with test ratings also playing
a key role.
Figure 35 Correctness vs. the Probability of Incompleteness
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4
Correctness
Probablity of Incompleteness
Series1
Poly. (Series1)
124
Figure 36 Time vs. the Probability of Incompleteness
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 50 100 150 200
Time
Probability of Incompleteness
Series1
Poly. (Series1)
45
0.111
Figure 37 Rating vs. the Probability of Incompleteness
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5
Rating
Probability of Incompleteness
Series1
Poly. (Series1)
2.667
0.111
The figures above clearly indict where the UUT limit is being reached with the
jump in the probability of incompleteness. Based on that jump, this research was able to
determine the UUT limit information for test time and rating, which is all illustrated in
125
the above three figures. These UUT limit information will be further applied toward
determining the improvement in the probability of exceeding the UUT limit in the
organized vs. unorganized test scenarios results below. The UUT limit information will
be illustrated on the figures that indict the comparison information between to the two
test scenarios.
4.7.2.3 Organized vs. Unorganized Test Environment Results
The test behavior results are a comparative analysis between the users’ UUT
behaviors in the test environment that utilized tags with consensus formation tools versus
the test environment without. The comparison was setup to ensure that the two test
environments were subjected to the same standards. It was ensured that the users are
never given the same test twice across the two testing environments, which could unfairly
sway the users’ performance. Additionally, the tests are randomly selected from both test
environments; therefore the barrier of entry is recorded in both tests. The comparison
findings concluded that the average test time clearly indicates a significant improvement
Figure 38 Test Time Averages
0
20
40
60
80
100
120
140
160
180
200
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57
URL IDs
Average Time
Without
Consensus
Tool
With
Consensus
Tool
45
126
with the tests that used tags generated with the consensus formation tools, which is
illustrated in Figure 38 above. In addition, the test rating average and incompleteness
doesn’t have quite as sharp of a contrast, but still show an improvement with tests that
used tagging with consensus formation tools. Therefore, there is a direct connection
between the improvement of MCS sufficiency and the use of consensus formation tools.
The statistical analysis of the test time averages, shown below in Table 4,
illustrated that the overall average test time for the test environment that used tags
generated without consensus formation tools was 44.9; whereas the overall average test
time for the test environment with consensus formation tools was 22.5. Furthermore, the
maximum average test time without consensus formation tools was 185.2; whereas the
testing environment with consensus formation tools was 52.7. As stated in the UUT
Limit result section above, this research found that tests averaging over 45 seconds in test
time may result in incompletion. Therefore, higher test time averages not only indicate
Table 4 Statistical Analysis of Test Time Average
127
poor user performance, but also greatly increase the probability of incompleteness. This
is discussed in detail below when this research examine the completeness data analysis.
The test rating average, although not as dramatic of an improvement as the test
time averages, still illustrate an significant improvement with testing environments using
tags generated with consensus formation tools, which is illustrated below in Figure 39.
The statistical analysis of the test rating averages, shown below in Table 5, illustrated that
the overall average rating for the tagging environment without consensus formation tools
was 2.2; where as the overall average rating for the tagging environment with consensus
formation tools was 1.8. Although the overall average of the ratings are fairly close
between to the two test environments, based on an examination of Figure VIII.7 below,
there are significantly higher number of rating averages that rated above 2.67 in the
testing environment that used tags generated without consensus formation tools. As
Figure 39 Test Average Rating
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 6 11 16 21 26 31 36 41 46 51 56
URL IDs
Average Rating
Without
Consensus
Tool
With
Consensus
Tool
2.667
128
Table 5 Statistical Analysis of Test Average Rating
stated the UUT limit discussion above, this research found that tests that had average
ratings over 2.67 in difficulty resulted in incompletion. Therefore, higher rating averages
not only indicate poor user performance, but also greatly increase the probability of
incompleteness. The implications of these findings are discussed further below when this
research examine the completeness data analysis.
Test incompleteness is the key results variable that determines the UUT limit as
discussed in Hypothesis Review section above. As this research can see in Figure 40
below, the test completeness results variable illustrate a significant improvement with
testing environments using tags generated with consensus formation tools. In the Figure
40, the tests where the users gave up on a test were illustrated in terms of probability of
incompleteness. There were only 2 incomplete tests in the testing environment where the
tags were generated with consensus formation tools versus quite a lot more in the tests
environment without consensus formation tool. Furthermore, the probability of
incompleteness was much higher for the testing environment where the tags were
129
generated with consensus formation tools vs. the one without. This is reflected in both
Figure 40 and the statistical analysis in Table 6.
Figure 40 Test Incompleteness
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
URL IDs
Probability of Incompleteness
Without
Consensus
Tool
With
Consensus
Tool
Table 6 Statistical Analysis of Test Incompleteness
130
Based on an examination of Figure 40 and Table 6 above, this research can see
that there are many more tests that were incomplete in the testing environment using tags
generated without consensus formation tools. This research had stated in the Hypotheses
section that the UUT limit being exceeded causes user abandonment and can eventually
lead to the failures of the MUEEWS system. Therefore, based on the results from Figure
40, this research can conclude that the test environment which used tags generated with a
consensus formation tool is much less likely to exceed the UUT limit and cause system
failure.
This research had illustrated above in the tagging behavior test results that tags
generated with user consensus formation tools does have significantly improved MCS
sufficiency. Therefore, there is a direct connection between the improvement of MCS
sufficiency and the use of consensus formation tools. Furthermore, in the test time and
test rating analysis above, this research illustrated that the UUT behaviors significantly
improved with the improvement of the MCS. Therefore, this research can illustrate that
the UUT behaviors are significantly improved with the use of consensus formation tools.
Lastly, UUT incompleteness also showed significant improvement with the use of the
tags with the consensus formation tool, or the improved MCS. These results’ validation
of the two hypotheses is discussed in further details in the Hypothesis Validation section
below.
4.7.3 Statistical Validation of Results
The test results illustrated above are validated with the ANOVA statistical
analysis. Since there are only two sample sets being compared, the analysis is the t-test
analysis. For the statistical validation settings, the assumption is that the variances
131
between the two sets are different for these sets of data points. The data being compared
is the test environment results or the UUT performance results from the organized and
unorganized test scenarios. The key point this statistical validation is trying to illustrate
that the two datasets are in fact disparate enough to be considered valid or disparate
enough to draw conclusions from. As this research will illustrate in the below section,
the two data sets were shown to be variant enough to be statistically sound for a
comparison analysis.
4.7.3.1 Organized vs. Unorganized Simulation Environment Results for Tagging
The t-test analysis was performed on the unique tags per link datasets for the
organized and unorganized tagging results. The p-value of a T-test indicates the
probability of getting a mean difference between the groups at high as what is observed
by chance. In another works, a p-value is the measure of how much evidence the dataset
has against the null hypothesis. The null hypothesis, represents the hypothesis of that the
two datasets have no change or no effect. Therefore, the lower the p-value, the more
significant the difference between the groups are. Based on a t-value of 2 and a degree of
freedom of 59, since the number of observations is 59, the p-value had to be lower than
0.050947 for a two-tailed probability and lower than 0.025474 for a one-tailed probability
in order to be considered significant. As illustrated below in Table 7, the p-value verified
that the datasets from the tests with the consensus tools is significantly better than the one
without for all the results variables this research considered (i.e. time, rating, and
completeness).
132
Table 7 Statistical Analysis for Unique Tag per Link
4.7.3.2 Organized vs. Unorganized Test Environment Results
The T-test analysis was also performed on the test time, test rating, and test
incompleteness. The p-value of these T-test analyses is indicated below. Based on a t-
value of 2 and a degree of freedom of 59, since the numbers of observations are 59, the p-
value had to be lower than 0.050947 for a two-tailed probability and lower than 0.025474
for a one-tailed probability in order to be considered significant. As illustrated below in
Table 8 for test average time, and Table 9 for test average rating and Table 10. for test
incompleteness. The p-value verified that the datasets from the tests with the consensus
tools is significantly better than the one without for all the results variables this research
considered (i.e. time, rating, and completeness).
Table 8 T-test Analysis for Test Time
133
Table 9 T-test Analysis for Test Rating
Table 10 T-test Analysis for Test Incompleteness
In conclusion, the unorganized tagging test scenario represents insufficient
MCS whereas the organized tagging represents sufficient MCS, or more sufficient MCS.
Essentially the variation of unorganized vs. organized tagging, or the variation of using
the consensus formation tool and not using the consensus formation tool is a variation in
the MCS sufficiency, which is the test factor this research varied to generate results for
the result variables, which is time, ratings and completeness. Since the ANOVA analysis
indicated that the variation between the results variables were significant based on the
change in the test factor, this research can state that the improvements this research see
with the consensus formation tool is significant. Therefore, the hypothesis that
improvement of the MCS improves the UUT and therefore enforces the functional
134
periodicity of MUEEWS is confirmed. This research will discussed this further in the
below section, Hypothesis Validation.
4.7.4 Hypotheses Validation
4.7.4.1 User Usability Tolerance Validation
As stated in the Functional Complexity Management section above, the UUT
limit determines the functional periodicity of MUEEWS. The UUT limit is the cutoff for
the functional periodicity of MUEEWS. It’s the assumption that the UUT limit is
equivalent to the users quitting out of performing a function, which is the abandoning of a
test in the test scenario. It is the first hypothesis is that exceeding the UUT limit causes
user abandonment, which was illustrated with the completeness results variable of the
tests. Furthermore, this research was able to illustrate that test time and rating, along with
incompleteness, were the most consistent indicators of when the UUT limit has been
exceeded.
4.7.4.2 Meta-Content Sufficiency Validation
It is the second hypothesis that the use of consensus formation tools can
promote the improvement of MCS. As this research saw from the Test Results section,
although the same numbers of tags were used overall, the users generated less unique tags
with the consensus formation tools, which indicate convergence of the tags.
Furthermore, it is the second hypothesis that the growth of MCS prevents exceeding the
upper bound of the UUT limitation. To validate this hypothesis, this research illustrated
that incompleteness is greatly reduced with the use of consensus formation tools. And
tests using tags without the consensus formation tools showed poor test time and rating
comparative to tests with consensus formation tools.
135
4.7.4.3 Functional Periodicity Validation
Furthermore, this research illustrated the second hypothesis that insufficient
MCS is a primary cause for UUT behaviors that exceed the UUT limit. The use of
consensus formation tools was the way of controlling the variation of MCS sufficiency,
where the use of consensus formation tools represented more sufficient MCS and the lack
of user of the consensus formation tools represented insufficient MCS. Therefore, by
showing that the simulation and test environments that used tagging with consensus
formation tools had significantly more convergence on the tag usage and was less likely
to exceed the UUT limit, this research illustrated the hypothesis. The hypothesis also
stated that the use of consensus formation tools can promote the sufficiency of MCS.
The sufficiency of MCS is measure by tag usage percentage stability as illustrated in the
Test Scenarios section above. This research showed through the tagging environment
results that users generated much more stable tag usage percentages in the tagging
environment with consensus formation tools. This shows a faster convergence of the tags
and therefore the formation of MCS sufficiency.
Furthermore, this research showed that sufficient MCS improves UUT behavior.
It is the hypothesis that the MCS directly impacts the UUT and therefore partially
determines the functional periodicity of MUEEWS. This research showed through the
results variables that the tests using tags generated with the consensus formation tools
show significant improvements in the UUT measurements. Also, the insufficient MCS,
or the tests that used tags generated without consensus formation tools lead to more
incompleteness. Furthermore, this research illustrated that all the test result variables
such as time, completeness and ratings all had a significant improvement with more
136
sufficient MCS. Further discussion regarding how these results are significant to the
hypotheses and theoretical framework of MUEEWS are discussed in the Conclusions
section below.
137
CHAPTER 5: CONCLUSION & CONTRIBUTION
5.1 Conclusions
The Theoretical Foundation section illustrated that UGC or “social media”
applications are in fact Massive User Enabled Evolving Web Systems (MUEEWS),
where user-generated meta-content are used to enable evolving functionalities of the
system. By making users a key stakeholder in enabling its functionalities, MUEEWS
creates an evolution in its functionalities that occurs during the usage phase of the
software. this research illustrated in the Research Background section that current
Software and Web Engineering models do not explain the effects of this phenomenon,
and current UGC research tend to target specific applications or elements and are difficult
to apply extensively. Therefore, this research set out to explore the various elements of
this evolution and the means of improving its functional performance and complexity.
In this Research Background section, this thesis identified various current state of
the art researches that have focused on improving the meta-content of MUEEWS
systems. The majority of these researches focused on means of improving the meta-
content offline, which follows the more traditional software engineering approaches by
taking the users out of the equation. In those scenarios the developers are the sole
contributors for the improvement of the meta-content. Although these methodologies
produced some interesting understanding regarding UGC meta-content, they take place
offline. Therefore, these approaches only serve to leverage the meta-content for other
usages and not the improvement of the meta-content for the users of those UGC
applications.
138
Therefore, the method proposed in this research deals both with the meta-content
and the users directly to improve the quality of the meta-content. The overall approach
focuses on user consensus formation over the meta-content they are generating. In the
specific experiment designed to validate its hypotheses, this research analyzed the users
coming to a consensus regarding tags they generated for URL storage and indexing. This
approach affects the meta-content of the MUEEWS system as the user generate their
meta-content and the users are able to see the evolved meta-content as they continue in
the meta-content contribution process. This methodology allows the improvement of the
meta-content to occur during the usage phase of the MUEEWS system and also directly
impacts the users’ experience with the system.
The current lack of fundamental theoretical understanding creates many
complexities for MUEEWS developers and users in both the development and usage
phase. Using the LCA and functional complexity analysis, this research found that
MUEEWS systems are prone to falling into the LCA paradox and creating a
combinatorial functional complexity, both of which results in poor system functional
performance, degradation, and possible failures. In the analysis in the theoretical
foundation of MUEEWS, this research found that MUEEWS fits the description of a
latency group, where the group size is so large, the users cannot see the effects of their
contributions on the system as a whole, and therefore have no incentives to contribute.
In MUEEWS, the cause of the combinatorial functional complexity is the constant
influx of users’ contributions. Furthermore, in MUEEWS systems, the users have the
clearest understanding regarding the meta-content they generated, yet they have no
incentives to act due to latency. Therefore, the users must be directly involved in
139
deciding on how the content is evolving and used in the systems they participate in. This
is the exact argument the PAR researchers are making regarding social projects where the
PAR process should be applied. Therefore, this research inserted the PAR process’
foundations into the theoretical framework for MUEEWS, which resulted in a process
that allows users to reflect on each other’s contributions as they contribute to the system.
As illustrated in the Simulation and Test Environment section, one such implementation
this research utilities to validate the theoretical framework is a tag recommendation tool.
The key for maintaining control of the functional complexity of MUEEWS lies in
finding its functional periodicity. There are essentially two factors that contribute to the
functional periodicity of MUEEWS: the MCS and UUT. The MCS distinguishes whether
or not enough meta-content is available for the system to function as intended. The UUT
measures a user’s usability performance during usage of the system. The UUT is the also
the key cutoff factor in determining the functional periodicity of MUEEWS because in
order to ensure that sufficient meta-content is generated, the user must continue to use the
software. If the software tasks become too difficult for users, they simply stop using the
software, ending their contribution to the meta-content.
In the theoretical framework’s functional complexity analysis section, it was
shown that the usability values of the MUEEWS must be kept under the UUT limit or the
MUEEWS functionalities will move outside of its functionalities design range and
frustrate the users to the point where they will abandon the system. It was also shown
that the meta-content must reach a sufficient level before MCS stability can be
established. Therefore, the hypothesis of this research is that is that without the
application of the theoretical framework, the MCS’ evolution can lead to system
140
performance degradation and eventually, failure of the system. Therefore, the three
hypotheses of this research are that consensus formation tools can promote stable MCS
growth, stable MCS can improve the UUT, and UUT improvement can reduce the
probability of system failure in MUEEWS.
The simulation environment is an application that mimics a MUEEWS system.
The test environment mimics particular functionalities in MUEEWS systems that utilize
the meta-content generated by the users. There were two versions of the simulation
environment, or the tagging environment, one where users were given user consensus
formation tools and one where the users were not. The implementation of the consensus
formation tool is a popular tag or tag cloud recommendation function, which
recommends the tags that have been used the most by all users who have tagged the
particular URL the current user is tagging. Since the users can reflect on these tags
recommendations before deciding on whether or not to select them or create their own
during the meta-content generation process, it replicates the PAR process and the
theoretical framework.
There were also two versions of the test environment, one of which used the data
from the simulation environment with the consensus formation tool and one without. The
test environment has two purposes. The first is to provide a means for simulating the
various test scenario conditions for the validation of the test goals. The second is to
gather the UUT limit information, which is essential for determining the functional
periodicity of the MUEEWS system. The tests are based on the search function of the
simulation environment, since search is one of the most important functionalities that
tagging or meta-content generation assists with. The four key result variables are test
141
completion time, correctness, completeness, and the test ratings and comments. These
result factors are combined to determine the UUT of the application as well as the UUT
limit. The test factor this research varies to generate different test result variables is the
MCS sufficiency. This is used to validate the hypothesis that the MCS sufficiency
determines the UUT of the system.
There are essentially two goals the test scenarios are trying to illustrate. The first
goal is to validate that without consensus formation, the user-generated meta-content’s
evolution can create system performance degradation, which is reflected in growing
usability problems. The second goal is to validate that consensus formation tools (e.g.,
popular tag recommendation and tag cloud sharing) can promote meta-content
sufficiency growth. There are essentially three test scenarios. The first is the base case
or untagged test scenario, which allows users to perform their searches solely based on
keywords generated by the developers. The second is the unorganized tagging test
scenario, which allows users to perform their searches based solely on the tags they
generated without any consensus tools, which is used to validate test goal one. The third
is the organized tagging test scenario, which allows users to perform their searches based
on their generated tags with the use of consensus formation tools, which utilities meta-
content imported from URLs in the delicious
TM
system, which is used to validate test
goal two.
The MCS or the tagging test results concluded that less unique tags were
generated in the organized than the unorganized tagging environment. Since the numbers
of tags generated overall were the same across both environments, this comparison shows
greater convergence in the organized tagging environment than the unorganized.
142
Furthermore, using the tag stability statistical analysis as presented by Golder and
Huberman, this research was able to verify that the organized tagging environment
generated much more stable tagging behavior than unorganized. The UUT test
hypotheses, which attempts to validate the relationship between MCS sufficiency and
UUT performance, illustrated that the overall average UUT variables that used tags
generated without consensus formation tools had much worse performance overall.
Test incompleteness is the key results variable that determines the UUT limit.
The test completeness results variable illustrate a significant improvement with the
testing environment using tags generated with consensus formation tools. There was
only a minimal probability of incompleteness in the testing environment where the tags
were generated with consensus formation tools versus a variation of probabilities, many
of which were quite high, in the tests environment without. Furthermore, this research
statistically validated the data using t-tests, which showed that the difference between the
two datasets with and without consensus formation tools is disparate enough to be valid.
Therefore, this research is able to validate the hypothesis that user consensus formation
improves the MCS sufficiency, and the MCS sufficiency improves the UUT
performance, make the system less likely to exceed the UUT limit.
Through the tests results, this research was able to validate that the application of
the theoretical framework, which is implemented as a consensus formation tool, was able
to significantly improve the MCS stability of the meta-content generation process.
Furthermore, this research was able to illustrate that the UUT limit and the probability of
incompleteness were both remedied with the application of the theoretical framework.
With the application of the theoretical framework, unlike the current MUEEWS
143
framework of applying meta-content improvement strategies after the meta-content has
been generated, the users are able to improve the quality of meta-content as the users use
the system. Therefore, the ad-hoc nature of the meta-content generation is being
contained within the process itself. The field of UGC is still expanding at an impressive
rate. The work illustrated in this thesis can be expanded in various ways to make further
contributions to the UGC field. Some of these contributions are discussed below in the
Future Work section.
5.2 Contributions
Information resource applications such as MUEEWS have come a long way since
the days of bulletin boards and rudimentary multimedia search engines. However, these
applications still have a fair number of challenges to overcome. The key challenge in
these applications is managing its functional complexity. Since MUEEWS not only have
ever expanding users base and user generate content, but also the system’s functionalities
and their performances also evolve as the users contribute their meta-content, it creates a
great deal of complexity that is very difficult to deal with. The PAR based theoretical
framework is the solution proposed by this research for dealing with the combinatorial
functional complexity of MUEEWS.
There have been other researches that have focused on the meta-content of these
systems. Formal taxonomy and ontology approaches in collaborative tagging systems
attempt to directly leverage the meta-content by applying or deriving ontologies with the
meta-content. However, these approaches tend to be too restrictive and conflict with the
main appeal of MUEEWS systems, which is its informality[63, 125]. Other methods such
as ontology seeding asks users to provide further ontological contributions towards the
144
tags’ organization, which may not appeal to users of large-scale web applications due to
the overhead of such contributions. Furthermore, there hasn’t been enough testing
performed to validate ontology seeding or data-mining methods in the collaborative
tagging field. However, these methodologies do not affect the meta-content of the
MUEEWS system during the usage phase of the system, therefore, the improvement they
make do not affect the users’ meta-content or the functionalities’ performances.
Statistical and pattern approaches such as Golder and Huberman’s work, which
this research have discussed extensively, are able to generate a great deal of information
regarding the meta-content and don’t require any extra contribution from the users.
However, their approach also deals with the meta-content outside of the system, and the
information derived from these statically approaches do not improve the meta-content
directly. Furthermore, the insights derived from these methods also do not impact the
users’ meta-content generation and therefore cannot directly improve the meta-content’s
evolutionary process. Social network approaches uses the social networks of the users to
improve the understanding the users of the MUEEWS system, and visualization
approaches improve the users’ understanding of their own usage behaviors. However,
these approaches are indirect in their impact on the meta-content.
The approach of this research is to build a theoretical framework that captures
how the meta-content in MUEEWS evolve and affect the system as a whole. The tag
recommendation tool is only one such example of a way to improve the meta-content
evolution in MUEEWS systems. With the theoretical framework and its functional
complexity analysis established, UGC developers are able to realize the status their UGC
systems’ evolutions to avoid system failure. Furthermore, they can use the principles of
145
the theoretical framework to identify various ways to improve the quality of their user-
generated meta-content. The unique contribution of this work is that it illustrates an
effective methodology in leveraging the users of the system to improve the meta-content
they generated rather than relying on outside sources. This approach has not only been
illustrated above to make significant improvements on the meta-content and its
corresponding system functionality performance, but also keeps the users engaged in the
system by allowing them to see the effects of their contributions.
The key in applying the theoretical framework of MUEEWS is to focus on the
various means of aiding the users in forming a consensus on how entities should be
represented in their shared space. Another approach, aside from making tag
recommendations, is to give users editing rights to the meta-content such as they have
wiki systems. In addition to these two approaches, there are various other technological
advancements that could serve to address the complexity problems of MUEEWS, such as
ubiquitous mobile devices or AI methodologies that could contribute and confirm user
information indirectly or automatically, without requiring explicit user inputs[89].
Therefore, the state of MUEEWS systems are still open to various research and
developments, all of which could serve to meet the challenges discussed in this paper and
make various other improvements.
146
CHAPTER 6: FUTURE WORK
The future work of this research will be persuaded in two directions; theory
refinement and simulation implementation improvement. To further refine the theoretical
framework discussed above, this research will be extended to consider other large group
behavioral theories as well as collaboration and negotiation theories. There will be
further experiments designed to target those theories and approach them with the analysis
procedure discussed above. Furthermore, the consensus formation of the users will also
be further analyzed based on various negotiation and collaboration theories. The research
along this line will be used to further improve the consensus formation as well as meta-
content generation. In addition, these research topics will improve the current
understanding regarding the evolution of meta-content in the field of UGC.
The simulation implementation improvement has various aspects, some of which
targets methodology and others targeting the implementation. One such improvement
could focuses on various methods that the users can interact with the meta-content other
than recommendations. This future work can focus on the implementation of new
consensus formation tools to further extend the users’ impact on the state of the meta-
content in the system. The key to this future work is based on giving the users more
feedback control over the MUEEWS systems’ meta-content, since the more control the
users has, the greater the improvement they can make to the meta-content. However, the
usability balance must be kept so that the users are not spending more time learning the
tools than using the system. Therefore, the tools for future implementations will focus on
ones that web users are already familiar with.
147
For example, one such an approach is to implement a wiki style editing
environment for meta-content such as tags, titles and etc. For each entity in the UGC
application, the user can edit the entire meta-content of that entity, as they can in
wikipedia. There will be rolling back editing and various other wiki features to ensure
that the users’ contributions cannot be intentionally harmed by other users. This
approach also provides an interesting venue for the application of the various
collaboration and negotiation theories discussed above, since such topics often arise when
users are directly affecting each other’s contents. In addition, giving the users full editing
control also further expands the PAR concepts onto the theoretical framework discussed,
as it further empowers the users to act on their own behalf and enables them to see the
direct effects their contributions have on the system as a whole.
The pure implementation improvements for the system would be to expand the
user base of the system and give the users more interaction tools so that they are able to
contribute more types of content and have more control over their accounts.
Furthermore, the other implementation improvements can be to expand on the simulation
and test environments to illustrate their MCS and UUT results automatically as the users
use the system. Therefore, the users as well as the developers can more closely monitor
the evolution of the meta-content as the system evolves. These are simple a few
examples of the potential future work in this research. The field of UGC is expanding at
a rapid pace, extending into mobile devices and various other new and unique
applications. The gap between the technical advancement and the understanding of the
behaviors of these systems is therefore widening, which makes this research and its future
work very important the field of UGC applications.
148
BIBLIOGRAPHY
69. Al-Khalifa, H., Davis, HC. FolksAnnotation: A Semantic Metadata Tool for
Annotating Learning Resources Using Folksonomies and Domain Ontologies. in
Proceedings of the Second International IEEE Conference on Innovations in
Information Technology. 2006. Dubai, UAE: IEEE Computer Society.
78. Al-Khalifa, H.S., H.C. Davis, and L. Gilbert. Creating Structure From Disorder:
Using Folksonomies To Create Semantic Metadata. in In Proceedings of the 3rd
International Conference on Web Information Systems and Technologies
(WEBIST). 2007. Barcelona, Spain: Springer.
68. Aurnhammer, M., P., Hanappe, L. Steels, Augmenting Navigation for
Collaborative Tagging with Emergent Semantics. iswc2006.semanticweb.org, 5th
International Semantic Web Conference, 2006.
62. Bateman, S., C., Brooks, G., McCalla, Collaborative Tagging Approaches for
Ontological Metadata in Adaptive E-Learning Systems. In the Proceedings of the
Fourth International Workshop on Applications of Semantic Web Technologies
for E-Learning (SW-EL 2006) in conjunction with 2006 International Conference
on Adaptive Hypermedia and Adaptive Web-Based Systems (AH2006), June 20,
2006. Dublin, Ireland. pp. 3-12
74. Begelman, G., P. Keller, and F. Smadja, Automated tag clustering: Improving
search and exploration in the tag space, in Collaborative Web Tagging
Workshop, 15th International World Wide Web Conference. 2006.
20. Bodzin, A.M., J.C.P., A Study of Preservice Science Teachers Interactions with a
Web-Based Forum. Electronic Journal of Science Education, 1998
6. Boehm, B., Guidelines for Lean Model-Based (System) Architecting & Software
Engineering (LeanMBASE). 2006.
14. Boehm, B., D.P., M.A., Avoiding the Software Model-Clash Spiderweb.
Computer, 2000. 33(11).
91. Boehm, B., Software Engineering Economics. 1981, Upper Saddle River, NJ:
Prentice Hall.
92. Boehm, B., Software cost estimation with Cocomo II. 2000: Prentice Hall.
95. Boehm, B., A View of Future Systems and Software Engineering.
http://sunset.usc.edu/events/2006/CSSE_Convocation/presentations/BoehmFuture
.PPT, 2006
149
25. Bonk, C.J., V.P.D., Massive Multiplayer Online Gaming: A Research Framework
for Military Training and Education. 2005, Office of the Under Secretary of
Defense for Personnel and Readiness.
28. Boyd, D., Jeffery Potter, Social Network Fragments: An Interactive Tool for
Exploring Digital Social Connections. Conference on Computer Graphics and
Interactive Techniques, 2003
115. Brajnik, G. Automatic web usability evaluation: what needs to be done. In Proc.
Human Factors and the Web, 6th Conference, Austin TX
52. Burrell, Jenna, G.K.G., Kiyo Kubo, and Nick Farina, Context-Aware Computing:
A Test Case, in UbiComp 2002: Ubiquitous Computing. 2002. p. 647-653.
77. Cattuto, C., Loreto, V., Pietronero, L., Collaborative Tagging and Semiotic
Dynamics. In the Proceedings of the National Academy of Sciences, 2007
86. Choy, S.-O., A. K. Lui. Web Information Retrieval in Collaborative Tagging
Systems. in IEEE/WIC/ACM International Conference on Web Intelligence (WI
2006). 2006. Hong Kong: IEEE Computer Society.
73. Dix, A., S., Levialdi, A., Malizia, Semantic Halo for Collaboration Tagging
Systems. Workshop on the Social Navigation and Community based Adaptation
Technologies, 2006.
13. Dobson, K., D.B., W.J., J.D., H.I., Creating Visceral Personal and Social
Interactions in Mediated Spaces. Conference on Human Factors in Computing
Systems, 2001
44. Donath, J., D.B., Public displays of connection. BT Technology Journal, Vol 22,
No 4, October 2004
114. Dumas, J.S., J.R., A Practical Guide to Usability Testing. 1999: ACM Press.
49. Ebner, W., J.M.L., H.K. Trust in Virtual Healthcare Communities: Design and
Implementation of Trust-Enabling Functionalities. in 37th Hawaii International
Conference on System Sciences. 2004.
34. Emigh, William, S.C.H. Collaborative Authoring on the Web: A Genre Analysis
of Online Encyclopedias. in 38th Annual Hawaii International Conference on
System Sciences (HICSS'05). 2005. Hawaii.
123. Farrar, S., T.L., linguistic ontology for the semantic web. Glot International, 2003.
Vol 7, No 3.
72. Fellbaum, C., ed, An Electronic Lexical Database. MIT Press, 1998.
150
124. Fergus, R., P.P., A.Z., A Visual Category Filter for Google Images. 10th
International Conference on Computer Vision (ICCV), 2005.
53. Figallo, C., Rhine, N., Tapping the Grapevine: User-Generated Content.
econtent, 2001.
4. Flickr, www.flickr.com.
90. Gaedke, M., Development and Evolution of Web-Applications Using the
WebComposition Process Model. Lecture Notes in Computer Science, 2001.
58. Golder, S., BA Huberman, Usage patterns of collaborative tagging systems
Journal of Information Science, 2006.
116. Gould, J.D., C.L., Designing for Usability: Key Principles and What Designers
Think. ACM, 1985.
109. Greenwood, D.J., W.F.W., I.H., Participatory Action Research as a Process and
as a Goal. Human Relations, 1993.
9. Grossman, L., Invention of the Year. TIME, 2006.
60. Gruber, T., Ontology of Folksonomy: A Mash-up of Apples and Oranges.
International Journal on Semantic Web and Information, 2007
82. Halpin, H., V Robu, H Shepherd, The complex dynamics of collaborative tagging.
in Proceedings of the 16th international conference on World Wide Web, 2007.
63. Hammond, T., T., Hannay, B., Lund, J., Scott, Social Bookmarking Tools (I). D-
Lib Magazine, dx.doi.org, 2005.
85. Hassan-Montero, Y., V., Herrero-Solana. Improving Tag-Clouds as Visual
Information Retrieval Interfaces. in InSciT2006 conference. 2006. Merída.
46. Heer, J., D.B., Vizster: Visualizing Online Social Networks., in Proceedings of the
2005 IEEE Symposium on Information
104. Hinchcliffe, D. Architectures of Participation: The Next Big Thing. 2006 August
http://web2.wsj2.com/architectures_of_participation_the_next_big_thing.htm.
117. Hornbeak, K., B.B.B., P.C., Navigation Patterns and Usability of Zoomable User
Interfaces with and without an Overview. ACM Transactions on Computer-
Human Interaction, 2002. 9(4): p. 362–389.
70. Hotho, A., R Jaschke, C Schmitz, G Stumme, FolkRank: A Ranking Algorithm for
Folksonomies., Proceedings of the Conceptual Structures Tool, 2006.
151
50. Jokela, T. Authoring tools for mobile multimedia content. in Multimedia and
Expo, 2003. ICME '03. 2003.
5. Kappel, G., E.M., B.P., S.R., W.R., Web Engineering - Old wine in new bottles.
Web Engineering, Springer Berlin, Heidelberg, 2004
47. Kelly, S., Uberoi, C.S., S.F. Designing for improved social responsibility, user
participation and content in on-line communities. in Conference on Human
Factors in Computing Systems. 2002.
106. Kemmis S., R.M., The Action Research Reader. 1988: Deakin University.
107. Kemmis S., R.M., Participatory Action Research. Handbook of Qualitative
Research. 2000: Sage Publications.
75. Kipp, M.E.I.a.D.G.C. Patterns and Inconsistencies in Collaborative Tagging
Systems: An Examination of Tagging Practices. in Proceedings Annual General
Meeting of the American Society for Information Science and Technology. 2006.
Austin, Texas.
36. Kiyoki, Y., T.K, T.H., A Metadatabase System for Semantic Image Search by a
Mathematical Model of Meaning. Sigmod Record, 1994. 23(4).
40. Koskinen, I. User-generated content in mobile multimedia: empirical evidence
from user studies. in ICME. 2003.
81. Lambiotte, R., Ausloos, M., Collaborative tagging as a tripartite network.
Lecture Notes in Computer Science, 3993:1114–1117, 2005.
51. Laukkanen, T., Modding scenes: Introduction to user-created content in computer
gaming. Hypermedia Laboratory Net Series, 2005. 9.
42. Lerman, K., L. Jones. Social browsing on flickr. in Proc. of International
Conference on Weblogs and Social Media (ICWSM-07),. 2007.
29. Leuf, B., and Cunningham, W., The Wiki Way: Collaboration and Sharing on the
Internet. 2001, Boston: Addison-Wesley.
24. Lih, A., Wikipedia as Participatory Journalism: Reliable Sources? Metrics for
evaluating collaborative media as a news resource. Nature, 2003.
65. Linden, G., B.S., J.K., Amazon.com Recommendations: Item-to-Item
Collaborative Filtering. IEEE, 2003.
152
33. Majchrzak, A., C.W., D.Y., The Role of Shapers in Virtual Firm-based Practice
Networks, under review at Management Science, 2007.
10. Mappr. [cited; Available from: http://www.mappr.com/.]
71. Marchetti, A., Tesconi, M., Ronzano, F., Rosella, M., Salvatore, M. SemKey: A
Semantic Collaborative Tagging System. in WWW2007. 2007. Banff, Canada.
38. Marlow, C., M.N., D.B., M.D., HT06, Tagging Paper, Taxonomy, Flickr,
Academic Article, ToRead. in Proceedings of the seventeenth conference on
Hypertext and hypermedia, 2006
39. Marlow, C., M.N., D.B., M.D., Position Paper, Tagging, Taxonomy, Flickr,
Article, ToRead. WWW2006, Edinburgh, UK, 2006
57. Mathes, A., Folksonomies - Cooperative Classification and Communication
Through Shared Metadata. in Proceedings of Computer Mediated
Communication LIS590CMC, 2004.
108. McTaggart, R., Principles for Participatory Action Research. Adult Education
Quarterly, 1991. 41(168).
67. Michlmayr, E., and S.Cayzer, Learning User Profiles from Tagging Data and
Leveraging them for Personal(ized) Information Access. WWW, 2007.
66. Mika, P. Ontologies are us: A unified model of social networks and semantics. in
In Proceedings of the Fourth International Semantic Web Conference (ISWC
2005). 2005. Galway, Ireland.
16. Miller, P., Web 2.0: Building the New Library. Ariadne, 2005. 46.
118. Mobasher, B., R.C., J.S., Automatic Personalization Based on Web Usage
Mining. Communications of the ACM, 2000. 43(8).
7. MySpace, www.myspace.com.
88. Naphade, M., Smith, J., Tesic, J., Chang, S., Hsu, W., Kennedy, L., Hauptmann,
A., Curtis, J., Large Scale Concept Ontology for Multimedia. IEEE Multimedia,
2006. 13(3): p. 86-91.
94. Norton, K., S., Applying Cross-Functional Evolutionary Methodologies to Web
Development. WebEngineering 2000, ed. S.M.a.Y. Deshpande: Springer-Verlag
Berlin Heidelberg.
153
89. O’Hara, K., Kindberg, T., Glancy, M., Baptista, L., Sukumaran, B., Kahana, G.
and Rowbotham, J., Collecting and Sharing Location-based Content on Mobile
Phones in a Zoo Visitor Experience. CSCW, 2007. 16(1-2): p. 11-44.
80. Ohmukai, I., M. Hamasaki and H. Takeda. A Proposal of Community-based
Folksonomy with RDF Metadata. in The 4th International Semantic Web
Conference (ISWC2005). 2005. Galway, Ireland.
101. Olson, M., Logic of Collective Action. 1965: Harvard University Press.
12. Panoramio. www.panoramio.com
79. Paolillo, J.C., S. Penumarthy. The Social Structure of Tagging Internet Video on
del.icio.us. in 40th Annual Hawaii International Conference on System Sciences
(HICSS 2007). 2007. Waikoloa, Big Island, Hawaii: IEEE Computer Society.
11. Photogalaxy. www.photogalaxy.com
112. Preim, B., C.T., W.S., O.P. Integration of Measurement Tools in Medical 3d
Visualizations. in IEEE Visualization. 2002. Boston, MA.
19. Press, A., Now Starring on the Web: YouTube. Wired News, 2006.
15. Pressman, R.S., Software Engineering: A Practitioner's Approach with Bonus
Chapter on Agile Development. 5 ed. 2003: McGraw-Hill
Science/Engineering/Math.
84. Quintarelli, E. Folksonomies: power to the people. in ISKO Italy-UniMIB
meeting. 2005. Italy.
32. Robbins, J.E., Adopting Open Source Software Engineering (OSSE) Practices by
Adopting OSSE Tools, in Making Sense of the Bazaar: Perspectives on Open
Source and Free Software, B.F. J. Feller, S. Hissam & K. Lakham (Eds.)
Sebastopol, Editor. 2003, O’Reilly & Associates.
41. Rockwell, R., An infrastructure for social software. Spectrum, IEEE, 1997. 34(3):
p. 26-31.
122. Rodríguez, A., N.G., D.M.S., O.T., Automatic Analysis of the Content of Cell
Biological Videos and Database Organization of Their Metadata Descriptors.
IEEE Transactions on Multimedia, 2004. 6(1).
64. Rogers, I., The google pagerank algorithm and how it works.
http://www.iprcom.com/papers/pagerank/, 2002.
154
87. Russell, T. cloudalicious: folksonomy over time. in Proceedings of the 6th
ACM/IEEE-CS joint conference on Digital libraries. 2006.
22. Sandler, T., Collective action: Theory and applications: University of Michigan
Press. 1992
102. Sandler, T., Collective action: Theory and applications. 1992: University of
Michigan Press.
17. Schmitz, P. Inducing Ontology from Flickr Tags. in Proceedings of IW3C2. 2006.
Edinburgh, UK.
76. Shen, K.a.L.W., Folksonomy as a Complex Network.
http://arxiv.org/abs/cs/0509072, 2005.
35. Shimizu, H., Y.K., A.S., N.K., A Decision Making Support System for Selecting
Appropriate Online Databases. IEEE, 1991.
113. Shneiderman, B., C.P., Designing the User Interface: Strategies for Effective
Human-Computer Interaction: Addison-Wesley.
48. Smith, M.A., Voices from the WELL: The Logic of the Virtual Commons,
Department of Sociology, UCLA, California.
37. Spiliopoulou, M., Improving the Effectiveness of a Web Site with Web Usage
Mining. Lecture Notes In Computer Science, 1999. 1836: p. 142 - 162
2. Spinoza, B., Social-isms. Spectrum Online, 2007.
121. Srivastava, J., R.C., M.D., P.N.T., Web usage mining: discovery and applications
of usage patterns from Web data. ACM SIGKDD Explorations Newsletter, 2000.
Vol 1, No 2.
56. Sturtz, D.N., Communal Categorization: The Folksonomy. INFO622: Content
Representation, 2004.
21. Suh, N.P., Equilibrium and Functional Periodicity: Fundamental Long-Term
Stability Conditions for Design of Nature and Engineered Systems. Research in
Engineering Design, 2004. 15.
96. Suh, N.P., A Theory of Complexity, Periodicity and the Design Axioms. Research
in Engineering Design, 1999. 11(2).
97. Suh, N.P. Complexity in Engineering. in Dn Keynote Paper. 2005
155
98. Suh, N.P., Role of Information and Complexity in Engineering Design. American
Physical Society, Annual APS March Meeting, Indianapolis, Indiana, March 18 -
22, 2002
99. Suh, N.P., Tribophysics. 1986., Englewood Cliff, N. J.: Prentice-Hall.
100. Suh, N.P., S.K., Surface Engineering. Annals of CIRP, 1987. 36: p. 403-408.
111. Suh, N.P. Application of Axiomatic Design to Engineering Collaboration and
Negotiation. in 4th International Conference on Axiomatic Design. 2006.
103. Svendsen, G.T., U.S. interest groups prefer emission trading: A new perspective.
Public Choice, 1999 101(1-2 ).
54. Thurman, N., Participatory journalism in the mainstream: Attitudes and
implementation at British news websites. 2006.
105. Tosh, D., B.W. Creation of a learning landscape: weblogging and social
networking in the context of e-portfolios. Technical report, University of
Edinburgh, 2004
45. Tsvi, Kuflik, P.S. Generation of user profiles for information filtering — research
agenda. in Annual ACM Conference on Research and Development in
Information Retrieval. 2000. Athens, Greece: ACM.
43. Viegas, F., B., D.B., D.H.N., J.P., J.D., Digital Artifacts for Remembering and
Storytelling: PostHistory and Social Network Fragments. in Proceedings of the
37th Annual Hawaii International Conference on System Sciences, 2004
8. Voss, J., Collaborative thesaurus tagging the Wikipedia way. Collaborative Web
Tagging Workshop, 2006.
125. Voss, J., Collaborative thesaurus tagging the Wikipedia way. Wikimetrics
research papers, 2006. 1(1).
110. Wadsworth, Y., What is Participatory Action Research. Action Research Issues
Association, 1993.
30. Wagner, C., Wiki: A technology for conversational knowledge management and
group collaboration. Communications of the AIS, 2004. 13(9): p. 265–289.
31. Wagner, C., A.M., Enabling Customer-Centricity Using Wikis and the Wiki Way.
Journal of Management Information Systems, 2006. 23(3): p. 17-44.
55. Wal, T.V., Explaning and Showing Broad and Narrow Folksonomies. 2005.
156
93. Warren, P., C.B., M.M. The Evolution of Websites. in 7th International Workshop
on Program Comprehension. 1999.
27. Wellman, B., J.S., D.D., L.G., M.G., C.H., Computer Networks as Social
Networks: Collaborative Work, Telework, and the Virtual Community. Annual
Review of Sociology, 1996.
23. Whyte, W.F., Participatory Action Research. 1991: Sage Publications.
1. Wikipedia. User-generated content. 2007, http://en.wikipedia.org/wiki/User-
generated_content.
26. W.P.o.t.I., Economy, PARTICIPATIVE WEB: USER-CREATED CONTENT.
SourceOCDE Science et technologies de l'information, 2007. 2007(15): p. i-
128(129).
59. Wu, X., Zhang, L., Yu. Y. Exploring Social Annotations for the Semantic Web. in
WWW2006. 2006. Edinburgh, Scotland.
61. Xu, Z., Yun Fu, Jianchang Mao, and Difu Su, Towards the Semantic Web:
Collaborative Tag Suggestions. Collaborative Web Tagging Workshop at
WWW2006, 2006.
18. Yee, K.-P., K.S., K.L., M.H. Faceted metadata for image search and browsing. in
SIGCHI conference on Human factors in computing systems. 2003.
3. YouTube, www.youtube.com.
119. Zaiane, O., Web Usage Mining for a Better Web-Based Learning Environment. in
Proceeding of Conference on Advanced Technology for Education, 2001.
120. Zaiane, O.R., L. J., Towards evaluating learnersapos; behaviour in a Web-based
distancelearning environment. Advanced Learning Technologies, 2001.
83. Zhang, L., X. Wu and Y. Yu, Emergent Semantics from Folksonomies: A
Quantitative Study. Journal on Data Semantics 2006. VI(4090): p. 168-186.
Abstract (if available)
Abstract
User-Generated Content Websites[1], or "social media" Websites[2], such as YouTube[3] or Flickr[4], have gained tremendous success for their ability to use user-generated meta-content to further their user experience. However, current UGC developments resemble the ad-hoc Web developments of the 1990s and Software developments of the 1960s[5]. Past history has shown what these approaches can lead to, and there are already failed UGC developments caused by these uninformed approaches. In this thesis, the research will be geared towards finding a theoretical framework that target the unique behavior of UGCs, and discovering means to remedy their problems and complexities.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A synthesis approach to manage complexity in software systems design
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Prediction of energy consumption behavior in component-based distributed systems
PDF
Composable risk-driven processes for developing software systems from commercial-off-the-shelf (COTS) products
PDF
Building mashups by example
PDF
A search-based approach for technical debt prioritization
PDF
A synthesis reasoning framework for early-stage engineering design
PDF
Efficient processing of streaming data in multi-user and multi-abstraction workflows
PDF
Cloud-enabled mobile sensing systems
PDF
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Adaptive resource management in distributed systems
PDF
Tag based search and recommendation in social media
PDF
Hybrid beamforming for massive MIMO
PDF
Managing functional coupling sequences to reduce complexity and increase modularity in conceptual design
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
Asset Metadata
Creator
Li, Qingfeng Anna
(author)
Core Title
Massive user enabled evolving web system
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/02/2008
Defense Date
05/22/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
collaborative tagging,OAI-PMH Harvest
Language
English
Advisor
Lu, Stephen C.-Y. (
committee chair
), Boehm, Barry W. (
committee member
), Jin, Yan (
committee member
)
Creator Email
qingfenl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1316
Unique identifier
UC1181286
Identifier
etd-Li-20080702 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-86974 (legacy record id),usctheses-m1316 (legacy record id)
Legacy Identifier
etd-Li-20080702.pdf
Dmrecord
86974
Document Type
Dissertation
Rights
Li, Qingfeng Anna
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
collaborative tagging