Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling social and cognitive aspects of user behavior in social media
(USC Thesis Other)
Modeling social and cognitive aspects of user behavior in social media
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING SOCIAL AND COGNITIVE ASPECTS OF USER BEHAVIOR IN
SOCIAL MEDIA
by
Jeon-Hyung Kang
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2015
Copyright 2015 Jeon-Hyung Kang
Dedication
Dedicated to my family and friends.
ii
Acknowledgements
This dissertation would not have been possible without the help and support of
my family, friends, and colleagues.
Foremost, I would like to thank my advisor Kristina Lerman for her advice and
support. For the past seven years, Kristina has been an exemplary teacher, mentor
and collaborator. I also thank my qual and dissertation committee, Ramakant
Nevatia, Fran cois Bar, Yan Liu and, Aram Galstyan for their insightful comments
and suggestions. I have spent great summers at IBM research, SRA-SV, and Yahoo
Labs. During this time, I was getting to know various elds and had enjoyable
experiences by interactions with many other employees. Thank you for hosting
excellent and enriching summer internships. I would like to thank Jerey Nichols,
Juhan Lee, and Zornitsa Kozareva for the immense advice and support.
I am fortunate to have interacted with an amazing group of friends and col-
leagues, inside and outside ISI, and I would like to recognize their contribution
to my dissertation. Discussions with them have greatly in
uenced this work. I
have greatly enjoyed my interactions with: Anon Plangprasopchok, Rumi Ghosh,
Farshad Kooti, Suradej Intagorn, Xiaoran Yan, Yoonsik Cho, Bo Wu, Mohsen
Taheriyan, Aman Goel, Shubham Gupta, Gowri Kumaraguruparan, Jason Riesa,
and Ashish Vaswani. All of you have been a great in
uence to my research through
discussion, collaboration, and constructive criticism.
iii
I thank all my friends and family for their support. I especially thank my
parents Younghwan Kang and Younghee Park who have given me a lifetime of love
and care. Thank you all my friends. Your emotional supports have salvaged my
sanity on many occasions. A special thanks to Jangwon Kim, Jukyung Kim, Mina
Park, Soyun Kim, Woojoo Lee, Songhee Yoo, Dayung Koh, Younggon Kim, Mia
Lee, and Angela Park. Last but not least I want to thank everyone that I forgot
to mention.
iv
Contents
Dedication ii
Acknowledgements iii
List of Tables viii
List of Figures ix
Abstract xiii
1 Introduction 1
1.1 Information in Online Social Networks . . . . . . . . . . . . . . . . 1
1.1.1 Modeling User Behavior . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Understanding Access to Information . . . . . . . . . . . . 4
1.2 Motivation and Applications . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Information Adoption in Social Media . . . . . . . . . . . . 5
1.2.2 Network Structure, Cognitive Constraints and User Eort
on Information Access in Social Media . . . . . . . . . . . . 7
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Dissertation overview . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Overview and Survey of Related Works 16
2.1 Social Media Overview . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Network Structure and Information . . . . . . . . . . . . . . . . . . 19
2.3 Information Diusion in Social Media . . . . . . . . . . . . . . . . . 21
2.4 Modeling Users with Recommender Systems . . . . . . . . . . . . . 22
2.5 Social and Cognitive Factors in Social Media . . . . . . . . . . . . . 24
3 Modeling Social Factors 26
3.1 Eect of The Social Ties . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Interactions and Proximity . . . . . . . . . . . . . . . . . . . 28
v
3.1.2 Data Sets (Digg 2009 and Twitter 2010) . . . . . . . . . . . 31
3.1.3 Experimental Results: Activity Prediction . . . . . . . . . . 33
3.2 Modeling Limited Attention over Ties . . . . . . . . . . . . . . . . . 34
3.2.1 Social Recommendation Setting . . . . . . . . . . . . . . . . 35
3.2.2 LA-LDA Model . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Learning Parameters . . . . . . . . . . . . . . . . . . . . . . 38
3.2.4 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.5 Data Sets (Digg 2010) . . . . . . . . . . . . . . . . . . . . . 42
3.2.6 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Improving Explanatory Power by Integrating Words . . . . . . . . . 52
3.3.1 LA-CTR Model . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.2 Learning Parameters . . . . . . . . . . . . . . . . . . . . . . 57
3.3.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.5 Experimental Results: Evaluation on Vote Prediction . . . . 60
3.4 Social In
uence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Modeling Cognitive Factors 65
4.1 Modeling Position Biases . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.1 Vip Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.2 Learning Parameters . . . . . . . . . . . . . . . . . . . . . . 71
4.1.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.4 Data Sets (Twitter 2012) . . . . . . . . . . . . . . . . . . . . 73
4.1.5 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Improving Explanatory Power by Integrating Words . . . . . . . . . 81
4.2.1 Learning Parameters with Stochastic Optimization . . . . . 86
4.2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2.4 Experimental Results: User-Item Adoption Prediction . . . . 88
4.3 Visibility vs Item Fitness vs Personal Relevance . . . . . . . . . . . 91
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 Scalable Mining of Social Data 96
5.1 Probabilistic Models for Social Data Mining . . . . . . . . . . . . . 98
5.1.1 Probabilistic Matrix Factorization . . . . . . . . . . . . . . . 98
5.1.2 Collaborative Topic Regression . . . . . . . . . . . . . . . . 100
5.2 Learning Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.1 Inference for PMF . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.2 Inference for CTR . . . . . . . . . . . . . . . . . . . . . . . 106
vi
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.2 Performance of SGFS using PMF . . . . . . . . . . . . . . . 110
5.3.3 Performance of SGFS using CTR . . . . . . . . . . . . . . . 112
5.3.4 Performance of hybrid SGFS using CTR . . . . . . . . . . . 114
5.3.5 Scalability of distributed SGFS . . . . . . . . . . . . . . . . 115
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6 Information Access in Online Social Networks 121
6.1 Structural Bottlenecks to Information Access . . . . . . . . . . . . . 122
6.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.1.2 Access to Information . . . . . . . . . . . . . . . . . . . . . 129
6.1.3 Bottlenecks to Information Access . . . . . . . . . . . . . . . 138
6.2 User Eort and Network Structure . . . . . . . . . . . . . . . . . . 142
6.2.1 Data Sets (Twitter 2014) . . . . . . . . . . . . . . . . . . . . 143
6.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.2.3 Information and Network Structure . . . . . . . . . . . . . . 146
6.2.4 Increasing Exposure to Diverse Information . . . . . . . . . 147
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7 Conclusions and Future Directions 156
7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . 157
7.2 Vision for the Future Works . . . . . . . . . . . . . . . . . . . . . . 159
7.2.1 Recommending Diverse Information . . . . . . . . . . . . . . 159
7.2.2 Scalable Approach for Real-time Analysis . . . . . . . . . . . 161
7.2.3 Understanding Cognitive Factors of Users in Access to Infor-
mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Reference List 163
Appendix 174
vii
List of Tables
3.1 Some of the proximity measures used for network analysis, including
four proposed in this dissertation . . . . . . . . . . . . . . . . . . . 29
3.2 Evaluation of predictions by dierent proximity measures in the
Digg and Twitter data sets. Lift is dened as % change over baseline. 34
3.3 Average deviation between empirical user interests and those learned
by each topic model. N
x
is the number of user interests and N
z
is
the number of item topics. . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Average precision of each model's predictions. . . . . . . . . . . . . 51
4.1 Model parameters used in this study. . . . . . . . . . . . . . . . . . 74
4.2 Model parameters used in this study. . . . . . . . . . . . . . . . . . 86
4.3 Overall prediction performance comparison using Precision@x (P@x),
Recall@x (R@x), normalized DCG@x (nDCG@x) on Twitter dataset. 89
4.4 Cascade size, expected values, descriptions on Youtube video URLs 92
4.5 Cascade size, expected values, descriptions on news article URLs . . 93
6.1 Variables used in the study. . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Pairwise correlations between variables in the Digg 2009 and 2010
data sets. Note that asterisk (**) shows statistically signicant cor-
relation with p<.01. . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3 Variables used in the study. . . . . . . . . . . . . . . . . . . . . . . 144
6.4 Keywords associated with the top 10 topics of users in dierent
positions within the network. Users are divided into two populations
based on their network diversity (ND). . . . . . . . . . . . . . . . . 146
A.1 Synthetic dataset characteristics for dierent parameter values. . . . 182
viii
List of Figures
2.1 Screenshots of social news aggregator Digg. . . . . . . . . . . . . . 17
2.2 Screenshots of microblogging service Twitter. . . . . . . . . . . . . 18
3.1 TheLA-LDA model. . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Model Selection: Log-likelihood for dierent number of interests and
topics for (a) ITM and (b)LA-LDA on the Digg 2009 data set. . . 45
3.3 The average deviation of user interest () and item topic ( ) with
dierent limited attention values ( and ) on synthetic. The top
two gures show average deviation between learned and actual
when (a) =0.05 and =0.05, 0.1, 0.5, and 1.0 and (b) =0.05 and
=0.05, 0.1, 0.5, and 1.0. The bottom two gures show average
deviation between learned and actual when (c) is xed to 0.05
and (d) is xed to 0.05. . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Topic distribution in Digg 2009 and Digg 2010 data set. . . . . . . . 48
3.5 TheLA-CTR model. . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Recall of in-matrix prediction for (a) Digg 2009 and (b) Digg 2010
data set by varying the number of recommended items (@X ) with
the 200 topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.7 One example user from Digg 2009 data set. We show top 5 topics
in learnedu
i
and top 3 topics in learned social in
uence
il
from his
three friends. We also show the bag of words that represents each
topic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.8 Recall of in-matrix prediction for Digg 2009 by varying the number
of recommended items (@X ) with the 50 topics. . . . . . . . . . . . 62
4.1 The Vip model (user topic proles (u), item topic proles (),
item tness (), personal relevance of an item to user (), visibility
to user (v), expected number of new posts user received () and
adoption(r)). N is the number of users and M is the number of
items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
ix
4.2 (a) Recall of user{item adoption prediction with dierent numbers
X of recommended items. The number of topics was xed at 30.
(b) Average recall@3 of user{item adoption prediction for dierent
activity levels (based on the number of adoptions) of users with 30
topics. Error bars are shown indicating standard deviation with the
upper bar and the lower bar. . . . . . . . . . . . . . . . . . . . . . 76
4.3 Dierent models predict cascade growth after the rst k adoptions
for three URLs: (a) technology news article, (b) youtube video, and
(c) political news article. . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 (a) Average correlation between the actual and predicted cascade
growth values for URLs with dierent size of cascade. (b) root-
mean-square error (RMSE) between cascade growth values predicted
by each model and the values actually observed. . . . . . . . . . . 80
4.5 Our model with user topic (u) and item topic () proles, item's
personal relevance () and visibility to user (v), item tness (),
expected number of new posts user received () and item adoption
(r). Topic model part has the topic distribution () of an item and
a distribution() over words from a vocabulary of size M. N is the
number of users, and D is the number of items. . . . . . . . . . . . 82
4.6 (a) Cascade size vs the expected values of item tness E(I) of all
items adopted through friends' recommendations. (b) Cascade size
vs expected values of item tness plus personal relevance E(I+P) for
all adopters. The size and color each circle represents the expected
value of that item's visibility. . . . . . . . . . . . . . . . . . . . . . 92
5.1 Graphical representation of (a) PMF and (b) CTR models. It cap-
tures the user interests (U), item topics (V ) for recommendation,
and item topics () to explain contents. The user-item ratings (R)
and the words (W ) of the items are observed variables. . . . . . . . 98
5.2 Held-out set prediction RMSE (root mean squared error) using Gra-
dient descent, Gibbs sampling, and Stochastic gradient sher scoring
(SGFS) inference algorithms on MovieLens data. . . . . . . . . . . 111
5.3 Evaluation results on the KDD Cup 2012, CiteULike, and Twitter
2012 data sets (Section 4.1.4) using Distributed SGFS. Top column
reports likelihood and bottom column reports recall@X. . . . . . . 113
5.4 Recall@X evaluation results on (a) KDD Cup 2012 (b) CiteULike,
and (c) Twitter 2012 using distributed hybrid SGFS. . . . . . . . . 115
5.5 The average CPU time per epoch with dierent number of sample
size on KDD 2012 Data set. The number of thread is xed as 1. . . 116
5.6 The average CPU time per epoch with dierent number of threads
on KDD 2012 Data set. The sample size is xed as 200. . . . . . . . 117
x
6.1 Amount of novel information (NRI
i
) a user is exposed to as a func-
tion of the number of active friends (S
i
) in the Digg 2010 data set.
The line represents the total number of distinct stories in the data
set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.2 Novel information in a user's network in the Digg 2010 data set. (a)
The total amount of novel information that user's friends (NRI
frds
)
and the user (NRI) are exposed to as a function of average friend
activity (or channel bandwidth B). Solid symbols show smoothed
data, and the line represents the total amount of information in
the network (number of distinct stories in the data set). (b) Novel
information rate as a function of friend activity. . . . . . . . . . . . 131
6.3 Scatterplot showing network diversity vs average friend activity (chan-
nel bandwidth) for Digg users who are divided into three populations
based on the number of friends in the Digg 2010 data set. The plot
demonstrates the diversity-bandwidth trade-o. . . . . . . . . . . . 133
6.4 (a) Topical diversity (TD) and (b) novelty (NRI) of information to
which Digg users are exposed as a function of their network diversity
(ND) and average friend activity (or channel bandwidth B) in the
Digg 2010 data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.5 Amount of new information injected into the network by users (col-
ored symbols) in dierent positions of network diversity (ND) and
friend activity (B). Symbol size represents the relative number of
seeded stories. Seeding users are divided into classes based on the
number of friends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.6 (a) Total amount of novel information that user's friends are exposed
to (NRI
frds
i
) as a function of user's network diversity (ND
i
) and
friend activity (B
i
) in the Digg 2010 data set. (b) Fraction of
novel information adopted (FNAR
i
) by friends (NRI
i
=NRI
frds
i
)
as a function of network diversity and friend activity in the Digg
2010 data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.7 Number of stories adopted by the user as a function of the number
of active friends (S) in the Digg 2010 data set. . . . . . . . . . . . . 141
6.8 Diversity of received information as a function of user's network
size. Users are divided into four populations based on their eort:
red circles represent the more active users, (who post more than 5.3
tweets per day on average), green stars represent the 2nd quartile
(3.1O
i
<5.3), black triangles represent 3rd quartile (1.9O
i
<3.1)
and the blue squares represent that bottom quartile users (who post
fewer than 1.9 tweets per day on average). We discretize values into
equal-sized bins for each quartile. . . . . . . . . . . . . . . . . . . . 148
xi
6.9 Friend topic diversity (FTD
i
) of a user as a function of the net-
work diversity (ND
i
) in the Twitter 2014 data set. We show the
average of FTD
i
for the same network diversity (ND
i
) users with
their standard deviation ranges in grey color. Users in the higher
network diversity positions tend to be exposed to more diverse infor-
mation, with active users receiving more diverse information regard-
less of their position in the network structure. We groupND values
into equal-sized bins and compute the mean of both ND andFTD
within each bin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.10 Network diversity (ND) as a function of the number of active friends
(S) in the Twitter 2014 data set. We use equal-sized bins for each
class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.11 Histograms of Network diversity (ND) of users in the Twitter 2014
data set. Users are divided into two populations based on their
eort (O). The peak of top 50% users is higher than bottom 50%
users, while bottom 50% users tend to have higher ND. . . . . . . 152
xii
Abstract
The spread of information in an online social network is a complex process that
depends on the nature of information, the structure of the network, and the behav-
ior of social media users. Understanding this process will allow us to forecast
information diusion for an early detection of trending topics and to mitigate the
problem of information overload by ltering out irrelevant information. Probabilis-
tic models can be used to learn users' preferences from their historical information
adoption behaviors, and in turn, recommend new relevant items or predict how far
a given item of information will spread. However, current models ignore social and
cognitive factors that shape user behavior. One such factor is attention, the cogni-
tive mechanism that integrates perceptual features to select the items the user will
consciously process. Research suggests that people have limited attention, which
they divide non-uniformly over all incoming messages from their social contacts.
We propose a collaborative topic regression model that learns which of their social
contacts users pay attention to, and use it to analyze user decisions to spread items
recommended by their online friends.
Another consequence of limited attention is that people attend more to items
near the top of their message stream than items lower down, which take more eort
to discover. We use visibility to capture the eects of limited attention. Visibility
of an item depends on its position in a message stream, and is determined by
xiii
a number of factors, including the number of new messages arriving in the user's
stream and the frequency the user visits the site. We propose a probabilistic model
that accounts for users' limited attention in their information adoption behavior.
The model incorporates user's interests, and the popularity and visibility of items
to the user. We use the model to study information spread on a popular social
media site. By accounting for the visibility of items, we can learn a better, more
predictive model of user interests and item topics. This work shows models of
user behavior that account for cognitive factors can better describe and predict
individual and collective behavior in social media.
Another central topic of my dissertation is understanding how users can increase
the access to diverse and novel information in online social networks. Social sci-
entists have developed in
uential theories about the role of network structure in
information access. However, previous studies of the role of networks in informa-
tion access were limited in their ability to measure the diversity of information.
Furthermore, It is not clear how these theories generalize to online networks, which
dier from real-world social networks in important respects, including asymmetry
of social links and the limited capacity to manage huge volume of information. We
study the interplay between network structure, the eort Twitter users are willing
to invest in engaging with the site, and the diversity of information they receive
from their contacts. We address this problem by learning the topics of interest to
social media users by applying proposed models on messages they share with their
followers. We conrm that users in structurally diverse network positions, which
bridge otherwise disconnected regions of the follower graph, are exposed to more
diverse information. In addition, we identify user eort as an important variable
xiv
that mediates access to diverse information in social media. These ndings indi-
cate that the relationship between network structure and access to information in
networks is more nuanced than previously thought.
xv
Chapter 1
Introduction
1.1 Information in Online Social Networks
Online social media has emerged as an important platform for social interaction and
information exchange. Specically, social media sites, such as Facebook, Twitter
and Digg, allow users to create social networks by subscribing to, or following,
other users. When a user posts or shares a message, this message is broadcast to
all followers, who may themselves choose to share it with their own followers, and
so on, enabling the message to spread over the social network. The main interest
of my research has been in understanding the mechanisms of access to information
(i.e., messages that users received from their friends) and information adoption
(i.e., messages that users share or re-share with their followers) in social media.
1.1.1 Modeling User Behavior
Information spread in social media is believed to be a complex process that depends
on the nature of information [Romero et al., 2011b], the structure of the net-
work [Bakshy et al., 2012, Weng et al., 2013], the strength of social in
uences [Bak-
shy et al., 2011, Romero et al., 2011a], as well as user interests and topic prefer-
ences [Agarwal and Chen, 2009, Koren et al., 2009, Yu et al., 2009, Agarwal and
Chen, 2010, Wang and Blei, 2011, Chua et al., 2012]. However, our research sug-
gests that information spread in social media depends on a number of relatively
1
simple factors including who posted the information, how the site displays informa-
tion, how users navigate it to nd items of interest, and the likelihood of sharing
the item upon exposure. Understanding this process will allow us to mitigate
the problem of information overload by recommending only relevant information
(i.e., items or messages containing text, images, or videos) to users. Recommender
systems are able to play an important role in helping social media users nd rel-
evant information by suggesting information of potential interest to them. While
social recommender systems are dened as any recommender systems that target
social media domains [Guy and Carmel, 2011] , such as items [Salakhutdinov and
Mnih, 2008b, Koren et al., 2009, Agarwal and Chen, 2010, Wang and Blei, 2011,
Chua et al., 2012], tags [Sigurbj ornsson and Van Zwol, 2008], users [Agarwal and
Bharadwaj, 2013, Chen et al., 2009a], and communities [Chen et al., 2009b], we
focus on item recommendation in this dissertation. With large amounts of social
media data and massive computing power available, we see opportunities to study
user's information adoption in social media, based on the history of user behavior
in social media. We propose a conceptually simple and general models of user
behavior that can be modied or applied to any user-item recommender system
in other domains. The challenges in mining user behavior from social media data
mainly arise from three aspects: characteristics of social media data,
exibility of
the model to capture characteristics of human behavior and scalability of compu-
tation.
Sparseness and Observability of Social Media Data
Social data is often highly heterogeneous and characterized by a long-tailed distri-
bution [Lerman and Ghosh, 2010, Kang and Lerman, 2012b]. For social recommen-
dation applications, this translates into a few users sharing many items and many
2
users sharing few items. Similarly, when it comes to the contents, each item may
be limited to a certain number of characters, and as a result abbreviated syntax
is often introduced. Consequently, the texts are very noisy, with broad topics and
repetitive, less meaningful content. Another characteristic of social media data set
is inobservable information. When a user does not adopt an item, this action can
be interpreted into two ways: either the user was aware of an item but did not like
it, or the user would have liked an item but never saw it.
Model Flexibility
There are many dierent kinds of social media services: online social networks,
like Facebook, Twitter, and LinkedIn, social bookmarking services, like Delicious,
photo sharing sites like Flickr, Picasa and Instagram, and social news aggregators,
like Digg and reddit. Data from dierent services are dierent in their format and
types. In general, most of social media web sites provide a platform to build social
network among users by posting and sharing items. However, other information,
such as user prole, descriptions of items, source of item adoption, will not always
available. A model of user behavior must be
exible, so that it can be extended
easily to take into account other features and aspects, including cognitive and
social factors.
Scalability and Online Methods
The rapid growth of the amount of social media data in the form of videos,
microblog posts and other items shared on social media, creates a great demand
for computationally eective and ecient approaches. At the same time, with the
advance of research, models become complex to account for various aspects of user
behavior in social media. To handle complex models and huge volume of real time
3
data, proposed approach should be scalable while not t to the head part of the
long-tailed distribution, mainly involving popular items or active users in social
media.
To this end, Bayesian probabilistic models are appealing for social data min-
ing. They allow researchers to incorporate prior knowledge, capture uncertainty in
the hidden parameters and avoid overtting. Bayesian models [Salakhutdinov and
Mnih, 2008b] can also be easily extended to include additional sources of evidence,
such as words [Wang and Blei, 2011] or social network structure [Purushotham
et al., 2012]. Furthermore, several scalable inference algorithms have been
developed to estimate the hidden variables of Bayesian models in a tractable
way [Andrieu et al., 1999, Zinkevich et al., 2010, Bradley et al., 2011, Mimno
et al., 2012, Gemulla et al., 2011, Welling and Teh, 2011, Ahn et al., 2012].
1.1.2 Understanding Access to Information
People use their social contacts to gain access to information in social net-
works [Granovetter, 1973, Burt, 2004], which they can then leverage for personal
advantage. However information in social networks is non-uniformly distributed,
leading sociologists to explore the relationship between an individual's network
position and the novelty and diversity of information she receives through her
social contacts.
Studies of social and organizational networks identied the importance of so-
called brokerage positions, which link individuals to otherwise unconnected peo-
ple [Granovetter, 1973, Burt, 1995, 2005, Aral and Van Alstyne, 2011]. By spanning
distinct communities, brokerage positions expose individuals to novel and diverse
4
information, which leads to new job prospects [Granovetter, 1973] and higher com-
pensation [Burt, 1995, 2004]. However, the links that connect individuals in broker-
age positions to the rest of the network, generally represent weaker relationships
(i.e., acquaintances rather than close friends) [Granovetter, 1973, Onnela et al.,
2007]. The less frequent interactions along these \weak" links limit the amount
of information
owing to individuals [Aral and Van Alstyne, 2011]. Thus, those
who are able, and willing, to invest greater eort in social interactions, will man-
age more connections thereby increasing the volume of information they receive
through those links [Aral and David, 2012, Miritello et al., 2013]. Specically, Aral
and Van Alstyne [2011] showed that individuals can increase the diversity and nov-
elty of information they receive via email either by placing themselves in brokerage
positions, or by communicating more frequently with their social contacts.
However, it is not clear how these theories generalize to online networks, which
dier from real-world social networks in important respects, including asymmetry
of social links, broadcast communication and sucient control to manage their
social connections and process information. To this end, we use social media data
sets and dene a set of network and information variables to characterize access
to information in the online social network.
1.2 Motivation and Applications
1.2.1 Information Adoption in Social Media
While social media allows individuals and organizations to communicate and
exchange information more widely, broadly, frequently, and in real-time than tra-
ditional communication channels, more information is created than social media
5
users can process. As a result, the problem of information overload has been dras-
tically exacerbated. For social media users, certain friend requests, messages, or
postings on their social network is unwanted information, or social spam [Brown
et al., 2008, Jagatic et al., 2007, Zinman and Donath, 2007]. Learning user behav-
ior, user preferences and interests, topics of items and relations between them,
identify communities of like-minded users, and so on could be used to lter the
vast amount of new user-generated content to deliver to users only the relevant,
timely or interesting items. Furthermore, understanding individual's behavior in
social network is important since it will decide the extent and speed of information
diusion. Predicting information diusion in its early stage is of an immense prac-
tical and commercial interest. Prediction can guide the design of more eective
marketing and public awareness campaigns.
Collaborative ltering methods examine item recommendations of many peo-
ple to discover their preferences and recommend new items that were liked by
similar people. User-based [Herlocker et al., 1999] and item-based [Sarwar et al.,
2001, Karypis, 2000] recommendation approaches have been developed for pre-
dicting how people will rate new items. Matrix factorization-based latent-factor
models [Salakhutdinov and Mnih, 2008b, Koren et al., 2009] have shown promise
in creating better recommendations by incorporating personal relevance into the
model. Probabilistic topic modeling techniques were merged with collaborative
ltering to improve the explanatory power of recommendation tools [Agarwal and
Chen, 2010, Wang and Blei, 2011]. Social recommendation system has been pro-
posed by matrix factorization techniques for both user's social network and their
item rating histories [Ma et al., 2008, Yang et al., 2011] without using the contents
of items or with the contents of items [Purushotham et al., 2012]. However, existing
approaches largely ignore cognitive aspects of user behavior in social media.
6
1.2.2 Network Structure, Cognitive Constraints and User
Eort on Information Access in Social Media
Social media provides us with new data for testing and generalizing information
brokerage theories. In contrast to email and phone interactions, where informa-
tion is exchanged between a pair of social contacts, social media users broadcast
information to all their contacts. Bakshy et al. [2012] showed that weak links
collectively deliver more novel information to Facebook users, even though they
interact infrequently with these contacts. These ndings suggest that an easy way
for social media users to increase their access to diverse information is by creating
more links, e.g., by following other users. However, cognitive (and temporal) con-
straints limit an individual's capacity to manage social interactions [Dunbar, 1992,
Goncalves et al., 2011b, Miritello et al., 2013] and process the information they
receive [Weng et al., 2012, Hodas and Lerman, 2012]. In addition, social media
users vary greatly in the eort they expend engaging with the site, leading to a
large variation in user activity, as measured by the number of messages posted on
the site [Wilkinson, 2008]. The impact of this variation on the information indi-
viduals receive and their position in the network is not known in social media. Do
users who are able (or at least willing) to be more active on the site receive more
diverse information? Do they curate their social links so as to move themselves into
network positions that provide more diverse information? Furthermore, previous
studies of the role of networks in individuals access to information were limited
in their ability to measure the diversity of information, using bag-of-words [Aral
and Van Alstyne, 2011] or predened categories [Kang and Lerman, 2013b] for
this task. In this dissertation, we study the interplay between network structure,
user activity and information content. Understanding the role of network structure
7
and user eort on information access can guide the design of better recommender
systems.
1.3 Contributions
In this dissertation, we address a number of important questions regarding the
information adoption and access of social media users. We answer these questions
by proposing probabilistic approaches for modeling social and cognitive factors and
studying the relationship between network structure and information access.
The main questions this dissertation focuses are the following.
Q1. How to construct psychologically motivated predictive models of online
human behavior from social media data? (Chapter 3 and Chapter 4)
Researchers have proposed algorithms for social recommendation, in which a user
is likely to adopt an item based on her latent topic proles, the latent topic pro-
les of items, and social network structure among users [Koren et al., 2009, Ma
et al., 2008, Yang et al., 2011, Wang and Blei, 2011]. However, people have nite
attention due to our brain's limited capacity for mental eort [Kahneman, 1973,
Rensink et al., 1997]. Limited attention was shown to constrain user's online social
interactions [Goncalves et al., 2011b] and the spread of information [Hodas and
Lerman, 2012, Kang and Lerman, 2013b] in social media. While existing mod-
els of information spread largely ignore the psychological and cognitive aspects of
user behavior, we begin to construct psycho-socially motivated predictive models
of online human behaviors from social media data.
We begin by considering social factors aecting behavior in online social net-
works. While users divide their limited attention over their friends [Hodas and
8
Lerman, 2012], some friends receive a larger share of their attention than oth-
ers [Gilbert and Karahalios, 2009]. We propose a model [Kang et al., 2013] that
captures a user's limited attention, which she must divide among all those she
follows. Specically, the proposed model includes a notion of social in
uence, that
is, the fact that the user may preferentially pay more attention to some, e.g., close
friends [Gilbert and Karahalios, 2009], depending on the topics of information.
Another cognitive factor aecting behavior is position bias. Due to this cogni-
tive bias, the amount of attention an item receives strongly depends on its position
on a screen or within a list of items. Position bias is known to aect the answers
people select in response to a multiple-choice questions [Payne, 1951, Blunch, 1984],
where on the screen they look [Buscher et al., 2009, Counts and Fisher, 2011], and
the links on a web page they choose to follow [Craswell et al., 2008, Huberman,
1998]. Also as a consequence of position bias, items near the top of a user's social
media stream are more salient, and therefore, more likely to be viewed, than items
in lower positions [Hogg et al., 2013, Hodas and Lerman, 2012, Hogg and Lerman,
2012].
Furthermore, given the inobservable data on information adoption behaviors of
users in social media, it is not clear whether user does not adopt an item because
the user was aware of an item but did not like it, or because the user would have
liked it but never saw it. Since position data and items viewed by the user are often
not directly available, we estimate their visibility from the user's information load.
This quantity measures the number of items a user has to inspect before nding a
specic item to adopt, and it is given by the number of new messages arriving in
the user's stream and the frequency the user visits the stream [Hogg et al., 2013].
The greater the number of new messages in the stream | either because the user
follows more people or because she rarely visits her stream | the more eort it
9
takes to discover item; therefore the less visible the item is. Accounting for item
visibility [Kang and Lerman, 2015b] improves learned models of user behavior.
Contributions:
we close the gap between behavioral and statistical recommendation models
by studying social and cognitive factors of social media users.
we develop probabilistic approaches (LA-LDAandLA-CTR) to learn how
users divide their attention over their social contacts and topics from noisy
social data using a probabilistic framework.
we propose Vip model to properly account for cognitive factors from inob-
servable user behaviors in social media.
we show that psychologically motivated predictive models of user behavior
can more accurately predict whether users adopt items recommended by
friends than models ignoring social and cognitive factors.
we show that psycho-socially motivated model provides more in-depth insight
on information spread in social media by allowing us to independently assess
the impacts of visibility, item tness, and personal relevance on information
diusion in social media.
Q2. How can user behavior models be eciently learned from large volumes of
real-time, noisy social media data? (Chapter 5)
Social media data is often highly sparse, heterogeneous, and characterized by
long-tailed distribution [Lerman and Ghosh, 2010, Kang and Lerman, 2012b]. For
social recommendation applications, high sparsity of data translates into a few
users rating many items and many users rating few items. Furthermore, growing
10
amount of social data has become available for analysis in recent years and with the
advance of research, models become complex to account for various aspects of user
behavior in social media. To handle complex models and huge volume of real time
data, proposed approach should be scalable while not overtting to the head part of
the long-tailed distribution, mainly involving popular items or active users in social
media. Researchers have used a variety of inference procedures to learn model
parameters from huge volume of real time data sucient. Stochastic Gradient
Fisher Scoring (SGFS) method was recently proposed for ecient inference. This
method samples from a Bayesian posterior using small number of data samples in
each iteration, instead of the entire data, to speed up the inference process.
In this dissertation, we explore the feasibility of using distributed SGFS for
scalable social data mining, specically focusing on the task of social recommen-
dation using probabilistic matrix factorization (PMF) [Salakhutdinov and Mnih,
2008b] and collaborative topic regression model (CTR) [Wang and Blei, 2011].
We verify that SGFS often outperforms other inference methods in dense data,
but it fails in the sparse \long-tail" where there are not enough instances for it
to learn parameters. We propose hybrid SGFS inference algorithm [Kang and
Lerman, 2013c] that takes the sparseness of social media data into account. To
overcome the sparseness of user-item adoption social media data, we also extend
all proposed models to incorporate texts of items and show better performance on
user-item adoption prediction task.
Contributions:
we extend the social (LA-LDA) and cognitive (Vip) user behaviors models
to take into account textual information of items to overcome the sparsity of
social media data and improve the explanatory power as well as prediction
accuracy.
11
we explore the feasibility of Stochastic Gradient Fisher Scoring (SGFS) for
social data mining.
we propose hybrid distributed SGFS (hSGFS) and evaluate its performance
on a variety of social data sets.
we nd that hSGFS is better able to predict held out items in data sets that
have a long-tailed distribution.
Q3. How does the interplay between network structures and user behaviors, aect
users' access to novel and diverse information in social media? (Chapter 6)
Social media provides us with new data for testing and generalizing information
brokerage theories. First, we study the interplay between network structure and
information content by analyzing how users of the social news aggregator Digg
adopt stories recommended by friends, i.e., users they follow. We measure the
impact dierent factors, such as network position and activity rate, have on access
to novel information [Kang and Lerman, 2013d].
While users can increase the volume of information they receive by adding more
friends, individuals have limited capacity to manage their social connections and
process information. In this dissertation, we also use data from the microblogging
site Twitter to study the interplay between network structure, the eort Twitter
users are willing to invest in engaging with the site, and the diversity of infor-
mation they receive from their contacts. Previous studies of the role of networks
in individual's access to information were limited in their ability to measure the
diversity of information. For example bag-of-words [Aral and Van Alstyne, 2011]
or predened categories [Kang and Lerman, 2013d] for this task. It is problematic
since it is crucial to represent an information as a mixture of topics to measure the
diversity of information
exible enough. In contrast, we learn topics of interest to
12
social media users from the messages they share with their followers using theVip
model. We use learned topics to measure the diversity of information users receive
from their contacts. This enables us to study the factors that aect the diversity
of information in online social networks.
Contributions:
we investigate the relation between the structure of the social network, the
content of information, and the eort of users and the activity of their friends.
we validate the existence of trade-o between network diversity and friends'
activities in online social networks.
we show that increasing friend activity aects novel information access, while
increasing network diversity provides access to more topically diverse friends,
but not the other way around.
we show that structural bottlenecks limit access to novel information in online
social networks. We show that a user can improve his information access by
linking to active users, though this becomes less eective as the number of
friends, or their activity, grows due to structural network constraints.
we show that the amount of novel information a user is exposed to increases as
she adds more friends, but saturates quickly. Similarly, linking to friend who
are more active improves access to novel information, but as the redundancy
increases, higher friend activity can no longer increase the amount of novel
information accessible to the user.
we nd that user eort is an important variable mediating access to infor-
mation in networks. Users who invest more eort into their activity on the
13
site not only place themselves in more structurally diverse positions within
the network than the less engaged users, but they also receive more diverse
information when located in similar network positions.
1.4 Dissertation overview
The outline of this dissertation is as follows. In Chapter 2, we begin with an
overview of social media sites including a social news aggregator Digg and a social
microblogging site Twitter. We review related works on information adoption,
information cascade size prediction models in social media, and the role of net-
work structure in information access. We also review the works on social and
cognitive aspects of users in social media. In Chapter 3, we focus on modeling
social aspects of user behavior in social media and propose limited attention mod-
els (LA-LDA andLA-CTR), Bayesian probabilistic models that integrate users'
limited attention and social tie into it. In Chapter 4, we focus on modeling cogni-
tive aspects of user behavior in social media. We propose Vip, a Bayesian prob-
abilistic model that takes into consideration visibility, item tness, and personal
relevance all together that is motivated by the mechanisms of users' items adop-
tion in social media. The proposed research deals with modeling user's visibility
in information adoptions to properly account for attention from inobservable part
of user behaviors data in social media. In Chapter 5, we explore the feasibility of
using distributed Stochastic Gradient Fisher Scoring for scalable social data min-
ing, specically focusing on the task of social recommendation using probabilistic
matrix factorization (PMF) [Salakhutdinov and Mnih, 2008b] and collaborative
topic regression model (CTR) [Wang and Blei, 2011] . We chose PMF and CTR
because they serve as bases for several other Bayesian models used in social data
14
mining applications [Salakhutdinov and Mnih, 2008a, Koren et al., 2009, Ma et al.,
2008, Purushotham et al., 2012, Kang and Lerman, 2013a, Yang et al., 2011]. In
Chapter 6, we study how social media users are able to increase the access to novel
and diverse information using proposed model. We validate the role of network
structure on information access by analyzing how users of social media sharing
information. We also identify the importance of the user eort to overcome the
structure bottlenecks on accessing diverse information. Finally, in Chapter 7, we
summarize overall ideas of dissertation and vision for future research.
15
Chapter 2
Overview and Survey of Related
Works
In this chapter we review two main social media services, Digg and Twitter, used
in this dissertation. Next we review the previous works on information adoption,
cascade size prediction, the role of network structure in information access, and
social and cognitive aspects of users in social media.
2.1 Social Media Overview
2.1.1 Digg
Digg (http://digg.com) is a social news aggregator, which at its peak had over
3.8 million monthly U.S. unique visits
1
. Digg allowed registered users to submit
links to news stories and other users to vote for web contents, by \digging" stories
they found interesting. When a user voted for a story, all of their followers could
see the vote, so, the information is broadcast and shared with all their followers.
Digg also allowed users to follow the activity of other users to see the stories they
submitted or dug recently. A user's social stream (Figure 2.1) shows the stories
her friends, people she follows, submitted or voted for. The follow links were not
necessarily reciprocated: a userb who follows usera, can see the messagesa posts,
1
"digg.com - Quantcast Audience Prole". Quantcast.com. July 16, 2012.
16
Figure 2.1: Screenshots of social news aggregator Digg.
but not vice versa. We refer to a as the friend of b, and b as the follower of
a. When we collected of the tens of thousands of daily submissions, Digg picked
about a hundred to feature on its front page. A newly submitted story went to
the upcoming stories list, where it remained for 24 hours, or until it is promoted
to the front page by Digg, whichever comes rst.
17
Figure 2.2: Screenshots of microblogging service Twitter.
2.1.2 Twitter
Twitter (http://twitter.com) is a popular social networking site, with over 288
million monthly active users
2
, that allows registered users to post and read short
text messages (at most 140 characters). This size restriction challenges users to
write well, but at the same time it promotes clever use of language and makes
tweets very easy to scan. Like Digg, Twitter allows users to follow the activity of
others to see the messages they posted or retweeted recently. Twitter's big appeal
is unreciprocated friendship links: a userb who follows usera, can see the messages
a posts, but not vice versa. We refer to a as the friend of b, and b as the follower
2
https://about.twitter.com/company, as of February 2015
18
of a. Being a follower on Twitter is equivalent to being a fan on Digg. Registered
users can also retweet or comment on another users post, usually containing URLs
to online content. Posting a link on Twitter is analogous to submitting a new story
on Digg, and retweeting the post is analogous to voting for it. When a user posts
or retweets a message, it is broadcast to all her followers, who are then able to
see it in their own streams (Figure 2.2). This size restriction and unreciprocated
friendship links made Twitter a popular social tool to track and scan hundreds of
interesting friends rapidly.
2.2 Network Structure and Information
Sociology researchers has been argued that network structure in
uences the infor-
mation diusion. The theoretical arguments, known as \the strength of weak
ties" [Granovetter, 1973], explored the relationship of between social links and the
information people receive along those links. Specically, he argued that weaker
social ties, representing infrequent social interactions, mostly provide users accesses
to new and novel information, which lead to new social and economic opportuni-
ties [Uzzi, 1997, Reagans and Zuckerman, 2001, Reagans and McEvily, 2003, Allen,
2003]. The evidence of weak ties arguments has been studied in job searching [Gra-
novetter, 1973], business relations [Coleman, 1988, Aral and Van Alstyne, 2011],
inter-business [Uzzi, 1996], and social capital [Coleman, 1988]. A strength of tie
can be computed by the frequency of communication or based on the position in
the network also has been studied by various researches. Burt [Burt, 1995, 2004,
2005] argued that weak ties act as bridges between dierent communities. Indi-
viduals with many such ties are in what he termed \brokerage positions" in the
network, which allow them to access, and benet from, novel information residing
19
in diverse sources. In the social network domain, empirical research on mobile
phone calls [Onnela et al., 2007], email communication [Iribarren and Moro, 2011,
Aral and Van Alstyne, 2011] also supported the study of information diusion. In
social media, research by [Grabowicz et al., 2011, Centola and Macy, 2007, Cen-
tola, 2010] supported the weak ties arguments about the nature of interactions on
a network and its structure, whereas others have proposed controversial theoretical
arguments [Zhao et al., 2010] through empirical study.
While the theoretical arguments focused on information diusion in networks,
Aral and Van Alstyne [2011] integrated frequency of contact by introducing channel
bandwidth concept. They observed that network diversity and channel bandwidth
tradeo exists when accessing both novel and diverse information in an email
communication network. The importance of network position to maximize infor-
mation diversity and novelty was argued again in [Aral and David, 2012]. They
also pointed out that limited volume of communication in a diverse network is
not due to relationship maintenance costs, but mostly due to the nature of the
relationships in social networks. In social media, Bakshy et al. [Bakshy et al.,
2012] showed that, although strong ties are individually more in
uential, weak
ties increased the diversity of information received. In this dissertation, we vali-
dated Aral's arguments of tradeos between network diversity and bandwidth for
novel and diverse information access in online social network. We [Kang and Ler-
man, 2013d] showed that increasing activity of social media friends a user follows
aected how much novel information user received from them, while increasing
network diversity provided access to more topically diverse information, but not
the other around. Moreover, with more freedom of users in online social network,
we studied the contribution of user eort to information diversity, as well as the
diversity-bandwidth tradeo.
20
2.3 Information Diusion in Social Media
Social data can be studied to understand the diusion of information, specically,
how far information will spread and which information will go viral. Informa-
tion cascades have been studied in terms of contents [Berger and Milkman, 2009,
Leskovec et al., 2009], temporal patterns of popularity [Szabo and Huberman,
2010], social contagion eects in social network [Suh et al., 2010, Aral and Walker,
2011, Kitsak et al., 2010, Jamali, 2009], and cognitive limit of user [Lerman and
Hogg, 2012, Hogg and Lerman, 2012, Hogg et al., 2013, Lehmann et al., 2012,
Weng et al., 2012]. Weng et al. [2013] studies the vitality of memes with respect
to the structural trapping eect and social reinforcement in Twitter. Authors
showed empirically that while most memes indeed behave like complex contagions,
a few viral memes spread across many communities like diseases. With several
features from user and community interactions, authors predicted the virality of
memes based on early spreading patterns. In this dissertation, we show that sim-
ple models of user behavior that account for human cognitive biases can better
describe and predict both individual and collective behavior in social media. Pre-
dicting the popularity of newly-submitted items has been studied in many previ-
ous research [Hogg et al., 2013, Hogg and Lerman, 2012, Lerman and Hogg, 2012].
They incorporated the various mechanisms through which web sites user interface
and structure into stochastic models and specically focused on the importance
of visibility in social media. Eect of tie strength and social reinforcement on
information spread in social media have been studied in [Bakshy et al., 2012].
However, all these approaches in recommendation systems largely ignore one of
the most important components, personal relevance.
21
2.4 Modeling Users with Recommender Systems
Recommendation systems have long been studied to predict user's rating for previ-
ously unseen item. One of the successful techniques in recommendation is collabo-
rative ltering, which examines item recommendations of many people to discover
their preferences and recommend new items that were liked by similar people.
User-based [Herlocker et al., 1999] and item-based [Sarwar et al., 2001, Karypis,
2000] recommendation approaches have been developed for predicting how people
will rate new items. Matrix factorization-based latent-factor models [Salakhutdi-
nov and Mnih, 2008b, Koren et al., 2009] have shown promise in creating better
recommendations by incorporating user interests into the model. However all these
approaches ignore the content of items in the recommendation task.
Recently, probabilistic topic modeling techniques were merged with collabora-
tive ltering to improve the explanatory power of recommendation tools [Agarwal
and Chen, 2010, Wang and Blei, 2011]. Content analysis based on probabilistic
topic modeling has been proposed to incorporate into collaborative ltering [Agar-
wal and Chen, 2010] to account for user rating and item popularity biases in the
data. Authors show better prediction performance by regularization of both user
and item factors through user features and the content of item. Collaborative
Topic Regression (CTR) model [Wang and Blei, 2011] incorporates collaborative
ltering model based on probabilistic topic modeling. Both models do a good job
in using item content for recommendation; however, neither takes social structure
of users into account.
In social recommendation setting, Chua et al. [2012] have extended LDA, in
which a user is likely to adopt an item based on the latent interests of other users.
Social correlations between users are learned from their item adoption choices,
rather than specied explicitly through the follower graph. Social recommendation
22
system has been proposed by matrix factorization techniques for both user's social
network and their item rating histories [Ma et al., 2008] without using the contents
of items. In CTR-smf [Purushotham et al., 2012], authors integrate CTR with
social matrix factorization models to take social correlation between users into
account. However, these works utilize homophily eect in social media to smooth
the similarity of users' interests to their friends, instead of directly learning how
users allocate their attention over their friends. So depending on the degree of
similarity among connected users, social matrix factorization may not consistently
perform well.
Attention in online interactions has been modeled in [Kang et al., 2013] to
directly learn users' limited attention over their friends and topics. However, this
model was applied to dyadic data [Agarwal and Chen, 2009, Koren et al., 2009,
Yu et al., 2009] without providing any explanation of the learned topics. The
model proposed in this proposal,LA-CTR, makes two important advances. First,
it incorporates the content of items, therefore, being able to provide explanatory
power to its recommendations, as well as recommend new relevant items. Further-
more, the proposed model is able to learn the degree of in
uence of other users in
modeling limited attention.
However, all these approaches in recommendation systems largely ignore the
psychological and cognitive aspects of user behavior. Furthermore, the item adop-
tion mechanism of users in existing social recommendation models is based on
personal relevance of items without taking into account the tness of items.
23
2.5 Social and Cognitive Factors in Social Media
The growing abundance of social media data has allowed researchers to start asking
questions about the nature of in
uence, how social ties interact with the contents
of items, and how these aect transmission of items in social networks [Bakshy
et al., 2011, Romero et al., 2011b]. Homophily, which refers to the tendency of
similar individuals to link to each other, is a strong organizing principle of social
networks. Numerous studies found that people tend to be friends with other who
belong to the same socio-economic class [Feld, 1981, McPherson et al., 2001], and
they tend to follow others in social media who have similar interests [Kang and
Lerman, 2012a]. In the context of information exchange on networks, this means
that content users are exposed to in social media depends on their position in
the network. Users embedded within a community of strongly tied individuals are
likely to share information on topics that other community members are interested
in, while users in brokerage positions that bridge dierent communities receive
information on more diverse topics. In this dissertation, we study the link between
homophily based on topics and network structure.
Recently, researchers demonstrated the importance of attention in online inter-
actions. Specically, the importance of cognitive limits based on the number of
friends [Hodas and Lerman, 2012] and users' activities [Goncalves et al., 2011a]
has been studied in social network. Cognitive constraints on social interactions
provide an interesting perspective on the structure and function of social net-
works. Dunbar argued that people have a limited ability, dened by their brain's
capacity, to manage social interactions, which gives rise to maximum social group
size [Dunbar, 2003]. Although social media was believed to expand the size of
human social networks, research showed that the maximum number of friends
24
that Twitter users interact with is around 100-200 [Goncalves et al., 2011a], simi-
lar to the Dunbar number. Cognitive constraints could also explain the ndings of
[Aral and Van Alstyne, 2011, Aral and David, 2012], namely cognitive constraints
create a trade-o between the complexity of social interactions (given by network
diversity) and the intensity of interactions along structurally complex links, result-
ing in \diversity{bandwidth trade-o." Specically relevant to this dissertation
are studies that show that visibility or social network structure constrains infor-
mation diusion [Hodas and Lerman, 2012] and novel and diverse information
spread [Kang and Lerman, 2013b] in social media.
The model proposed in this dissertation incorporates these ideas into a joint
model with interest and personal relevance for collaborative recommendation. In
this dissertation, we identied the importance of friends' activity on novel infor-
mation access and identied the structure diversity is also highly related to the
contents diversity. Further we argued that the sparse activities in high diverse
network is mainly comes from their interests dierences and cognitive limit and it
in
uences information diusion in the social network. Unlike previous researches,
we examined how users vary in their capacity for social interactions (or activity),
and how this capacity denes their level of engagement with the social media site
itself and their access to diverse information.
25
Chapter 3
Modeling Social Factors
People who are \close" in some sense in a social network are more likely to per-
form similar actions than more distant people [McPherson et al., 2001]. Homophily
refers to the tendency of individuals in a social system to link with others who are
similar to them rather than those who are less similar. Many studies have veri-
ed homophily along demographic dimensions, such as age, location, occupation,
etc., not only in real-world social networks [McPherson et al., 2001, Leskovec and
Horvitz, 2008, Kossinets and Watts, 2009] but also online [Mackinnon, 2006, Kwak
et al., 2010, Kang and Lerman, 2012b]. The community structure that homophily
imposes on a social network may, in turn, through the processes of in
uence [Chris-
takis and Fowler, 2007] and selection [Crandall et al., 2008], cause linked individuals
to become even more similar [Kang and Lerman, 2012a]. Over time, preferential
linking will structure the network in such a way as to make the behavior of individ-
uals [Lerman et al., 2012], the properties of dynamic processes on networks, and
even their future structure [Liben-Nowell and Kleinberg, 2007], more predictable.
Structural proximity and homophily are important social factors, however social
media users have nite attention, which limits the number of incoming messages
from friends they can process. Attention is the mechanism that controls how we
process incoming stimuli and decide what activities to engage in [Kahneman, 1973,
Rensink et al., 1997], and it was recently shown to be an important factor in online
human behavior [Goldhaber, 1997, Wu and Huberman, 2007, Weng et al., 2012].
Actions, such as reading a tweet, browsing a web page, or responding to email,
26
require mental eort, and since human brain's capacity for mental eort is limited,
so is attention. As a consequence, the more stimuli users have to process, the
smaller the probability they will respond to any one stimulus, since they must
divide their limited attention over all incoming stimuli [Hodas and Lerman, 2012].
However, users rarely divide their attention uniformly [Counts and Fisher, 2011,
Gilbert and Karahalios, 2009, Huberman et al., 2009]. Some friends may receive
a greater share of a user's attention due to familiarity, trust, social closeness,
or in
uence. While some of the social cues that guide attention are dicult to
quantify, others, like in
uence, have been intensively studied.
In this chapter, we begin to show how network structures help predict informa-
tion adoption in social media (Section 3.1). Next, we focus on modeling information
adoption by integrating users' limited attention over their friends and topics into
probabilistic models (Section 3.2, Section 3.3.1).
3.1 Eect of The Social Ties
The strength of social ties [Granovetter, 1973] is more precisely dened with net-
work proximity to capture the degree to which people are \close" to each other.
Network proximity has been used to predict which interactions among users are
likely to occur in the futrue [Liben-Nowell and Kleinberg, 2007]. In addition to
standard proximity measures used in the link prediction task, such as neighbor-
hood overlap, we introduce new measures that model dierent types of interactions
that take place between people. The degree to which node is reachable depends
not only on network topology, but also on the nature of interaction between the
nodes [Ghosh et al., 2011a]. One-to-one interactions such as web surng or phone
conversations, can be described as a random walk. However in social media, rather
27
than pick one neighbor to whom to transmit a message, users broadcast it to all
neighbors. Also, since users' capacity to respond to incoming messages from net-
work neighbors is limited by their nite attention [Hodas and Lerman, 2012, 2014],
this may further change the nature of interactions in social media. We proposed
proximity measures [Lerman et al., 2012] that take into account the one-to-many
and limited-attention interactions between nodes.
We study this claim empirically using data about URL adopting activity on
the social media sites Digg and Twitter. We show that structural proximity of two
users in the follower graph is related to similarity of their activity, i.e., how many
URLs they both adopt. We also show that given friends' activity, knowing their
proximity to the user can help better predict which URLs the user will adopt. We
compare the performance of dierent proximity measures on the activity prediction
task and nd that proximity measures that take into account the limited-attention
nature of interactions in social media lead to substantially better predictions.
3.1.1 Interactions and Proximity
Intuitively, network proximity measures the likelihood a message starting at node
u will reach another node v, regardless of whether an edge exists between them.
The greater the number of paths connecting them, the more likely they are to share
information, and the closer they are in the network. Proximity measures used in
previous studies [Liben-Nowell and Kleinberg, 2007, L u and Zhou, 2010] include
the number of common neighbors (CN), fraction of common neighbors, or Jaccard
(JC) coecient, and the Adamic-Adar (AA) score, which weighs each common
neighbor by the inverse of the log of its degree. Table 3.1 gives their denition using
directed neighborhoods ofu andv: =
out
(u)\
in
(v) and
0
=
in
(u)\
out
(v):
Here,
out
(u), represents the set of out-neighbors of node u, which in social media
28
Table 3.1: Some of the proximity measures used for network analysis, including
four proposed in this dissertation
metric definition
CN CN =
1
2
jj +j
0
j
JC JC=
1
2
h
jout(u)\
in
(v)j
jout(u)[
in
(v)j
+
jout(v)\
in
(u)j
jout(v)[
in
(u)j
i
AA AA =
1
2
h
P
z2
1
log(d(z))
+
P
z
0
2
0
1
log(d(z
0
))
i
RW RW =
1
2
P
z2
1
dout(u)dout(z)
+
1
2
P
z2
0
1
dout(v)dout(z)
LA RW LA RW =
1
2
P
z2
1
dout(u)d
in
(z)dout(z)d
in
(v)
+
1
2
P
z2
0
1
dout(v)d
in
(z)dout(z)d
in
(u)
EPI EPI =
1
2
jj +j
0
j
LA EPI LA EPI =
1
2
P
z2
1
d
in
(z)d
in
(v)
+
1
2
P
z2
0
1
d
in
(z)d
in
(u)
corresponds to the set of followers of u. Similarly,
in
(u) represents the set of in-
neighbors (friends) of u. The out-degree of u is d
out
(u) =j
out
(u)j and in-degree
is d
in
(u).
The likelihood a message will reach v from u depends, however, not only on
the number of paths, but also on the nature of the dynamic process by which
messages spread on the network [Ghosh et al., 2011a]. Dierent dynamic processes
will lead to dierent notions of proximity, even in the same network. Consider
rst a random walk, or what we call a conservative process. Koren et al. [2007]
introduced cycle-free eective conductance as a measure of proximity. This is a
global metric computes the probability a random walk starting at u will reach v
through any path in the graph. In most cases we are interested in local measures,
that depend only on the neighborhoods of u and v. They are not only easier
to compute, but also do not require knowledge of the full graph, e.g., the entire
Twitter follower graph. To go from u to v, the random walker rst needs to pick
29
an edge that will take it u to a common neighbor z it shares with v (which it will
do with probability 1=d
out
(u)), then it has to pick an edge that will take it to v
(which it will do with probability 1=d
out
(z)). Symmetrizing, we obtain metricRW
in Table 3.1. This measure is almost identical to the metric shown by Zhou et al.
[2009] to perform best on the missing link prediction task in the electric power
grid, router-level Internet graph, and US air transportation networks, all of which
have conservative interactions.
People have nite attention, which limits their capacity to respond to incoming
stimuli. Social media users divide their attention among all friends, which limits
their ability to respond to a specic friend (for simplicity, we assume that attention
is evenly divided among friends). This alters the interactions. Now, in order for
a message to get from u to a common neighbor z, it must not only go over the
correct out-link from u, but z must also pay attention to the in-link, which it will
do with probability 1=d
in
(z). This leads to limited-attention random walk metric
(LA RW ) in Table 3.1.
Now imagine that messages
ow via one-to-many broadcasts. For a message
to get from u to v, rst u broadcasts it to its neighbors, including z, and then z
broadcasts it. Probability of getting the message to v is one; therefore, epidemic
proximity (EPI) simply counts the neighborhood overlap. Finite attention can
also play a role in epidemic interactions. Following the logic above, we derive
limited-attention version LA EPI. In undirected graphs, it is identical to conser-
vative metric.
30
3.1.2 Data Sets (Digg 2009 and Twitter 2010)
Digg 2009
At the time data set was collected, users were submitting tens of thousands of
stories, from which Digg selected a handful (about 100 each day) to promote to
its front page based on how popular the story was in the community. Digg allows
users to submit links to and recommend news stories by voting on, or digging,
them. A newly submitted story goes to the upcoming stories list, where it remains
for 24 hours, or until it is promoted to the front page by Digg, whichever comes
rst. Of the tens of thousands of daily submissions, Digg picks about a hundred
to feature on its front page.
We used Digg API to collect complete voting record for all stories promoted
to Diggs front page in June 2009. The data associated with each story contains
story anonymized id, submitters anonymized id, and list of voters with the time
of each vote. We also collected the time each story was promoted to the front
page. At the time this data set was collected, Digg was assigning stories to one of
8 topics (Entertainment, Lifestyle, Science, Technology, World & Business, Sports,
Obeat, and Gaming) and one of 50 subtopics (World News, Tech Industry News,
General Sciences, Odd Stu, Movies, Business, World News, Politics, etc.). In
total, the data set
1
contains over 3 million votes on 3,553 front page stories. Of
the 139K voters in the data set, more than half followed at least one other user.
We retrieved their user names and reconstructed the follower graph of active users.
This graph contained 70K nodes and more than 1.7 million edges.
1
http://www.isi.edu/lerman/downloads/digg2009.html
31
Twitter 2010
Twitters Gardenhose streaming API provides access to a portion of real time user
activity, roughly 20%-30% of all user activity
2
. We used this API to collect tweets
over a period of three weeks. We focused on tweets that included a URL in the
body of the message, usually shortened by some URL shortening service, such as
bit.ly or tinyurl. In order to ensure that we had the complete tweeting history of
each URL, we used Twitters search API to retrieve all tweets associated with that
URL. Then, for each tweet, we used the REST API to collect friend and follower
information for that user. Data collection process resulted in more than 3 million
tweets which mentioned 70K distinct shortened URLs. There were 816K users in
our data sample, but we were only able to retrieve follower information for some
of them, resulting in a graph with almost 700K nodes and over 36 million edges.
Retweeting activity in our sample encompassed diverse behaviors from spread-
ing newsworthy content to orchestrated human and bot-driven campaigns that
included advertising and spam. We used a novel method to automatically classify
these behaviors [Ghosh et al., 2011b] by characterizing the dynamics of retweeting
with two information theoretic features. The rst feature is the entropy of the
distinct user distribution, and second feature is the entropy of the distinct time
interval distribution. We showed that these two features alone were able to accu-
rately separate activity into meaningful classes. High user entropy implies that
many dierent people retweeted the URL, with most people retweeting it once.
High time interval entropy implies presence of many dierent time scales, which
is a characteristic of human activity. In this dissertation, we focus on those URLs
from the data set which are characterized by high (>3) user and time interval
entropies. These parameter values are associated with the spread of news-worthy
2
As of November 2010, Gardenhose is restricted to 10% of real time content.
32
content and excludes robotic spamming and manipulation campaigns driven by few
individuals. This left us with a data set containing 3,798 distinct URLs retweeted
by 542K distinct Twitter users.
3.1.3 Experimental Results: Activity Prediction
Social media users tend to be similar to their friends, which means that they tend
to vote for URLs their friends vote for on Digg or retweet on Twitter, and so on.
While friends' activity can be a useful predictor of user's actions, we claim that
knowing the local structure of the follower graph can enhance predictive power. In
other words, while social media users tend to act like their friends, they are more
likely to act like their closer friends.
We evaluate this claim by predicting URL forwarding on Digg 2009 data set
and Twitter 2010 data set. The task can be stated as follows: given the follower
graph and the URLs that a user's friends forward (retweet), predict which stories
the user retweets. We construct a prediction vector p for a user, whose values
represent probability a user's friends retweet thei
th
URL, weighted by each friend's
proximity. To compute precision and recall of prediction, we construct a vector
of URLs the user actually retweeted. We compare proximity-based prediction to
baseline that weighs friends' activity uniformly, without regard to proximity to
user. We measure performance as improvement over baseline (lift).
Table 3.2 compares prediction performance of dierent proximity measures.
Limited-attention versions of proximity measures result in the greatest lift both in
precision and recall. This is because they account for the nature of communication
in social media.
33
Table 3.2: Evaluation of predictions by dierent proximity measures in the Digg
and Twitter data sets. Lift is dened as % change over baseline.
base CN, EPI JA AA RW LA RW LA EPI
(a) Digg 2009
precision 0.032 0.027 0.033 0.027 0.028 0.039 0.034
recall 0.172 0.248 0.174 0.250 0.272 0.195 0.174
pr lift % 0 -15.0 3.3 -14.7 -11.1 22.1 7.7
re lift % 0 44.2 1.1 45.5 57.9 13.3 1.3
(b) Twitter 2010
precision 0.105 0.091 0.120 0.093 0.094 0.133 0.125
recall 0.094 0.090 0.102 0.091 0.097 0.113 0.106
pr lift % 0 -14.1 14.1 -12.0 -10.7 25.9 18.5
re lift % 0 -4.8 8.4 -3.4 2.8 19.7 12.3
3.2 Modeling Limited Attention over Ties
It is widely believed [Lazarsfeld and Katz, 1955] that recommendations of in
u-
ential people sway the decisions of others, and much of the research eort has
centered on methods to identify such individuals. While a great variety of meth-
ods were proposed to identify such in
uentials, see for example, [Trusov et al.,
2010, Bakshy et al., 2011, Ver Steeg and Galstyan, 2012], the more popular of
these rely on the structure of the follower network. Global measures of in
uence
include centrality measures, such as PageRank [Page et al., 1999] or betweenness
centrality [Freeman, 1977], while local measures that take into consideration only
the neighborhood of a particular user include degree centrality.
While social ties are important in social media [Bakshy et al., 2012, 2011,
Romero et al., 2011b, Kang and Lerman, 2013b], studies modeling information
adoption in social media have mainly focused on personal relevance of items to
users [Salakhutdinov and Mnih, 2008b, Koren et al., 2009, Wang and Blei, 2011,
Purushotham et al., 2012], and not on social ties. Social media users may pref-
erentially pay more attention to some of their friends depending on the topics of
items posted. For example, children might have strong ties with their parents but
34
they may less likely share contents from their parents due to the lack of relevance.
In this section, we propose a limited attention model [Kang et al., 2013], which
incorporates limited, non-uniformly divided attention over their friends and topics
in online social networks.
3.2.1 Social Recommendation Setting
Before we describe ourLA-LDA model, we begin by describing the social recom-
mendation scenario that we are modeling. We assume an idealized social media
setting, in which there are users who recommend to each other and adopt items.
Users have interests X, and items have topics Z, with users more likely to adopt
items whose topics match their interests. In addition, users have friends (or fol-
lowers), which are denoted by directed links between users, f(u
i
;u
j
).
The social recommendation model that we propose is dynamic, and there are a
number of actions that we model. A user i can share an itemj at timet, denoted
sh(i;j;t). An item could be a link to an online resource that a user shares by
tweeting it on Twitter or submitting or voting for it to Digg. We assume that
when an item is shared by useri, the recommendation is broadcast of all of useri's
followers. A user i can adopt a recommended item j at time t, denoted ad(i;j;t),
for example, by retweeting the link on Twitter or voting for it on Digg.
We also introduce the notion of a seed. Seed users are the rst adopters who
introduce new items into the dynamic system. So, for any item j, there are a set
of users seeds(j), and there are seeding events, seed(j;i;t) for each i2 seeds(j),
which represent initial adoptions of item j. Then, adoptions diuse through the
social network along follower links, based on users' interests.
Finally, what sets our model apart from previous models for social recommen-
dations is that we also model user's attention. Users may not be able to pay
35
τ
N i
A
N
x,z
τ
N
i
N
adoption(i)
N
x
N
j
Figure 3.1: TheLA-LDA model.
attention to all the items their friends share with them. They have limited atten-
tion, which we denote v(i) for the volume of attention of user i, and with that
limited attention, they will only attend to certain items,at(i;j;t). After attending
to an item, then, as described above, they may decide to adopt the item, and, hav-
ing adopted an item, they may decide to share the item. Once an item is shared,
the limited attention diusion process continues to unfold.
3.2.2 LA-LDA Model
Given the social recommendation setting described, we introduce a topic model
LA-LDA that captures the salient elements, including the limited attention of
users, into account. Our model consists of four key components which model user
interests (
i
), item topics (
j
), users' attention to their friends on dierent interests
(
i
), and users' limited attention (
i
). We assume there are N
i
users, N
j
items,
36
and each user i follows N
friend(i)
friends. Our topic model has xed dimensions,
each user has N
x
interests, and each item has N
z
topics.
TheLA-LDA model is presented in graphical form in Figure 3.1. There are
four parts to the model representation: user level (,,), item level ( ), interest
topic level (). ;;, and are global hyperparameters. Each adoption of
an item j by a user i, denoted a(i;j), has an associated item topic z, and user
interest x; it also has an associated user Y which denotes the friend(s) whose
recommendations were adopted. Variablesa(i;j) andY are observed, whilex and
z are hidden.
User i's interest prole
i
is a distribution over N
x
interests. Similarly, item
j's topic prole
j
is a distribution over N
z
topics. Each user pays attention to
dierent friends depending on the interest, so that for user i and interest x, there
is an interest-specic distribution
(i;x)
over friends(i). The distribution of user
i's attention over bothN
x
interests andfriends(i) is captured by
i
. Finally, each
interest x and topic z pair has an adoption probability
(x;z)
for items.
The model includes a number of hyperparameters which aect the specics of
the limited attention model:
: determines the prior users' limited attention on their friends depending on
the interest. Users may pay attention to all their friends uniformly for a
particular interest (large ) or listen to a select set of friends (small ).
: determines the prior interest proportions and sparsity of the user's interest
distribution (large values produce more interests, small values, more focused
user interests).
: determines the prior topic proportions and sparsity of topics for items (large
values produce more topics, small values fewer topics).
37
: determines prior adoption propensity of an interest and topic pair (large values
produce uniform probability over items, small values produce fewer item
adoptions).
The generative process for item adoption through a social network can be for-
malized as follows:
For each user i
Generate
i
Dirichlet()
For each interest x
Generate
(i;x)
Dirichlet()
For each item j
Generate
j
Dirichlet()
For each interest x
For each topic z
Generate
(x;z)
Dirichlet()
For each user i
For each adopted item j
Choose interest xMultinomial(
i
)
Choose friend to pay attention to yMultinomial(
(i;x)
)
Choose topic zMultinomial(
j
)
Choose item jMultinomial(
(x;z)
)
3.2.3 Learning Parameters
The inference procedure for our model follows the derivation of the equations for
collapsed Gibbs sampling, since we cannot compute posterior distribution directly
38
because of the summation in the denominator. By constructing a Markov chain, we
can sample sequentially until the sampled parameters approach the target posterior
distributions. In particular, we sample all variables from their distribution by
conditioning on the currently assigned values of all other variables. To apply this
algorithm, we need the full conditional distribution and it can be obtained by a
probabilistic argument (detailed derivation is provided in Appendix).
The Gibbs sampling formulas for the variables are:
P (Z
(i;j)
=zjZ
(i;j)
;X;Y;A
i
)/
n
z
(i;j)
+
n
()
(i;j)
+N
z
n
x;z
(i;j)
+
n
x;
(i;j)
+N
i
P (X
(i;j)
=xjX
(i;j)
;Y;Z;A
i
)/
n
x
(i;j)
+
n
()
(i;j)
+N
x
n
y
(i;j)
+
n
()
(i;j)
+N
(friends(i))
n
x;z
(i;j)
+
n
;z
(i;j)
+N
i
(3.1)
where n
z
(i;j)
is the number of times topic z is assigned on item j excluding
the current assignment of Z
(i;j)
, n
x;k
(i;j)
is the number of times topic z is assigned
on item j under user interest assignment of x, excluding the current item topic
assignment ofZ
(i;j)
,A
i
is the set of items adopted by user i, and j is a member of
the the items in A
i
. The rst ratio expresses the probability of topic z for item j,
and the second ratio expresses the probability of itemj's adoption under the item
topic assignment z and user interest assignment x.
In the second equation, n
x
(i;j)
is the number of times user i pays attention to
interest x excluding the current assignment of X
(i;j)
and n
y
(i;j)
is the number of
times user i pays attention to friend y excluding the current assignment of X
(i;j)
.
The rst ratio expresses the probability of useri paying attention to interestx and
the second ratio expresses the probability that useri pays attention to friendy. Our
39
model allows the algorithm learn each user's interests by taking into account the
limited attention from local perspective, while adopting is given by user's interest
and item' topic assignment from global perspective. Note that hyperparameters,
such as ; and could be vector values, however to make the model simple, we
use symmetric Dirichlet priors. We estimate , , , and with sampled values
in the standard manner.
When the social network information is not provided,LA-LDA can be simpli-
ed by removing and from presented model. We used this simplied model as
one of comparison models (ITM) used in our evaluation. In addition, theLA-LDA
model can be easily extended to include descriptions of items in the topic model
describing items ( ).
3.2.4 Prediction
In this section we demonstrate the utility of theLA-LDA model. We evaluate
LA-LDA model on user-item adoption task of several datasets. We rst show
results on synthetic data generated using the social recommendation diusion
model described in Appendix A.3. Next, we apply theLA-LDA model to data
from the social news aggregator Digg 2009 (Section 2.1.1). We compare our results
with a variety of alternative models, including:
LDA Originally introduced for document mining and text analysis [Blei et al.,
2003], LDA automatically learns a compressed representation, or hidden
topics, of a document corpus by analyzing co-occurrence of words in doc-
uments. Each topic is represented as a probability distribution over words.
LDA was used for collaborative ltering, e.g., in movie recommendation, by
representing users as mixtures of probabilistic interests, with each interest a
distribution over movies. We adopt this setting in our social recommendation
40
setting as follows. In the generative model, each user chooses an interest
and then chooses item to adopt given the selected interest.
ITM Originally introduced to model social annotation [Plangprasopchok and Ler-
man, 2010], ITM takes into account variations in individual users' vocabu-
lary through two hidden variables | user interests and item topics | which
together generate annotations for items. When modeling social recommen-
dation with ITM, we represent users as a mixture of probabilistic interests
and items as a mixture of probabilistic topics . Each user adopts an item
based on the adoption probability given by the sampled interest and topic.
After all the parameters are learned, LDA, ITM, andLA-LDA models can be
used to compute the probability user i votes for each story j in the test set, given
training dataD For LDA, the probability of the vote on item j is the probability
of adopting a
j
(we drop the i subscript to avoid cluttering the notation):
P (a
j
jD) =
Z
X
x
P (a
j
jx)P (xj)P (jD) d
(3.2)
The probability of user's vote is solely based on his interest prole , which is
learned from the training dataD.
For ITM, the probability that useri votes for storyj is obtained by integrating
over the posterior Dirichlet distributions of and :
P (a
j
jD) =
Z
Z
X
x;z
P (a
j
jz;x)P (zj )P (xj)
P ( jD)P (jD) dd
(3.3)
where the probability of user's votes is decided by user interest prole and story
topic prole .
41
Finally, in theLA-LDA model, the probability user i votes for story j is:
P (a
j
jD) =
Z
Z
X
x;y;z
P (a
j
jx;z)P (zj )
P (x;yj)P ( jD)P (jD) dd
(3.4)
where the probability of a user's vote is decided by the distribution of the user's
limited attention over friends and interests and story's topic prole .
3.2.5 Data Sets (Digg 2010)
Synthetic Cascades Data Set
We generated synthetic cascades data set according to the rules of the diusion
for social recommendation in Section 3.2.1. We assume an idealized social media
setting, with users who recommend to each other and adopt items. Users have
interests, and items have topics, with users more likely to adopt items whose topics
match their interests. In addition, each user has friends and can see the items
friends adopted. The synthetic cascades are dynamic, and describes a number of
user actions.
Our synthetic data generator uses a follower graph from Digg 2009 data (Sec-
tion 3.1.2) to specify the social network component and it is available online
3
. We
used social network links among top 5,000 most active users in Digg 2009 dataset,
who are followed by in average 81.8 other users (max 984 and medium 11 users).
We begin generating synthetic data by creating N
i
items according to theLA-
LDA item topic model and N
u
users according theLA-LDA user interest model.
Further details of synthetic cascades data set are described in Appendix A.3.
3
http://www.isi.edu/lerman/downloads/digg2009.html
42
Digg 2010
The Digg 2010 data set [Sharara et al., 2011] contains information about both
story submitting histories and the diggs of 11,942 users over a six months period
(Jul - Dec 2010). At the time data was collected, Digg assigned stories to 10 top-
ics (Entertainment, Lifestyle, Technology, World News, Obeat, Business, Sports,
Politics, Gaming and Science) replacing the Digg 2009 \World & Business" cate-
gory with \World News," \Business," and \Politics".
Before a story is promoted to the front page, it is visible on the upcoming
stories queue and to submitter's followers through the friends' interface, which
shows users stories that their friends have submitted and voted on. With each new
vote, the story becomes visible to the voter's followers. We examine only the votes
that the story accrues before promotion to the front page.
We examine only the votes that the story accrues before promotion to the
front page. During that time, the story propagates mainly via friends' votes (or
recommendations), although some users could discover the story on the upcoming
stories page, which received tens of thousands of submissions daily. After pro-
motion, users are likely to be exposed to the story through the front page, and
vote for it independently of friends recommendations. In the Digg 2009 data set
(Section 3.1.2), 28K users voted for 3,553 stories and in the Digg 2010 data set,
4K users voted for 36,883 stories before promotion. We focused the data further
by selecting only those users who voted at least 10 times, resulting in 2,390 users
(who voted for 3,553 stories) in the 2009 data set and 2,330 users (who voted on
22,483 stories) in the Digg 2010 data set.
We use tf-idf to choose the top 3K distinct words from the titles and user-
supplied summaries of stories in the Digg 2009 data set and 4K distinct words
in the Digg 2010 data set as the vocabularies. We focused the data further by
43
selecting only those users who voted at least 10 times and items containing more
than 10 words, resulting in 1.4K users (who voted for 3K stories) in the Digg 2009
data set and 1.8K users (who voted on 18K stories) in the Digg 2010 data set.
3.2.6 Model Selection
Model selection involves making choices for the parameters of our model.LA-LDA
has six parameters: the number of interests (N
x
) and topics (N
z
) and hyperpa-
rameters , , , and .
The choice of hyperparameters can have implications inference results. While
our algorithm can be extended to learn hyperparameters, here we x them and
focus on the consequences of varying the number of topics and interests. We used
the same hyperparameter values for all , , , and equal to 0.1. The number
of topics and interests denes the granularity level of the model. We vary the
granularity level by changing the values of N
x
and N
z
from 5 to 800. To estimate
the performance of the model, we compute the likelihood of the training set given
the model for dierent combinations of parameters.
We took samples at a lag of 100 iterations after discarding the rst 1000 iter-
ations and both algorithms stabilize within 2000 iterations. Figure 3.2 shows the
calculated values of log-likelihood obtained by ITM andLA-LDA models on the
Digg 2009 data set (Digg 2010 data set has similar trends) for dierent parameter
values. Higher likelihood indicates better performance. The best performance is
obtained for N
x
= 10 interests and N
z
= 200 topics in the Digg 2009 data set and
N
x
= 30 interests and N
z
= 200 topics in the Digg 2010 data for both ITM and
LA-LDA. Using the same model selection approach for LDA results in best perfor-
mance for 200 interests in the Digg 2009 data and 500 interests in the Digg 2010
44
(a)
(b)
Figure 3.2: Model Selection: Log-likelihood for dierent number of interests and
topics for (a) ITM and (b)LA-LDA on the Digg 2009 data set.
data. LA-LDA outperforms both LDA and ITM for all combinations of interests
and topics on log-likelihood on training data.
45
3.2.7 Experimental Results
Evaluation on Synthetic Data
We investigate how well we are able to recover the user interests from the synthetic
data set (Section 3.2.5). We learn hidden variables that represent user interests
and item topics by learning theLA-LDA (or LDA) model from synthetic data.
Since the hidden variables are set during the data generation process, we can then
evaluate the performance of the learned models by comparing the distribution of
the learned variables to their actual distribution. We measure the similarity of the
two distributions by the average deviation between the Jensen-Shannon divergence
(JSD) of their vectors:
(A;B) =
X
i=1;n
X
i
0
=i+1;n
jJSD(
A
i
;
A
i
0 )JSD(
B
i
;
B
i
0 )j
(3.5)
{ where A in this case is the ground truth, and B is one of our learned models
(LA-LDA or LDA), n is the number of users, and is the distribution of user
interests. The average deviation is small when two vectors A and B are similar
without considering the indexing of the interests. The average deviation can be
calculated in a similar manner for topics, .
For comparison, we learned two dierent LDA models, one for user interests
and one for item topics. We learn the LDA for interest distributions of users
by viewing a user as a document and items as terms in a document, and we learn
the LDA for topic distributions of items by setting item as a document and
users as terms in a document. We also ranLA-LDA to learn both and in
accordance with that model. For all three models, we perform 1000 iterations,
and then evaluate samples every 100 iterations. The algorithms converge within
2000 iterations. For generating the synthetic data, we set v
g
=2, =0.1, =0.1
46
(a) (b)
(c) (d)
Figure 3.3: The average deviation of user interest () and item topic ( ) with
dierent limited attention values ( and) on synthetic. The top two gures show
average deviation between learned and actual when (a)=0.05 and=0.05, 0.1,
0.5, and 1.0 and (b)=0.05 and=0.05, 0.1, 0.5, and 1.0. The bottom two gures
show average deviation between learned and actual when (c) is xed to 0.05
and (d) is xed to 0.05.
and S=30%) and varied (0.05, 0.1, 0.5, and 1.0) and (0.05, 0.1, 0.5, and 1.0).
We applied the same hyperparameters used to generate the synthetic data in the
models. The average deviation between learned and actual interests ((a) and (b))
and topics of items ((c) and (d)) in the synthetic datasets are shown in Fig. 3.3.
With large values of , users allocate their attention uniformly over interests,
so users are more likely to adopt items on a variety of interests. Because of this
adoption tendency, it is hard to distinguish their interests. For small values of ,
users pay attention to a limited number of interests and more can be learned from
their adoption behavior. That is why both LDA andLA-LDA perform better for
small values. Similarly, large values of cause users to pay attention to their
friends uniformly, while small values focuses users' attention to a smaller subset
47
0.00%$
5.00%$
10.00%$
15.00%$
20.00%$
Percentage)
Topics)
2010$ 2009$
Figure 3.4: Topic distribution in Digg 2009 and Digg 2010 data set.
of their friends. With large values, average deviations of both models are high,
whereas for lower values both models perform better. In all four cases,LA-LDA is
superior to LDA in learning interests distribution of users and topics distribution
of items for all and values.
Evaluation on Learned User Interests on Digg Data Sets
Digg assigns a topic to each story from a predened set of topics. Figure 3.4 shows
the distribution of Digg-assigned topics in our data sets, that is the percentage
of stories assigned to each topic. In both data sets, \Obeat," \Entertainment,"
\Lifestyle" and \Technology" were the most popular topics, while \Sports" and
\Gaming" were the least popular topics. Overall, there is no dominant topic in
either dataset and the popularity ranking of dierent topics are almost identical.
The topics assigned to stories by Digg provide useful evidence for evaluating
topic models. We represent user's story preferences by constructing an empiri-
cal digg topic interest vector, which gives the fraction of votes she made on each
48
Table 3.3: Average deviation between empirical user interests and those learned
by each topic model. N
x
is the number of user interests and N
z
is the number of
item topics.
N
x
N
z
Digg 2009 N
x
N
z
Digg 2010
LA-LDA 10 200 15.11 50 200 28.71
ITM 10 200 36.38 50 200 36.01
LDA 200 n/a 37.72 500 n/a 55.43
Digg topic. This empirical interest vector gives us a gold standard for evaluating
user interests learned by dierent topic models. However, we cannot compare the
learned and empirical interest vectors directly, since they have dierent dimen-
sionality and indexing of interests. Again, we use Eq. 3.5 to compute the average
deviation. Table 3.3 shows the average deviations of user interests learned byLA-
LDA, ITM, and LDA in the Digg 2009 and Digg 2010 data sets. In both datasets,
LA-LDA outperforms other models in the sense that it learns user interests that
are closer to the gold standard obtained from Digg story topic assignments.
Evaluation on Story Adoption on Digg Data Sets
Next, we evaluate our proposed topic models by measuring how well they allow
us to predict individual votes. There are 257K pre-promotion votes in the Digg
2009 dataset and 1.5 million votes in the Digg 2010 dataset, with 72.34 and 68.20
average votes per story, respectively. For our evaluation, we randomly split the data
into training and test sets, and performed ve-fold cross validation. To generate
the test set, we use the held-out votes (positive examples) and augment it with
stories that friends of users shared but that were not adopted by user (negative
examples). Depending on a user's and their friends' activities, there are dierent
numbers of positive (N
i
pos
) and negative (N
i
neg
) examples in the test set. The
49
average percentage of positive examples in the test set is 0.73% (max 18%, min
0.02%, and median 0.13%), suggesting that friends share many stories that users do
not end up not voting for. This makes the prediction task extremely challenging,
with less than one in a hundred chance of successfully predicting votes if stories
are picked randomly.
We train the models on the data in the training set and evaluate performance
of the models on the prediction task using mean average precision (MAP). Average
precision
4
at N
i
pos
for each user is:
Average Precision@N
i
pos
=
P
k=1;N
i
pos
Prec@k
N
i
pos
(3.6)
where Prec@k is the precision at cut-o k in the list of votes ordered by their
likelihood. The mean average precision for N
i
users is the average of the average
precision of each user.
Vote Prediction Results
Table 3.4 compares the average precisionLA-LDA with a baseline that performs
random selection, LDA and ITM on the vote prediction task. We divide users into
ve categories depending on their activity level. We measure activity based on the
number of votes users made in the training set. The rst category includes all users,
the second category includes those users who voted for at least 7.5% of stories in the
training set, and the remaining categories include users who voted for at least 15%,
20%, 25% of stories respectively. WhileLA-LDA outperforms baseline methods in
all cases, its comparative advantage improves with user activity level. When there
is little information about user interests, the precision of all methods is ranges
4
http://en.wikipedia.org/wiki/Information retrieval
50
Table 3.4: Average precision of each model's predictions.
MAP Digg 2009 Data Set Digg 2010 Data Set
All >7.5% >15% >20% >25% All >7.5% >15% >20% >25%
random 0.019 0.048 0.062 0.081 0.109 0.011 0.036 0.056 0.074 0.105
LDA 0.021 0.044 0.062 0.079 0.111 0.018 0.042 0.056 0.077 0.112
ITM 0.022 0.110 0.153 0.208 0.269 0.024 0.136 0.176 0.223 0.237
LA-LDA 0.022 0.116 0.168 0.239 0.320 0.038 0.137 0.188 0.246 0.315
Submitter 0.038 0.087 0.114 0.129 0.152 0.028 0.048 0.075 0.098 0.126
Max 0.079 0.096 0.124 0.143 0.171 0.070 0.073 0.108 0.131 0.162
ITM+Submitter 0.024 0.090 0.131 0.155 0.189 0.038 0.084 0.112 0.149 0.182
ITM+Max 0.026 0.098 0.147 0.182 0.2365 0.048 0.124 0.164 0.213 0.244
from 1%{3%. As the amount of information about user interests, as expressed
through the votes they make, grows, performance of all models improves, but that
ofLA-LDA improves much faster. LA-LDA correctly predicts more than 30% of
the votes made by the most active users, as compared to 11% of the randomly
guessed votes. Performance of ITM is competitive withLA-LDA, suggesting that
learning hidden user interests improves predictions of user behavior.
The results above suggest thatLA-LDA outperforms other models because,
in addition to user interests, it also takes into account how users distribute their
attention over friends. People use a variety of heuristics in allocating their atten-
tion, including familiarity, social tie strength, and in
uence, andLA-LDA attempts
to learn individual allocations from the data. However, one may ask whether a
simple attention allocation heuristic could predict votes as well asLA-LDA, but
at a reduced computational cost. Here we answer this question by studying the
eect of the in
uence heuristic, which prefers either most in
uential voter (Max)
or the in
uence of a submitter (Submitter). It is widely believed that recommen-
dations of in
uential people sway the decisions of others; therefore, it is reasonable
51
to expect that people may allocate more of their attention to the more in
uential
people. Many methods have been proposed to identify such in
uentials [Trusov
et al., 2010, Cha et al., 2010, Ghosh and Lerman, 2010, Bakshy et al., 2011, Ver
Steeg and Galstyan, 2012] | we use user degree, or the number of followers the
user has, as a measure of in
uence. In addition to being easy to compute, this
measure has been widely used to represent individual's centrality or in
uence. In
Table 3.4 we present results of four experiments studying the eect of the in
u-
ence heuristic on the prediction task. In the rst experiment, predicted votes for
each user are sorted based the in
uence of the submitter, the rst user to post the
story on Digg. In the second experiment, they are sorted based on the in
uence
of the most in
uential (max) voter. The third experiment investigates the eect
of including either in
uence heuristic into the ITM model. In this case, the vote
probability given by Eq. 3.3 is multiplied by relative in
uence (with respect to the
most in
uential user in the network) of the submitter or max voter. When there is
little information to learn user interests, using a simple heuristic that a user votes
for a story if a very in
uential user voted for it, works quite well to predict user's
votes, three to four times better than random guess. However, asLA-LDA has
receives more information about user interests, it is able to learn a ne-grained
model of user interests that outperforms the simpler in
uence-based models.
3.3 Improving Explanatory Power by Integrat-
ing Words
In this section, we close the gap between behavioral and statistical recommendation
models by presentingLA-CTR [Kang and Lerman, 2013a], a model that extends
52
collaborative recommendation topic model CTR [Wang and Blei, 2011] by inte-
grating heterogeneous social in
uences. Attention in online interactions has been
modeled in a previous section [Kang et al., 2013] to directly learn users' limited
attention over their friends and topics. However, that model was applied to dyadic
data [Agarwal and Chen, 2009, Koren et al., 2009, Yu et al., 2009] without pro-
viding any explanation about the learned topics. Furthermore,LA-LDA assumed
a simple user item adoption mechanism, where item adoption is decided based
on sampled interest and topic pairs, rather than the combinations of multiple user
interests and item topics, as matrix factorization does. Also,LA-LDA only learned
from adopted items, while non-adopted items also explain user's preferences. In
this section, we propose a model that makes important advances. First, it incorpo-
rates the content of items, thus providing, being able to provide explanatory power
to its recommendations, as well as recommend previously unseen items. Second, it
learns user's preferences from both adopted and non-adopted items made by each
user. Furthermore, the proposed model is able to learn the degree of in
uence of
other users in modeling limited attention.
Like most of social recommendation models,LA-CTR uses item adoptions of
all users and the content of items as a basis for its recommendations. However,
LA-CTR is unique from the existing model in two important ways. First, the
model captures a user's limited attention, which he must divide among all friends
he follows. Second, the model includes a notion of in
uence, that is the fact
that the user may preferentially pay more attention to some friends, e.g., close
friends [Gilbert and Karahalios, 2009]. Moreover, the model allows the strength of
in
uence to vary according to user's topical interests.
53
LA#CTR'
V
R
S
U
φ
K
N
N
adoption(u)
M
α
λ
v
λ
u
λ
s
λ
φ
D
Figure 3.5: TheLA-CTR model.
3.3.1 LA-CTR Model
We introduce a limited attention collaborative topic regression model (LA-CTR)
that learns user preferences from the observations of items they adopt on a social
media site. Our model captures the behavior of social media users by introducing
three novel elements that model users' social in
uence (), their topic proles (u),
and how much attention they pay to other users (s), which we also refer to as
the strength of ties. Two additional elements in the model are the item's topic
proles for explaining recommendations (v) and for explaining the item's contents
() [Wang and Blei, 2011]. Finally adoptions (r) are the observed variables, which
are equal to one when the user adopts an item and zero otherwise.
54
Figure 5.1 presentsLA-CTR in graphical form. Each variable is drawn from a
normal distribution dened as follows:
u
i
N (0;
1
u
I
K
)
s
i
N (0;
1
s
I
N
)
il
N
g
(s
il
u
i
);c
il
1
I
K
r
ijl
N
g
r
(
T
il
v
j
);c
r
ijl
(3.7)
whereN is the number of users,D is the number of items, andK is the number of
topics. Symbol T refers to the transpose operation. We dene g
and g
r
as linear
functions for simplicity. The precision parameters c
il
and c
r
ijl
serve as condence
parameters for in
uences
il
and ratingr
ijl
. The condence parameterc
il
represents
how much attention user i pays to user l. In social media, users adopt new items
primarily based on their friends' recommendations, i.e., by seeing the items their
friends adopt. However, users may also be exposed to items from outside of their
social network, depending on the site's interface. We usec
il
to model the structure
of the social network: we set it a high value a
when user l is a friend of user i
and a low value b
when he is not (a
> b
> 0). Similarly, when user i does
not adopt item j (i.e., his item rating for j is zero), it can be interpreted into two
ways: either user i was aware of item j but did not like it, or user i would have
liked item j but never saw it. We set c
r
ijl
as a condence parameter for user i's
rating on item j via user l and set it to a high value a
r
when r
ijl
= 1 and a low
value b
r
when r
ijl
= 0 (a
r
>b
r
> 0). In this chapter, we use the same condence
parameter values, a
r
=a
= 1:0 and b
r
=b
= 0:01, for all c
il
and c
r
ijl
.
Like CTR, we use the hidden topic model LDA [Blei et al., 2003] to capture
item's topic distribution , which is represented as a K dimensional vector. In
LDA, the topic distribution (
1:D
) of each document is viewed as a mixture of
55
multiple topics, with each topic (
1:K
) a distribution over words from a vocabulary
of size M. We assume a generative process in which documents are created by
drawing words from their topic distributions. Latent variable
j
N (0;
1
v
I
K
)
captures the dierences between topics that explain the contents of documents and
those that explain recommendations. Depending on the choice of
v
, itemj's topic
distribution
j
is perturbed to create latent vectorv
j
, which could be similar to
j
or diverge from it:
v
j
N (
j
;
1
v
I
K
)
(3.8)
The generative process for item adoption through a social network can be for-
malized as follows:
For each user i
Generate u
i
N (0;
1
u
I
K
)
Generate s
i
N (0;
1
s
I
N
)
For each user l
Generate
il
N (g
(s
il
u
i
);c
il
1
I
K
)
For each item j
Generate
j
Dirichlet()
Generate
j
N (0;
1
v
I
K
) and set v
j
=
j
+
j
For each word w
jm
Generate topic assignment z
jm
Mult()
Generate word w
jm
Mult(
z
jm
)
For each user i
Attention to friend l
For each adopted item j
Choose the rating r
ijl
N (
T
il
v
j
;c
r
ijl
1
)
56
Here
u
=
2
r
=
2
u
,
v
=
2
r
=
2
v
,
s
=
2
r
=
2
s
, and
=
2
r
=
2
. Note that latent
vectors u, v and are in a shared K-dimensional space.
3.3.2 Learning Parameters
To learn model parameters, we follow the approaches of CTR [Wang and Blei,
2011] and CTR-smf [Purushotham et al., 2012] and develop an EM-style algo-
rithm to calculate the maximum a posteriori (MAP) estimates. MAP estimation
is equivalent to maximizing the complete log likelihood (`) of U, S, , V , R, and
given
u
,
s
,
,
v
and . We set Dirichlet prior to 1.
` =
u
2
N
X
i
u
T
i
u
i
v
2
D
X
j
(v
j
j
)
T
(v
j
j
)
+
D
X
j
W (j)
X
t
log
K
X
k
jk
k;w
jt
!
s
2
N
X
i
s
i
T
s
i
N
X
i
D
X
j
N
X
l
c
r
ijl
2
(r
ijl
T
il
v
j
)
2
2
N
X
i
N
X
l
c
il
(
il
s
il
u
i
)
T
(
il
s
il
u
i
)
(3.9)
We use gradient ascent to estimate MAP and iteratively optimize the variables
fu
i
;v
j
;
il
;s
i
g and the topic proportions
j
. Given a current estimate, we take the
gradient of ` with respect to u
i
, v
j
, s
i
, and
il
and set it to zero. Derived update
equations are the following:
u
i
u
I
k
+
S
T
i
C
i
S
i
1
i
C
i
S
i
v
j
v
I
k
+C
r
j
T
1
C
r
j
R
j
+
v
j
S
i
s
I
N
+
u
i
T
C
i
u
i
1
C
i
T
i
u
i
il
C
il
I
K
+VC
r
il
V
T
1
VC
r
il
R
il
+
u
i
C
il
S
il
(3.10)
57
where C
i
and C
r
il
are diagonal matrices with condence parameters c
ij
and c
r
ijl
.
We dene S as NN matrix, as KN
2
matrix and R
j
as vector with r
ijl
values for all pairs of users i and l for the given item j. Given updated variables
fu
i
;v
j
;
il
;s
i
g the topic proportions
j
is updated by applying Jensen's inequality
[Wang and Blei, 2011]:
`(
j
)
W (j)
X
m
K
X
k
jmk
log
jk
k;w
jm
log
jmk
v
2
D
X
j
(v
j
j
)
T
(v
j
j
)
=`(
j
;
j
)
(3.11)
where
jmk
= q(z
jm
= k) and `(
j
;
j
) gives the tight lower bound of `(
j
). The
optimal
jmk
satises
jmk
/
jk
k;w
jm
. Given updated variablesfu
i
;v
j
;
il
;s
i
;
j
g,
we can optimize ,
kw
/
X
j
X
m
jmk
(w
jm
= w)
(3.12)
where is one if and only if w is assigned to w
jm
.
3.3.3 Prediction
After all the parameters are learned,LA-CTR model can be used for both in-matrix
and out-of-matrix prediction with either user's social in
uence () or interest (u).
As Wang and Blei [2011] mentioned, in-matrix prediction refers to the prediction
task for user's rating on item that has been rated at least once by other users, while
out-of-matrix prediction refers to predicting user's rating on a new item that has
no rating history. In this dissertation, we demonstrate the utility of theLA-CTR
model both with user topic prole (u) and social in
uence () hidden variables
using in-matrix prediction.
58
For in-matrix prediction with social in
uence (), the prediction that user i's
ratings for itemj via friendl is obtained by point estimation with optimal variables
(
, u
, v
,
):
E[r
ijl
jD]E[
il
jD]
T
(E[
j
jD] +E[
j
jD])
r
ij
il
T
v
j
(3.13)
where the rating of useri is decided by social in
uence of friends
il
and item topic
prole v
j
.
For in-matrix prediction with user-topic vectors (u), the point estimation pre-
diction that user i's rating for item j is:
E[r
ij
jD]E[u
i
jD]
T
(E[
j
jD] +E[
j
jD])
r
ij
u
i
T
v
j
(3.14)
where the prediction of user's rating is decided by a user topic prole u
i
and an
item topic prole v
j
.
We apply theLA-CTR model to data from the social news aggregator Digg
(Digg 2009 data set (Section 3.1.2) and Digg 2010 data set (Section 3.2.5)). We
compare our results to state-of-the-art alternative models: CTR and CTR-smf.
3.3.4 Model Selection
For collaborative topic regression (CTR), we set the parameters K=200,
u
= 0.01,
v
= 100, a = 1, b = 0.01 by using grid search on held out recommendations. The
precision parameters
v
balances how the item's latent vectorv
j
diverges from the
topic proportions
j
. We vary
v
2f0.001, 0.01, 0.1, 1, 10,100,1000g, where a
larger
v
increases the penalty of v
j
diverging from
j
. For Collaborative Topic
Regression with social matrix factorization (CTR-smf), we choose the parameters
59
50 100 150 200
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
number of recommended items
recall
LA−CTR (Influence)
LA−CTR (Relevance)
CTR
CTR−smf
50 100 150 200
0
0.05
0.1
0.15
0.2
0.25
number of recommended items
recall
CTR
CTR−smf
LA−CTR (Influence)
LA−CTR (Relevance)
(a) (b)
Figure 3.6: Recall of in-matrix prediction for (a) Digg 2009 and (b) Digg 2010 data
set by varying the number of recommended items (@X ) with the 200 topics.
using grid search on held out recommendations. We set parameters K=200,
u
=
0.01,
v
= 100,
q
= 10,
s
= 2, a=1, and b=0.01. ForLA-CTR, we choose the
parameters similar to CTR and CTR-smf approach by using grid search on held
out recommendations. We set parameters K=200,
u
= 0.01,
v
= 100,
= 1,
s
= 0.01, a
=1,a
r
=1, b
=0.01, and b
r
=0.01.
3.3.5 Experimental Results: Evaluation on Vote Predic-
tion
We present each user with X items sorted by their predicted rating score, and
evaluate based on the fraction of items that the user actually voted for. We dene
recall@X for a user as:
recall@X =
num of items in top X user votes for
total num of items user votes for
(3.15)
and we average recall values of all users to summarize performance of the algorithm.
Note a better model should provide higher recall@X at each X. We use ve-fold
60
cross validation and compare performance of Influence to three baseline models:
CTR, CTR-smf, Relevance. The CTR baseline bases its recommendations on
user-topic and item-topic vectors learned by CTR to select X most relevant items
for recommendation. The CTR-smf baseline chooses items from user-topic and
item-topic vectors learned by CTR-smf where social-matrix-factorization is applied
to take into account the homophily principle in social networks. The Influence
and theRelevance use social in
uence () and user-topic (u) vectors ofLA-CTR
respectively to compute the most relevant items for recommendation.
Figure 3.6 shows therecall@X for in-matrix prediction when we vary the num-
ber of top X itemsf20, 40, ..., 200g. All four models show performance improve-
ment when the number of returned items is increased. Influence always out-
performed other three models consistently, which demonstrates the importance
of modeling social in
uence. When we compare learned user-topic vectors from
three dierent models: CTR, CTR-smf, andLA-CTR, Relevance consistently
performed better than CTR and CTR-smf. Interestingly CTR-smf performance
was not highlighted in our experiments due to the little homophily among friends
in our dataset. Note that recall scores with smaller than 100 of Influence are
already higher than ones of CTR-smf recall@200.
3.4 Social In
uence
An important advantage ofLA-CTR is that it can explain both course-grained and
ne-grained levels of latent topic spaces: (1) user topic space (u
i
) and (2) social
in
uence topic space (
il
) using the topics learned from the corpus. Furthermore,
it also explains the strength of ties to friends (s
i
) on user's item adoption decisions.
In Figure 3.7, we show one example from the Digg 2009 data set. WithLA-CTR
61
(user&interest)&
(a,en-on&users)&
Topic&
#&
Bag&of&words&
1&
pic,& christmas,& guy,& holiday,& girl,& hot,& cool,&
tree,& season,& mark,& geek,& santa,& mario,&
gallery,&ho,est,&super,&gadget,&fes-ve &
3&
found,&city,&dead,&body,&fall,&lost,&walk,&teen,&
zombie,& beach,& pound,& fish,& feet,& shark,&
swim,&bone,&pool,&smash,&breath&
15&
team,&player,&point,&run,&season,&pick,¢er,&
football,&score,&play,&field,&start,&nfl,&nba,&fan,&
fantasy,&league,&injury,&talent,&coach,&yard&
18&
guide,& culture,& lady,& collec-on,& style,&
tradi-on,&beer,&suit,&fashion,&survive,&town,&
hair,&spirit&
47&
women,& kid,& sex,& men,& children,& young,&
parent,& baby,& child,& toy,& mother,& adult,&
rela-onship,&mom,&birth,&violent,&dad&
1,&3,&15,&18,&47&
3,&13,&47& 1,&13,&22&& 15,&18,&47&
…& …&
S
i
φ
il
u
i
s
i
(a,en-on)&
Figure 3.7: One example user from Digg 2009 data set. We show top 5 topics in
learned u
i
and top 3 topics in learned social in
uence
il
from his three friends.
We also show the bag of words that represents each topic.
50 100 150 200
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
number of recommended items
recall
CTR
CTR−smf
LA−CTR (Relevance)
LA−CTR (Influence)
Figure 3.8: Recall of in-matrix prediction for Digg 2009 by varying the number of
recommended items (@X ) with the 50 topics.
we can nd the top n matched topics by ranking the elements of his topic vector
u
i
. In this example, useri is interested in topics 1, 3, 15, 18, and 47. Furthermore,
we can also explain the strength of ties to friends of user i as well as friend l's
social in
uences on user i over topic space. Here, user i pays attention to the last
user mostly on sports topic (topic # 15), while he pays attention to the rst two
users on other topics.
62
3.5 Conclusion
We use network proximity to capture how \close" people are to each other in a
social network. In addition to standard proximity measures, such as neighborhood
overlap, we introduced new measures that model dierent types of interactions
that take place between people. We studied this claim empirically using data about
URL forwarding activity on the social media sites Digg and Twitter. We showed
that structural proximity of two users in the follower graph is related to similarity
of their activity, i.e., how many URLs they both adopt. We also showed that
given friends' activity, knowing their proximity to the user can help better predict
which URLs the user will forward. We compared the performance of dierent
proximity measures on the activity prediction task and showed that measures that
take into account the limited-attention nature of interactions in social media lead
to substantially better predictions.
In addition to structural proximity, we investigate the importance of users'
interest and limited social attention on URL re-sharing activities in social networks.
We introducedLA-LDA, a novel hidden topic model that takes into account social
media users' limited attention. We showed that our proposed model was able to
learn more accurate user models from users' social network and item adoption
behavior than models which do not take topical relationship into account. We
carried out two evaluations of the proposed model. The rst was on realistic
simulated data to analyze the competing trade-o between users' interest and social
attention. Next, we analyzed voting on news items on the social news aggregator
Digg and showed that our proposed model is better able to predict held out votes
than alternative models that do not take limited attention into account.
Finally, we improvedLA-LDA and studied modeling users' social in
uence and
limited attention in online social media by proposingLA-CTR. We showed that
63
by taking into account the content of items as well as users' social in
uence, the
proposed model outperforms other state-of-the-art algorithms on item adoption
prediction task. Our model provides not only interpretable user topic proles, but
also ne-grained description of how much attention the user pays to others on
which topics. Such description could help explain why people follow others and
how information propagates over the online social network.
One disadvantage ofLA-CTR is its model complexity, since the size of user-
item rating space isN
2
D. However, due to high sparsity of the social network and
the user-item matrix we did not experience any major slow down. Furthermore,
with the small number of topics,LA-CTR already outperformed the other models.
In Figure 3.8, we show recall of in-matrix prediction for Digg 2009 data set with
50 topics. Not only doesLA-CTR model outperform both CTR and CTR-smf
with 50 topics, but it also outperformed both CTR and CTR-smf with 200 topics.
In other words, increasing the size of the topic space and descriptive power of
competing models did not lead to better performance compared toLA-CTR.
Our work demonstrates the importance of modeling psychological factors, such
as attention, in social media analysis. These results may apply beyond social
media and point to the fundamental role that limited attention plays in social
communication. People do not have innite time and patience to read all status
updates or scientic articles on topics they are interested in, see all the movies or
read all the books.
64
Chapter 4
Modeling Cognitive Factors
In psychology, attention is the mechanism that integrates perceptual and cognitive
factors to select what sensory inputs the user will consciously process [Kahneman,
1973]. However, the world presents far more information than people have the
capacity to examine. As a result, humans have evolved to use cognitive heuristics
to decide quickly what information to pay attention to [Kahneman, 1973, 2011].
Since our brain's capacity for mental eort is limited [Kahneman, 2011], we tend
to choose the one requires little time and eort. A consequence of this cognitive
heuristic, called position bias, is the strong eect the presentation orderitem rank-
inghas on individual choices. Cognitive scientist found that position bias aects
the answers people select in response to a multiple-choice questions [Payne, 1951,
Blunch, 1984], where on the screen they look [Buscher et al., 2009, Counts and
Fisher, 2011], and the links on a web page they choose to follow [Craswell et al.,
2008, Huberman, 1998]. Also as a consequence of position bias, items near the
top of a user's social media stream are more salient, and therefore, more likely
to be viewed, than items in lower positions [Hogg et al., 2013, Hodas and Ler-
man, 2012, Hogg and Lerman, 2012]. Therefore, a model of information adoption
must include item salience. To distinguish position-based salience from other psy-
chological eects, we refer to it as item's visibility. In this chapter, we propose
a conceptually simple mechanism of information adoption in social media that
accounts for cognitive factors of user, position bias and item tness.
65
4.1 Modeling Position Biases
4.1.1 Vip Model
We proposeVip [Kang and Lerman, 2015b], a probabilistic model that captures the
three basic ingredients of information spread in social media: item's visibility, t-
ness and its personal relevance to the user. Vip is based on social recommendation
models, whose goal is to recommend only the relevant items to users [Salakhut-
dinov and Mnih, 2008b, Koren et al., 2009, Wang and Blei, 2011, Purushotham
et al., 2012, Kang and Lerman, 2013a, Kang et al., 2013]. In social recommenda-
tion, each user is assigned a vector of topics, and each item also has some topics.
Once these hidden vectors are learned from the history of user item adoptions, it
is possible to calculate how relevant an item is to the user.
Social media users may adopt an item even if they had not earlier demonstrated
a sustained interest in its topics. This is often the case with viral, general-interest
items, such as breaking news or celebrity gossip. We use the term interestingness
or tness to describe an item's propensity to be adopted to be adopted upon
exposure.
The key innovation of Vip is introducing visibility into the generative model
of item adoption. Visibility conceptually simplies the mechanisms of information
spread and explains away some of the complexity associated with it, for example,
the network eects observed in [Bakshy et al., 2011, Romero et al., 2011a,b]. Vis-
ibility explicitly takes into account the process of information discovery in social
media. Online social networks are directed, with users following the activities of
their friends. A user's message stream contains a list of items her friends adopted
or \recommended" to her, chronologically ordered by their adoption time, with
the most recent item at the top of the stream. We consider a user to be exposed
66
δ
u θ
r
v
N
M
λ
u
λ
θ
ρ
λ
η
Figure 4.1: The Vip model (user topic proles (u), item topic proles (), item
tness (), personal relevance of an item to user (), visibility to user (v), expected
number of new posts user received () and adoption(r)). N is the number of users
and M is the number of items.
as soon as the item enters her stream; however, exposure does not guarantee that
the user will actually view the item. The probability of viewing | visibility |
depends on the item's position in the user's stream [Hodas and Lerman, 2012]. Due
to a cognitive bias known as position bias [Payne, 1951], a user is more likely to
attend to items near the top of the screen than those deeper in the stream [Buscher
et al., 2009]. Hence, items in top stream positions have higher visibility. Below we
discuss a method to quantitatively account for this visibility.
Figure 4.1 graphically represents the Vip model. It considers a user i with a
user-topic vector u, and an item j with an item-topic vector . Vip generates an
adoption of an item j by user i as follows:
r
ij
N
v
i
g
r
(
ij
+
j
);c
1
ij
(4.1)
67
where
j
N (0;
1
), is the tness (or interestingness) of itemj, which represents
the probability of adoption given the user viewed it [Hogg and Lerman, 2012, Wang
et al., 2013]. The precision parameter c
ij
serves as condence for adoption r
ij
. v
i
represents the visibility of item to user and
ij
represents useri's interest in itemj.
We deneg
r
as a linear function for simplicity. Note that one of the key properties
of Vip lies in how user adopts items that have the same visibility. We assume that
user adopts either items that are relevant to her (
ij
) or interesting in general (
j
).
Users discover items by browsing through their message stream. As argued
above, the position of an item in the stream determines its visibility, the likelihood
to be viewed. However, item's exact position is often not known. Instead, we
estimate its average visibility from the available data. This quantity depends on
user's information load, or the
ow of messages to the user's stream. The greater
the number of new messages user receives between visits to the site, the less likely
the user is to view any specic item. Following [Hogg et al., 2013], we estimate
visibility of an item to user i as:
v
i
X
L
(G(1=(1 +
i
);L)(1 IG(;;L)))
(4.2)
The rst factor gives the probability that L newer posts have accumulated in user
i's stream since the the arrival of a given item. The accumulation of items is
a competition between the rates friends post new messages to the user's stream
and the rate user visits the stream to read the messages. The ratio
i
of these
rates gives the expected number of new messages in a user's stream since item
j's arrival. Taking friends activity and user activity each to be a Poisson process,
the competition gives rise to a geometric distribution with success probability
p = 1=(1 +
i
): G = (1p)
L
p. We will revisit how we estimate
i
in Section 4.1.5.
68
The second factor of Eq. 4.10 gives the probability that user i will navigate to
at least L + 1'st position in her stream to view the item. This is given by the
upper cumulative distribution of an inverse gaussian IG with mean and shape
parameter and variance
3
=:
exp
(L)
2
2
2
L
2L
3
(1=2)
:
(4.3)
This distribution has been used to describe the \law of surng" [Huberman, 1998],
and it represents the probability the user will view L items on a web page before
stopping. Therefore, the cumulative distribution of IG gives the probability the
user will view at leastL items, hence, navigating toL+1'st position in her stream.
We calculate personal relevance of the item j to user i as:
ij
g
(u
T
i
j
)
(4.4)
where symbol T refers to the transpose operation, u
i
represents topic prole of
user i,
j
represents topic prole of item j and g
is linear function for simplicity.
We represent topic proles of users and items in a shared low-dimensional space
as follows.
u
i
N (0;
1
u
I
K
)
j
N (0;
1
I
K
)
(4.5)
where K is the number of topics. Note that if we only use personal relevance
() and ignore visibility and interestingness, Vip model reduces to probabilistic
matrix factorization (PMF) model [Salakhutdinov and Mnih, 2008b] that learns
latent topics from user{item adoption behaviors.
The generative process for item adoption through a social stream can be for-
malized as follows:
69
For each user i
Generate u
i
N (0;
1
u
I
K
)
Generate v
i
P
L
(G(1=(1 +
i
);l)(1 IG(;;l)))
For each item j
Generate
j
N (0;
1
I
K
)
Generate
j
N (0;
1
)
For each user i
For each recommended item j from friends
Generate the adoption r
ij
N
v
i
g
r
(
ij
+
j
);c
1
ij
Here
ij
= u
T
i
j
,
u
=
2
r
=
2
u
,
=
2
r
=
2
, and
=
2
r
=
2
. Lack of adoption by
user i of item j (r
ij
= 0) can be interpreted in two ways: either user saw the item
but did not like it, or user did not see the item but may have liked it had she seen
it. While other models partly account for lack of knowledge about non-adoptions
using smoothing [Wang and Blei, 2011], we properly model visibility of items to
users. We set c
ij
as a condence parameter for user i's adoption on item j via
her friend and set it to a high value a
r
when r
ij
= 1 and a low value b
r
for items
recommended by friends and c
r
for the rest when r
ij
= 0 (a
r
> b
r
> c
r
> 0).
In this dissertation, we use the condence parameter values, a
r
= 1:0, b
r
= 0:03
and c
r
= 0:01, for c
ij
. We dene g
r
as linear functions for simplicity. The total
probability of the model is:
P (U;V;;jR) =P (U)P ()P (V )P ()P (Rj;;V )
(4.6)
70
4.1.2 Learning Parameters
To learn model parameters, we develop coordinate ascent, an EM-style algorithm,
to iteratively optimize the variablesfu
i
,
j
,
j
g and calculate the maximum a
posteriori estimates. MAP estimation is equivalent to maximizing the complete
log likelihood (`) of U, V , , and R given
u
,
,
, , and .
` =
u
2
N
X
i
u
T
i
u
i
2
M
X
j
T
j
j
2
M
X
j
j
T
j
+
N
X
i
log
L
X
l
(1=
i
+ 1)(
i
=
i
+ 1)
l
(1 IG(;;l)
!
c
ij
2
N
X
i
M
X
j
(r
ij
v
i
(
ij
+
j
))
T
(r
ij
v
i
(
ij
+
j
))
(4.7)
Given a current estimate, we take the gradient of ` with respect to u
i
,
j
, and
j
and set it to zero. The update equations are:
u
i
u
I
k
+ v
i
C
i
v
i
T
1
C
i
(v
i
R
i
v
i
v
i
)
j
u
I
k
+UVC
j
VU
T
1
UC
j
(VR
i
j
VVI
N
)
j
+v
T
C
j
v
1
v
T
C
j
R
j
VU
T
j
where C
j
is diagonal matrix with condence parameter c
ij
. Item visibility to user
i, v
i
, is represented as a diagonal matrix V or in vector format as v. We dene
asKM matrix,U asKN matrix andR
j
as vector withr
ij
values for all pairs
of users i for the given item j.
4.1.3 Prediction
After parameters are learned, Vip can be used to predict both item adoptions
by a user and how many times a given item will be adopted by exposed users,
71
i.e., predict the growth of its cascade. For user-item adoption prediction, user i's
adoption of item j recommended by a friend is obtained by point estimation with
optimal variablesf
, u
, v
,
g:
E[r
ij
jD]E[v
i
jD]
T
(E[
ij
jD] +E[
j
jD])
r
ij
v
i
(u
i
T
j
+
j
)
(4.8)
whereD is the training data. The adoption probability is decided by user visibility
v
i
, user topic prole u
i
, item topic prole v
j
, and item tness
j
.
The cascade growth prediction task predicts the expected number of new adop-
tions following the rst k adoptions of an item. Prediction also needs the item
adoption history of the users exposed by the rst k adopters, i.e., the followers of
the k adopters. In this propsal, we use the rst ve adoptions as training data
to test any k (k > 5) instead of retraining the model. The expected number of
adoptions of item j from the exposed followers of the after the rst k adopters is:
E[r
j
jD
5
;k]
X
i2fol(Adopters(j;k))
E[v
i
jD]
T
(E[
ij
jD] +E[
j
jD])
E[r
j
jD
5
;k]
X
i2fol(Adopters(j;k))
v
i
(u
i
T
j
+
j
)
(4.9)
whereD
5
is the training data which includes rst ve adoptions of the test itemj,
as well as adoptions of all other items. fol(Adopters(j;k)) is the set of followers
exposed by the rst k adopters. We compute the expected number of adoptions
at time k using visibility v
i
, user topic prole u
i
, item topic prole v
j
, and item's
popularity
j
learned fromD
5
for item j.
72
4.1.4 Data Sets (Twitter 2012)
Twitter 2012
Twitter oers an Application Programming Interface (API) for data collection.
The Twitter 2012 data set [Kang and Lerman, 2015b] contains tweets including
a URL to monitor information spread over the social network from Nov 2012 to
Jul 2013. We start by monitoring potential initial URLs containing http://t.co
1
from the streaming APIs
2
and collect all tweets containing them. Since the total
volume of tweets containing every URLs that we capture is very large, we focus on
popular and broadly shared URLs. We select initial URLs based on the heuristic
that URLs that have been appeared more often in the streaming APIs will be more
popular in the whole Twitter.
So, we selected URLs that appeared more than once in 5 days from the initial
appearance in the streaming APIs as our initial URLs. That reduced the number
of tweets from 55 million to 115K in the data collection. Next, we collected the
whole history of URL using the Twitter REST APIs
3
to reconstruct the entire
information sharing history. We kept monitoring these seed URLs until there
were no more tweets containing them within ve days from their last appearance
in the Twitter REST APIs. This yielded 12.5M tweets with 9.5M users. For this
experiment, we removed all URLs that have been tweeted by fewer than 10 distinct
users, which resulted in 27K URLs.
We selected users who adopted at least 10 URLs either independently (tweet)
or by retweeting a friend's recommendation (2,907 users). We then found all URLs
1
Twitter have been shortened all links posted in Tweets using t.co services to protect users
from malicious sites and other harmful activity.
2
https://dev.twitter.com/docs/streaming-apis
3
https://dev.twitter.com/docs/api
73
Table 4.1: Model parameters used in this study.
Parameters Value
number of topics K =30
user topic prole
u
=0.001
item topic prole
=0.001
item tness
=100000
law of surng = 14:0
= 14:0
views per URL post 7.6
typical URL posting rates 1.4
these users adopted through their friends' recommendations (cascade adoptions),
as well as all other users participating in resharing these URLs through friends'
recommendations. This resulted in 4,835 URLs, 79,066 users, and 421,917 tweets.
The sparseness of data set is 99.89%, with users adopting an average of 1.4 URL
and median of 1 URL. The distribution of the size of cascades of these URLs in
our data has a heavy-tailed distribution with relatively few URLs becoming very
popular while most spread to just a handful of users.
4.1.5 Model Selection
First, we study how parameters of Vip aect the overall performance of user-
item adoption prediction using recall@3 (see below). We use the same \law of
surng" parameters, = 14:0 and = 14:0, as [Hogg et al., 2013, Hogg and
Lerman, 2012] did in their study of Twitter and another social media site. The
expected number of new posts including a URL user i received,
i
, is computed
byrate
(url posts received)
i
=rate
(visits)
i
. The raterate
(posts received)
i
is proportional to the
number of friends (N
frd(i)
) i follows and their average posting frequency [Hodas
and Lerman, 2012, Hogg et al., 2013]. To estimate posting frequency of all users,
we have to track all their behaviors. Instead of tracking all users, we estimate it
using the typical URL posting rates of users from our data: rate
(posts received)
i
=
74
1:4N
frd(i)
. Useri visits Twitter at a raterate
(visits)
i
. This number is not available;
however, we expect it to be proportional to the number we do observe: the number
of posts of user i (N
posts(i)
). [Hogg et al., 2013] estimated that average number of
visits per post was 38 for Twitter users. Also, since around 20% of tweets include
a URL [Chaudhry et al., 2012], the posting rate of user i becomes rate
(visits)
i
=
7:6N
posts(i)
.
For the PMF model, we vary the parameters K 2f10, 30, 50, 100, 200g,
u
2f0.001, 0.01, 0.1, 1, 10, 100, 1000g,
2f0.001, 0.01, 0.1, 1, 10, 100, 1000g by
using grid search on validation recommendations. Throughout this dissertation,
we set parameters K = 30,
u
= 0:001,
= 0:001, a = 1, b = 0:03 and b = 0:01
both for PMF andVip that performed the best for PMF. For the tness parame-
ter, we vary
2f0; 0:001;:::; 100000g, while we x other parameters:
= 0:001
and
u
= 0:001. In this dissertation, we set
= 100000.
4.1.6 Experimental Results
User{Item Adoption Prediction
In the prediction task, we sort the items by r
ij
, the probability of adoption by
useri, and calculate the fraction of the X top-ranked items that the user actually
adopted. A user does not adopt an item either because she is not aware of it or
because she does not like it. This makes it dicult to use precision to evaluate
prediction results. Instead, we use recall@X to measure model's performance on
the prediction task (Eq. 3.15). We average recall values over all users to summarize
performance of the prediction algorithm.
We divide each user's adopted item into ve folds and construct the training
set and the test set. We use ve-fold cross validation and compare performance of
Vip to three baseline models: Random,Fitness andRelevance. TheRandom
75
(a) (b)
Figure 4.2: (a) Recall of user{item adoption prediction with dierent numbersX of
recommended items. The number of topics was xed at 30. (b) Average recall@3
of user{item adoption prediction for dierent activity levels (based on the number
of adoptions) of users with 30 topics. Error bars are shown indicating standard
deviation with the upper bar and the lower bar.
baseline chooses items at random from among the items in i's stream, i.e., items
adopted by i's friends. The baseline Fitness uses item tness values () learned
by Vip to recommend X highest tness items. The baseline Relevance bases
its recommendations on user and item topic proles learned by PMF to select X
most relevant items for recommendation.
Figure 4.2 (a) shows the overall performance of user{item adoption prediction
task when we vary X, the number of recommendations made by each model.
Note that a better model should provide higher recall@X for dierent X. Vip
outperforms all baselines, but the improvement is especially dramatic when the
number of recommended items is small: recall@3 was 0.30, 0.17, 0.16, and 0.12
for Vip, Relevance, Fitness, and Random models respectively. Note that as
the number of recommendation (X) increases, the recall of all models improves,
however, at the expense of precision.
76
Figure 4.2 (b) shows how prediction performance on the user{item recommen-
dation task varies with user activity level, that is how the number of items adopted
by the user in the training set aects recall@3 on the test set. The performance
of the Random baseline, which recommends three randomly chosen items from
the user's stream, does not vary with user activity level, as expected. Similarly,
Fitness baseline does not vary signicantly with activity, since it depends only
on the propensity of the item to spread. Both Vip and Relevance improve
with increasing user activity as they can learn better user{topic proles with more
training data. Note that for low activity users, whose interests are not well-known,
recommending items based on personal Relevance performs about the same as
picking items based on their tness, but as more can be learned about user's pref-
erences, Relevance outperforms picking items based on their tness (Fitness)
or picking them randomly from the user's stream (Random). Vip handily out-
performs competition over all user activity levels. This shows that accounting
for visibility dramatically improves predictability of user item adoptions in social
media than using personal relevance of item tness alone.
Cascade Size Prediction
In this evaluation, we compute the expected number of new adoptions for item
j after the rst k adoptions of the item. Since the new adoptions can only come
from the followers of thesek adopters, this task involves calculating the probability
each follower will adopt itemj. Due to data sparsity, we apply leave-one-out cross-
validation. Specically, we use the rst k = 5 adopters of a single item j, as well
as adoptions of all other items, as a training set, and the remaining adoptions of
item j beyond the rst k = 5 as the test set.
77
(a) (b) (c)
Figure 4.3: Dierent models predict cascade growth after the rst k adoptions
for three URLs: (a) technology news article, (b) youtube video, and (c) political
news article.
We compute the expected number of adoptions using the Vip, Relevance,
andVisibility models. TheVisibility baseline uses visibility values (v) of users
to compute the expected number of new adoptions from among followers of rst
k adopters. The baseline Relevance bases its recommendations on user- and
item-topic vectors learned by PMF to compute the expected number of adoptions.
We introduce another baseline,Followers, that estimates the expected number
of adoptions as a xed fraction of the number of followers who have been exposed
to item j but have yet to adopt it.
Figure 4.3 compares predictions of cascade growth made by dierent models
for three dierent items. The three items are: (a) technology news article, (b)
youtube video, and (c) political news article. Each model predicts the number
of new adoptions for an item at dierent times, i.e., after k adoptions, using the
model learned from the rst ve adoptions (k = 5). We rescale these predic-
tions using the ratio of predicted to actual values at k = 5. The solid blue line
shows the actual number of adoptions (Actual) among the followers of the rst
k adopters. Note that Followers represents the upper bound of the number
of new adoptions at time k and Actual can go up or down depending on the
78
number of followers exposed by the k adopters. For the news article (Figure 4.3
(a)), Vip predicts cascade growth almost perfectly over its entire lifetime, while
Relevance model performs similar toVisibility andFollowers models. Our
interpretation on this example is that the joint modeling of visibility and personal
relevance are important, since neither Relevance nor Visibility models can
catch the decreasing cascade growth as more users are exposed. Cascade growth
prediction for YouTube video shows similar pattern; however, the network plays
a bigger role than in Figure 4.3 (a), since the number of exposed followers grows
dramatically after about 20 adoptions. In Figure 4.3 (c), political news article has
been propagated over bigger and diverse communities since the number of follow-
ers keep increasing and neither Relevance nor Visibility models predict the
trends correctly. Outstanding performances of Vip and Relevance emphasizes
the importance of modeling user interests in cascade prediction.
To quantify the overall prediction performance, we compute the correlation
between the actual cascade growth values (Actual) and model predicted cascade
growth values over all k and average the correlation over 100 URLs (Figure 4.4
(a)). These URLs have dierent cascade sizes. Vip outperformed others on cascade
growth trends prediction for all URLs over all size of cascade. However, in case of
prediction task on URLs with small size of cascade, none of the models can predict
the cascade growth trends correctly. Note that only Vip model is able to predict
the cascade growth trends for URLs with cascade size 20, while other models
have all negative average correlations. Outstanding performances of Visibility
for cascade size 30 emphasized the importance of modeling visibility in cascade
growth trends prediction.
79
(a) (b)
Figure 4.4: (a) Average correlation between the actual and predicted cascade
growth values for URLs with dierent size of cascade. (b) root-mean-square error
(RMSE) between cascade growth values predicted by each model and the values
actually observed.
Average correlation between the actual cascade growth and the model predicted
growth can capture the predictive performance for the cascade growth trends. How-
ever it can not capture the actual dierences between the predicted size of cas-
cade and the actual cascade size, so we also tested root-mean-square error(RMSE)
between cascade size predicted by each model and the actual size in Figure 4.4 (b).
Vip performs best on RMSE over all URLs for any size of cascades. From both
RMSE and average correlation evaluations, Vip better predicts cascade growth
trends as well as cascade size over all k. Most of the unpredictability of cascade
growth comes from the heterogeneity of exposed users in dierent network posi-
tions [Kang and Lerman, 2013b]. For example, if an item is adopted by a user who
bridges dierent communities, new users will be exposed, who may not otherwise
have seen the item. This may breathe new life into the cascade and keep it growing
for a longer period.
80
4.2 Improving Explanatory Power by Integrat-
ing Words
In the previous section, we proposed a model that captures the three basic ingre-
dients of information spread in social media: item's visibility (v) to a user, its
tness or virality (), and its (personal) relevance () to the user. While the
model improves on previous models, it applies normal distribution assumptions on
modeling binary responses, uses full user-item adoption matrix, and provides no
descriptions on the learned latent topic space. In this section, we model binary
responses (adopted vs unadopted items) of social media users with multinomial
logic model. Our stochastic inference algorithm allows us to learn from randomly
sampled negative (not adopted) and positive (adopted) dyads without overtting
to the positive ones. This handles many user-item dyads and can be distributed for
ecient computation. Furthermore, with the help of a probabilistic topic model,
we can provide an interpretable low-dimensional representation of information.
Figure 4.5 graphically represents our model [Kang and Lerman, 2015a].
Item visibility When a user's message stream is delivered as a list of items,
the process of item discovery is biased by the position of each item in the list. A
user is more likely to see items near the top of the list than those deeper in the
stream [Lerman and Hogg, 2014]. Hence, items in top stream positions have higher
visibility. Since we do not know an item's exact position, we estimate it as the
average visibility of items to user i as follows:
v
i
X
L
(G(1=(1 +
i
);L)(1 IG(;;L)))
(4.10)
81
r
σ
u
σ
θ
K
D
α φ
N
M
σ
η
Figure 4.5: Our model with user topic (u) and item topic () proles, item's
personal relevance () and visibility to user (v), item tness (), expected number
of new posts user received () and item adoption (r). Topic model part has the
topic distribution () of an item and a distribution() over words from a vocabulary
of size M. N is the number of users, and D is the number of items.
The rst factor gives the probability that user i discovers an item depending on
the number of items in her stream. The greater the number of new messages
user receives between visits to the site, the less likely the user is to view any
specic item. Thus, average visibility depends on the frequency the user visits the
site and the rate of posts received. This competition between the rates friends
post new messages to the user's stream and the rate user visits the stream to
read the messages modeled by a geometric distribution with success probability
p = 1=(1+
i
): G = (1p)
L
p. The ratio
i
of these rates gives the expected number
of new messages in a user's stream. The second factor of gives the probability that
user i will navigate to at least (L + 1)-th position in the stream to view the item.
This is estimated by the upper cumulative distribution of an inverse gaussian IG
with mean and shape parameter and variance
3
=:
82
exp
(L)
2
2
2
L
2L
3
(1=2)
:
(4.11)
Item virality Social media users adopt items even if they had not earlier demon-
strated a sustained interest in their topics. This is often the case with viral, general-
interest items, such as breaking news or celebrity gossip. Thus, we use \virality"
to represent item's propensity to spread on exposure.
j
N (0;
2
)
(4.12)
Item relevance We calculate personal relevance of an item j to user i as:
ij
g
(u
T
i
j
)
(4.13)
where symbol T refers to the transpose operation, u
i
represents the topic prole
of user i,
j
represents the topic prole of item j and g
is linear function for
simplicity.
u
i
N (0;
2
u
I
K
)
j
N (0;
2
I
K
)
(4.14)
where K is the number of topics.
We use a widely known text mining algorithm Latent Dirichlet Allocation
(LDA) [Blei et al., 2003], which analyzes the co-occurrence of the words in docu-
ments, to learn the hidden topics representing the documents. In our case, LDA
captures the item's topic distribution , which is represented as K dimensional
vector in the recommendation model. The topic distribution of each document
(
d
j
) is viewed as a mixture of multiple topics, with each topic (
k
) as a distribu-
tion over words. In our setting, the corpus D is a collection of tweet text of the
83
tweet posts. The likelihood of D is computed by multiplying over all documents
and all words in each document as follows:
p(Dj
;
;z) =
Y
d
j
2D
Y
w2d
j
d
j
;zw
zw;w
(4.15)
wherez
w
is assigned topic index for each word w in the documentd
j
,
d
j
;zw
is the
likelihood of topics z
w
for the document d
j
and
zw;w
is the likelihood of choosing
specic word w for the topic z
w
.
The generative process for item adoption through a social stream can be for-
malized as follows:
For each user i
Generate u
i
N (0;
2
u
I
K
)
Generate v
i
P
L
(G(1=(1 +
i
);l)(1 IG(;;l)))
For each item j
Generate
j
N (0;
2
)
Generate
j
Dirichlet()
Generate
j
N (0;
2
I
K
) and set
j
=
j
+
j
For each word w
jm
Generate topic assignment z
jm
Mult(
j
)
Generate word w
jm
Mult(
z
jm
)
For each user i
For each item j on the news feed
Generate the adoption r
ij
p(I(r
ij
)ju
i
;v;;;O
i
)
Lack of adoption by user i of item j (r
ij
= 0) can be interpreted in two ways:
either the user saw the item but did not like it, or the user did not see the item but
may have liked it had she seen it. While other models partly account for the lack
84
of knowledge about non-adoptions using smoothing [Wang and Blei, 2011, Kang
and Lerman, 2013a], we properly model visibility of items to users.
We model the user-item adoption with Softmax function, which makes the
products of K dimensional vectors in [0-1] range. The equation is as follows:
p(I(r
ij
)ju
i
;v;;;O
i
) =
exp(v
i
g
r
(
ij
+
j
))
P
l2O
i
exp(v
i
g
r
(
il
+
l
))
(4.16)
whereI(r
ij
) is the indicator function,I(r
ij
) = 1 when useri adopted itemj and 0
otherwise, andO
i
is the observed items by useri. We deneg
r
as linear functions
for simplicity.
The main objective function is:
` =
1
2
2
u
N
X
i
u
T
i
u
i
1
2
2
D
X
j
j
T
j
1
2
2
D
X
j
(
j
j
)
T
(
j
j
)
+
N
X
i
log
L
X
l
(1=
i
+ 1)(
i
=
i
+ 1)
l
(1 IG(;;l)
!
N
X
i
D
X
j
log(
X
l2O
i
exp(v
i
(
il
+
l
)))v
i
(
ij
+
j
)
!
(4.17)
The last term of the equation minimizes the error between the binary rating and
the predicted rating. The second line of the equation minimizes the error between
the topics that explain the recommendation and the content. The importance
between these two components can be controlled with
. MAP estimation is
equivalent to maximizing the complete log likelihood (`) of U, V , , , and r
given
u
,
,
, , and .
85
Table 4.2: Model parameters used in this study.
Parameters Value
number of topics K =100
user topic prole
2
u
=10
4
item topic prole
2
=10
4
item tness
2
=10
law of surng = 14:0
= 14:0
views per post 38
typical posting rates 1.4
4.2.1 Learning Parameters with Stochastic Optimization
To optimize Eq. (4.17), we develop a stochastic gradient descent algorithm. Given
a current estimate, we take the gradient of Eq. (4.17) with respect to u
i
,
j
, and
j
and iteratively optimize the parametersfu
i
;
j
;
j
g. Derived update equations
are:
Algorithm 1 Stochastic Optimization
Initialize model parameter U;V;;;;r
for t = 1 to T do
for u in U do
Choose randomjr
i
j mini batch S
i
from D-r
i
Generate O
i
=r
i
[S
i
for j in O
i
do
u
i
u
i
[v
j
j
r +
1
2jr
i
j
2
u
u
i
]
j
j
[v
i
u
i
r +
1
2jr
j
j
2
(
j
j
)]
j
j
[v
i
r +
1
2jr
j
j
2
j
]
end for
end for
end for
wherejr
i
j is the number of items adopted by useri andjr
j
j is the number of users
who adopted item j. We generate a set of observed items O
i
by adding randomly
sampledjr
i
j number of items from the unadopted set (D-r
i
) and incrementally
learning from the unadopted and adopted item set of each user. We use the
86
learning rate with discount by a factor of 0.9 in each iteration Koren et al.
[2009].
The equation for gradient (r) is as follows:
r =
exp(v
i
g
r
(
ij
+
j
))
P
l2O
i
exp(v
i
g
r
(
il
+
l
))
I(r
ij
): (4.18)
The proposed recommendation model can be updated incrementally to model
dynamic user adoptions in real time. It is also computationally ecient since
it can be distributed by decomposing the data set over multiple computers.
4.2.2 Prediction
We evaluate the proposed model by using it to predict which items users will adopt.
For this task, user i's adoption of item j shared by a friend is obtained by point
estimation with optimal variablesf
, u
, v
,
g:
E[r
ij
jD]E[v
i
jD]
T
(E[
ij
jD] +E[
j
jD])
r
ij
v
i
(u
i
T
j
+
j
)
(4.19)
whereD is the training data. The adoption probability is decided by user visibility
v
i
, user topic prole u
i
, item topic prole
j
, and item tness
j
.
4.2.3 Model Selection
We demonstrate the utility of the proposed model by applying model to data from
the social media Twitter (Twitter 2012 data set) and evaluating its performance on
the prediction tasks. We use the same \law of surng" parameters, = 14:0 and
= 14:0, as [Kang and Lerman, 2015b, Hogg et al., 2013, Hogg and Lerman, 2012]
did in their study of social media. The expected number of new posts including a
87
URL user i received,
i
, is computed by rate
(url posts received)
i
=rate
(visits)
i
. The rate
rate
(posts received)
i
is proportional to the number of friends (N
frd(i)
)i follows and their
average posting frequency. To estimate posting frequency of all users, we use the
typical URL posting rates of users from our data: rate
(posts received)
i
= 1:4N
frd(i)
.
We estimate user i's visiting rate (rate
(visits)
i
) using the number of posts of user i
(N
posts(i)
). [Hogg et al., 2013] estimated that average number of visits per post was
38 for Twitter users. Also, since around 20% of tweets include a URL [Chaudhry
et al., 2012], the posting rate of useri becomesrate
(visits)
i
= 7:6N
posts(i)
(Twitter
2012 data set).
For the model hyper-parameters, we vary the parameters K2f10, 30, 50, 100,
200g, andf
u
;
g2f10
4
, 10
3
,..., 10
4
g by using grid search on validation set.
Throughout this dissertation, we set parameters K = 100,
u
= 0:01,
= 0:001,
both for PMF and CTF that performed the best for PMF. For the tness parameter
of [Kang and Lerman, 2015b] and the proposed model, we vary
2
2f10
4
, 10
3
,...,
10
4
g, while we x other parameters:
2
= 10
4
and
2
u
= 10
4
. In this dissertation,
we set
2
= 10.
4.2.4 Experimental Results: User-Item Adoption Predic-
tion
To evaluate the performance, we use precision (P), recall (R) and normalized dis-
counted cumulative gain (nDCG) for top-x recommended posts.
P@x computes the fraction of items that are adopted by each user in top-x items
in the list. We average the precision@x of all users.
88
Table 4.3: Overall prediction performance comparison using Precision@x (P@x),
Recall@x (R@x), normalized DCG@x (nDCG@x) on Twitter dataset.
Model Text P@10 R@10 nDCG@10
Random No 0.0483 0.3738 0.2410
Fitness No 0.0798 0.5924 0.3630
Relevance No 0.0647 0.4383 0.3170
Softmax-Vip No 0.0984 0.6446 0.4205
Softmax-CTR Yes 0.1047 0.6105 0.4123
Our Model Yes 0.1138 0.7022 0.4619
R@x computes the fraction of adopted items that are successfully discovered in
top-x ranked list out of all adopted items by each user. We average the
recall@x of all users.
nDCG@x computes the weighted score of adopted items based on the position
in the top-x list. It penalizes adopted items in the bottom of the top-x list.
We average the nDCG@x of all users.
We divide each user's adopted items into ve folds and construct the training
set and the test set. We use ve-fold cross validation and compare performance
of the proposed model to ve baseline models: Random, Fitness, Relevance,
Vip,CTR. TheRandom baseline chooses items at random from among the items
in user i's stream, i.e., items adopted by i's friends. The baseline Fitness uses
item tness values () learned by Vip to recommend k highest tness items. The
baselineRelevance bases its recommendations on user-topic and item-topic vec-
tors learned by PMF. Collaborative Topic Regression (CTR) [Wang and Blei,
2011] was originally introduced to recommend scientic articles. It combines col-
laborative ltering (PMF) and probabilistic topic modeling (LDA). It captures two
K-dimensional lower-rank user and item hidden variables from user-item adoption
matrix and the content of the items. This model uses textual information and
negative dyads, but unlike our method it uses `
2
function instead of a Softmax.
89
Here for a fair comparison, we implemented a Softmax version. Based on our
experiment Softmax versions of CTR and Vip outperformed original `
2
function
models. Note that recall@10 was 0.58 and 0.49 for `
2
function version Vip and
CTR respectively. The improvement in prediction performance gained by using
Softmax function instead of `
2
function showing that modeling binary responses
using Softmax function is critical in social media recommendation.
Table 4.3 shows the models' overall performance on the user{item adoption
prediction task. In this dissertation, we set x=10 since recommending too many
items is not realistic. From our experiments, we found that results are consistent
with dierent number ofk. While nDCG@x uses the position of correct answer in
the top-x ranked list, it does not penalize for unadopted items or missing adopted
items in the top-x ranked list, therefore one has to consider the performance of all
three metrics together. Intuitively a better model should have higher P@x, R@x,
and nDCG@x.
The experimental results show that the proposed model dramatically outper-
forms the random model with 135.61% and 87.85% respectively on precision and
on recall. A comparison against the random model is important to uncover the
complexity of the post-recommendation task. Fitness and Relevance models
yield 62.21% and 33.95% improvement over the random model in terms of pre-
cision, and 58.48% and 17.25% in terms of recall respectively. The gain of Vip
over Relevance is 52.08% on precision and 47.06% on recall, while the one of
CTR over Relevance is 61.82% on precision and 39.28% on recall. This shows
that accounting for cognitive biases dramatically improves predictability of user
item adoptions in social media as much as accounting for text description of items
alone. Among all models, the proposed model yields best performance, showing
that modeling text, as well as visibility, is critical in social media recommendation.
90
4.3 Visibility vs Item Fitness vs Personal Rele-
vance
We analyze URL cascades on Twitter by examining how the three factors learned
by the Vip model, namely, visibility, item tness, and personal relevance, con-
tribute to their success. First, we explore the impact of URL tness on the size of
cascade it creates. Figure 4.6 (a) shows cascade size of all URLs in the data set
vs their expected tness values (E(I)). As a reminder, item tness measures the
adoption probability per single view. Depending on the characteristics of the com-
munity that the URL has reached, tness can vary. In our data set, URLs that have
been shared within a community sharing a specic hobby (e.g., Japanese manga
or guitar lessons) or interest (e.g., teen idol or celebrity fan club), tend to have
high tness values. Since members share common topic preferences, items received
relatively high adoption rates per exposure, which also often translates into quick
adoption. As shown in Fig. 4.6 (a), high tness means high adoption rates per view
with statistically signicant 0.85 correlation with cascade size. However, 4% of the
URLs have tness values that are negatively correlated with cascade size (statisti-
cally signicant -0.51 correlation values) and 40% of the URLs show no correlation
between tness and cascade size. Apparently tness by itself cannot explain the
information adoptions, and other factors, such as visibility and personal relevance
also have to be considered.
We separate the eect of item quality from its visibility to the user. We dene
quality very loosely as the combined eect of its tness and relevance to adopters,
and measure it by the expected value of these variables. This denition aims to
make quality specic to the item itself, and separate from the details of how users
may discover it. Figure 4.6 (b) shows how the size of cascades in our data set
91
(a) (b)
Figure 4.6: (a) Cascade size vs the expected values of item tness E(I) of all items
adopted through friends' recommendations. (b) Cascade size vs expected values
of item tness plus personal relevance E(I+P) for all adopters. The size and color
each circle represents the expected value of that item's visibility.
Table 4.4: Cascade size, expected values, descriptions on Youtube video URLs
Descriptions Cascade Size E(V) E(I) E(P)
Strongbow surfers Neon Night Surng on Bondi Beach 84 49 -0.04 85.1
Jay-Z Music Video 141 50.5 -0.04 130.7
Parallels for Mac for Chrome OS and Windows 8 68 52.3 -0.05 71.6
Bahraini Activist Nabil Rajab 116 62.1 -0.03 127.6
The Swaggin' Wagon Brings a marriage Proposal - ellen degeneres show 87 65.6 -0.13 80.2
UNICEF - Making headway toward an AIDS-free generation 102 71.7 -0.11 100.6
Whitney Houston Video 120 73.5 -0.13 127.9
Paul McCartney's message from Moscow 109 87 -0.22 102.2
Ian Somerhalder Foundation 143 98.6 -0.24 150.8
depends on item quality and visibility. Each circle represents a URL, with its
color encoding the expected visibility of the URL. Not surprisingly, higher quality
URLs have larger cascades. More interestingly, some of the variance of cascade
size can be explained by visibility: for URLs of similar quality, the more visible
URLs spread more widely. In other words, for items cascading through a network
where users have similar topic preferences, the total size of the cascade is decided
by their visibility.
Next we illustrate the contributions of the three factors using specic case stud-
ies. There were 205 URLs to Youtube videos in our data set, with examples shown
92
Table 4.5: Cascade size, expected values, descriptions on news article URLs
Descriptions Cascade Size E(V) E(I) E(P)
Justin Bieber and Taylor Swift to Collaborate on New Song 94 36.6 -0.03 678.6
Fukuyama's 'Future of History': Is Liberal Democracy Doomed? 60 36.8 -0.05 1831.8
Apple Earns 80% of All Mobile Phone Prots 56 38.5 -0.05 305.3
Bobbi Kristina Denied Access to See Whitney's Body 73 38.8 -0.04 535.6
Neuroscience the new face of warfare: experts 69 47.1 -0.06 1893.4
Amazon launches 'Kindle Daily Deal' for UK readers 73 62.1 -0.05 528.59
Facebook posts can oer clues of depression 113 68.6 -0.11 218.33
Justin Bieber In Far East Movement's New Single 'Live My Life' 154 77 -0.07 1060.22
9 Ways Students Can Use Social Media to Boost Their Careers 120 79.2 -0.14 1385.39
Report: Broncos could deal Tebow if Manning signs on 125 82.1 -0.16 916.17
Mourinho: "We will ght to the end in each game to win the league" 116 86.8 -0.2 917.73
Internet accounts for 4.7% of U.S. economy 137 100.5 -0.21 709.86
in Table 4.4. Two of the most popular URLs in our data set were \Empire State
Of Mind Music Video by Jay-Z" and \Ian Somerhalder Foundation Video", which
were both adopted more than 140 times through friends' recommendation. The t-
ness of the \Jay-Z Music Video" is six times higher than that of \Ian Somerhalder
Foundation Video", but has half the expected visibility. Therefore, the high tness
value of \Jay-Z Music Video" makes up the relatively low visibility and reaches a
similar number of adoptions as \Ian Somerhalder Foundation Video".\Parallels for
Mac for Chrome OS and Windows 8" has similar visibility and tness as \Jay-Z
Music Video"; however, its cascade size is almost half of it. This is mainly due
to personal relevance of the URL to adopters. In other words, the item could
have been adopted by many more users, had it been sent to the right users who
were interested in the content. \Paul McCartney's message from Moscow" had
higher expected visibility than \UNICEF - Making headway toward an AIDS-free
generation"; however, it had a similar cascade size due to low tness. \Strongbow
surfers Neon Night Surng on Bondi Beach" has lowest visibility in Table 4.4 but
highest tness. The visibility and tness values of \Strongbow surfers Neon Night
Surng on Bondi Beach" were almost identical to those of \Jay-Z Music Video",
but the expected values of personal relevance to adopters was low; hence, it had a
smaller cascade.
93
We also looked at URLs to 600 news article, examples of which are shown in
Table 4.5. Two of the most retweeted news articles were about \Justin Bieber In
Far East Movement's New Single 'Live My Life"' and \Internet accounts for 4.7%
of U.S. economy", but their high number of adoptions comes from dierent factors.
While \Internet accounts for 4.7% of U.S. economy" became popular because of
high visibility, \Listen To Justin Bieber In Far East Movement's New Single 'Live
My Life"' became popular due to high tness and personal relevance.
4.4 Conclusion
We proposed Vip, a model that captures the mechanisms of information spread
in social media. Vip can recommend items to users, as well as predict how large
an item cascade will grow, based on how easily users nd an them in their stream
(visibility), how well the item aligns with their interests (relevance), and the item's
propensity to be adopted upon exposure (tness). Prediction is surprisingly accu-
rate, considering the crude estimates of visibility. Knowing visibility more accu-
rately will further improve prediction performance. The good performance of our
model on both user-item recommendation and cascade growth prediction tasks sug-
gests that information spread in social media is more predictable than previously
believed.
In the second part of this chapter, we improved Vip model on three main
components: we applied Softmax function to model binary responses (user-item
adoption), incorporate the text of messages in a generative model of information
spread, and improve the eciency of learning parameters with stochastic optimiza-
tion. Our stochastic inference algorithm handles many user-item dyads and can be
94
distributed for ecient computation. Furthermore, with the help of a probabilis-
tic topic model, we can provide an interpretable low-dimensional representation
of information. As a bonus, we also validated the importance of modeling users'
binary responses with Softmax function through the comparison of models with
the normal distribution assumptions on adoptions.
95
Chapter 5
Scalable Mining of Social Data
Social data is often highly heterogeneous and characterized by a long-tailed dis-
tribution [Kwak et al., 2010, Cheng et al., 2008, Lerman and Ghosh, 2010, Kang
and Lerman, 2012b]. For social recommendation applications, this translates into
a few users rating many items and many users rating few items, and vice versa. To
make use of the data, statistical inference algorithms have been developed to esti-
mate the hidden parameters of Bayesian models in a tractable way [Andrieu et al.,
1999, Zinkevich et al., 2010, Bradley et al., 2011, Mimno et al., 2012, Gemulla
et al., 2011]. Some recently proposed methods apply stochastic optimization for
large-scale Markov Chain Monte Carlo (MCMC) methods to Bayesian models.
Stochastic Gradient Langevin Dynamics method (SGLD) [Welling and Teh, 2011]
approximates the gradient over the whole data set using small number of samples
of data (mini-batch) with Gaussian noise injected to avoid collapsing the estimate
to a local Maximum a Posteriori (MAP) point. Stochastic Gradient Fisher Score
(SGFS) [Ahn et al., 2012] extends SGLD to speed up inference with \Bayesian
Central Limit Theorem". This enables SGFS to eciently estimate parameters
for large and dense data. Unfortunately, SGFS does not perform well on social
media data. The long-tailed distribution of social media data means that it has
both very dense and very sparse regions with few instances available for sampling
the mini-batch. Since SGFS needs to see a large enough mini-batch for unbiased
learning,
96
In this chapter, we explore the feasibility of distributed SGFS for scalable social
data mining, specically focusing on the task of social recommendation using prob-
abilistic matrix factorization (PMF) [Salakhutdinov and Mnih, 2008b] and collab-
orative topic regression model (CTR) [Wang and Blei, 2011] (Section 5.1). We
chose PMF and CTR because they serve as bases for several other Bayesian mod-
els used in social data mining applications [Salakhutdinov and Mnih, 2008a, Koren
et al., 2009, Ma et al., 2008, Purushotham et al., 2012, Kang and Lerman, 2013a,
Yang et al., 2011, Kang and Lerman, 2015b,a]. In Section 5.2, we compare three
standard inference algorithms: Gibbs sampling, Gradient descent, and Stochas-
tic Gradient Fisher Scoring and propose a hybrid version of SGFS to deal with
the challenges outlined in the next section. In Section 5.3, we evaluate the perfor-
mances of distributed SGFS on a social recommendation task using PMF and CTR,
and compare their performance to that of standard inference algorithms. We show
that SGFS outperforms Gibbs sampling and gradient descent on the MovieLens
data, a large and dense data set of movie ratings. However, SGFS underperforms
other inference methods because it cannot construct a large enough mini-batch
for an unbiased sample of sparse Social media data. We propose a hybrid solu-
tion [Kang and Lerman, 2013c] that takes advantage of SGFS speed up for dense
data and uses a standard inference method that is fast for sparse data. We show
that the hybrid distributed SGFS inference algorithm is better able to predict held
out ratings than distributed SGFS that do not take the sparseness of social media
data into account.
97
PMF$
V
R
U
N
σ
v
σ
u
D
σ
CTR$
V
R
U
K
N
M
α
λ
v
λ
u
D
(a) PMF (b) CTR
Figure 5.1: Graphical representation of (a) PMF and (b) CTR models. It captures
the user interests (U), item topics (V ) for recommendation, and item topics () to
explain contents. The user-item ratings (R) and the words (W ) of the items are
observed variables.
5.1 Probabilistic Models for Social Data Mining
In this section, we describe two models (PMF and CTR) that have been used
in social data mining applications [Salakhutdinov and Mnih, 2008a, Koren et al.,
2009, Ma et al., 2008, Purushotham et al., 2012, Kang and Lerman, 2013a, Yang
et al., 2011, Kang and Lerman, 2015b,a]. We will use these models to explore the
performance of dierent inference methods on social data sets.
5.1.1 Probabilistic Matrix Factorization
Probabilistic Matrix Factorization (PMF) [Salakhutdinov and Mnih, 2008a] is a
probabilistic linear model with Gaussian observation noise that handles very large
and possibly sparse data, e.g., when users provide very few ratings. PMF models
98
the user-item adoption as a product of twoK-dimensional lower-rank user and item
hidden vectors. The rating value of user i for item j (R
ij
) is dened as follows:
r
ij
N
u
T
i
v
j
;
(5.1)
whereN is the probability density function of the Gaussian distribution with
mean u
T
i
v
j
and variance . User and item hidden variables are drawn from a
normal distribution dened as:
u
i
N (0;
U
I
K
)
v
j
N (0;
V
I
K
)
(5.2)
We place zero mean K-dimensional spherical gaussian priors for both user and
item latent vectors. Figure 5.1(a) presents PMF model in a graphical form. The
model describes the relationships betweenN users andD items. We represent the
latent matrix of user interests as U2 R
KN
, the latent matrix of item topics as
V2R
KD
, and the observed user ratings of items as R2R
ND
.
The conditional distribution over the observed ratings can be represented as
P (RjU;V;) =
N
Y
i
D
Y
j
N (r
ij
ju
T
i
v
j
;)
I
ij
(5.3)
whereI
ij
is the indicator function that is 1 if the useri rated itemj and otherwise
it is 0. Given the parameters, computing the full posterior of u
i
, v
j
is intractable.
In Section 5.2 we describe three statistical inference methods to estimate hidden
variables of PMF.
99
5.1.2 Collaborative Topic Regression
Collaborative Topic Regression (CTR) [Wang and Blei, 2011] combines collabora-
tive ltering (PMF) and probabilistic topic modeling. It captures the user interests
(u) and item topics (v) from user item adoption observations and the descriptions,
or words, of items. The topics (
j
) of items to explain contents are generated from
Latent Dirichlet Allocation (LDA) [Blei et al., 2003] and latent variable
j
that
osets the topic proportions
j
to topics proportions v
j
. Figure 5.1 (b) presents
CTR in graphical form, whereN is the number of users,D is the number of items,
K is the number of topics, andM is the size of vocabulary. Ratings (r) and words
(w) in the item descriptions are observed variables. Rating is equal to 1 when user
adopts an item and 0 otherwise. Each variable is drawn from a normal distribution
dened as follows:
u
i
N (0;
1
u
I
K
)
r
ij
N (u
T
i
v
j
;c
r
ij
)
(5.4)
In LDA, the topic distribution (
1:D
) of each document is viewed as a mixture of
multiple topics, with each topic (
1:K
) a distribution over words from a vocabulary
of sizeM. Latent variable
j
N (0;
1
v
I
K
) captures the dierences between topics
that explain the contents of documents () and those that explain recommendations
(v). Depending on the choice of
v
, item j's topic distribution
j
is perturbed to
create latent vector v
j
, which could be similar to
j
or diverge from it:
v
j
N (
j
;
1
v
I
K
)
(5.5)
The generative process of CTR model is as follows:
For each user i
Generate u
i
N (0;
1
u
I
K
)
100
For each item j
Generate
j
Dirichlet()
Generate
j
N (0;
1
v
I
K
) and set v
j
=
j
+
j
For each word w
jm
Generate topic assignment z
jm
Mult()
Generate word w
jm
Mult(
z
jm
)
For each user i
For each adopted item j
Choose the rating r
ij
N (u
T
i
v
j
;c
r
ij
)
Similar to PMF (Section 5.1.1), we dene
u
=
2
r
=
2
u
and
v
=
2
r
=
2
v
and latent
vectorsu,v and are in a sharedK-dimensional space. The precision parameters
c
r
ij
serve as condence parameters for rating r
ij
and here for simplicity reason, we
set it 1 if the user rated item and otherwise set it 0.
5.2 Learning Parameters
Given the probabilistic model, which captures the generative process by which
the data was created, we infer the values of parameters of interest using statistical
inference. In this section we compare the performance of four inference algorithms:
Gibbs sampling, Gradient descent, Stochastic Gradient Fisher Scoring (SGFS),
and hybrid-SGFS on probabilistic models. We then compare the performance of
dierent inference algorithms on the user-item rating prediction task using PMF
and investigate the performance of SFGS on social media data using CTR.
101
5.2.1 Inference for PMF
Gibbs Sampling
Using the simplest Markov chain Monte Carlo (MCMC) algorithm, Gibbs sam-
pling, we cycle through the latent variables, sampling each one from its distribution
conditional on the current assigned values of all other variables. The conditional
distribution over the user feature vector u
i
, conditioned on the item latent matrix
V , observed user rating R, and hyperparameters is following:
P (u
i
jR;V;
U
;
V
;) =N
u
i
j
i
;
1
i
D
Y
j=1
N
r
ij
ju
T
i
v
j
;
I
ij
p(u
i
j0;
U
I
K
)
(5.6)
where
i
=
1
U
I
K
+
1
D
X
j=1
v
j
v
T
j
I
ij
i
=
1
i
1
D
X
j=1
[v
j
r
ij
]
I
ij
!
Similarly, the conditional distribution over the item feature vector v
j
, condi-
tioned on the user latent matrix U, observed rating R, and hyper parameters is:
P (v
j
jR;U;
U
;
V
;) =N
v
j
j
j
;
1
j
(5.7)
where
j
=
1
V
I
K
+
1
N
X
i=1
u
i
u
T
i
I
ij
j
=
1
j
1
N
X
i=1
[u
i
r
ij
]
I
ij
!
Gibbs sampling algorithm for PMF takes the following form:
102
Algorithm 2 Gibbs sampling for PMF
Initialize model parameter U;V
for t = 1 to T do
for u in U do
sample u from up(ujr;v) from Eq. 5.6
end for
for v in D do
sample v from vp(vjr;u) from Eq. 5.7
end for
end for
Gradient Descent
Algorithm 3 Gradient Descent for PMF
Initialize model parameter U;V;rU;rV
for t = 1 to T do
for u in U do
ru [V (V
T
uR
u
) +
u
u] +mru
u uru
end for
for v in V do
rv [U(U
T
vR
v
) +
v
v] +mrv
v vrv
end for
end for
Using gradient descent, we develop an EM-style algorithm to learn the Maxi-
mum a Posteriori (MAP) estimates. Maximization of the posterior is equivalent to
maximizing the complete log likelihood of U,V andR given
u
and
v
or to min-
imizing the sum-of-squared-error objective function with quadratic regularization
terms:
E =
u
2
N
X
i
u
T
i
u
i
+
v
2
D
X
j
v
j
T
v
j
+
1
2
N
X
i
D
X
j
(r
ij
u
T
i
v
j
)
2
(5.8)
where
u
==
U
and
v
==
V
. A local minimum of the objective function given
by above equation can be found by performing gradient descent in U and V . We
simply takes steps proportional to the negative of the gradient of the function at
103
each state. To avoid optimization out of local minima, we use momentum (m) and
learning rate ().
Stochastic Gradient Fisher Scoring
Algorithm 4 Stochastic Gradient Fisher Scoring for PMF
Initialize model parameter U;V;
^
I
U
;
^
I
V
for t = 1 to T do
for u in U do
Choose random n
u
mini batch r
u
from R
u
g
n
(u)
1
nu
P
i2ru
(r
i
u
T
V
i
)V
i
V (u)
1
nu1
P
i2ru
f(r
i
u
T
V
i
)V
i
g
n
(u)gf(r
i
u
T
V
i
)V
i
g
n
(u)g
T
^
I
u
(1
t
)
^
I
u
+
t
V (u)
Draw N [0;
4B
]
u u+
2(
nu+Nu
nu
N
u
^
I
u
+
4B
)
1
f
u
u +N
u
g
n
(u) +g
end for
for v in V do
Choose random n
v
mini batch r
v
from R
v
g
n
(v)
1
nv
P
i2rv
(r
i
U
T
i
v)U
i
V (v)
1
nv1
P
i2rv
f(r
i
U
T
i
v)U
i
g
n
(v)gf(r
i
U
T
i
v)U
i
g
n
(v)g
T
^
I
v
(1
t
)
^
I
v
+
t
V (v)
Draw N [0;
4B
]
v v+
2(
nv +Nv
nv
N
v
^
I
v
t
+
4B
)
1
f
v
v +N
v
g
n
(v) +g
end for
end for
where
t
= 1=t and stepsizes is a(b +t)
.
When the size of the data set is really large, MCMC algorithm is not able
to generate correct samples since it requires computations over the whole data
set for the Metropolis Hasting accept/reject step. Stochastic Gradient Langevin
dynamics (SGLD) [Welling and Teh, 2011] uses instead an ecient MCMC sam-
pling algorithm that combines stochastic gradient [Robbins and Monro, 1951] with
104
Langevin dynamics [Neal, 2010]. Here, stochastic gradient is used to approximate
the gradient over the whole data set using mini-batch of data, while Langevin
dynamics is used to avoid to collapse to MAP estimation by injecting Gaussian
noise in parameter estimations. Authors show that SGLD will sample from the
correct posterior distribution when the step sizes are annealed to zero at a certain
rate. For a hidden parameter vector with p() prior distribution and p(Xj)
likelihood for a data set X, update equation of SGLD algorithm is following:
+
C
2
fr logp() +N g
n
()g +
where N [0;C]
(5.9)
where g
n
(
t
) is the average of gradient of the log likelihood w.r.t. given mini-
batches andr logp(
t
) is gradient of the log of prior distribution. is the step
size and C is the preconditioning matrix [Girolami and Calderhead, 2011]. The
injected Gaussian noise () has zero mean and C covariance matrix.
Even though SGLD succeeds in generating samples at a very small cost, it
requires a large number of iterations due to slow mixing rate. In Stochastic Gradi-
ent Fisher Scoring (SGFS) [Ahn et al., 2012], authors extended SGLD by leveraging
\Central Limit Theorem". SGFS solves the slow mixing rate problem by sampling
from approximate gaussian distribution for high mixing rates, while sampling from
the accurate approximation of the posterior at slower mixing rates. For the same
hidden parameter vector withp() prior distribution andp(Xj) likelihood for a
data set X, SGFS update equation is:
+ 2
(
n +N
n
)N
^
I +
4B
1
fr logp() +N g
n
() +g; where N [0;
4B
]
(5.10)
105
where n is the number of data points, or observations, in a mini-batch and N is
the total number of data points in the whole data set.
^
I is an empirical Fisher
information (online average of the empirical covariance of gradients) and B is a
free symmetric positive denite matrix.
5.2.2 Inference for CTR
Distributed SGFS
Algorithm 5 Distributed SGFS for CTR
Initialize model parameter U;V;
^
I
U
;
^
I
V
for t = 1 to T do
for all u in U
t
do in parallel
Choose random n
u
mini batch r
u
from R
u
g
n
(u)
1
nu
P
i2ru
(r
i
u
T
V
i
)V
i
V (u)
1
nu1
P
i2ru
f(r
i
u
T
V
i
)V
i
g
n
(u)gf(r
i
u
T
V
i
)V
i
g
n
(u)g
T
^
I
u
(1
t
)
^
I
u
+
t
V (u)
Draw N [0;
4B
]
u u+
2(
nu+Nu
nu
N
u
^
I
u
+
4B
)
1
f
u
u +N
u
g
n
(u) +g
end for
for all v in V
t
do in parallel
Choose random n
v
mini batch r
v
from R
v
g
n
(v)
1
nv
P
i2rv
(r
i
U
T
i
v)U
i
V (v)
1
nv1
P
i2rv
f(r
i
U
T
i
v)U
i
g
n
(v)gf(r
i
U
T
i
v)U
i
g
n
(v)g
T
^
I
v
(1
t
)
^
I
v
+
t
V (v)
Draw N [0;
4B
]
v v+
2(
nv +Nv
nv
N
v
^
I
v
t
+
4B
)
1
f
v
(v) +N
v
g
n
(v) +g
end for
end for
Stochastic Gradient Fisher Scoring can be run in parallel to speed up the infer-
ence algorithm. At each iteration, we randomly select and distribute users or items,
106
then P processors update each user or item individually. We develop distributed
SGFS for CTR that we will use to analyze social media data. We discard samples
from burn in periods and collect samples forU andV in each iterations. We devel-
oped distributed Stochastic Gradient Fisher Scoring algorithm using OpenMP
1
.
Distributed Hybrid SGFS
In sparse regions of data, where there are few observations, SGFS will not perform
well, since there are not enough data points for constructing an unbiased mini-
batch. We propose a hybrid SGFS inference scheme that can handle data that
follows a long tail distribution. The idea behind the scheme is that hidden vari-
ables with enough observations for the Central Limit Theorem to hold (N
CLT
) will
be inferred using Stochastic Gradient Fisher Scoring, while those with few observa-
tions will be inferred using Gradient Descent with learning rate and momentum
m.
5.3 Experimental Results
In this section, we evaluate the utility of using distributed Stochastic Gradient
Fisher Scoring inference algorithm for social data mining on a variety of data sets.
Note that we measure data sparseness by the fraction of zeros in user-item adoption
matrix.
1
http://openmp.org
107
5.3.1 Data Sets
KDD Cup 2012
In KDD Cup 2012 track 1
2
, Tencent Weibo, one of the largest micro-blogging
websites in China, released the dataset for the user-item adoption prediction task.
Like other social networking sites, users can follow other users and adopt items to
share interests in Tencent Weibo. The original data set includes 2.3 million users,
6 thousand items, and 50 million links in the social network, 5.2 million user-
item adoptions, user proles (including user keywords, age, gender, tags, number
of tweet), item categories (preassigned topic hierarchy), item keywords, and user
activities.
In this dissertation, we only use user-item adoption history and item descrip-
tions. We examine only active users who adopted at least 30 items in the training
set, resulting in 24,962 users, 6,095 items, and 2 million user-item adoptions. For
evaluation, we use the provided test set of the KDD Cup competition (0.8 million
recommended items by their system) and the answers containing items actually
adopted by users. The sparseness of this data is 98.75%, with users adopting on
average 77 items with median of 58 items. While the frequency distribution of the
number of items adopted by users and the number of users adopting each item
distributions are long-tailed, they are less power law-like than other social media
data sets we studied.
CiteULike
At CiteULike, users create personal reference libraries to bookmark and organize
a set of articles using their own tags, follow other users, and subscribe to groups
2
http://www.kddcup2012.org/c/kddcup2012-track1
108
with similar interests in research areas. Each article has the information about
authors, a title, an abstract, tags, reviews and notes by users. We obtained a copy
of CiteULike
3
data set on Feb 2013, containing the public information about users,
papers, tags that have been used by users to describe papers, and the group infor-
mation that link users to group by subscription. Since the title and the abstract
information are not available from the copy of CiteULike, we merged all tags used
by all users to describe each article and use them for item description. We remove
stop words (using Python NLTK package) and use tf-idf to choose vocabulary. We
selected users adopted at least 10 papers and papers has been adopted at least
5 times, resulting in 5,204 users, 31,204 items with total 303,077 adoptions, and
29,321 tags. The sparseness of this data set is 99.81%, and users adopted an aver-
age of 9 items and a median of 8 items. Note that the frequency distribution of
each users's library size (number of articles) and the number of people who adopted
each article are both long-tailed distributions, with about half of the users with
fewer than 30 articles in their library.
MovieLens
The MovieLens
4
data set contains 1 million anonymous ratings of 3.9 thousand
movies, which are made by 6 thousand users since 2000. The sparseness of data
is 76.6%, and users adopted an average of 166 items, with median of 96 items.
We used the same training and test data set split as in Salakhutdinov and Mnih
[2008a] (0.9 million training and 0.1 million test ratings).
3
http://www.citeulike.org/faq/data.adp
4
http://www.grouplens.org/
109
Twitter 2012
For this experiment, we removed all URLs that have been tweeted by fewer than
10 distinct users, which resulted in 27 thousand URLs. To construct the item
description, we merge all tweets containing the same URL. We focused the data
further by selecting URLs that containing at least 5 terms and have been tweeted
at least 5 distinct users and users who adopted at least 5 URLs, resulting in 6,526
URLs, 14,922 users, and 121,483 tweets. The sparseness of dataset is 99.87%, users
adopted an average of 8 items and median of 6 items. Further details of Twitter
2012 data set can be found from Section 4.1.4.
5.3.2 Performance of SGFS using PMF
First, we use PMF to predict user-item ratings in the MovieLens data set. We use
this data set as a benchmark to compare the performance of dierent inference
algorithms, including Gibbs sampling, Gradient descent and SGFS, on the user-
item prediction task. For PMF model parameters, we set = 0.01,
u
=
v
=0.1,
and the number of topicsK = 10. We take the rst 50 iterations as burn-in period
both for Gibbs sampling and SGFS. We set momentum (m) as 0.8 and learning
rate () as 0.0005 for Gradient descent. For Gradient descent and Gibbs sampling,
we use the whole training data set to update latent variables in each iteration,
whereas we only use mini-batch (size of 20, 30, 50, and 100) of randomly selected
users and items to update latent variables in each iteration. We ran all three
algorithms for 2000 iterations and training set log-likelihood is increasing in all
cases.
Figure 5.2 shows results of dierent inference algorithms on the rating pre-
diction task in the held-out set using Root Mean Squared Error (RMSE). The
best performance in CPU time was achieved by Gradient descent with 5.0 sec per
110
10
0
10
0
10
1
10
2
10
3
10
4
10
5
0.8
0.9
1
1.1
1.2
CPU Time (Log)
RMSE
SGFS (n=20)
SGFS (n=30)
SGFS (n=50)
SGFS (n=100)
Gibbs Samping
Gradient Descent
Figure 5.2: Held-out set prediction RMSE (root mean squared error) using Gradi-
ent descent, Gibbs sampling, and Stochastic gradient sher scoring (SGFS) infer-
ence algorithms on MovieLens data.
iteration followed by SGFS with 20 data samples 7.3 sec per iteration. While Gra-
dient descent suers from over-tting, with degraded prediction results (increasing
RMSE) after some number of iterations, Markov Chain Monte Carlo inference
algorithms (both SGFS and Gibbs sampling) show consistent improvement of pre-
diction performance. The performance of Gibbs sampling is the worst among the
three in both speed and accuracy. With the increment size of the mini-batch of
SGFS, the RMSE results improved quickly with the per-iteration CPU time trade-
o. Even though prior work [Ahn et al., 2012] recommends using mini-batches
larger than 100 in size to ensure the Central Limit Theorem holds, our experi-
ments with MovieLens data set demonstrate that mini-batches as small as 20 work
ne for this data. Note that we also tested SGFS with mini-batch size smaller
than 20 samples; however, the training set log-likelihood kept decreasing rather
than increasing due to the error propagation.
111
5.3.3 Performance of SGFS using CTR
Next, we study the performance of SGFS inference algorithm on social media data.
We apply Collaborative Topic Regression model on user-item adoption prediction
task using KDD Cup 2012 Tencent Weibo, CiteULike and Twitter data sets. After
all the parameters are learned, we used in-matrix prediction with user's interest
(u) and item's topic (v). As Wang and Blei [2011] mentioned, in-matrix prediction
refers to the prediction task for user's rating on item that has been rated at least
once by other users, while out-of-matrix prediction refers to predicting user's rating
on a new item that has no rating history.
For in-matrix prediction with user interest (u), the point estimation prediction
that user i's rating for item j is:
E[r
ij
jD]E[u
i
jD]
T
(E[
j
jD] +E[
j
jD])
r
ij
u
i
T
v
j
(5.11)
where the prediction of user's rating is decided by user interest proleu
i
and item
topic prole v
j
. To evaluate performance, we consider a user bookmarking an
article on CiteULike or tweeting a URL on Twitter as an item adoption event by
a user with a rating of one. A rating of zero (non-adoption) means that either the
user did not like the item or did not see it; therefore, we only consider rated items
while evaluating the test set. We userecall@X to measure model's performance on
the prediction task (Eq. 3.15). We average recall values over all users to summarize
performance of the prediction algorithm. Note that a better model should provide
higher recall@X.
For KDD Cup data set, we use the answers provided by competition as ground
truth. For CiteULike and Twitter data, we randomly split the data into training
112
KDD Cup 2012 CiteULike Twitter 2012
10
3
10
4
10
5
−10
5.7
−10
5.8
num of Iterations(Log)
Likelihood (Log)
10
3
10
4
10
5
−10
5.3
−10
5.4
num of Iterations(Log)
Likelihood (Log)
10
3
10
4
10
5
−10
5.3
−10
5.32
−10
5.34
num of Iterations(Log)
Likelihood (Log)
(a) (b) (c)
10
4
10
5
0.5
0.52
0.54
0.56
num of Iterations (Log)
Recall@10 (Log)
all users
> 30
> 50
> 100
10
4
10
5
0.08
0.1
0.12
num of Iterations (Log)
Recall@50 (Log)
all users
> 30
<= 30
10
4
10
5
0.1
0.2
0.3
num of Iterations (Log)
Recall@50 (Log)
all users
> 30
<= 30
(d) (e) (f)
Figure 5.3: Evaluation results on the KDD Cup 2012, CiteULike, and Twit-
ter 2012 data sets (Section 4.1.4) using Distributed SGFS. Top column reports
likelihood and bottom column reports recall@X.
and test sets, and performed 5-fold cross validation. High data sparseness makes
the prediction task extremely challenging, with less than one in a hundred chance
of successfully predicting adoptions if items are picked randomly. We used a poly-
nomial annealing schedule a(b +t)
for algorithm step size. We monitored the
training dataset likelihood to select the values and useda = 1;b = 10
4
, and = 1:1
where step size ranges from 4
4
to 2
6
during 1e+5 iterations. We take mini-batch
with n = 50 for KDD Cup, and n = 30 for CiteULike and Twitter. For CTR, we
used
v
= 100,
u
= 0:01, and 50 topics.
Figures 5.3 (a), (b) and (c) show the training set log-likelihood at each iteration
and Figure 5.3 (d), (e), and (f) show recall@X for in-matrix prediction for CTR
with learned hidden variables at each iteration for KDD Cup 2012, CiteULike and
113
Twitter respectively. Training set log-likelihood for KDD Cup 2012 shows a big
jump because it requires some number of iterations to cover the whole data set
with a small mini-batch. Except for the initial stage in KDD Cup 2012, training
set log-likelihood keeps increasing as more iterations take place. We divide users
into several classes depending on their activity level and show them in dierent
color in Figure 5.3 (d), (e), and (f). In KDD Cup 2012, we categorize users into
all users and users who adopted at least 30, 50, and 100 items in the training set
respectively. Even though prediction performance diers for dierent classes, the
recall@10 grows over all cases.
On the other hand, for CiteULike and Twitter data sets, prediction perfor-
mance, as measured by recall@50, degrades with more iterations of the inference
algorithm. This occurs despite an increase in log likelihood, which suggests that
parameter estimates continue to improve. This occurs because both CiteULike
and Twitter data have a long tail containing few observations per variable: in
these data sets more than 80% of users adopted fewer than 30 items, while fewer
than 20% of users adopted more than 30 items. Because of the large population in
the long tail, the overall accuracy of SGFS degrades with more iteration as errors
propagate. Note that the number of users adopting at least 30 papers is 664, which
is 2% of all users in our CiteULike data, and the number of users who tweeted at
least 30 URLs is 192, which is 1% out of all users in our Twitter data, making it
very sparse.
5.3.4 Performance of hybrid SGFS using CTR
We address the challenge of long-tail distributed data using a hybrid inference
algorithm, described in the previous section. We evaluate the performance of the
hybrid algorithm on the same prediction task. We used a polynomial annealing
114
10
4
10
5
0.52
0.54
0.56
num of Iterations (Log)
Recall@10 (Log)
all users
> 30
> 50
> 100
10
3
10
4
10
5
0.05
0.1
0.2
0.3
0.4
num of Iterations (Log)
Recall@50 (Log)
all users
> 30
<= 30
10
3
10
4
10
5
0.1
0.2
0.3
0.4
0.5
num of Iterations (Log)
Recall@50 (Log)
all users
> 30
<= 30
(a) (b) (c)
Figure 5.4: Recall@X evaluation results on (a) KDD Cup 2012 (b) CiteULike,
and (c) Twitter 2012 using distributed hybrid SGFS.
schedule witha = 1;b = 10
4
, and = 1:1 with momentum (m) as 0.7 and learning
rate () as 0.0005, 0.0001, and 0.0001 for KDD Cup 2012, CiteULike, and Twitter
respectively. Depending on the fraction of data following Central Limit Theorem,
we applied dierent andN
CLT
. For KDD Cup 2012, we useN
CLT
as 50 and for
CiteULike and Twitter, we use N
CLT
as 30. We take 2000 iterations for burn in
and collect samples from SGFS to update parameters.
Figure 5.4 (a), (b), and (c) shows the recall@X for in-matrix prediction for
CTR. Compared to the original SGFS in Figure 5.3 (d), (e), and (f), hybrid SGFS
shows improved prediction performance for all iterations in all user categories in all
three data sets. Furthermore, as the algorithm learns users in dierent categories
correctly, errors coming from SGFS do not propagate. Note that, with the same
settings, recall@X performance with Gradient descent is 0.51, 0.22, and 0.18 for
KDD Cup, CiteULike, and Twitter data sets respectively. Hybrid SGFS handily
outperforms both SGFS and gradient descent on this prediction task.
5.3.5 Scalability of distributed SGFS
Next, we explore the scalability of distributed SGFS by scaling up both the size
of the mini-batch and the number of threads. We tested scalability performance
115
0 200 400 600
5
6
7
8
9
sample size (n)
CPU time per epoch
SGFS
hSGFS
Figure 5.5: The average CPU time per epoch with dierent number of sample size
on KDD 2012 Data set. The number of thread is xed as 1.
on a multiple processor workstation with two Intel Xeon E5-2680 processors and
8 cores at 2.7 GHz and 64 GB of memory.
Size of mini-batch
In distributed SGFS, the gradient of the log-likelihood over the whole data set
(N) is approximated by a mini-batch of n sample data points. When the size
of mini-batch for each variable is large enough for the Central Limit Theorem to
hold, the posterior is close to Bernstein-von Mises approximation in SGFS. Based
on the assumption that our choice of n is large enough to hold the Central Limit
Theorem which does not aect the accuracy performance, the choice of the size of
mini batch will decide the per iteration computation complexity, which will save
the per iteration computation time asO(n) instead ofO(N).
116
1 10 30
10
0
10
1
number of Threads (Log)
CPU time per epoch (Log)
SGFS
Figure 5.6: The average CPU time per epoch with dierent number of threads
on KDD 2012 Data set. The sample size is xed as 200.
As we saw in Section 5.3.2, as we increase the size of mini-batch, RMSE
improves quickly with the computation speed degrading. We vary the size of
mini-batch (n) and see the trade-o between the prediction accuracy and the com-
putation speed using PMF model. With the small size of mini-batch, once we
run algorithms long enough to cover most of the data, the accuracies of dierent
mini-batch sizes become almost identical. To study the eect of increasing the
size of mini-batch on speed, we used the KDD Cup data and applied CTR on the
single thread. With the same trade o, Figure 5.5 displayed the per iteration CPU
time with dierent size of mini-batch. As we decreases the size of mini-batch, per
iteration CPU time decreases asymptotically. As long as the mini-batch size is
117
large enough for the Central Limit Theorem to hold (about 20), we can improve
the speed of inference by decreasing the size of mini-batch.
Number of Threads (Processors)
We can run distributed SGFS algorithm in parallel on P processors to facilitate
learning eciency. For each iteration, we split the number of users or the number
of items into P parts to speed up inference computations.
Figure 5.6 shows scale-up results as we increase the number of threads while
keeping the mini-batch size constant at 200. The impact of the number of threads
is much stronger than the size of mini-batch on computation time scale-up, mainly
because of the nature of social media data set. We often nd long tail distribution
of the number of observations per each user or item in social media data set, so
not all users or items will get the benet of taking small mini-batch.
5.4 Conclusion
In this chapter, we explored scalable inference algorithm SGFS for mining social
data. We showed that algorithm scales up with both the number of processors
and the size of mini-batch. However, distributed SGFS fails to learn good models
of long-tailed distributed data, resulting in poor prediction performance. In fact,
many social media data sets have such long-tailed distributions, containing sparse
regions with few observations per variable. To address this problem, and enable
us to use SGFS for mining social media data, we proposed hSGFS algorithm by
combining SGFS and Gradient descent to provide more ecient performance. Our
method showed signicant performance improvement in both speed and prediction
118
accuracy. Such advantage could help mining large-scale data that follows long-
tailed distribution eciently and without sacricing accuracy. We believe that
hSGFS is an initial step to further work on ecient MCMC sampling based on
stochastic gradients on long-tailed distributed data.
119
Algorithm 6 Distributed Hybrid SGFS for CTR
Initialize model parameter U;V;
^
I
U
0
;
^
I
V
0
for t = 1 to T do
for all u in U
t
do in parallel
if N
u
N
CLT
then
Choose random n
u
mini batch r
u
from R
u
g
n
(u)
1
nu
P
i2ru
(r
i
u
T
V
i
)V
i
V (u)
1
nu1
P
i2ru
f(r
i
u
T
V
i
)V
i
g
n
(u)gf(r
i
u
T
V
i
)V
i
g
n
(u)g
T
^
I
u
(1
t
)
^
I
u
+
t
V (u)
Draw N [0;
4B
]
u u+
2(
nu+Nu
nu
N
u
^
I
u
+
4B
)
1
f
u
u +N
u
g
n
(u) +g
else
ru [V (V
T
uR
u
) +
u
u] +mru
u uru
end if
end for
for all v in V
t
do in parallel
if N
v
N
CLT
then
Choose random n
v
mini batch r
v
from R
v
g
n
(v)
1
nv
P
i2rv
(r
i
U
T
i
v)U
i
V (v)
1
nv1
P
i2rv
f(r
i
U
T
i
v)U
i
g
n
(v)gf(r
i
U
T
i
v)U
i
g
n
(v)g
T
^
I
v
(1
t
)
^
I
v
+
t
V (v)
Draw N [0;
4B
]
v v+
2(
nv +Nv
nv
N
v
^
I
v
t
+
4B
)
1
f
v
(v) +N
v
g
n
(v) +g
else
rv [U(U
T
vR
v
) +
v
(v)] +mrv
v vrv
end if
end for
end for
120
Chapter 6
Information Access in Online
Social Networks
The position of a person in a social network determines how much information she
receives. Studies of social and organizational networks identied the importance
of so-called brokerage positions, which link individuals to otherwise unconnected
people [Granovetter, 1973, Burt, 1995, 2005, Aral and Van Alstyne, 2011]. Social
media provides us with new data for testing and generalizing information broker-
age theories. In contrast to email and phone interactions, where information is
exchanged between a pair of social contacts, social media users broadcast infor-
mation to all their contacts. While users can access more information by adding
more friends, cognitive (and temporal) constraints limit an individual's capacity to
manage social interactions [Dunbar, 1992, Goncalves et al., 2011b, Miritello et al.,
2013] and process the information they receive [Weng et al., 2012, Hodas and Ler-
man, 2012]. In addition, social media users vary greatly in the eort they expend
engaging with the site, leading to a large variation in user activity, as measured
by the number of messages posted on the site [Wilkinson, 2008].
In this chapter, we begin to study the social news aggregator Digg to investi-
gate the relationship between the structure of the social graph, user activity, and
access to information (Section 6.1). Next, we use data from the microblogging site
Twitter to study the interplay between network structure, the eort Twitter users
are willing to invest in engaging with the site, and the diversity of information
121
they receive from their contacts (Section 6.2). While, previous studies of the role
of networks in individual's access to information were limited in their ability to
measure the diversity of information, using bag-of-words [Aral and Van Alstyne,
2011] or predened categories [Kang and Lerman, 2013d] for this task, we learn
topics of interest to social media users from the messages they share with their
followers using theVip model (Section 4.2). We use learned topics to measure the
diversity of information users receive from their contacts. This enables us to study
the factors that aect access to diverse information in online social networks.
6.1 Structural Bottlenecks to Information
Access
The relationship between tie strength and access to novel information is subtle.
Though weak ties deliver novel information [Granovetter, 1973], such as, new job
prospects, the volume of communication along these ties is low, since these ties
represent infrequent, low intensity social interaction. This was conrmed by Aral
& Van Alstyne's analysis [Aral and Van Alstyne, 2011, Aral and David, 2012] of
email communication within a corporate recruiting rm. They showed that struc-
turally diverse network positions provide access to diverse and novel information,
though the positive eects of structural diversity are oset by lower volumes of
communication (bandwidth), what they call \diversity{bandwidth trade-o."
To date, little is known about how these factors operate in online social net-
works and how they compare to real-world and email networks. Ties in online social
networks, including social media sites Digg and Twitter, are often non-reciprocal,
with users sharing messages with both friends they know in real life and strangers.
We explore the questions about how users can broaden their access to information
122
Variable Description
S number of active friends
ND network diversity
O volume of outgoing info. (# votes by user)
I volume incoming info. (friend recommendations)
B avg friend activity
uB user activity (# adopted recommendations)
TD friend topic diversity
NRI novel information
NRI
frds
novel information friends are exposed to
NAR fraction of novel information adopted by user
FNAR fraction of novel information adopted by friends
Table 6.1: Variables used in the study.
by controlling their position within the network and their activity level using Digg
2009 (Section 3.1.2) and Dig 2010 (Section 3.2.5) data sets.
In Section 6.1, we study the interplay between network structure and infor-
mation content by analyzing how users of the social news aggregator Digg adopt
stories recommended by friends, i.e., users they follow [Kang and Lerman, 2013d].
We measure the impact dierent factors, such as network position and activity
rate; have on access to novel information, which in Digg's case means set of dis-
tinct news stories.
6.1.1 Methods
Following Aral & Van Alstyne [Aral and Van Alstyne, 2011, Aral and David, 2012]
we dene a set of variables to characterize access to information in networks.
Network Variables
A social network can be represented as a graph G = (U;E) consisting of a set
of users U and a set of edges E between them. There exists edge e
ij
2 E, if
123
user i follows user j. While in traditional social networks, friendship links as
reciprocated, resulting in an undirected graph, online social networks (e.g., Twitter
and Digg) form a directed graph. It allows users to follow people with certain
interests without having a reciprocal relationship. The neighborhood N
i
of user i
consists of both friends N
frd
i
and followers N
fol
i
of u
i
.
Network Size Network size is an important variable which shows the breadth
of contacts each user has. We dene the size of i's network, S
i
, as the number
of friends from whom user i received messages during certain time period T ,
which we take to be the time over which data was collected. Since not all friends
were active during that period and thus had a chance to in
uence user i votes, we
focused on active friends, i.e., friends who had recommended stories during T .
Therefore, network size is dened as
S
i
=
X
l2N
frd
i
I(r
l
)
(6.1)
where N
frd
i
is the set of friends of user i and the indicator function I(r
l
) is one if
and only if friendl voted during the time period t and zero otherwise. Note that
ten is the minimum number of messages to cover all topic categories in Digg 2009
and Digg 2010 data set.
Network Diversity Network diversity of user i represents how many other-
wise unconnected neighbors (both friends and followers) user i interacts with. We
measure network diversity using local clustering coecient [Watts and Strogatz,
124
1998], C
i
, which quanties how often the neighbors of user i are linked together
(regardless of the direction of the edge):
C
i
=
jfe
jk
:j;k2N
frd
i
;e
jk
2Egj
jN
i
j(jN
i
j 1)
(6.2)
where N
i
is the set of neighbors of user i andjN
i
j is the number of neighbors.
The total number of possible connections among neighbors isjN
i
j(jN
i
j 1). High
clustering coecient implies low network diversity, and vice versa. Therefore, we
dene network diversity of user i asND
i
= 1C
i
. Aral & Van Alstyne [Aral and
Van Alstyne, 2011, Aral and David, 2012] dened network diversity as the lack of
structural holes using the rst and second order dimensions of link redundancy.
We prefer to follow the denition of Watts et al. [Watts and Strogatz, 1998], since
clustering coecients are more evenly distributed over the range from 0 to 1.
User Activity Variables
Access to information in a social network depends on the activity levels of users. In
friendship networks, strength of a tie denes frequency and intensity of interaction
of a pair of individuals [Granovetter, 1973]. Close friends | strong ties | interact
more frequently than acquaintances (weak ties). In the analysis of email commu-
nications, Aral & Van Alstyne used the quantity channel bandwidth to represent
the strength of a tie. They dened bandwidth as the number of messages sent
across the tie. One-to-many directed broadcasts of social media dier in nature
from email communication. We nd it useful to separate activities into incoming
messages and outgoing messages.
Average Friend Activity In social media, friends' activity determines the total
volume of incoming information I
i
over a time period T . We measure I
i
as the
125
total number of stories friends of user i recommended, i.e., voted for, during the
time period T . Hence, we dene the average (per link) volume of incoming
information during T as:
B
i
=
I
i
S
i
(6.3)
User Activity Most social media sites, including Digg and Twitter, display
items from friends as a chronologically sorted list, with the newest items at the
top of the list. A user scans the list and if he nds an item interesting, he may share
it with his followers, e.g., by voting it. He will continue scanning the list until he
loses interest, gets bored or distracted [Hodas and Lerman, 2012]. When user gets
bored, he could start to inspect new information from outside of his social network
and recommend new information to all his followers. User i's activity is the sum
of the number of new storiesO
s
i
(seeded messages) user discovered from outside of
his network by browsing the Web and other sections of Digg, and the number of
storiesO
a
i
he adopted from friends' recommendations. In the analysis presented in
this thesis we focus on the component of user activity that corresponds to adoption
events, i.e., cases where user votes for a story after a friend had recommended it.
Therefore, we measure the activity of the user i as the number of adoptions the
user made during the time period T :
uB
i
=O
a
i
(6.4)
Information Variables
We model information content in a user's network using topic diversity of infor-
mation and the total volume of novel information. We use the topics Digg assigns
126
to each story to represent its content. Figure 3.4 shows the distribution of Digg-
assigned topics in our Digg 2009 and 2010 data sets, that is the percentage of
stories assigned to each topic. In both data sets, \Obeat," \Entertainment,"
\Lifestyle" and \Technology" were the most popular topics, while \Sports" and
\Gaming" were the least popular topics. Overall, there is no dominant topic in
either dataset and the popularity ranking of dierent topics are almost identical.
The topics assigned to stories by Digg provide useful evidence for identifying user's
topic preferences. We represent user i's topic interest vector
i
by computing the
fraction of votes he made on each topic.
Topic Diversity The variance of topics to which a user is exposed to by his
friends has important implication for modeling information in a social network.
Topic diversity of a user's network neighborhood measures the variance of friends'
topic interests: when most of friends have orthogonal interests, topic diversity will
be high, whereas when most of friends have similar topic interests it will be low.
Diversity can be captured in several dierent ways. Aral & Van Alstyne [Aral
and Van Alstyne, 2011, Aral and David, 2012] dened topic diversity as the average
cosine distance of friends' topic vectors and their mean topic vector aggregated over
all friends. Based on our experiments, Aral & Van Alstyne's measurement is not
able to capture topic diversity correctly for users with the same mean (based on
friend topic vectors) but dierent number of friends. Instead, we dene a user i's
topic interest vector
i
in terms of the Digg-dened categories. Each component
of
i
represents the fraction of all votes made by user i on stories belonging to
127
that category. Then, we dene topic diversity of a user's network by averaging
pair-wise cosine distances of friends' topic interest vectors.
TD
i
=
P
N
frd
i
j=1
P
N
frd
i
k=1
(1Cos(
j
;
k
))
S
2
i
(6.5)
Novel Information Total amount of novel information is another important
measure of information content of networks. In many social media services, the
same message or a piece of information can be recommended multiple times by
multiple friends. Since most social media services provide a unique identier for
each message (e.g., original tweet id on Twitter or story id on Digg), we can
measure the amount of novel information that a user is exposed to during time
period T by counting the number of distinct messages, or stories on Digg, to
which user's friends expose her. Following Aral & Van Alstyne, we refer to this
quantity asNRI
i
, or non-redundant information, although in Aral & Van Alstyne's
studies, this quantity was not measured directly but derived from topic diversity
and friend activity. In addition to the amount of novel information, we can also
measure the novel information rate in a user's social network as R
i
= NRI
i
=I
i
.
Of the total volume of novel information (NRI
i
) user i is exposed to through
friends' recommendation activities, she adopts a subset O
a
i
based on the topics or
popularity of information. We measure the novel information adoption rate by
NAR
i
=O
a
i
=NRI
i
.
Novel Information Potential In social media, user's access to novel informa-
tion is mainly determined by the activities of their friends. We introduce a new
variable to dene the volume of novel information that a user could potentially be
exposed to if his friends adopted all the information they themselves were exposed
to. We measure NRI
frds
i
, the potential amount of novel information user i could
128
access, by counting the number of distinct stories that all friends of user i are
exposed to. While the friends of user i have access to NRI
frds
i
novel information,
they adopt a subset of this information based on their interests, exposing useri to
NRI
i
novel information. We measure friend novel information adoption rate by
FNAR
i
=NRI
i
=NRI
frds
i
.
6.1.2 Access to Information
In the study of email communication within a corporate recruiting rm, Aral & Van
Alstyne observed that both the total volume of novel information (NRI
i
)
owing
to recruiters and its diversity (TD
i
) increased with their network size, network
diversity and channel bandwidth (the number of emails they received along each
tie). We tested whether the same conclusions hold for the online social network
of Digg. Specically, whether larger (S
i
), structurally diverse networks (ND
i
) or
high friend activity (B
i
) are likely to deliver more novel information (NRI
i
) and
topically diverse information (TD
i
).
Eect of Network Size
One of simplest ways users can control their position within a network is by adding
friends. But, does having more friends improve access to information in online
social networks? Figure 6.1 shows the volume of novel information (NRI) users
can access as a function of the number of friends (S). The amount of novel infor-
mation increases as users add more friends, but saturates quickly. Surprisingly, no
single user had access to all the information available in the network (shown as a
line in Figure 6.1). The highest number of distinct stories any user was exposed
to in the Digg 2010 dataset was 29,558, or 80% of the total (36,883 distinct sto-
ries). It appears that adding more friends in an online social network improves
129
Figure 6.1: Amount of novel information (NRI
i
) a user is exposed to as a function
of the number of active friends (S
i
) in the Digg 2010 data set. The line represents
the total number of distinct stories in the data set.
access to novel information, but very quickly, after about 100 friends, it becomes
counterproductive, since doubling the number of friends raises the volume of novel
information only a few percentage points.
Eect of User Activity
In addition to creating new social links, a user can choose to link to more active
users in order to improve his access to information in a social network. Does
having active friends, i.e., friends who recommend many stories, lead to greater
access to novel information? Figure 6.2 shows the relationship between the volume
of novel information in a user's network as a function of average friend activity
(referred to as channel bandwidth by Aral & Van Alstyne). Figure 6.2 (a) shows
130
0 1000 2000 3000 4000 5000
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
4
Friend Activity (B)
Novel Info. Volume (NRI) / Potential (NRI
frds
)
novel info. volume (NRI)
smoothed NRI
novel info. potential (NRI
frds
)
smoothed NRI
frds
(a)
0 1000 2000 3000 4000 5000
0
0.2
0.4
0.6
0.8
1
Friend Activity (B)
Novel Information Rate (R)
(b)
Figure 6.2: Novel information in a user's network in the Digg 2010 data set. (a)
The total amount of novel information that user's friends (NRI
frds
) and the user
(NRI) are exposed to as a function of average friend activity (or channel bandwidth
B). Solid symbols show smoothed data, and the line represents the total amount
of information in the network (number of distinct stories in the data set). (b)
Novel information rate as a function of friend activity.
131
the amount of novel information that user's friends are exposed to (NRI
frds
i
). The
solid line represents the total amount of information in the network, i.e., distinct
stories in the data set. Potential amount of novel information rises quickly as a
function of friend activity, approaching near-maximum. However, the amount of
novel information to which the user is exposed is just a fraction of this maximum,
as shown in Figure 6.2 (a). Interestingly, the amount of potential novel information
and novel information available to the user both decrease as friend activity grows
past 2000. Our results indicate that while linking to more active users does initially
improve access novel information in a social network, after a certain point, higher
friend activity no longer increases the amount of novel information available to the
user, but may even slightly suppress it.
Figure 6.2 (b) shows the rate at which users receive novel information, i.e.,
the fraction of novel information in their information stream, as a function of the
average activity of their friends (channel bandwidth B
i
). The gure clearly shows
that as friends become more active, by voting for more stories, the fraction of novel
information in the user's social stream drops precipitously. As we show later, this
is due to higher redundancy of incoming information. In online social networks,
friends activity is an important factor in deciding the extent and the amount of
novel information available to the user.
Eect of Network Structure
Next, we study the interplay between network structure and user activity and
their impact on access to information. Aral & Van Alstyne demonstrated that
while structurally diverse networks provide greater access to information, their
benets are oset by lower rate of communication along structurally diverse ties.
Due to such \diversity{bandwidth trade-o," people can increase their exposure to
132
Figure 6.3: Scatterplot showing network diversity vs average friend activity (chan-
nel bandwidth) for Digg users who are divided into three populations based on the
number of friends in the Digg 2010 data set. The plot demonstrates the diversity-
bandwidth trade-o.
topically diverse and novel information either by placing themselves in structurally
diverse network positions or by linking to people with higher bandwidth who will
communicate with them more frequently.
We examine whether \diversity{bandwidth trade-o" exists on Digg. Digg
users can increase their \bandwidth" by linking to friends who vote for more sto-
ries. However, as friends' activity increases, it worsens the user's cognitive load, or
the volume of incoming information the user has to process. We divide users into
dierent populations based on the total volume of incoming information, which is,
on average, proportional to the number of active friends S
i
they have. Figure 6.3
shows the relationship between network diversity ND and average friend activity
133
(or channel bandwidth) B for each user, where users are broken into three pop-
ulations: those with more than 322 friends, between 131 and 322 active friends,
and 130 or fewer active friends. The thresholds were chosen to produce equal size
populations. The correlation between network diversity and average friend activ-
ity for the three populations are -0.54 (p<.01), -0.58 (p<.01) and -0.50 (p<.01)
respectively. Overall (over all populations of users), there is still a strong neg-
ative relationship (-0.47, p<.01) between network diversity ND
i
and bandwidth
B
i
, conrming the \diversity{bandwidth trade-o" [Aral and Van Alstyne, 2011]:
users who place themselves into positions of greater network diversity within the
Digg follower graph on average receive fewer story recommendations from friends
than users who place themselves into positions of smaller network diversity. For
2009 data set, we also divided users into three populations: those with more than
87 friends, between 26 and 87 active friends, and 25 or fewer active friends. The
correlation between network diversity and average friend activity are -0.54 (p<.01),
-0.59 (p<.01) and -0.03 (p<.01) respectively and over all populations of users in
2009 data set, the correlation is -0.13 (p<.01). The dierences mainly coming from
incomplete history about users' activities in 2009 data set, since it only contains
subset of users' behaviors on front page stories, while we have the complete users'
voting history in the Digg 2010 data set.
In both Digg 2009 and 2010 data set, users in the greater network diversity
within the follower graph on average receive fewer story recommendations from
friends than users who place themselves into positions of smaller network diversity.
We observed that users connected by strong ties are more active, recommending
more stories than those users who are connected by weak ties. Similarly, as users'
networks become more diverse, friends' activities contract.
134
0 1000 2000 3000 4000 5000 6000
0
0.5
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Friend Activity (B)
Topic Diversity (TD)
Network
Diversity
(ND)
(a)
0 1000 2000 3000 4000 5000 6000 0
0.5
1
0
0.5
1
1.5
2
2.5
3
x 10
4
Friend Activity (B)
Novel Information (NRI)
Network
Diversity
(ND)
(b)
Figure 6.4: (a) Topical diversity (TD) and (b) novelty (NRI) of information to
which Digg users are exposed as a function of their network diversity (ND) and
average friend activity (or channel bandwidth B) in the Digg 2010 data set.
Figure 6.4 shows how much (a) topically diverse information (TD) and (b) novel
information (NRI) Digg users are exposed to as a function of their position in the
network (network diversity ND) and friend activity (B). Users whose friends are
more active can access more novel information (NRI), whereas users in positions
of higher network diversity can access more topically diverse information (TD).
135
Digg 2009 NRI TD Digg 2010 NRI TD
B 0.04** -0.11** B 0.69** -0.83**
ND -0.09** 0.48** ND -0.15** 0.41**
Table 6.2: Pairwise correlations between variables in the Digg 2009 and 2010 data
sets. Note that asterisk (**) shows statistically signicant correlation with p<.01.
This is in contrast to the ndings of Aral & Van Alstyne, which demonstrated
that users could increase both the topic diversity and amount of non-redundant
(novel) information they are exposed to by increasing either their network diversity
or channel bandwidth. On Digg, on the other hand, users who place themselves in
position of high structural diversity can access more topically diverse information
(correlation between ND
i
and TD
i
was 0.41 (p<.01)), rather than more novel
information. There was a strong negative relationship (in Table 6.2) between B
i
andTD
i
that shows users in a strongly tied network have similar topical interests.
Based on detailed investigation, the strong negative relationship is not because of
friends' uniform preferences over a variety of topics but because of high similarities
between friends' topic preferences in highly clustered network. Intensifying friends'
activity, on the other hand, led to more novel information (correlation between B
i
and NRI
i
was 0.69 (p<.01)), but less topically diverse information. These results
demonstrate that activity is an important feature for accessing novel information
in online social networks, while structural diversity can be used to get access to
more topically diverse information.
Last, we examine the characteristics of users who inject new information into
their networks by voting for stories they found outside of their friends' recom-
mendations, e.g., on the Web or on other sections of Digg. Figure 6.5 shows the
network diversity vs friend activity plot with colored dots representing users who
introduce, or seed, new stories in their network. We divide these users into two
136
Figure 6.5: Amount of new information injected into the network by users (colored
symbols) in dierent positions of network diversity (ND) and friend activity (B).
Symbol size represents the relative number of seeded stories. Seeding users are
divided into classes based on the number of friends.
classes based on the number of friends they have. The size of the symbol represents
the relative number of seeded stories (dierence between the total votes made by
useri and those adopted through friends' recommendations). The x-axis is shown
in log-scale to highlight the dierences between classes of users. Users with many
friends (blue symbols) who are very active (high B) inject relatively more new
stories into their network than users with many, but less active friends. These
users are also in less structurally diverse positions, i.e., they are more strongly tied
to their friends. At rst, it seems counterintuitive that these users, who already
receive many recommendations, would take the time to look for new information.
These could be the dedicated top users, who consider it their responsibility to look
137
for new stories to post on Digg, or users who are so overwhelmed with the quan-
tity and redundancy of their friends' recommendations, that they choose to nd
new information on their own. Users with few friends (red symbols) also tend to
have less active friends (lower B). These users inject more information into their
network when their network diversity is high, or as shown earlier, their friends
have diverse interests. Such users cannot rely on their network to expose them
to interesting information; instead, they seek it out themselves by seeding new
stories.
6.1.3 Bottlenecks to Information Access
Why do users in positions of high network diversity receive less novel information
even though they are connected to more topically diverse friends? To answer
this question, we measured NRI
frds
i
, the total amount of novel information that
friends of user i are exposed to. Figure 6.6 (a) shows that this quantity depends
both on network diversity (ND) and friend activity (B). In most cases, friends are
collectively exposed to a large quantity of novel information (also demonstrated
by Figure 6.2 (a)), almost all of the 36,883 distinct stories in the Digg 2010 data
set. Although most of the users could potentially be exposed to nearly all of the
information in the network, in fact, as shown in Figure 6.2, they receive far less
novel information. We get some insight into this puzzle from Figure 6.6 (b), which
shows the friends' novel information adoption rate, i.e., the fraction of stories in
their stream friends voted for, as a function of the user's network position and
friend activity rates. Friends of users in positions of high network diversity fail
to adopt most of the novel information they are exposed to. However, users with
highly active friends (high channel bandwidthB region) are exposed to more novel
information because their friends adopt it at a higher rate.
138
0
1000
2000
3000
4000
5000
6000
0
0.5
1
0.5
1
1.5
2
2.5
3
3.5
4
x 10
4
Friend Activity (B)
Novel Info. Potential (NRI
frds
)
Network
Diversity
(ND)
(a)
0
1000
2000
3000
4000
5000
6000
0
0.5
1
0
0.2
0.4
0.6
0.8
1
Friend Activity (B)
Friend Novel Info. Adoption Rates
(FNAR)
Network
Diversity
(ND)
(b)
Figure 6.6: (a) Total amount of novel information that user's friends are exposed to
(NRI
frds
i
) as a function of user's network diversity (ND
i
) and friend activity (B
i
)
in the Digg 2010 data set. (b) Fraction of novel information adopted (FNAR
i
) by
friends (NRI
i
=NRI
frds
i
) as a function of network diversity and friend activity in
the Digg 2010 data set.
139
This could explain the dierence between our study and the ndings of Aral &
Van Alstyne. In their study, users could increase their access to diverse and non-
redundant (novel) information by increasing their network diversity or channel
bandwidth. In our study, however, users in positions of high friend activity (high
channel bandwidth) increase their access to novel information since their friends
adopt a large portion of the novel information that they themselves are exposed to.
Users in positions of high network diversity are exposed to more diverse informa-
tion, but since their friends have interests that are dierent from their own, they
do not adopt much of the information they are exposed to. In addition, in Aral &
Van Alstyne's study, novelty and diversity were not independent variables: non-
redundant information (novelty) was the product of topic diversity and channel
bandwidth. Hence, it may not be surprising that both were correlated highly with
network diversity and channel bandwidth. We treat these variables as independent
variables: while topic diversity of a network neighborhood is measured based on
the Digg topic assignments of the stories friends adopted, non-redundant, or novel,
information is measured simply by the number of distinct stories friends adopted.
Our study demonstrates that friends in structurally diverse positions and posi-
tions of high activities constrain user's information exposure in dierent ways.
Increasing friend activity aects novel information access, while increasing net-
work diversity provides access to more topically diverse friends, but not the other
way around.
Other factors that aect access to information are the cognitive constraints that
limit user's ability to process incoming information. These cognitive constraints
are similar to those that limit the number of stable social relationships a human
can maintain [Dunbar, 1992]. In the social media domain, they have been shown to
limit the number of conversations users engage in [Goncalves et al., 2011b] and limit
140
Figure 6.7: Number of stories adopted by the user as a function of the number of
active friends (S) in the Digg 2010 data set.
the spread of information [Hodas and Lerman, 2012]. We believe that cognitive
constraints also play a role in access to information on networks. As the volume
of incoming information increases, users compensate by increasing their activity
to cope with the greater amount of new information. This works, but up to a
point. Unfortunately, we cannot directly measure how much information users
process, e.g., how many stories they read, but on average this quantity should
be proportional to the number of recommended stories they vote for. We call
this quantity user activity uB
i
. Figure 6.7 shows user activity as a function of
the number of active friends the user has. While user activity initially increases
with the number of friends, it reaches a maximum around 200 friends and then
decreases somewhat. The cognitive constraints prevent users from keeping up
with the rising volume of new information friends expose him to. This, in turn,
141
limits the amount of new information to which they expose their followers. This
eect is more dramatic for users whose friends have topically diverse interests.
These friends tend to vote on fewer recommended stories, either because they lack
interest in processing recommended information, or because they are less willing to
devote the greater cognitive resources required to process this diverse information.
Understanding the complex interplay between cognitive constraints and network
structure is the subject of our ongoing research.
6.2 User Eort and Network Structure
In the previous section, we found that the diversity of information that social
media users are exposed to via friends depends on their position in the network.
Users embedded within a community of strongly tied individuals are likely to
share information on topics that other community members are interested in, while
users in brokerage positions that bridge dierent communities receive information
on more diverse topics. Users can increase their access to novel information by
adding more friends.
However, cognitive (and temporal) constraints limit an individual's capacity
to manage social interactions [Dunbar, 1992, Goncalves et al., 2011b, Miritello
et al., 2013] and process the information they receive [Weng et al., 2012, Hodas
and Lerman, 2012]. In addition, social media users vary greatly in the eort they
expend engaging with the site, leading to a large variation in user activity, as
measured by the number of messages posted on the site [Wilkinson, 2008]. The
impact of this variation on the information individuals receive and their position
in the network is not known. Do users who are able (or at least willing) to be more
active on the site receive more diverse information? Do they curate their social
142
links so as to move themselves into network positions that provide more diverse
information?
In this section, we use data from the microblogging site Twitter (Twitter 2012
and 2014 data sets) to study the interplay between network structure, the eort
Twitter users are willing to invest in engaging with the site, and the diversity of
information they receive from their contacts [Kang and Lerman, 2015a]. Previous
studies of the role of networks in individual's access to information were limited
in their ability to measure the diversity of information, using bag-of-words [Aral
and Van Alstyne, 2011] or predened categories [Kang and Lerman, 2013d] for this
task. We learn topics of interest to social media users from the messages they share
with their followers using the Vip (Section 4.2) model. We use learned topics to
measure the diversity of information users receive from their contacts. This enables
us to study the factors that aect the diversity of information in networks.
6.2.1 Data Sets (Twitter 2014)
Twitter 2014
The 2014 data set contains the tweets from 5600 initial seed users [Smith et al.,
2013] and their friends from Mar 2014 to Oct 2014. Starting with 5,600 initial
seed users, they collected all their friends and at least rst 200 tweets from their
time line. The data set includes 23.8 M tweets from 1.9M users with 17.8M social
network links.
6.2.2 Methods
We use the topics learned by the Vip (Section 4.1) model to study how informa-
tion is distributed in a network and what users can do to increase the diversity of
143
Var. Description
S
i
number of active friends
ND
i
network diversity
O
i
avg. vol. of outgoing info. (# tweets/day)
u
i
user-topic vector. (k-dimensional vector)
FTD
i
friend topic diversity
Table 6.3: Variables used in the study.
information they receive from their social media friends. In order to use the mes-
sages users posted, in addition to friends' messages they retweeted, we changed
the model by assigning visibility equal to one to each original message user posted.
Following [Aral and Van Alstyne, 2011, Aral and David, 2012] we dene a set
of variables we use to characterize users, their network position, and information
diversity.
Network size We dene the network size S
i
of user i as the number of friends
from whom useri received messages during a time period t, which we take to be
the data collection period. We only consider active friends, i.e., friends who posted
messages during t. Network size is dened as
S
i
=
X
l2N
frd
i
I(r
l
)
(6.6)
where N
frd
i
is the set of friends of user i and the indicator function I(r
l
) is one if
and only if friend l tweeted during the time period t and zero otherwise.
Network diversity User's position in a network signicantly impacts the diver-
sity of received information. Position can be characterized by its structural diver-
sity, which represents how many otherwise unconnected contacts user i has. We
144
measure structural diversity of a network position using local clustering coe-
cient [Watts and Strogatz, 1998], C
i
, which quanties how often user i's contacts
are linked (regardless of the direction of the link):
C
i
=
2jfe
jk
:j;k2N
frd
i
;e
jk
2Egj
S
i
(S
i
1)
(6.7)
The variable e
jk
= 1 if user j follows user k or vice versa; otherwise, e
jk
= 0.
The total number of possible connections among contacts is S
i
(S
i
1). High
clustering coecient implies low network diversity, and vice versa. Therefore, we
dene network diversity of user i as ND
i
= 1C
i
. Note that brokerage positions
have high network diversity, while individuals in tightly-knit communities have are
in positions with low network diversity.
User eort Most social media sites, including Twitter, display items from
friends as a chronologically ordered list, with the newest items at the top. A
user scans the list and if she nds an item interesting, she may share it with her
followers by retweeting it. She will continue scanning the list until she loses interest
or distracted [Hodas and Lerman, 2012]. It is dicult to quantify how much of the
list a user processes, since the site does not provide this information. Instead, we
use user activity as a heuristic for the eort users are willing (or able) to invest in
Twitter. We measure useri's activity by the average number of messages the user
tweets and retweets per day:
O
i
=
jr
i
j
t
(6.8)
wherejr
i
j is the number of tweets from user i.
Friend topic diversity We measure the diversity of information user i receives
from friends by the the variance of friends' topic interests: when most of friends
145
Table 6.4: Keywords associated with the top 10 topics of users in dierent positions
within the network. Users are divided into two populations based on their network
diversity (ND).
# Users in a Low ND Users in a High ND
1 lesson weight loss acoustic profession connect prole
lose motive guitar
ash gain webdesign bigdata update
2 pet dog animal adopt praise children parent surgery inch
cat rescue love mate relax anxiety obesity autism
3 read book review kindle united kingdom stadium
novel cover publish buddha arena holland yankees
4 good happy hope morn prosecute labour governor
birthday wish love like palestinian nationwide peru
5 yoga workout exercise jump ferguson pray brooklyn
doctor t body back diet documentary oakland
6 graphic japanese poetry art center science exhibit
manga cinema photo culture paper draw museum
7 oil kale gene napa sausage camera shoot timeline canon
wrap aspire coal trainer len accent timeline possess
8 children parent common worldcup shout football
journey ready pack escape soccer illinois player sold
9 home design studio site space mars nasa planner
interior built lawn layout newton isaac modern
10 beauty summer city park free win get email gift
resort nation beach island chance enter oer ticket
have distinct, non-overlapping, interests, topic diversity will be high, whereas when
most of friends have similar topic interests it will be low. We dene friend topic
diversity as the average pair-wise cosine distance of friends' topic interest vectors.
FTD
i
=
2
P
j2N
frd
i
P
k2N
frd
i
(1Cos(u
j
;u
k
))
S
i
(S
i
1)
(6.9)
6.2.3 Information and Network Structure
Information is not uniformly distributed in a network: users in brokerage posi-
tions are interested in systematically dierent topics than users within denser
146
communities. To study user-topic distribution, we rank users according to net-
work diversity (ND) and split them into two equal sized groups: high and low
network diversity. Table 6.4 compares the representative keywords of the top ten
topics from the topic proles of users in these two groups. Users in high network
diversity positions tend to be interested in more general topics, such as sports
(\worldcup", \yankees", \lad"), current events (\ferguson", \oakland"), business
(\profession", \big data"), health (\surgery", \obesity"), politics (\peru", \pales-
tinian"), arts (\art", \exhibit", \camera"), science (\science", \nasa", \space"),
promotion (\gift", \oer"), etc. According to sociological theory, users in such
brokerage positions spanning multiple unconnected communities are exposed to
diverse information [Burt, 1995]; therefore, it makes sense that the topics they
have in common are the more general topics. On the other hand, users in posi-
tions of low network diversity focused on more specialized topics, such as hobbies
(\guitar", \book", \yoga", \manga"), pets (\dog", \cat"), family (\birthday",
\children"), food (\oil", \kale"), vacation (\journey", \escape",\island"), home &
garden (\home", \interior").
6.2.4 Increasing Exposure to Diverse Information
How do users increase the amount of diverse information they receive in social
media? Do they follow more people to increase the volume of information received?
Or do they move themselves into special network positions? Perhaps users who
are willing and able to exert the eort needed to process information actually put
themselves into network positions where they receive more diverse information? To
examine how user eort aects information access, we split users into four classes
based on the average number of tweets they post daily (O). The top quartile
contains the most active users, who post more than 5.3 tweets per day, the second
147
10
0
10
1
10
2
10
3
10
4
0
0.05
0.1
0.15
0.2
0.25
S (number of friends)
FTD (friend topic diversity)
top quartile
2nd quartile
3rd quartile
bottom quartile
Figure 6.8: Diversity of received information as a function of user's network size.
Users are divided into four populations based on their eort: red circles repre-
sent the more active users, (who post more than 5.3 tweets per day on average),
green stars represent the 2nd quartile (3.1O
i
<5.3), black triangles represent 3rd
quartile (1.9 O
i
<3.1) and the blue squares represent that bottom quartile users
(who post fewer than 1.9 tweets per day on average). We discretize values into
equal-sized bins for each quartile.
quartile contains users who post from 3.1 to 5.3 tweets per day and the third and
the bottom quartile contains from 1.9 to 3.1 and fewer than 1.9 tweets per day
respectively.
Figure 6.8 shows the relationship between diversity of received information,
measured by friend topic diversity (FTD), and user's network size (S), for these
classes of Twitter users. The trends among these four classes of users are some-
what dierent, indicating that people use dierent strategies to access information
148
in network. Active users who expend more eort on Twitter (red circles in Fig-
ure 6.8) increase their exposure to diverse information by adding more friends
(0.1874, p<.01). However, when the bottom quartile users (blue squares in Fig-
ure 6.8) add friends, this actually decreases the diversity of information they are
exposed to until around 100 friends. After that point, information diversity slowly
increases. For the same network size, the less active users actually receive more
diverse information than the more active user until around 100 friends. Appar-
ently, network size itself cannot provide an access to diverse information (when
S > 100) since the network structure can vary signicantly.
Figure 6.9 shows the relationship between friend topic diversity (FTD
i
) and
structural network diversity (ND
i
) for the four classes of users divided according to
their eort. There is a strong correlation (0.9212 (p<.01)) for bottom quartile users
(blue squares in Figure 6.9), between network position and information diversity,
correlation values decrease with increasing user eort (3rd quartile 0.9162 (p<.01)
and 2nd quartile 0.7774 (p<.01)). When these users place themselves in more
structurally diverse position within the Twitter network, they receive on average
more topically diverse tweets from friends than users who place themselves in less
structurally diverse network positions. However, the correlation betweenFTD and
ND for active users (red circles in Figure 6.9) is far less, 0.3248 (p<.01). These
users are generally exposed to more diverse information than the less active users,
regardless of their network position. Also, active users in low network diversity
positions receive more diverse information than the less active users in similar
positions. These results demonstrate that the eort users are willing to invest in
using social media is an important factor in access to diverse information.
Why are highly active users exposed to more diverse information? To address
this question, we study how network diversity changes as users add more friends.
149
0 0.2 0.4 0.6 0.8 1
0.05
0.1
0.15
ND (Network Diversity)
FTD (friend topic diversity)
top quartile
2nd quartile
3rd quartile
bottom quartile
Figure 6.9: Friend topic diversity (FTD
i
) of a user as a function of the network
diversity (ND
i
) in the Twitter 2014 data set. We show the average of FTD
i
for
the same network diversity (ND
i
) users with their standard deviation ranges in
grey color. Users in the higher network diversity positions tend to be exposed
to more diverse information, with active users receiving more diverse information
regardless of their position in the network structure. We group ND values into
equal-sized bins and compute the mean of both ND and FTD within each bin.
Figure 6.10 shows this relationship for users separated into two classes based on
their activity or eort. Overall, network diversity increases with network size (after
around 100 friends), which is not surprising since probabilistically as the number
of people in a network grows, any two people are less likely to be connected to each
other. Active users overall place themselves in more structurally diverse positions.
Surprisingly, network diversity initially decreases with network size for both
user populations, reaching a minimum around S = 100. A potential explanation
of this eect involves the Dunbar number. Dunbar [Dunbar, 1992] argued that
150
10
0
10
1
10
2
10
3
10
4
10
5
0
0.2
0.4
0.6
0.8
1
S (number of friends)
ND (Network Diversity)
top 50% users
bottom 50% users
Figure 6.10: Network diversity (ND) as a function of the number of active friends
(S) in the Twitter 2014 data set. We use equal-sized bins for each class.
nite human cognitive capacity constrains the number of social interactions indi-
viduals can manage, limiting size of social groups to about 100{200 individual.
Research has validated the impact of cognitive constraints on online social interac-
tions [Goncalves et al., 2011a, Kang and Lerman, 2013d]. Similar arguments could
apply to our setting. Minimum network diversity corresponds to maximal social
connectivity, which in our Twitter data set occurs when users have around 100
friends. While their social networks can grow beyond that size, increasing network
diversity implies that new friends are less likely to form a community.
The minimum in network diversity for the less active users occurs at lower
values than for the more active users. This suggests that active users who invest
more eort into using Twitter can manage larger communities of connected friends
than the less-active users. This observation is in line with cognitive limits on social
151
0 0.2 0.4 0.6 0.8 1
0
5
10
15
20
25
30
35
40
45
ND (Network Diversity)
Frequency (percent)
top 50% users
bottom 50% users
Figure 6.11: Histograms of Network diversity (ND) of users in the Twitter 2014
data set. Users are divided into two populations based on their eort (O). The
peak of top 50% users is higher than bottom 50% users, while bottom 50% users
tend to have higher ND.
interactions theory: users who have a greater capacity for social interactions (or
who may simply be willing to invest more time and eort in social interactions)
will have more interactions on Twitter (higher activity), and they will also tend
to belong to larger social groups (higher network size), simply because they are
better capable of managing their social connections. At this time we cannot prove
this intriguing possibility, and leave it as a question for future research.
152
6.3 Conclusion
The idea that network structure aects the novelty and diversity of information
people receive from their social contacts has long fascinated sociologists [Granovet-
ter, 1973, Burt, 1995]. In social media, some of the theoretical arguments are
supported [Grabowicz et al., 2011, Centola and Macy, 2007, Centola, 2010]. We
studied the social news aggregator Digg to investigate the relationship between the
structure of the social graph, user activity, and information diversity. We showed
the amount of novel information a user is exposed to increases as she adds more
friends, but saturates quickly. Similarly, linking to friend who are more active
improves access to novel information, but as the redundancy increases, higher
friend activity can no longer increase the amount of novel information accessi-
ble to the user. In addition, we validated the \diversity{bandwidth trade-o"
in online social media. In two dierent data sets, users in positions of greater
network diversity in the follower graph on average receive fewer story recommen-
dations from friends than users who place themselves into positions of high friend
activity (high bandwidth). Users in positions of higher network diversity can access
more topically diverse information while users whose friends are more active can
access more novel information.
We analyzed the importance of users' activities and friends' activities in a social
network to demonstrate that user's eort in social media explains the diversity of
information people receive. We found that friends' activities act as an \informa-
tion bottleneck," blocking potential access of novel information to their followers.
Our work also demonstrates the importance of cognitive factors, such as a nite
cognitive capacity, humans also have a nite capacity to process information and
maintain social relations [Dunbar, 1992].
153
The interplay between network structure and cognitive constraints has impor-
tant implications for how people gain access to information in social networks in
general, and on social media in particular. In this chapter, we explored these ques-
tions using data from a popular social media platform Twitter, where users create
links in order to receive information, in the form of short text messages called
tweets, from other people.
One of the challenges we faced is measuring the diversity of information users
receive from their friends on Twitter. We addressed this challenge by using a
probabilistic model, Vip, to learn users' topics of interest from the messages they
receive and share on Twitter. We then used learned topics to measure diversity of
the information a user is exposed to as the variance of topic interests of the user's
friends.
By quantifying information diversity, we can study the factors that aect infor-
mation access in networks. We conrmed that network position plays an important
role: users can increase the amount of diverse information they receive by increas-
ing the structural diversity of their network position, rather than simply increas-
ing the number of people they follow. However, we also identied user eort as an
important factor mediating access to information in networks. Users who post (and
consume) more messages place themselves in positions of higher network diversity
than the less active users. Even when they are in structurally similar positions,
the more active users receive more diverse information. This suggests that users
who invest greater eort into using Twitter may have higher cognitive capacity for
processing information, or they may simply be able to devote more time to such
interactions [Miritello et al., 2013]. These users curate their links so as to increase
the diversity of information they receive. One mechanism for accomplishing this is
to break links so as to reduce the redundancy of received information. Even when
154
these actions do not change a user's structural position within the network, they
serve to increase information diversity. Our work underscores the importance of
cognitive factors and variation in eort in access to information in networks.
155
Chapter 7
Conclusions and Future
Directions
People use their social contacts to gain access to information in social net-
works [Granovetter, 1973, Burt, 2004], which they can then leverage for personal
advantage. However information in social networks is non-uniformly distributed,
leading sociologists to explore the relationship between an individual's network
position and the novelty and diversity of information she receives through her
social contacts. The ubiquity and the emergence of the social media services and
a large | and growing | amount of social data present new opportunities for
studying information access in social networks. In contrast to o-line, email or
phone interactions, where information is exchanged between a pair of social con-
tacts, social media users broadcast information to all their contacts. These changes
suggest that social media users can easily increase their access to novel and diverse
information by creating more links. However, cognitive constraints limit an indi-
vidual's capacity to manage social interactions [Dunbar, 1992, Goncalves et al.,
2011b, Miritello et al., 2013] and process the information they receive [Weng et al.,
2012, Hodas and Lerman, 2012]. So, it is important to understand how individu-
als access information, what factors need to be considered in modeling information
adoptions, and how individuals can increase their access to novel and diverse infor-
mation in social media.
156
The dissertation presents a combination of (a) analysis of important mecha-
nisms of individuals' information adoptions in social media and design of generative
probabilistic frameworks modeling social and cognitive factors, (b) empirical com-
parisons of various state-of-the-art algorithms and models on social media data,
(c) study the interplay between the network structure and individuals' activities
and eorts on access to novel and diverse information in social media.
7.1 Summary of Contributions
The three main research questions of this dissertation are to (a) propose generative
probabilistic frameworks and analyze the results of realistic models, (b) verify
scalable inference algorithms on social media data, (c) understand diverse and novel
information access in social media. The main contributions of this dissertation
focuses are the following.
Psycho-socially Motivated Predictive Models:
we closed the gap between behavioral and statistical recommendation models
by presenting limited attention models (LA-LDA andLA-CTR) and visibil-
ity, item tness, and personal relevance models (Vip).
we developed probabilistic approaches (LA-LDA andLA-CTR) to learn how
users divided their attention over their social contacts and topics from noisy
social data.
we proposedVip model to properly account for cognitive factors from inob-
servable user behaviors in social media.
157
we showed that models of user behavior that account for human cognitive
biases could more accurately predict whether users adopt items recommended
by friends than models ignoring visibility.
we showed that Vip also could predict the growth of information cascades
better than alternative models.
we showed that psycho-socially motivated model provided more in-depth
insight by providing dierent impacts of visibility, item tness, and personal
relevance on information diusion in social media.
Verication and Improvement of Algorithms on Large Volumes of Real-time,
Noisy, and Sparse Social Media Data:
we extended the social (LA-LDA) and cognitive (Vip) user behavior models
to take into account textual information of items to overcome the sparsity of
social media data and improved the explanatory power as well as prediction
accuracy.
we explored the feasibility of Stochastic Gradient Fisher Scoring (SGFS) for
social data mining.
we proposed hybrid distributed SGFS (hSGFS) and evaluated its perfor-
mance on a variety of social data sets.
we found that hSGFS was better able to predict held out items in data sets
that have a long-tailed distribution.
Empirical Study to Understand Novel and Diverse Information Access:
we investigated the relation between the structure of the social network, the
content of information, and the eort of users and their friends' activity.
158
we validated the existence of trade-o between network diversity and friends'
activity in online social networks.
we showed that increasing friends activity aects novel information access,
while increasing network diversity, which bridges otherwise disconnected
regions of the follower graph, provides access to more topically diverse friends,
but not the other way around.
we showed that a user can improve her information access by linking to
active users, though this becomes less eective as the redundancy increases,
higher friend activity can no longer increase the amount of novel information
accessible to the user.
we found that user eort is an important variable mediating access to infor-
mation in networks. Users who invest more eort into their activity on the
site not only place themselves in more structurally diverse positions within
the network than the less engaged users, but they also receive more diverse
information when located in similar network positions.
7.2 Vision for the Future Works
7.2.1 Recommending Diverse Information
Recommender systems [Herlocker et al., 1999, Sarwar et al., 2001, Karypis, 2000]
examine item ratings (or adoptions) of many people to discover their preferences
and recommend new items that were liked by similar people. Latent-factor models,
such as probabilistic matrix factorization [Salakhutdinov and Mnih, 2008b, Koren
et al., 2009, Wang and Blei, 2011], have shown promising in creating better rec-
ommendations by incorporating personal relevance into the model. Many social
159
recommender systems have been proposed by matrix factorization techniques for
both user's social network and their item rating histories [Ma et al., 2008]. In
addition to modeling user-item adoptions, researchers integrate social correlation
between users [Purushotham et al., 2012], topic in
uences of friends [Kang and
Lerman, 2013a], and cognitive biases of users [Kang and Lerman, 2015b] in social
recommender system.
Recommender or personalized systems often focus on understanding user pref-
erences based on the history of observed actions to recommend possible future likes
and interests. As a result, users may often experience isolation in their own self-
interest bubble, such as users may only see updates from specic friends or topics in
social media personalized news stream. It is also known as \lter bubble [Pariser,
2011]" which is a result of a personalized algorithm where users become isolated
from friends or information that disagrees with their viewpoints. Now, the key
challenge of recommender and personalized systems community is how to increase
the variety of recommended items without the expenses of sacricing the accuracy.
The trade-o between exploration and exploitation is important to prevent over-
specialization where we never recommend items outside of the history of user's
actions. Most of the current approaches focus on proposing new intra-list diversity
metrics [Ziegler et al., 2005, Agrawal et al., 2009] to diversify recommendations.
Our study shows that users increase activity to access diverse information. We
can estimate how much user opens to diverse information by taking into account
the engagement levels as well as the network diversity of the user. This line of
research will allow us to provide promising solutions that can practically resolve
this issue.
160
7.2.2 Scalable Approach for Real-time Analysis
One of our research questions focuses on studying scalable online methods to han-
dle large and growing social media data. Through my dissertation, we focus on
scalable inference algorithms for Bayesian probabilistic models and tested on multi-
thread computing environments. We plan to explore map-reduce computing frame-
works for large scale computing in the future. Furthermore, for real-time analysis,
we plan to simplify algorithms. We will develop realistic models and algorithms
to eciently learn the hidden parameters of the models from social media data.
Working with massive datasets of terabyte scale allows us to observe patterns and
characteristics of social media users' behaviors that we cannot observe from small
scales.
7.2.3 Understanding Cognitive Factors of Users in Access
to Information
We identied user eort as an important factor mediating access to information in
networks. Users who post (and consume) more messages place themselves in posi-
tions of higher information diversity than the less active users. Even when they
are in structurally similar positions, the more active users receive more diverse
information. This suggests that users who invest greater eort into using Twit-
ter may have higher cognitive capacity for processing information, or they may
simply be able to devote more time to such interactions [Miritello et al., 2013].
These users curate their links so as to increase the diversity of information they
receive. One mechanism for accomplishing this is to break links so as to reduce
the redundancy of received information. Even when these actions do not change
a user's structural position within the network, they serve to increase information
161
diversity. We plan to explore the evolutions of individuals' network structure to
access novel and diverse information in social media. Our work underscores the
importance of cognitive factors and variation in eort in access to information in
networks. Further work is needed to disentangle social media users' eort and
cognitive factors.
Our dissertation has presented theoretically grounded social media user behav-
ior models and information adoption theories that will be useful for many problems
in several areas | from computer science to social science, communication, and
cognitive science. Despite the many successes of recommendation systems, many
interesting problems and important questions remain. We believe that all contri-
butions in this dissertation would provide some inspiration for future works.
162
Reference List
Deepak Agarwal and Bee-Chung Chen. Regression-based latent factor models. In
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages
19{28, New York, NY, USA, 2009. ISBN 978-1-60558-495-9.
Deepak Agarwal and Bee-Chung Chen. fLDA: Matrix factorization through latent
dirichlet allocation. In ACM International Conference on Web Search and Data
Mining, 2010. ISBN 978-1-60558-889-6.
Vinti Agarwal and KK Bharadwaj. A collaborative ltering framework for friends
recommendation in social networks based on interaction intensity and adaptive
user similarity. Social Network Analysis and Mining, 3(3):359{379, 2013.
Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversi-
fying search results. In Proceedings of the Second ACM International Conference
on Web Search and Data Mining, pages 5{14. ACM, 2009.
S. Ahn, A. Korattikara, and M. Welling. Bayesian posterior sampling via stochastic
gradient sher scoring. The International Conference on Machine Learning,
2012.
T.J. Allen. Managing the
ow of technology: Technology transfer and the dissem-
ination of technological information within the r&d organization. MIT Press
Books, 1, 2003.
Christophe Andrieu, Nando De Freitas, and Arnaud Doucet. Sequential mcmc for
bayesian model selection. IEEE Signal Processing Workshop on Higher Order
Statistics, 1999.
S. Aral and V. David. The anatomy & dynamics of vision advantages. In 33rd
International Conference on Information Systems, 2012.
Sinan Aral and Marshall W. Van Alstyne. The Diversity-Bandwidth tradeo.
American Journal of Sociology, 117(1):90{171, January 2011.
163
Sinan Aral and Dylan Walker. Creating social contagion through viral product
design: A randomized trial of peer in
uence in networks. Management Science,
57(9):1623{1639, 2011.
Eytan Bakshy, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. Every-
one's an in
uencer: quantifying in
uence on twitter. In ACM International
Conference on Web Search and Data Mining, 2011.
Eytan Bakshy, Itamar Rosenn, Cameron Marlow, and Lada Adamic. The role of
social networks in information diusion. In Proceedings of the 21st international
conference on World Wide Web, pages 519{528. ACM, 2012.
Jonah A Berger and Katherine L Milkman. What makes online content viral?
Available at Social Science Research Network 1528077, 2009.
D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. The Journal of
Machine Learning Research, 3:993{1022, 2003.
Niels J. Blunch. Position bias in multiple-choice questions. Journal of Marketing
Research, 21(2):pp. 216{220, 1984.
Joseph K Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin. Paral-
lel coordinate descent for l1-regularized loss minimization. The International
Conference on Machine Learning, 2011.
Garrett Brown, Travis Howe, Micheal Ihbe, Atul Prakash, and Kevin Borders.
Social networks and context-aware spam. In Proceedings of the 2008 ACM con-
ference on Computer supported cooperative work, pages 403{412. ACM, 2008.
Ronald Burt. Structural Holes: The Social Structure of Competition. Harvard
University Press, Cambridge, MA, 1995.
Ronald S. Burt. Structural holes and good ideas. The American Journal of Soci-
ology, 110(2):349{399, 2004. ISSN 00029602.
Ronald S Burt. Brokerage and closure: An introduction to social capital. Oxford
University Press, 2005.
Georg Buscher, Edward Cutrell, and Meredith Ringel Morris. What do you see
when you're surng?: using eye tracking to predict salient regions of web pages.
In Special Interest Group on Computer-Human Interaction (SIG-CHI), pages
21{30. ACM, 2009.
D. Centola. The spread of behavior in an online social network experiment. science,
329(5996):1194{1197, 2010.
164
D. Centola and M. Macy. Complex contagions and the weakness of long ties1.
American Journal of Sociology, 113(3):702{734, 2007.
Meeyoung Cha, Hamed Haddadiy, Fabrcio Benevenutoz, and Krishna P. Gum-
madi. Measuring User In
uence in Twitter: The Million Follower Fallacy. In
International AAAI Conference on Web and Social Media, 2010.
Aaa Chaudhry, L Michael Glod e, Matt Gillman, and Robert S Miller. Trends
in twitter use by physicians at the american society of clinical oncology annual
meeting, 2010 and 2011. Journal of Oncology Practice, 8(3):173{178, 2012.
Jilin Chen, Werner Geyer, Casey Dugan, Michael Muller, and Ido Guy. Make
new friends, but keep the old: recommending people on social networking sites.
In Special Interest Group on Computer-Human Interaction (SIG-CHI), pages
201{210. ACM, 2009a.
Wen-Yen Chen, Jon-Chyuan Chu, Junyi Luan, Hongjie Bai, Yi Wang, and
Edward Y Chang. Collaborative ltering for orkut communities: discovery of
user latent behavior. In Proceedings of the 18th international conference on
World wide web, pages 681{690. ACM, 2009b.
Xu Cheng, Cameron Dale, and Jiangchuan Liu. Statistics and social network of
youtube videos. In IEEE/ACM International Symposium on Quality of Service,
pages 229{238. IEEE, 2008.
Nicholas A. Christakis and James H. Fowler. The Spread of Obesity in a Large
Social Network over 32 Years. New England Journal of Medicine, 357(4):370{
379, July 2007. ISSN 0028-4793. doi: 10.1056/NEJMsa066082.
Freddy Chong Tat Chua, Hady W. Lauw, and Ee-Peng Lim. Generative mod-
els for item adoptions using social correlation. IEEE Transactions on Knowl-
edge and Data Engineering, 99, 2012. ISSN 1041-4347. doi: http://doi.
ieeecomputersociety.org/10.1109/TKDE.2012.137.
J.S. Coleman. Social capital in the creation of human capital. American journal
of sociology, pages 95{120, 1988.
Scott Counts and Kristie Fisher. Taking it all in? visual attention in microblog
consumption. In International AAAI Conference on Web and Social Media,
2011.
David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, and Siddharth
Suri. Feedback eects between similarity and social in
uence in online com-
munities. In ACM SIGKDD Conference on Knowledge Discovery and Data
165
Mining, ACM SIGKDD Conference on Knowledge Discovery and Data Min-
ing, pages 160{168, New York, NY, 2008. ISBN 978-1-60558-193-4. doi:
10.1145/1401890.1401914.
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An experimental
comparison of click position-bias models. In ACM International Conference on
Web Search and Data Mining, 2008.
R. I. M. Dunbar. Neocortex size as a constraint on group size in primates. Journal
of Human Evolution, 22(6):469{493, June 1992.
Robin Dunbar. Evolution of the social brain. Science, 302(5648):1160{1161, 2003.
Scott L. Feld. The focused organization of social ties. The American Journal of
Sociology, 86(5):1015{1035, 1981.
Linton C Freeman. A set of measures of centrality based on betweenness. Sociom-
etry, pages 35{41, 1977.
Rainer Gemulla, Erik Nijkamp, Peter J Haas, and Yannis Sismanis. Large-scale
matrix factorization with distributed stochastic gradient descent. ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, 2011.
Rumi Ghosh and Kristina Lerman. Predicting in
uential users in online social
networks. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery
and Data Mining workshop on Social Network Analysis (SNA-KDD), 2010.
Rumi Ghosh, Kristina Lerman, Tawan Surachawala, Konstantin Voevodski, and
Shang-Hua Teng. Non-conservative diusion and its application to social net-
work analysis. Computing Research Repository, abs/1102.4639, 2011a.
Rumi Ghosh, Tawan Surachawala, and Kristina Lerman. Entropy-based classi-
cation of ?retweeting? activity on twitter. In Proceedings of ACM SIGKDD
Conference on Knowledge Discovery and Data Mining workshop on Social Net-
work Analysis (SNA-KDD), August 2011b.
Eric Gilbert and Karrie Karahalios. Predicting tie strength with social media. In
Special Interest Group on Computer-Human Interaction (SIG-CHI), 2009. ISBN
978-1-60558-246-7. doi: 10.1145/1518701.1518736.
M. Girolami and B. Calderhead. Riemann manifold langevin and hamiltonian
monte carlo methods. Journal of the Royal Statistical Society: Series B (Statis-
tical Methodology), 73(2):123{214, 2011.
Michael Goldhaber. The Attention Economy and the Net. First Monday, 2(4-7),
1997.
166
B. Goncalves, N. Perra, and A. Vespignani. Modeling users' activity on twitter
networks: validation of dunbar's number. PLoS One, 6(8):e22656, 2011a.
Bruno Goncalves, Nicola Perra, and Alessandro Vespignani. Validation of Dunbar's
number in Twitter conversations. arXiv.org, 2011b.
Przemyslaw A. Grabowicz, Jose J. Ramasco, Esteban Moro, Josep M. Pujol, and
Vctor M. Eguluz. Social features of online networks: the strength of weak ties
in online social media. Computing Research Repository, abs/1107.4009, 2011.
M.S. Granovetter. The Strength of Weak Ties. The American Journal of Sociology,
78(6):1360{1380, 1973.
Ido Guy and David Carmel. Social recommender systems. In Proceedings of the
20th international conference companion on World wide web, pages 283{284.
ACM, 2011.
J.L. Herlocker, J.A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework
for performing collaborative ltering. In ACM SIGIR conference on Research
and development in information retrieval, pages 230{237. ACM, 1999.
Nathan Hodas and Kristina Lerman. How limited visibility and divided attention
constrain social contagion. In SocialCom, 2012.
Nathan O. Hodas and Kristina Lerman. The simple rules of social contagion.
Scientic Reports, 4, 2014. doi: 10.1038/srep04343.
Tad Hogg and Kristina Lerman. Social dynamics of digg. EPJ Data Science, 1(5),
June 2012.
Tad Hogg, Kristina Lerman, and Laura M. Smith. Stochastic models predict user
behavior in social media. In ASE/IEEE International Conference on Social
Computing, 2013.
B A Huberman. Strong Regularities in World Wide Web Surng. Science, 280
(5360):95{97, April 1998.
Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. Crowdsourcing, atten-
tion and productivity. Journal of Information Science, 35(6):758{765, December
2009.
J.L. Iribarren and E. Moro. Anity paths and information diusion in social
networks. Social networks, 33(2):134{142, 2011.
Tom N Jagatic, Nathaniel A Johnson, Markus Jakobsson, and Filippo Menczer.
Social phishing. Communications of the ACM, 50(10):94{100, 2007.
167
Salman Jamali. Comment mining, popularity prediction, and social network anal-
ysis. PhD thesis, George Mason University, 2009.
Daniel Kahneman. Attention and eort. Prentice Hall, 1973.
Daniel Kahneman. Thinking, fast and slow. Macmillan, 2011.
Jeon-Hyung Kang and Kristina Lerman. Using lists to measure homophily on
twitter. In AAAI Conference on Articial Intelligence workshop on Intelligent
Techniques for Web Personalization and Recommendation, July 2012a.
Jeon-Hyung Kang and Kristina Lerman. Using lists to measure homophily on
twitter. In Workshops at the Twenty-Sixth AAAI Conference on Articial Intel-
ligence, 2012b.
Jeon-Hyung Kang and Kristina Lerman. LA-CTR: A limited attention collabora-
tive topic regression for social media. AAAI Conference on Articial Intelligence,
2013a.
Jeon-Hyung Kang and Kristina Lerman. Structural and cognitive bottlenecks to
information access in social networks. In Proceedings of 24th ACM Conference
on Hypertext and Social Media (HyperText), 2013b.
Jeon-Hyung Kang and Kristina Lerman. Scalable mining of social data using
stochastic gradient sher scoring. In Proceedings of the 2103 workshop on Data-
driven user behavioral modeling and mining from social media, pages 21{24.
ACM, 2013c.
Jeon-Hyung Kang and Kristina Lerman. Structural and cognitive bottlenecks to
information access in social networks. In The ACM Hypertext and Social Media
conference, 2013d.
Jeon-Hyung Kang and Kristina Lerman. User eort and network structure mediate
access to information in networks. In Proceedings of the International AAAI
Conference on Web and Social Media, 2015a.
Jeon-Hyung Kang and Kristina Lerman. VIP: Incorporating human cognitive
biases in a probabilistic model of retweeting. In International Conference on
Social Computing, Behavioral-Cultural Modeling, and Prediction (SBP), 2015b.
Jeon-Hyung Kang, Kristina Lerman, and Lise Getoor. LA-LDA: A limited atten-
tion model for social recommendation. In International Conference on Social
Computing, Behavioral-Cultural Modeling, and Prediction (SBP), 2013.
G. Karypis. Evaluation of item-based top-n recommendation algorithms. Technical
report, DTIC Document, 2000.
168
Maksim Kitsak, Lazaros K Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik,
H Eugene Stanley, and Hern an A Makse. Identication of in
uential spreaders
in complex networks. Nature Physics, 6(11):888{893, 2010.
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recom-
mender systems. Computer, 42(8):30{37, 2009.
Yehuda Koren, Stephen C. North, and Chris Volinsky. Measuring and extracting
proximity graphs in networks. ACM Trans. Knowl. Discov. Data, 1(3), Decem-
ber 2007. ISSN 1556-4681. doi: 10.1145/1297332.1297336.
Gueorgi Kossinets and Duncan J. Watts. Origins of Homophily in an Evolv-
ing Social Network. The American Journal of Sociology, 115(2), 2009. ISSN
00029602. doi: 10.2307/20616117.
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter,
a social network or a news media? In Proceedings of the 19th international
conference on World wide web, pages 591{600. ACM, 2010.
Paul F Lazarsfeld and Elihu Katz. Personal in
uence: the part played by people
in the
ow of mass communications. Glencoe, Illinois, 1955.
Janette Lehmann, Bruno Gon calves, Jos e J Ramasco, and Ciro Cattuto. Dynam-
ical classes of collective attention in twitter. In Proceedings of the 21st interna-
tional conference on World Wide Web, pages 251{260. ACM, 2012.
Kristina Lerman and Rumi Ghosh. Information contagion: An empirical study of
the spread of news on digg and twitter social networks. In International AAAI
Conference on Web and Social Media, 2010.
Kristina Lerman and Tad Hogg. Using stochastic models to describe and predict
social dynamics of web users. ACM Transactions on Intelligent Systems and
Technology, 3(4), September 2012.
Kristina Lerman and Tad Hogg. Leveraging position bias to improve peer recom-
mendation. PLoS One, 9(6):e98914, 2014.
Kristina Lerman, Suradej Intagorn, Jeon-Hyung Kang, and Rumi Ghosh. Using
proximity to predict activity in social networks. In International World Wide
Web Conference (WWW), 2012.
Jure Leskovec and Eric Horvitz. Planetary-scale views on a large instant-messaging
network. In Proceeding of the 17th international conference on World Wide Web,
pages 915{924. ACM, 2008. ISBN 978-1-60558-085-2.
169
Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynam-
ics of the news cycle. In ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, pages 497{506. ACM, 2009.
David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social
networks. J. Am. Soc. Inf. Sci., 58(7):1019{1031, 2007. doi: 10.1002/asi.20591.
Linyuan L u and Tao Zhou. Link prediction in complex networks: A survey. Physica
A: Statistical Mechanics and its Applications, December 2010. ISSN 03784371.
doi: 10.1016/j.physa.2010.11.027.
H. Ma, H. Yang, M.R. Lyu, and I. King. Sorec: social recommendation using
probabilistic matrix factorization. In Proceedings of the 17th ACM conference
on Information and knowledge management, pages 931{940. ACM, 2008.
Ian Mackinnon. Age and geographic inferences of the livejournal social network.
In Statistical Network Analysis Workshop, 2006.
M. McPherson, L. Smith-Lovin, and J.M. Cook. Birds of a feather: Homophily in
social networks. Annual review of sociology, pages 415{444, 2001.
David Mimno, Matt Homan, and David Blei. Sparse stochastic inference for
latent dirichlet allocation. The International Conference on Machine Learning,
2012.
Giovanna Miritello, Esteban Moro, Rub en Lara, Roc o Mart nez-L opez, John
Belchamber, Sam G. B. Roberts, and Robin I. M. Dunbar. Time as a limited
resource: Communication strategy in mobile phone networks. Social Networks,
35(1):89{95, January 2013. ISSN 03788733. doi: 10.1016/j.socnet.2013.01.003.
R.M. Neal. Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte
Carlo, 54:113{162, 2010.
J.P. Onnela, J. Saram aki, J. Hyv onen, G. Szab o, D. Lazer, K. Kaski, J. Kert esz,
and A.L. Barab asi. Structure and tie strengths in mobile communication net-
works. Proceedings of the National Academy of Sciences, 104(18):7332{7336,
2007.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank
citation ranking: Bringing order to the web. 1999.
Eli Pariser. The lter bubble: What the Internet is hiding from you. Penguin UK,
2011.
Stanley L. Payne. The Art of Asking Questions. Princeton University Press, 1951.
170
Anon Plangprasopchok and Kristina Lerman. Modeling social annotation: a
bayesian approach. ACM Transactions on Knowledge Discovery from Data, 5
(1):4, 2010.
S. Purushotham, Y. Liu, and C.C.J. Kuo. Collaborative topic regression
with social matrix factorization for recommendation systems. arXiv preprint
arXiv:1206.4684, 2012.
R. Reagans and B. McEvily. Network structure and knowledge transfer: The eects
of cohesion and range. Administrative Science Quarterly, 48(2):240{267, 2003.
R. Reagans and E.W. Zuckerman. Networks, diversity, and productivity: The
social capital of corporate r&d teams. Organization science, 12(4):502{517, 2001.
R.A. Rensink, J.K. O'Regan, and J.J. Clark. To see or not to see: The need for
attention to perceive changes in scenes. Psychological Science, 8(5):368, 1997.
H. Robbins and S. Monro. A stochastic approximation method. The Annals of
Mathematical Statistics, pages 400{407, 1951.
Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman.
In
uence and passivity in social media. In International World Wide Web Con-
ference (WWW), 2011a. ISBN 978-1-4503-0637-9.
Daniel M. Romero, Brendan Meeder, and Jon Kleinberg. Dierences in the
mechanics of information diusion across topics: Idioms, political hashtags, and
complex contagion on twitter. In International World Wide Web Conference
(WWW), 2011b.
R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using
markov chain monte carlo. In Proceedings of the 25th international conference
on Machine learning, pages 880{887. ACM, 2008a.
R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. Advances in
neural information processing systems, 20:1257{1264, 2008b.
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Itembased
collaborative ltering recommendation algorithms. In International World Wide
Web Conference (WWW), 2001.
Hossam Sharara, William Rand, and Lise Getoor. Dierential adaptive diusion:
Understanding diversity and learning whom to trust in viral marketing. In
International AAAI Conference on Web and Social Media, 2011.
171
B orkur Sigurbj ornsson and Roelof Van Zwol. Flickr tag recommendation based
on collective knowledge. In Proceedings of the 17th international conference on
World Wide Web, pages 327{336. ACM, 2008.
Laura M. Smith, Linhong Zhu, Kristina Lerman, and Zornitsa Kozareva. The
role of social media in the discussion of controversial topics. In ASE/IEEE
International Conference on Social Computing, 2013.
Bongwon Suh, Lichan Hong, Peter Pirolli, and Ed H Chi. Want to be retweeted?
large scale analytics on factors impacting retweet in twitter network. In Social
Computing (SocialCom), 2010 IEEE Second International Conference on, pages
177{184. IEEE, 2010.
Gabor Szabo and Bernardo A Huberman. Predicting the popularity of online
content. Communications of the ACM, 53(8):80{88, 2010.
Michael Trusov, Anand V. Bodapati, and Randolph E. Bucklin. Determining
In
uential Users in Internet Social Networks. Journal of Marketing Research,
XLVII:643{658, 2010.
B. Uzzi. The sources and consequences of embeddedness for the economic perfor-
mance of organizations: The network eect. American sociological review, pages
674{698, 1996.
B. Uzzi. Social structure and competition in interrm networks: The paradox of
embeddedness. Administrative science quarterly, pages 35{67, 1997.
Greg Ver Steeg and Aram Galstyan. Information transfer in social media. In
International World Wide Web Conference (WWW), 2012.
Chong Wang and David M. Blei. Collaborative topic modeling for recommending
scientic articles. In ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2011.
Dashun Wang, Chaoming Song, and Albert-L aszl o Barab asi. Quantifying long-
term scientic impact. arXiv preprint arXiv:1306.3293, 2013.
D. Watts and S. Strogatz. The small world problem. Collective Dynamics of
Small-World Networks, 393:440{442, 1998.
M. Welling and Y.W. Teh. Bayesian learning via stochastic gradient langevin
dynamics. The International Conference on Machine Learning, 2011.
L Weng, A Flammini, A Vespignani, and F Menczer. Competition among memes
in a world with limited attention. Scientic Reports, 2, 2012.
172
Lilian Weng, Filippo Menczer, and Yong-Yeol Ahn. Virality prediction and com-
munity structure in social networks. arXiv preprint arXiv:1306.0158, 2013.
Dennis M. Wilkinson. Strong regularities in online peer production. In EC '08:
Proceedings of the 9th ACM conference on Electronic commerce, pages 302{309,
New York, NY, USA, 2008.
Fang Wu and Bernardo A. Huberman. Novelty and collective attention. Proc. the
National Academy of Sciences, 104(45):17599{17601, November 2007.
Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng,
and Hongyuan Zha. Like like alike: joint friendship and interest propagation in
social networks. In Proceedings of the 20th international conference on World
wide web, pages 537{546. ACM, 2011.
Kai Yu, John Laerty, Shenghuo Zhu, and Yihong Gong. Large-scale collaborative
prediction using a nonparametric random eects model. In The International
Conference on Machine Learning, pages 1185{1192, 2009. ISBN 978-1-60558-
516-1. doi: 10.1145/1553374.1553525.
Jichang Zhao, Junjie Wu, and Ke Xu. Weak ties: A subtle role in the infor-
mation diusion of online social networks. Computing Research Repository,
abs/1001.3181, 2010.
Tao Zhou, Linyuan L u, and Yi-Cheng Zhang. Predicting missing links via local
information. The European Physical Journal B - Condensed Matter and Complex
Systems, 71(4):623{630, October 2009. ISSN 1434-6028. doi: 10.1140/epjb/
e2009-00335-8.
Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen.
Improving recommendation lists through topic diversication. In Proceedings
of the 14th international conference on World Wide Web, pages 22{32. ACM,
2005.
Martin Zinkevich, Markus Weimer, Alex Smola, and Lihong Li. Parallelized
stochastic gradient descent. Conference on Neural Information Processing Sys-
tems (NIPS), 23(23):1{9, 2010.
Aaron Zinman and Judith S Donath. Is britney spears spam? In Conference on
Email and Anti-spam, 2007.
173
Appendix
A.1 Derivation of Gibbs Sampling Formula forLA-
LDA
The total probability of the model is:
P (A;X;Y;Z;; ;;;;;;) =
Y
j
P (
j
;)
Y
x;z
P (
x;z
;)
Y
i
P (
i
;)
Y
x
P (
(i;x)
;)
Y
j2A
i
P (A
(i;j)
j
Z
(i;j)
;X
(i;j)
)P (X
(i;j)
j
i
)P (Z
(i;j)
j
j
)P (Y
(i;j)
j
(i;X
(i;j)
)
)
(1)
whereA
i
is a list of adopted item by user i,
j
is topic distribution of item j, and
Z
(i;j)
is topic assignment for the itemj adopted by useri. Since all the,, and
are independent given item adoption, we can treat them separately. Here, for
174
simplicity reason, we use symmetric Dirichlet priors. These priors are conjugate
to , , and and it allows us to compute the joint distribution.
P (A;X;Y;Z;;;;) =
Z
Z
Z
Z
P (A;X;Y;Z;; ;;;;;;) =
(N
x
)
()
Nx
N
i
(N
friend(i)
)
()
N
friend(i)
N
i
Nx
(N
j
)
()
N
j
NzNx
(N
z
)
()
Nz
N
j
Z
Z
Z
Z
Y
i;x
(n
x
i
+1)
i;x
Y
j;z
(n
z
j
+1)
j;z
Y
i;x;y
(n
y
(i;x)
+1)
i;x
Y
x;z;j2A(i)
(n
x;z
j
+1)
x;z
dd dd
(2)
By integrating out , and and rearranging, we get:
P (A;X;Y;Z;;;;) =
Y
j
(N
z
)
()
Nz
Q
z
(n
z
r
+)
(N
j
+N
z
)
Y
i
(N
x
)
()
Nx
Q
x
(n
x
i
+)
(n
i
+N
x
)
Y
i;x
(N
friends(i)
)
()
N
friends(i)
Q
y
(n
y
i;x
+)
(n
i;x
+N
friends(i)
)
Y
i;x;z
(N
j
)
()
N
j
Q
j2A(i)
(n
x;z
j
+)
(n
x;z
+N
j
)
!
(3)
Computing posterior distribution is intractable because of the summation in
the denominator, we apply Markov chain Monte Carlo sampling to approxi-
mate conditional distribution of P(X,ZjA,Y; ;;;). Since P(A,Y; ;;;)
175
is invariable for any of X, or Z, we can derive Gibbs Sampling equation from
P(A,X,Y,Z;;;;).
P (Z
(i;j)
=zjA;X;Y;Z
(i;j)
;;;;)
/P (Z
(i;j)
=z;A;X;Y;Z
(i;j)
;;;;) =
(N
z
)
()
Nz
N
j
Q
z
(n
z
(i;j)
+)
(n
()
(i;j)
+N
z
)
Y
(i;j)
Q
z
(n
z
(i;j)
+)
(n
()
(i;j)
+N
z
)
!
Y
i
(N
x
)
()
Nx
Q
x
(n
x
(i;)
+)
(n
()
(i;)
+N
x
)
!
(N
i
)
()
N
i
NxNz
Y
i;x
(N
friends(i)
)
()
N
friends(i)
Q
y
(n
y
(i;x)
+)
(n
()
(i;x)
+N
friends(i)
)
!
Y
x;z
Y
(i;j)
(n
x;z
i
+)
Y
x;z
(n
x;z
(i;j)
+)
(n
x;
(i;j)
+N
i
)
!
(4)
where n
z
(i;j)
represent the number of counts topic z assigned on item j. Since we
do not need to compute exact probability, we use the ratios among the valuesZ
(i;j)
176
can take. We can further simplify Eq.(4) by ignoring constant values which is not
dependent on Z
(i;j)
.
/
Q
z
(n
z
(i;j)
+)
(n
()
(i;j)
+N
z
)
Y
x;z
(n
x;z
(i;j)
+)
(n
x;
(i;j)
+N
i
)
!
/
Q
z
(n
z
(i;j)
+)
(n
()
(i;j)
+N
z
+ 1)
Y
x;z
(n
x;z
(i;j)
+)
(n
x;
(i;j)
+N
i
)
!
(n
z
(i;j)
+ + 1)
(n
x;z
(i;j)
+ + 1)
(n
x;
(i;j)
+N
i
+ 1)
/
(n
z
(i;j)
+ + 1)
(n
()
(i;j)
+N
z
+ 1)
(n
x;z
(i;j)
+ + 1)
(n
x;
(i;j)
+N
i
+ 1)
=
(n
z
(i;j)
+)(n
z
(i;j)
+)
(n
()
(i;j)
+N
z
)(n
()
(i;j)
+N
z
)
(n
x;z
(i;j)
+)(n
x;z
(i;j)
+)
(n
x;
(i;j)
+N
i
)(n
x;
(i;j)
+N
i
)
/
n
z
(i;j)
+
n
()
(i;j)
+N
z
n
x;z
(i;j)
+
n
x;
(i;j)
+N
i
(5)
n
(i;j)
z
is the number of counts topic z is assigned on item j excluding the current
assignment of Z
(i;j)
. n
(i;j)
x;z
is the number of counts of any items under the topic
assignment ofz and interest assignment ofx excluding the current topic assignment
for item j adopted by user i. Similarly, we can obtain the posterior probability of
X
(i;j)
.
P (Z
(i;j)
=zjZ
(i;j)
;X;Y;A)/
n
z
(i;j)
+
n
()
(i;j)
+N
z
n
x;z
(i;j)
+
n
x;
(i;j)
+N
i
P (X
(i;j)
=xjX
(i;j)
;Y;Z;A)/
n
x
(i;j)
+
n
()
(i;j)
+N
x
n
y
(i;j)
+
n
()
(i;j)
+N
(friends(i))
n
x;z
(i;j)
+
n
;z
(i;j)
+N
i
(6)
177
A.2 Derivation of Optimization Formula forLA-
CTR
By integrating out and Z, the total probability of the model is:
P (;U;S;V;jR) =P (jS;U)P (Rj;V )P (V )P (S)P (U)
(7)
We dene each variable as follows:
il
N(s
il
^ u
i
;
1
I
K
)
S
il
N(0;
1
S
I
K
)
u
i
N(0;
1
U
I
K
)
P (RjV;) =
N
Y
i
D
Y
j
A(i)
Y
l
N
r
ijl
jg
r
(
T
il
v
j
);
2
R
1
R
ijl
(8)
where N is the number of user, D is the number of documents, and A(i) is the
number of attention friends of user i. We dene ^ u
i
as a diagonal matrix by taking
outer product of K-dim identity matrix ( ^ u
i
= u
i
I
k
).
178
L =
u
2
N
X
i
u
T
i
u
i
v
2
D
X
j
(v
j
j
)
T
(v
j
j
)
+
D
X
j
W (j)
X
t
log(
K
X
k
jk
k;w
jt
)
N
X
i
D
X
j
A(i)
X
l
c
r
ijl
2
(r
ijl
g
r
(
T
il
v
j
))
2
S
2
N
X
i
A(i)
X
l
s
il
T
s
il
2
N
X
i
A(i)
X
l
c
il
(
il
g
(s
il
^ u
i
))
2
(9)
dL
du
= 0;
dL
dv
= 0;
dL
dS
= 0;
dL
d
= 0
u
i
u
I
k
+
S
i
C
i
S
i
T
1
S
i
C
i
~
T
i
v
j
v
I
k
+
^
C
r
j
^
T
1
^
C
r
j
R
j
+
v
j
S
il
s
I
K
+
^ u
i
C
il
^ u
i
T
1
C
il
^ u
i
il
il
C
il
I
K
+VC
r
il
V
T
1
VC
r
il
R
il
+
C
il
^ u
i
S
il
(10)
,where
~
i
=
P
K
k=1
ik
.
^
is K by NA andR
j
is vectors with 0 or 1 value for pair
of user and friend.
A.3 Synthetic Cascades Data
Our synthetic data generator uses a follower graph from Digg to specify the social
network component and it is available online
1
. We used social network links among
1
http://www.isi.edu/lerman/downloads/digg2009.html
179
top 5,000 most active users in 2009 dataset, who are followed by in average 81.8
other users (max 984 and medium 11 users). We begin generating synthetic data
by creating N
i
items according to theLA-LDA item topic model and N
u
users
according theLA-LDA user interest model.
We model the propagation of items through the social network over a period
ofN
day
days. We rst choose a set of seeders (S%) fromN
u
users. Seeders will be
able to introduce new items into the network. We introduce a special source node,
which has adopted all of the generated items. Seeders will have the source node
as one of their friends.
Every user u is assigned a xed attention budget V
u
, which determines the
total number of items from friends that u can attend to in a day. For simplicity,
we represent V
u
as a function of a global attention limit parameter, v
g
and the
number of friends user has. This is motivated by the observation that, at least on
Digg, user activity (number of votes) is correlated with the number of friends they
follow (the correlation coecient is 0.1626{0.1701 in our datasets). Intuitively,
the number of items a user adopts (e.g., the number of stories a user votes for) is
some fraction of the number of stories to which a user attends (e.g., the number
of stories a user views); here, to simplify matters, we assume that user's attention
budget is simply proportional to the number of friends she follows.
Synthetic cascades are generated as follows. Each day, every user within her
allotted attention budget, will check to see whether her friends have any items
that match her interests. Initially, when the cascade starts, the source node is the
only friend, which has items, so only seed nodes will be able to adopt and share
items. However, as time progresses, and items begin
owing through the network,
eventually users will exhaust their attention budget, without being able to attend
to all the items that their friends shared with them. When users choose to attend
180
an itemi that has been shared by a friendy, they choose without replacement, so
that an item will only be attended to once from a particular friendy. However, we
do allow a user to attend the same item from dierent friends. Once an item has
been chosen to attend to, the user will adopt (and share) the item with probability
x;z
. In our model, we assume users share all of their adoptions with their friends,
e.g. that adoptions are broadcast. As mentioned earlier, more nuanced models
could also be supported.
function Generate Synthetic Data
for day = 1!N
day
do
for u = 1!N
u
do
for attention = 1!V
u
do
choose interest xMult(
(u)
)
choose friend yMult(
(u;x)
)
choose a new item i from y randomly
from the items that y has shared
choose topic zMult(
(i)
)
Adopt and share item with probability
(x;z)
end for
end for
end for
end function
By varying parameters (S andv
g
) and hyperparameters (,,, and) we can
create dierent synthetic datasets and study how these parameters aect diusion.
181
Table A.1: Synthetic dataset characteristics for dierent parameter values.
(a) Friends ()
= 0.05, S = 30% 0.001 0.01 0.1 0.5 1.0
Min Adoptions 41 41 42.2 42 44.6
Max Adoptions 656.8 1485.2 1298.5 1488.2 1590.2
Mean Adoptions 104.62 129.6 140.9 142.9 144.7
Median Adoptions 94.2 107.1 115 117.2 118
(b) Interests ()
= 0.05, S = 30% 0.001 0.01 0.1 0.5 1.0
Min Adoptions 42.3 42.5 42.3 47 40.4
Max Adoptions 1093.3 1042.7 1413.7 1684 1749
Mean Adoptions 128.7 132.8 139.8 141.5 142.2
Median Adoptions 109 110 115 116.7 118
(c) Seeders (S)
= 0.1, = 0.1 10% 30% 50% 70% 90%
Max Diam. (>1) 16.6 34.9 28.4 28.1 23.5
Mean Diam. (>1) 3.4 4.2 3.8 3.9 3.6
Median Diam. (>1) 2 2.15 2.14 2.17 2.05
Min Adoptions 11.9 27.25 43 58 83
Max Adoptions 943 1589.8 1385.2 1664.2 1729.3
Mean Adoptions 45.6 97 142.5 183.4 229.2
Median Adoptions 35.6 77.1 116.7 154.2 196.6
The values we used in this experiments are v
g
=2, =0.1 and =0.1 and we
vary the three parameters: , , and S. Each synthetic dataset consists of 1000
items, 5000 users, 50 topics, and 8 interests.
Table A.1 shows the minimum, maximum, mean and median number of item
adoptions for specied parameter values. First, we evaluated =0.001, 0.01, 0.1,
0.5 and 1.0 to vary the number of limited attention friends, while we xed =0.05
and S=30% (percentage of seeders (S%) from N
u
users). Large values of will
let users pay attention to their friends uniformly, while small values of will make
182
users pay more attention to a smaller number of friends. As a result, for large
values of , the total number of users' adoptions is also increasing.
Next, we varied =0.001, 0.01, 0.1, 0.5 and 1.0 to control the uniformity of
limited attention on interests, while we xed =0.05 and S=30%. For large
values, users pay attention to interests uniformly, while for small values, users
pay attention to only few interests. In table A.1(b), as we increase the variation
in users interests, the number of adoptions also increases. We also varied the
percentage of seeders S who introduce new items by choosing 10%, 30%, 50%,
70%, and 90% ofN
u
users and we x and to 0.1. As we increase the number of
seeders, not surprisingly, the number of votes also increases. However, the average
depth of cascades decreases, since users are exposed to the same items via multiple
paths.
183
Abstract (if available)
Abstract
The spread of information in an online social network is a complex process that depends on the nature of information, the structure of the network, and the behavior of social media users. Understanding this process will allow us to forecast information diffusion for an early detection of trending topics and to mitigate the problem of information overload by filtering out irrelevant information. Probabilistic models can be used to learn users' preferences from their historical information adoption behaviors, and in turn, recommend new relevant items or predict how far a given item of information will spread. However, current models ignore social and cognitive factors that shape user behavior. One such factor is attention, the cognitive mechanism that integrates perceptual features to select the items the user will consciously process. Research suggests that people have limited attention, which they divide non‐uniformly over all incoming messages from their social contacts. We propose a collaborative topic regression model that learns which of their social contacts users pay attention to, and use it to analyze user decisions to spread items recommended by their online friends. ❧ Another consequence of limited attention is that people attend more to items near the top of their message stream than items lower down, which take more effort to discover. We use visibility to capture the effects of limited attention. Visibility of an item depends on its position in a message stream, and is determined by a number of factors, including the number of new messages arriving in the user's stream and the frequency the user visits the site. We propose a probabilistic model that accounts for users' limited attention in their information adoption behavior. The model incorporates user's interests, and the popularity and visibility of items to the user. We use the model to study information spread on a popular social media site. By accounting for the visibility of items, we can learn a better, more predictive model of user interests and item topics. This work shows models of user behavior that account for cognitive factors can better describe and predict individual and collective behavior in social media. ❧ Another central topic of my dissertation is understanding how users can increase the access to diverse and novel information in online social networks. Social scientists have developed influential theories about the role of network structure in information access. However, previous studies of the role of networks in information access were limited in their ability to measure the diversity of information. Furthermore, It is not clear how these theories generalize to online networks, which differ from real‐world social networks in important respects, including asymmetry of social links and the limited capacity to manage huge volume of information. We study the interplay between network structure, the effort Twitter users are willing to invest in engaging with the site, and the diversity of information they receive from their contacts. We address this problem by learning the topics of interest to social media users by applying proposed models on messages they share with their followers. We confirm that users in structurally diverse network positions, which bridge otherwise disconnected regions of the follower graph, are exposed to more diverse information. In addition, we identify user effort as an important variable that mediates access to diverse information in social media. These findings indicate that the relationship between network structure and access to information in networks is more nuanced than previously thought.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
Learning distributed representations from network data and human navigation
PDF
Disentangling the network: understanding the interplay of topology and dynamics in network analysis
PDF
Understanding diffusion process: inference and theory
PDF
Modeling information operations and diffusion on social media networks
PDF
Measuing and mitigating exposure bias in online social networks
PDF
Socially-informed content analysis of online human behavior
PDF
Predicting and modeling human behavioral changes using digital traces
PDF
Heterogeneous graphs versus multimodal content: modeling, mining, and analysis of social network data
PDF
3D deep learning for perception and modeling
PDF
Diffusion network inference and analysis for disinformation mitigation
PDF
Tag based search and recommendation in social media
PDF
Social selection and influence of alcohol & marijuana implicit cognitions and behaviors: a longitudinal investigation of peer social network dynamics
PDF
Artificial intelligence for low resource communities: Influence maximization in an uncertain world
PDF
Bridging the visual reasoning gaps in multi-modal models
PDF
Identifying Social Roles in Online Contentious Discussions
PDF
Global consequences of local information biases in complex networks
PDF
Probabilistic framework for mining knowledge from georeferenced social annotation
PDF
Towards social virtual listeners: computational models of human nonverbal behaviors
PDF
A structural econometric analysis of network and social interaction models
Asset Metadata
Creator
Kang, Jeon-Hyung
(author)
Core Title
Modeling social and cognitive aspects of user behavior in social media
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/22/2015
Defense Date
04/09/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
modeling information adoption,OAI-PMH Harvest,probabilistic models,recommendation system,role of network structure,social media,social network,understanding information diffusion,user behavior,user effort
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lerman, Kristina (
committee chair
), Bar, Francois (
committee member
), Nevatia, Ramakant (
committee member
)
Creator Email
jeonhyuk@usc.edu,jeonhyungkang@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-603277
Unique identifier
UC11301511
Identifier
etd-KangJeonHy-3670.pdf (filename),usctheses-c3-603277 (legacy record id)
Legacy Identifier
etd-KangJeonHy-3670.pdf
Dmrecord
603277
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Kang, Jeon-Hyung
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
modeling information adoption
probabilistic models
recommendation system
role of network structure
social media
social network
understanding information diffusion
user behavior
user effort