Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Neural matrix factorization model combing auxiliary information for movie recommender system
(USC Thesis Other)
Neural matrix factorization model combing auxiliary information for movie recommender system
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Neural Matrix Factorization Model Combing Auxiliary Information for Movie
Recommender System
by
Weidi Pan
A Thesis Presented to the
FACULTY OF THE USC DORNSIFE COLLEGE OF LETTERS, ARTS AND
SCIENCES
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF ARTS
(APPLIED MATHEMATICS)
May 2022
Copyright 2022 Weidi Pan
Table of Contents
List of Tables iv
List of Figures v
Abstract vi
Chapter 1: Introduction 1
1.1 Research Background and Significance . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2: Data 4
2.1 Rating Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Content-based Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 3: Feature Engineering 6
3.1 Item Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Poster Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Genres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 User Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 4: Traditional Matrix Factorization Models 12
4.1 SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 SVD++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 5: Neural Matrix Factorization Models 16
5.1 Generalized Matrix Factorization (GMF) . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Multi-Layer Perceptron (MLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Fusion of GMF and MLP (NeuMF) . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 GMF combing auxiliary information (GMF++) . . . . . . . . . . . . . . . . . . . 20
5.5 MLP++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.6 NeuMF++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ii
Chapter 6: Results 23
6.1 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References 27
Appendices 29
A Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.1 Python code for SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.2 Python code for SVD++ . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.3 Python code for GMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.4 Python code for MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.5 Python code for NeuMF . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.6 Python code for GMF++ . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.7 Python code for MLP++ . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.8 Python code for NeuMF++ . . . . . . . . . . . . . . . . . . . . . . . . . . 41
iii
List of Tables
2.1 MovieLens-1M statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 User and item attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Summary of attribute types and feature engineering techniques. . . . . . . . . . . . 6
3.2 The number of different movie genres. . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Three genre features after TF-IDF and one-hot encoding. . . . . . . . . . . . . . . 9
3.4 The number of genders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.1 RMSE of different compared models on MovieLens-1M. . . . . . . . . . . . . . . 25
iv
List of Figures
3.1 VGG16 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Age distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Frequency of each occupation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1 GMF model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 MLP model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 NeuMF model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 GMF++ model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.5 MLP++ model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.6 NeuMF++ model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1 Training loss of compared models over 60 iterations. . . . . . . . . . . . . . . . . 24
6.2 Validation loss of compared models over 60 iterations. . . . . . . . . . . . . . . . 25
v
Abstract
In this era of information explosion, recommender systems are personalized and efficient ap-
proaches to recommending filtered information. One main task is to give item recommendations
based on user-item interaction data such as movie rating data. A traditional matrix factorization
model is an approach to modeling the rating process as the inner product of user and item latent
factors. However, the model has limitations such as the sparsity problem, the cold-start problem
and underfitting by the inner product. Sometimes, neural networks are introduced to improve col-
laborative filtering models. This thesis focuses on building neural matrix factorization models
successfully by combining user, item and interaction data to generate movie recommendations.
This thesis is divided into three parts including feature engineering, mathematical modeling
and model evaluation. In feature engineering, this work uses a pre-training model to extract fea-
tures from image data and performs TF-IDF to quantify text data such as genres. In the process of
building models, based on traditional MF models such as SVD and SVD++, neural networks are
introduced to learn user and item latent factors and to model the user-item interaction. This work
proposes three neural matrix factorization models combining user, item and interaction data called
GMF++, MLP++ and NeuMF++. These models alleviate the traditional MF models’ limitations
such as the cold-start problem. In the model evaluation section, this work compares the results
of different models that use the MovieLens-1M data in conjunction with picture data obtained
from scraping posters from IMDB. The results show that the three new models perform better than
traditional MF models, and in particular that GMF++ has the best performance on the test dataset.
Keywords: Matrix Factorization Model, Neural Networks, Auxiliary Information, Feature
Extractors
vi
Chapter 1
Introduction
1.1 Research Background and Significance
The goal of a recommendation system is to assist people in overcoming the problem of information
overload. Information overload means that the things you need and are interested in are drowning
in a sea of similar items. As a result, you have to put in a lot of time and effort to find what
you want. People have gone through three stages to solve the problem of information overload:
classified catalogs, search engines, and recommender systems.
One of the most well-known events in the field of recommender systems is the Netflix Prize
competition. The competition not only attracted a large number of professionals to participate in
research in the field of recommender systems, but it also brought technology from the academic
world to the business world. Recommender systems sparked lively debates and gradually became
the technical core of many businesses.
1.2 Related Research
The key to building a recommender system is the modeling of user interest based on explicit or
implicit user feedback on items, also known as collaborative filtering [1]. Among the different
collaborative filtering models, the matrix factorization model is one of the most popular models
[2, 3]. Matrix factorization aims to map users and items into the latent space and represent users
1
and items by latent factors. The traditional MF model uses inner products to model the interaction
between users and items. And some improvements are proposed based on the traditional MF model
such as adding user, item bias terms [4]. But the traditional MF model has three limitations: the
first problem is that the user and item may not be fitted with inner product [5]. The second is
the overfitting problem due to the sparsity of the user and item interaction matrix, which limits
the performance of the recommender system [3, 6]. The third is that there will be a cold start
[4]. Therefore, those limitations let traditional MF models be difficult to improve the precision of
recommendations.
The neural networks have been proven to be capable of approximating any continuous func-
tion [7] and have been found to be effective in several fields such as computer vision, speech
recognition, to text processing [8–10]. More and more researchers have started to introduce neu-
ral networks into recommendation models. Ouyang experimented with autoencoders to explore
high-level user-item relationships, and then proposed a three-layer autoencoder [11]. Wu proposed
Collaborative Denoising AutoEncoder to perform collaborative filtering, which utilizes Denoising
AutoEncoder (DAE) [12] to perform collaborative filtering. He pointed out the deficiencies of in-
ternal products in collaborative filtering and introduced DNNs to the collaborative filtering model
to make the model learn more about the nonlinear relationships between users and items [4]. The
linear relationship between users and items and experimented with three models GMF, MLP, and
NeuMF for comparison. Yu Liu proposed a new Deep hybrid recommender system based on NCF
framework and auto-encoder for noise reduction of user and item feature information [13].
1.3 Contribution
Researchers explored to improve recommendation performance by either designing models based
on neural networks or introducing auxiliary information via neural networks. However, using neu-
ral networks for both tasks is rarely used. In this work, this paper focuses on building a neural
matrix factorization model based on user ratings including extracting features from images and
2
text data. This network extracts user and item features from auxiliary data as user and item la-
tent factors for neural collaborative filtering. In addition, this work proposes three models based
on the NCF framework: GMF++ (Generalized Matrix Factorization++), MLP++ (Multi-Layer
Perception++), NeuMF++ (Neural Matrix Factorization++), all of which integrate user, item and
user-item interaction information. The models alleviate the problems that existed in the traditional
models mentioned in Section 1.2.
3
Chapter 2
Data
2.1 Rating Dataset
The dataset used in this work is MovieLens-1M [14]. The dataset is a collection of 1,000,209
anonymous ratings from users who joined MovieLens in 2000. The dataset covers 6,040 users and
3,706 movies, with 95.53% sparseness. Each movie receives at least 20 ratings which range from
1 to 5. On average, a movie receives an average of 270 ratings, and a person rates an average of
165.6 movies.
TABLE 2.1: MovieLens-1M statistics.
Dataset Number of ratings Number of users Number of items Sparsity (%)
MovieLens-1M 1,000,209 6,040 3,706 95.53
2.2 Content-based Data
The traditional matrix factorization models suffer from the cold-start problem [15], i.e., without
user’s rating data or user’s implicit feedback, the models are hard to make predictions about user
preferences. To solve this problem, content-based data will be introduced in this paper. This
information is not generated by user-item interaction, so the cold-start problem can be addressed
by using content-based features to represent users and items. For example, if the user’s age is
4
known to be young, the model can be learned to recommend cartoons to them. The following table
summarizes the user attributes and item attributes in this work.
TABLE 2.2: User and item attributes.
Dataset User attributes Item attributes
MovieLens-1M Age, gender, occupation Poster picture, genre
5
Chapter 3
Feature Engineering
This chapter mainly focuses on feature engineering on user attributes and item attributes. The
attributes are classified into different data types including ordinal data, nominal data, and numerical
data. The different types of data were processed by different feature engineering techniques. The
following table summarizes all attributes and the corresponding feature engineering techniques.
TABLE 3.1: Summary of attribute types and feature engineering techniques.
Side Attribute names Data types Preprocessing methods
Item Poster picture Image Feature extraction using a CNN
Item Genres Nominal TF-IDF [16], one-hot encoding
User Gender Nominal One-hot encoding
User Age Ordinal Ordinal encoding
User Occupation Nominal One-hot encoding
In the following sections, each attribute will be discussed one by one.
3.1 Item Features
3.1.1 Poster Picture
Clean up movies without poster pictures
The poster pictures are downloaded through IMDB’s API. Since some poster pictures cannot
be found, the movies that do not have posters should be removed. In general, there are 36 movies
6
without poster pictures in MovieLens-1M data, accounting for 0.971%. And these movies are in a
total of 2374 rating records, accounting for 0.237%. Since the percentage of these records is quite
small, the data is cleaned up.
Pre-trained convolutional neural networks as feature extractors
To extract features from image information in a small dataset, a common and efficient way is
to use a pre-trained neural network model trained on a large dataset. The training dataset is so
large that the spatial hierarchy of features learned by the model can be effectively used as a general
model of the visual world [17]. These features can be used for a variety of different computer
vision tasks, even if the new problem involves a completely different class than the original task.
In this work, the VGG16 model [18] is used for feature extraction from image data.
FIGURE 3.1: VGG16 Architecture.
Since the shape of input and output for VGG16 model are fixed. Therefore, first, convert the
poster picture of each movie into a 224×224×3 shape and pass it into the VGG16 model to get an
1000×1 feature vector of poster picture.
X
poster
= VGG16(X
poster
in
) (3.1)
where X
poster
in
is a 224× 224× 3 matrix, and X
poster
is an 1000× 1 feature vector of a poster picture.
7
3.1.2 Genres
The following table shows the number of movies for each genre.
TABLE 3.2: The number of different movie genres.
Genre Number of genre Genre Number of genre
Action 503 Film-Noir 44
Adventure 283 Horror 343
Animation 105 Musical 114
Children’s 251 Mystery 106
Comedy 1200 Romance 471
Crime 211 Sci-Fi 276
Documentary 127 Thriller 492
Drama 1603 War 143
Fantasy 68 Western 68
Note that one movie can have multiple genres. The total number of movies is 3670 after data
cleaning.
Clean up movies without poster pictures
TF-IDF was originally widely used in the field of text analysis. For an article, the more fre-
quently the words appear in it, the more likely they are to be the keywords of the article, which is
the original purpose of the Term Frequency (TF) indicator. However, the frequency of meaningless
words like “the”, “an”, “but” can be very high, and Inverse Document Frequency (IDF) is proposed
to balance the weight of words. In this paper, TF-IDF is used to measure the amount of feature
information. If a genre is more common, it will have a lower weight, which means the genre is
more useless for personalized recommendations.
Let tf(g,i) be the ratio of genre g to the number of genres of movie i.
tf(g,i)=
f
g,i
∑
g
′
∈G
i
f
g
′
,i
(3.2)
where f
g,i
is dummy variable. Value 0 or 1 indicates the absence or presence of the genre g for
movie i.
8
idf(g,I)= log
|I|
|{i∈ I : g∈ G
i
}|
(3.3)
where|I| is total number of movies, and|i∈ I : g∈ G
i
| is number of movies where the genre g
appears. Then TF-IDF is calculated as
tfidf (g,i,I)= tf(g,i)· idf(g,I) (3.4)
Then, because genre attribute is nominal data, it is encoded by one-hot encoding. The examples
from the dataset are as follows.
TABLE 3.3: Three genre features after TF-IDF and one-hot encoding.
Adventure Animation Children’s Comedy Fantasy Romance ...
Movie ID 1 0 0.7294 0.5911 0.3444 0 0 ...
Movie ID 2 0.5001 0 0.5163 0 0.6952 0 ...
Movie ID 3 0 0 0 0.5743 0 0.8186 ...
Note that the “Comedy” genre in the above example has relatively low weights due to the more
common, which are 0.3444 and 0.5743. While the “Fantasy”, “Romance” genres are relatively
rare and have relatively high weights, which are 0.6952 and 0.8186.
9
3.2 User Features
Gender
Gender is composed of “M” and “F”.
TABLE 3.4: The number of genders.
Gender Number of users
M 4331
F 1709
Since gender is nominal data, the common feature engineering technique is one-hot encoding.
After one-hot encoding, the feature is represented as a one-hot vector for this binary data, i.e.,[1,0]
for “M”,[0,1] for “F”.
Age
In MovieLens-1M, age is ordinal data and chosen from the following ranges: “Under 18”,
“18-24”, “25-34”, “35-44”, “45-49”, “50-55”, “56+”.
FIGURE 3.2: Age distribution.
For this data type, it is common to use an ordinal encoding. In this work, the age is mapped to
integers from 0 to 6.
10
Occupation
Users’ occupations contain 21 types including an “other or not specified”. In details,
FIGURE 3.3: Frequency of each occupation.
Because the attribute is not measured, perform one-hot processing on this attribute. For exam-
ple, “writer” maps to[1,0,...,0] and “unemployed” maps to[0,1,...,0], etc.
11
Chapter 4
Traditional Matrix Factorization Models
In this chapter, we introduce two traditional Matrix Factorization models (MF) as baselines. The
matrix factorization model is one type of collaborative filtering recommendation model. The idea
of MF is to decompose the matrix by mapping both users and items into an f -dimensional latent
factor space. Each user is associated with a vector p
u
∈R
f
and each item with a vector q
i
∈R
f
where these vector factors are implicitly interest factors. In this latent factor space, the rating
process is modeled as an operation between vectors. Note that the following SVD and SVD++
models are based on dot product, but it is possible that the operation between user and item vectors
cannot be modeled by dot product properly, so a neural network is introduced to address this
problem in Chapter 5.
4.1 SVD
SVD is short for singular value decomposition [19] which is a factorization of a matrix. The
decomposition of an m×n matrix M is a factorization of form
M= UΣV
T
(4.1)
where U is an m× m orthogonal matrix,Σ is an m× n rectangular diagonal matrix with non-negative
real numbers on the diagonial, and V is an n× n orthogonal matrix.
12
Unfortunately, SVD cannot be applied to explicit ratings in the collaborative filtering based
approach since the user does not rate the majority of the items, resulting in a large number of
missing values in the user-item rating matrix. Furthermore, large sparse matrices could lead to
overfitting [20]. In this work, the item-user rating matrix in MovieLens-1M dataset is extremely
sparse with a sparsity of 95.53%. Therefore, a learning model based on bias predictors is proposed
below.
Biases [4]
The interactions between users and items that result in varying rating levels are captured by CF
models. However, independent of their interaction, much of the observed rating values are due to
effects connected with either users or items. For example, systematic tendencies for some users to
give greater ratings than others, and for some objects to obtain higher ratings than others.
In this work, biases are introduced to encapsulate those effects, which do not involve user-item
interaction. Denote by µ the overall average rating. b
ui
stands for biases for an unknown rating r
ui
those accounts for user and item effects:
ˆ
b
ui
=µ+ b
u
+ b
i
(4.2)
where the parameters b
u
and b
i
represent the observed deviations of user u and item i from the
average, respectively.
Model
For the SVD model, the user-item interactions are modeled as dot products in the latent factor
space of dimensionality f . Each user is associated with a vector p
u
∈R
f
and each item with a
vector q
i
∈R
f
. The resulting dot product, q
T
i
p
u
, captures the overall interest of the user u in
characteristics of the item i. The final rating is built by adding the aforementioned biases that
depend only on the user and item. Thus, the SVD model is
ˆ r
ui
=µ+ b
u
+ b
i
+ q
T
i
p
u
(4.3)
13
The loss function is built by the regularized squared error. In order to learn the model parame-
ters (b
u
, b
i
, p
u
and p
i
) the loss function is minimized
min
b
∗ ,q
∗ ,p
∗ ∑
(u,i)∈K
(r
ui
− µ− b
u
− b
i
− q
T
i
p
u
)
2
+λ
1
(b
2
u
+ b
2
i
+ q
i
2
+ p
u
2
) (4.4)
where the constantλ controls the extent of regularization and is usually determined by cross vali-
dation. The first term ∑
(u,i)∈K
(r
ui
− µ− b
u
− b
i
− q
T
i
p
u
)
2
strives to find b
u
’s, b
i
’s, Q=(q
T
1
,q
T
2
,...)
and P=(p
T
1
,p
T
2
,...) that fit the given ratings. The regularizing term ∑
(u,i)∈K
λ
1
(b
2
u
+ b
2
i
+∥q
2
i
∥+
∥p
u
∥
2
) avoids overfitting by penalizing the magnitudes of the parameters.
Minimization is performed by stochastic gradient descent, which was popularized by Funk
[21]. The algorithm loops through all ratings in the training data.
4.2 SVD++
SVD++ [4] is improved SVD model with the addition of implicit data. The implicit data is infor-
mation that is not provided intentionally but gathered from available data streams such as browsing
and purchase history. The data like ratings are explicit data which is information users actively pro-
vide on their own. When more implicit feedback sources are available, they can be used to better
understand user behavior. This helps to overcome data scarcity and is especially useful for users
who have few explicit ratings. To handle implicit feedback, extensions are described for the SVD
model.
For a dataset such as the MovieLens-1M dataset, a user implicitly tells us about her prefer-
ences by choosing to voice her opinion and vote for a high or low rating. This results in a binary
matrix, with “1” denoting “rated” and “0” denoting “not rated.” While this binary data may not
be as helpful as other independent forms of implicit input, adding it increases prediction accuracy
significantly. The advantage of using the binary data is closely related to the fact that ratings are
not absent at random; users pick which items to rate deliberately (see Marlin et al. [22]).
14
Model
The SVD++ model is as follows:
ˆ r
ui
=µ+ b
i
+ b
u
+ q
T
i
(p
u
+|R(u)|
− 1/2
∑
j∈R(u)
y
j
) (4.5)
where R(u) is the item set that rated by the user u, and y
i
∈R
f
is a new factor vector of item i.
The new term|R(u)|
− 1/2
∑
j∈R(u)
y
j
is to characterize users based on the set of items that the user u
rated. Now, p
u
+|R(u)|
− 1/2
∑
j∈R(u)
y
j
is used to model a user u. Note that the sum is normalized
by|R(u)|
− 1/2
in order to stabilize its variance across the range of observed values of|R(u)|.
The loss function is the regularized squared error. The SGD algorithm is used to minimize this
loss function.
min
b
∗ ,q
∗ ,p
∗ ∑
(u,i)∈K
(r
ui
− ˆ r
ui
)
2
+λ
2
(b
2
i
+ b
2
u
)+λ
3
(∥q
i
∥
2
+∥p
u
∥
2
+∥y
i
∥
2
) (4.6)
4.3 Limitation
The SVD and SVD++ have some limitations as follows:
• One hypothesis of the SVD and SVD++ models is that the user-item interaction can be
modeled by dot product function, which can limit the expressiveness of MF [5].
• The model relies on data from user-item interactions and suffers from a cold-start problem
[15].
15
Chapter 5
Neural Matrix Factorization Models
Note that any continuous function can be approximated by a neural network if it contains at least
one hidden layer and utilizes non-linear activations [23]. Therefore, He [5] introduced neural
networks into the collaborative filtering model, presented a general neural collaborative filtering
model framework (NCF) and proposed GMF, MLP and NeuMF models based on the NCF frame-
work. In this chapter, the GMF, MLP and NeuMF applied to explicit data are introduced first.
Then, propose novel models that support a wide range of user-side and item-side auxiliary infor-
mation such as content-based data. Note that the cold-start problem can be handled by leveraging
content features to represent users and items with such a general feature representation for inputs.
5.1 Generalized Matrix Factorization (GMF)
GMF is the simplest model under the NCF framework, which is the most similar model to dot
product-based MF model. Like the vectors mentioned before, p
u
is the embedding vector for user
u, q
i
is the embedding vector for item i. Then, GMF is defined as follows:
ˆ r
ui
= a
out
(h
T
(p
u
⊙ q
i
)) (5.1)
where ⊙ denote the element-wise product of vectors, h denote the edge weights of the output
layer, and a
out
denote the activation function. In this work, the model uses the sigmoid function
16
σ(x)= 1/(1+ e
− x
) as a
out
and learns h from data with the square error loss. Note that when h is
a uniform vector of 1 and a
out
is an identity function, the model becomes the linear MF model.
FIGURE 5.1: GMF model.
This model is a generalized MF model, which is more flexible than the traditional MF, but
cannot fit arbitrary continuous functions. Therefore, the MLP model is presented below.
5.2 Multi-Layer Perceptron (MLP)
It was pointed out by He that modeling the user-item interaction as dot product may limit the
expressiveness of MF [5]. Therefore, He introduced the multi-layer perceptron to the MF to model
the user-item interaction.
FIGURE 5.2: MLP model.
17
The main idea of the model is to first embedding the user ID and item ID into dense vectors p
u
and q
i
, and then using the multi-layer perceptron to model the interaction between users and items
to obtain the score. the model is as follows.
z
1
=φ
1
(p
u
,q
i
)=
p
u
q
i
,
φ
2
(z
1
)= a
2
W
T
2
z
1
+ b
2
,
......,
φ
L
(z
L− 1
)= a
L
W
T
L
z
L− 1
+ b
L
,
ˆ r
ui
=σ
h
T
φ
L
(z
L− 1
)
.
(5.2)
where W
x
, b
x
, and a
x
denote the weight matrix, bias vector, and activation function for the x-th
layer’s perceptron, respectively. φ
1
is to concatenate p
u
and q
i
. Then, φ
2
,...,φ
L
is multi-layer
perceptron. Andσ is the output layer.
18
5.3 Fusion of GMF and MLP (NeuMF)
However, it is non-trivial to learn a dot product with a multilayer perceptron [5]. Hence, The
NeuMF model integrates GMF and MLP.
FIGURE 5.3: NeuMF model.
In the NeuMF model, different embeddings are used for GMF and MLP in order to increase
flexibility. Embeddings are then brought into GMF and MLP respectively, and the results of the
two parts are concatenated and brought to the output layer for the final output score.
φ
GMF
= p
G
u
⊙ q
G
i
φ
MLP
= a
L
W
T
L
a
L− 1
... a
1
W
T
2
p
M
u
q
M
i
T
+ b
2
...
+ b
L
ˆ r
ui
=σ
h
T
φ
GMF
φ
MLP
(5.3)
where p
G
u
and q
G
i
denote the user and item latent factors for GMF and similar notations of p
M
u
and
q
M
i
for MLP. ReLU is used as the activation function of all MLP layers. For modeling user–item
latent structures, this model combines the linearity of MF with the non-linearity of DNNs.
19
5.4 GMF combing auxiliary information (GMF++)
Based on GMF, this work proposes a new model which can extract information from different
parties. The main idea is that use a multilayer perceptron to learn the user latent factor and the
item latent factors by user information (including user ID embedding, user features) and item
information (including item ID embedding, item features), respectively. The GMF++ means a
combination of user, item, interaction data into the model.
FIGURE 5.4: GMF++ model.
The left-half model in Figure 5.4 as to obtain the user latent factors p
u
is as follows:
z
U
1
=φ
1
(P
T
v
U
u
,X)=
P
T
v
U
u
X
φ
2
(z
U
1
)= a
U
2
([W
U
2
]
T
z
U
1
+ b
U
2
)
......
p
u
=φ
L
U(z
L
U
− 1
)= a
U
L
U
([W
U
L
U
]
T
z
U
L
U
− 1
+ b
U
L
U
)
(5.4)
where W
x
, b
x
, and a
x
denote the weight matrix, bias vector, and activation function for the x-
th layer’s perception, respectively. X=[X
T
1
,X
T
2
,...]
T
and each X
i
is a feature vector of users.
v
U
u
=(0,0,...,1,...,0,0) is an|U|-length vector for the u-th user with an 1 at u-th position. The P
20
is an embedding matrix and P
T
v
U
u
is the u-th user’s corresponding embedding. The item part is
modeled in the same way as follows.
q
i
= a
I
L
[W
I
L
]
T
a
I
L− 1
... a
I
2
[W
I
2
]
T
Q
T
v
I
i
Y
+ b
I
2
...
+ b
I
L
(5.5)
where Y denote the concatenation of all item feature vectors like X. The final layer is the same as
the GMF in Equation 5.1 to obtain ˆ r
ui
.
In this work, activation functions of MLP layers choose Rectifier (ReLU). And the output layer
chooses linear.
5.5 MLP++
Just like the GMF++ model, the MLP++ improves the adaptation of auxiliary information.
FIGURE 5.5: MLP++ model.
21
The MLP++ is identical as the GMF++ to obtain user and item latent factors p
u
, q
i
in Equation
5.4 and Equation 5.5. As for modelling of the user-item interaction, MLP++ is the same as MLP
to obtain ˆ r
ui
in Equation 5.2 by p
u
, q
i
.
5.6 NeuMF++
FIGURE 5.6: NeuMF++ model.
where MF vectors p
G
u
,q
G
i
and MLP vectors p
M
u
,q
M
i
are learned from GMF++ and MLP++ models
in Equation 5.4 and Equation 5.5. After having the latent factors, the later part of the model is the
same as NeuMF in Equation 5.3. Note that to give the fused model more flexibility, we let GMF++
and MLP++ learn their own latent factors and then combine the two models by concatenating their
last hidden layer. The model is depicted in Figure 5.6.
22
Chapter 6
Results
6.1 Experiment Settings
Dataset
For the MovieLens-1M dataset, the whole dataset is randomly split into train set, validation set,
and test set in the ratio of 80%, 10%, and 10%. The train set is for model training, the validation set
is for hyperparameter tuning of the model, and the test set is used to evaluate competing models.
It is worth pointing out that all the results in this paper are from the evaluation of the test set.
In addition, the random seed is fixed for all models in splitting the dataset. The main purposes
of fixed seed are to ensure the reimplementation of the models and to ensure the comparability
between models.
Evaluation Index
The root mean square error (RMSE) is employed in this work, as given in Equation 6.1. Our
loss function is directly related to RMSE. The lower the RMSE, the more accurate the recommen-
dation. Nonetheless, small RMSE improvements have been shown to have a significant influence
on the quality of the top few provided recommendations [2, 24].
RMSE=
s
m
∑
u=1
n
∑
i=1
(r
ui
− ˆ r
ui
)
2
N
(6.1)
23
Parameter Settings
The proposed models are implemented by Keras [25]. All models are learned by optimizing the
mean square error. We randomly initialized model parameters with a Gaussian distribution (with a
mean of 0 and standard deviation of 0.01) for the models, then optimized the model with minibatch
Adam [26]. We tested the batch size of 1024 and the learning rate of 0.001. The embedding size is
16. The multi-layers perceptrons for latent factors p
u
and q
i
are same[128,64,64,16] in GMF++
and MLP++. The second multi-layers perceptron in MLP and MLP++ sets[32,16,8].
Pre-training
Gradient-based optimization approaches can only provide locally optimal solutions due to the
non-convexity of NeuMF++’s objective function. Initialization is said to play a significant impact
on the convergence and performance of deep learning models [27]. We train GMF++ and MLP++
using random initializations until convergence and then utilize their model parameters as the ini-
tialization for the relevant parts of NeuMF++’s parameters because NeuMF++ is an ensemble of
GMF++ and MLP++.
6.2 Performance Comparison
Since the convergence speed of SVD++ is relatively slow, the following is the main comparison of
SVD and GMF++, MLP++, NeuMF++ over 60 iterations.
FIGURE 6.1: Training loss of compared models over 60 iterations.
24
FIGURE 6.2: Validation loss of compared models over 60 iterations.
Figure 6.1 shows that the pre-trained NeuMF++ training loss decreases very fast over first 60
iterations. Figure 6.2 shows that although the convergence speed of MLP++ in validation dataset
is relatively fast.
TABLE 6.1: RMSE of different compared models on MovieLens-1M.
Method Training RMSE Validation RMSE Testing RMSE
SVD 0.8937 0.9422 0.9318
SVD++ 0.7793 0.9021 0.8959
GMF 0.8071 0.8765 0.8757
MLP 0.8666 0.9001 0.8992
NeuMF (pre-train) 0.8265 0.8882 0.8852
GMF++ 0.8156 0.8769 0.8740
MLP++ 0.8566 0.8953 0.8942
NeuMF++ (pre-train) 0.8125 0.8891 0.8846
As Table 6.1 showed, the GMF++ has proved to outperform all the other baseline models with
0.8740 in testing RMSE. But MLP++ slightly underperforms GMF++. Note that MLP can be
further improved by adding more hidden layers, and here we only show the performance of three
layers.
25
6.3 Future Work
In this thesis, this work does not include the comparison of different hyperparameters due to lim-
itations of computational resources and time. In future work, the hyperparameters, such as the
dimension of the embedding vectors and the setting of the multi-layer perceptron, can be analyzed
in more detail. By spending more time on hyperparameter tuning, the model will perform better.
In addition, the models can be extended to many other types of auxiliary information such as on-
tology [28]. We expect that by incorporating more auxiliary information into the recommender
system, the recommendation precision will improve even further.
26
References
1. Sarwar, B., Karypis, G., Konstan, J. & Riedl, J. Item-based collaborative filtering recommen-
dation algorithms in Proceedings of the 10th international conference on World Wide Web
(2001), 285–295.
2. Koren, Y . Factorization meets the neighborhood: a multifaceted collaborative filtering model
in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery
and data mining (2008), 426–434.
3. Li, S., Kawale, J. & Fu, Y . Deep collaborative filtering via marginalized denoising auto-
encoder in Proceedings of the 24th ACM international on conference on information and
knowledge management (2015), 811–820.
4. Koren, Y . & Bell, R. Advances in collaborative filtering. Recommender systems handbook,
77–118 (2015).
5. He, X., Liao, L., Zhang, H., Nie, L., Hu, X. & Chua, T.-S. Neural collaborative filtering in
Proceedings of the 26th international conference on world wide web (2017), 173–182.
6. Wang, H., Wang, N. & Yeung, D.-Y . Collaborative deep learning for recommender systems
in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery
and data mining (2015), 1235–1244.
7. Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal
approximators. Neural networks 2, 359–366 (1989).
8. Collobert, R. & Weston, J. A unified architecture for natural language processing: Deep
neural networks with multitask learning in Proceedings of the 25th international conference
on Machine learning (2008), 160–167.
9. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition in Pro-
ceedings of the IEEE conference on computer vision and pattern recognition (2016), 770–
778.
10. Hong, R., Hu, Z., Liu, L., Wang, M., Yan, S. & Tian, Q. Understanding blooming human
groups in social networks. IEEE Transactions on Multimedia 17, 1980–1988 (2015).
11. Ouyang, Y ., Liu, W., Rong, W. & Xiong, Z. Autoencoder-based collaborative filtering in
International conference on neural information processing (2014), 284–291.
27
12. Wu, Y ., DuBois, C., Zheng, A. X. & Ester, M. Collaborative denoising auto-encoders for
top-n recommender systems in Proceedings of the ninth ACM international conference on
web search and data mining (2016), 153–162.
13. Liu, Y ., Wang, S., Khan, M. S. & He, J. A novel deep hybrid recommender system based on
auto-encoder with neural collaborative filtering. Big Data Mining and Analytics 1, 211–221
(2018).
14. Harper, F. M. & Konstan, J. A. The movielens datasets: History and context. Acm transactions
on interactive intelligent systems (tiis) 5, 1–19 (2015).
15. Kluver, D. & Konstan, J. A. Evaluating recommender behavior for new users in Proceedings
of the 8th ACM Conference on Recommender Systems (2014), 121–128.
16. Wikipedia contributors. Tf–idf — Wikipedia, The Free Encyclopedia [Online; accessed 4-
March-2022]. 2022.
17. George Karimpanal, T. & Bouffanais, R. Self-organizing maps for storage and transfer of
knowledge in reinforcement learning. Adaptive Behavior 27, 111–126 (2019).
18. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556 (2014).
19. Wikipedia contributors. Singular value decomposition — Wikipedia, The Free Encyclopedia
[Online; accessed 4-March-2022]. 2022.
20. Sarwar, B., Karypis, G., Konstan, J. & Riedl, J. Application of dimensionality reduction in
recommender system-a case study tech. rep. (Minnesota Univ Minneapolis Dept of Computer
Science, 2000).
21. Funk, S. Netflix update: Try this at home 2006.
22. Marlin, B., Zemel, R. S., Roweis, S. & Slaney, M. Collaborative filtering and the missing at
random assumption. arXiv preprint arXiv:1206.5267 (2012).
23. Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function.
IEEE Transactions on Information theory 39, 930–945 (1993).
24. Koren, Y . Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans-
actions on Knowledge Discovery from Data (TKDD) 4, 1–24 (2010).
25. Elkahky, A. M., Song, Y . & He, X. A multi-view deep learning approach for cross domain
user modeling in recommendation systems in Proceedings of the 24th international confer-
ence on world wide web (2015), 278–288.
26. Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:
1412.6980 (2014).
28
27. Erhan, D., Courville, A., Bengio, Y . & Vincent, P. Why does unsupervised pre-training help
deep learning? in Proceedings of the thirteenth international conference on artificial intelli-
gence and statistics (2010), 201–208.
28. Nilashi, M., Ibrahim, O. & Bagherifard, K. A recommender system based on collaborative
filtering using ontology and dimensionality reduction techniques. Expert Systems with Appli-
cations 92, 507–520 (2018).
29
Appendices
A Code
A.1 Python code for SVD
def svd(u, user_dim, item_dim, factor_dim):
user_input = Input(shape=(1, ), dtype="int32", name="user_input")
item_input = Input(shape=(1, ), dtype="int32", name="item_input")
pu = Embedding(input_dim=user_dim, output_dim=factor_dim, name =
"user_embedding", embeddings_regularizer=l2(0.00001))(user_input)
pu = Reshape((factor_dim, ))(pu)
qi = Embedding(input_dim=item_dim, output_dim=factor_dim, name =
"item_embedding", embeddings_regularizer=l2(0.00001))(item_input)
qi = Reshape((factor_dim, ))(qi)
pred = Dot(axes=-1)([pu, qi])
bu = Reshape((1,))(Embedding(input_dim=user_dim, output_dim=1, name =
"user_bias", embeddings_regularizer=l2(0.00001))(user_input))
bi = Reshape((1,))(Embedding(input_dim=item_dim, output_dim=1, name =
"item_bias", embeddings_regularizer=l2(0.00001))(item_input))
30
pred = add([pred, bi, bu])
pred = Lambda(lambda x: x + K.constant(u, dtype=K.floatx()))(pred)
return Model(inputs=[user_input, item_input], outputs=pred)
A.2 Python code for SVD++
def svdpp(u, num_users, num_items, factor_dim, R):
user_input = Input(shape=(1, ), dtype="int32", name="user_input")
item_input = Input(shape=(1, ), dtype="int32", name="item_input")
pu = Embedding(input_dim=num_users, output_dim=factor_dim, name =
"user_embedding", embeddings_regularizer=l2(0.00001))(user_input)
pu = Reshape((factor_dim, ))(pu)
qi = Embedding(input_dim=num_items, output_dim=factor_dim, name =
"item_embedding", embeddings_regularizer=l2(0.00001))(item_input)
qi = Reshape((factor_dim, ))(qi)
yu = Embedding(num_users, R.shape[1], trainable=False,
weights=[R])(user_input)
yu = Reshape((R.shape[1],), name = "R")(yu)
yu = Embedding(num_items, factor_dim, embeddings_initializer=Zeros,
embeddings_regularizer=l2(0.00001), mask_zero=True)(yu)
yu = WeightedAvgOverTime()(yu)
pu = add([pu, yu])
31
pred = Dot(axes=-1)([pu, qi])
bu = Reshape((1,))(Embedding(input_dim=num_users, output_dim=1, name =
"user_bias", embeddings_regularizer=l2(0.00001))(user_input))
bi = Reshape((1,))(Embedding(input_dim=num_items, output_dim=1, name =
"item_bias", embeddings_regularizer=l2(0.00001))(item_input))
pred = add([pred, bi, bu])
pred = Lambda(lambda x: x + K.constant(u, dtype=K.floatx()))(pred)
return Model(inputs=[user_input, item_input], outputs=pred)
A.3 Python code for GMF
def gmf(num_users, num_items, embedding_dim=16, user_layers=[128,64,64,16],
item_layers=[128,64,64,16], regs=0):
# Input variables
user_input = Input(shape=(1,), dtype=’int32’, name = ’user_input’)
item_input = Input(shape=(1,), dtype=’int32’, name = ’item_input’)
MF_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’user_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MF_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’item_embedding’,
32
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
user_vector = Flatten()(MF_Embedding_User(user_input))
item_vector = Flatten()(MF_Embedding_Item(item_input))
# Element-wise product of user and item embeddings
predict_vector = Multiply()([user_vector, item_vector])
# Final prediction layer
pred = Dense(1, activation=’linear’, name = ’prediction’)(predict_vector)
model = Model(inputs=[user_input, item_input], outputs=pred)
return model
A.4 Python code for MLP
def mlp(num_users, num_items, embedding_dim=16, mlp_layers=[32, 16, 8], regs=0):
# Input variables
user_input = Input(shape=(1,), dtype=’int32’, name = ’user_input’)
item_input = Input(shape=(1,), dtype=’int32’, name = ’item_input’)
MF_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’user_embedding’,
33
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MF_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’item_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
user_vector = Flatten()(MF_Embedding_User(user_input))
item_vector = Flatten()(MF_Embedding_Item(item_input))
predict_vector = Concatenate()([user_vector, item_vector])
for idx, dim in enumerate(mlp_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’mlp_layer%d’ %idx)
predict_vector = layer(predict_vector)
# Final prediction layer
pred = Dense(1, activation=’relu’, name = ’prediction’)(predict_vector)
model = Model(inputs=[user_input, item_input], outputs=pred)
return model
34
A.5 Python code for NeuMF
def neumf(num_users, num_items, embedding_dim=16, user_layers=[128,64,64,16],
item_layers=[128,64,64,16], mlp_layers=[32, 16, 8], regs=0):
# Input variables
user_input = Input(shape=(1,), dtype=’int32’, name = ’user_input’)
item_input = Input(shape=(1,), dtype=’int32’, name = ’item_input’)
# ========================== GMF ==========================
GMF_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’GMF_user_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
GMF_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’GMF_item_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
GMF_user_vector = Flatten()(GMF_Embedding_User(user_input))
GMF_item_vector = Flatten()(GMF_Embedding_Item(item_input))
# Element-wise product of user and item embeddings
GMF_predict_vector = Multiply()([GMF_user_vector, GMF_item_vector])
35
# ========================== MLP ==========================
MLP_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’MLP_user_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MLP_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’MLP_item_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MLP_user_vector = Flatten()(MLP_Embedding_User(user_input))
MLP_item_vector = Flatten()(MLP_Embedding_Item(item_input))
MLP_predict_vector = Concatenate()([MLP_user_vector, MLP_item_vector])
for idx, dim in enumerate(mlp_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’MLP_layer%d’ %idx)
MLP_predict_vector = layer(MLP_predict_vector)
# ========================== final prediction layer ==========================
predict_vector = Concatenate()([GMF_predict_vector, MLP_predict_vector])
36
pred = Dense(1, activation=’linear’, name = ’prediction’)(predict_vector)
model = Model(inputs=[user_input, item_input], outputs=pred)
return model
A.6 Python code for GMF++
def gmfpp(num_users, num_items, num_user_features, num_item_features,
embedding_dim=16, user_layers=[128,64,64,16], item_layers=[128,64,64,16],
regs=0):
# Input variables
user_input = Input(shape=(1,), dtype=’int32’, name = ’user_input’)
item_input = Input(shape=(1,), dtype=’int32’, name = ’item_input’)
user_features = Input(shape=(num_user_features,), dtype=’float64’, name =
’user_features’)
item_features = Input(shape=(num_item_features,), dtype=’float64’, name =
’item_features’)
MF_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’user_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MF_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’item_embedding’,
37
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
user_vector = Flatten()(MF_Embedding_User(user_input))
user_vector = Concatenate()([user_vector, user_features])
for idx, dim in enumerate(user_layers):
layer = Dense(user_layers[idx], kernel_regularizer= l2(0),
activation=’relu’, name = ’user_layer%d’ %idx)
user_vector = layer(user_vector)
item_vector = Flatten()(MF_Embedding_Item(item_input))
item_vector = Concatenate()([item_vector, item_features])
for idx, dim in enumerate(item_layers):
layer = Dense(item_layers[idx], kernel_regularizer= l2(0),
activation=’relu’, name = ’item_layer%d’ %idx)
item_vector = layer(item_vector)
# Element-wise product of user and item embeddings
predict_vector = Multiply()([user_vector, item_vector])
# Final prediction layer
pred = Dense(1, activation=’linear’, name = ’prediction’)(predict_vector)
model = Model(inputs=[user_input, item_input, user_features, item_features],
outputs=pred)
38
return model
A.7 Python code for MLP++
def mlppp(num_users, num_items, num_user_features, num_item_features,
embedding_dim=16, user_layers=[128,64,64,16], item_layers=[128,64,64,16],
mlp_layers=[32, 16, 8], regs=0):
# Input variables
user_input = Input(shape=(1,), dtype=’int32’, name = ’user_input’)
item_input = Input(shape=(1,), dtype=’int32’, name = ’item_input’)
user_features = Input(shape=(num_user_features,), dtype=’float64’, name =
’user_features’)
item_features = Input(shape=(num_item_features,), dtype=’float64’, name =
’item_features’)
MF_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’user_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MF_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’item_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
39
user_vector = Flatten()(MF_Embedding_User(user_input))
user_vector = Concatenate()([user_vector, user_features])
for idx, dim in enumerate(user_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’user_layer%d’ %idx)
user_vector = layer(user_vector)
item_vector = Flatten()(MF_Embedding_Item(item_input))
item_vector = Concatenate()([item_vector, item_features])
for idx, dim in enumerate(item_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’item_layer%d’ %idx)
item_vector = layer(item_vector)
predict_vector = Concatenate()([user_vector, item_vector])
for idx, dim in enumerate(mlp_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’mlp_layer%d’ %idx)
predict_vector = layer(predict_vector)
# Final prediction layer
pred = Dense(1, activation=’linear’, name = ’prediction’)(predict_vector)
model = Model(inputs=[user_input, item_input, user_features, item_features],
outputs=pred)
return model
40
A.8 Python code for NeuMF++
def neumfpp(num_users, num_items, num_user_features, num_item_features,
embedding_dim=16, user_layers=[128,64,64,16], item_layers=[128,64,64,16],
mlp_layers=[32, 16, 8], regs=0):
# Input variables
user_input = Input(shape=(1,), dtype=’int32’, name = ’user_input’)
item_input = Input(shape=(1,), dtype=’int32’, name = ’item_input’)
user_features = Input(shape=(num_user_features,), dtype=’float64’, name =
’user_features’)
item_features = Input(shape=(num_item_features,), dtype=’float64’, name =
’item_features’)
# ========================== GMFPP ==========================
GMF_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’GMF_user_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
GMF_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’GMF_item_embedding’,
41
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
GMF_user_vector = Flatten()(GMF_Embedding_User(user_input))
GMF_user_vector = Concatenate()([GMF_user_vector, user_features])
for idx, dim in enumerate(user_layers):
layer = Dense(user_layers[idx], kernel_regularizer= l2(0),
activation=’relu’, name = ’GMF_user_layer%d’ %idx)
GMF_user_vector = layer(GMF_user_vector)
GMF_item_vector = Flatten()(GMF_Embedding_Item(item_input))
GMF_item_vector = Concatenate()([GMF_item_vector, item_features])
for idx, dim in enumerate(item_layers):
layer = Dense(item_layers[idx], kernel_regularizer= l2(0),
activation=’relu’, name = ’GMF_item_layer%d’ %idx)
GMF_item_vector = layer(GMF_item_vector)
# Element-wise product of user and item embeddings
GMF_predict_vector = Multiply()([GMF_user_vector, GMF_item_vector])
# ========================== MLPPP ==========================
MLP_Embedding_User = Embedding(input_dim = num_users, output_dim =
embedding_dim, name = ’MLP_user_embedding’,
42
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MLP_Embedding_Item = Embedding(input_dim = num_items, output_dim =
embedding_dim, name = ’MLP_item_embedding’,
embeddings_initializer =
initializers.RandomNormal(stddev=0.01),
embeddings_regularizer = l2(0.00001),
input_length=1)
MLP_user_vector = Flatten()(MLP_Embedding_User(user_input))
MLP_user_vector = Concatenate()([MLP_user_vector, user_features])
for idx, dim in enumerate(user_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’MLP_user_layer%d’ %idx)
MLP_user_vector = layer(MLP_user_vector)
MLP_item_vector = Flatten()(MLP_Embedding_Item(item_input))
MLP_item_vector = Concatenate()([MLP_item_vector, item_features])
for idx, dim in enumerate(item_layers):
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’MLP_item_layer%d’ %idx)
MLP_item_vector = layer(MLP_item_vector)
MLP_predict_vector = Concatenate()([MLP_user_vector, MLP_item_vector])
for idx, dim in enumerate(mlp_layers):
43
layer = Dense(dim, kernel_regularizer= l2(0), activation=’relu’, name =
’MLP_layer%d’ %idx)
MLP_predict_vector = layer(MLP_predict_vector)
# ========================== final prediction layer ==========================
predict_vector = Concatenate()([GMF_predict_vector, MLP_predict_vector])
pred = Dense(1, activation=’linear’, name = ’prediction’)(predict_vector)
model = Model(inputs=[user_input, item_input, user_features, item_features],
outputs=pred)
return model
44
Abstract (if available)
Abstract
In this era of information explosion, recommender systems are personalized and efficient approaches to recommending filtered information. One main task is to give item recommendations based on user-item interaction data such as movie rating data. A traditional matrix factorization model is an approach to modeling the rating process as the inner product of user and item latent factors. However, the model has limitations such as the sparsity problem, the cold-start problem and underfitting by the inner product. Sometimes, neural networks are introduced to improve collaborative filtering models. This thesis focuses on building neural matrix factorization models successfully by combining user, item and interaction data to generate movie recommendations.
This thesis is divided into three parts including feature engineering, mathematical modeling and model evaluation. In feature engineering, this work uses a pre-training model to extract features from image data and performs TF-IDF to quantify text data such as genres. In the process of building models, based on traditional MF models such as SVD and SVD++, neural networks are introduced to learn user and item latent factors and to model the user-item interaction. This work proposes three neural matrix factorization models combining user, item and interaction data called GMF++, MLP++ and NeuMF++. These models alleviate the traditional MF models’ limitations such as the cold-start problem. In the model evaluation section, this work compares the results of different models that use the MovieLens-1M data in conjunction with picture data obtained from scraping posters from IMDB. The results show that the three new models perform better than traditional MF models, and in particular that GMF++ has the best performance on the test dataset.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Recurrent neural networks with tunable activation functions to solve Sylvester equation
PDF
Topics in selective inference and replicability analysis
PDF
Feature selection in high-dimensional modeling with thresholded regression
PDF
Validation of an alternative neural decision tree
PDF
High dimensional estimation and inference with side information
PDF
Asymptotic properties of two network problems with large random graphs
PDF
Equilibrium model of limit order book and optimal execution problem
PDF
An application of Markov chain model in board game revised
PDF
Asymptotic problems in stochastic partial differential equations: a Wiener chaos approach
PDF
Automatic tracking of protein vesicles
PDF
Multi-population optimal change-point detection
PDF
A rigorous study of game-theoretic attribution and interaction methods for machine learning explainability
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Max-3-Cut performance of graph neural networks on random graphs
PDF
Supervised learning algorithms on factors impacting retweet
PDF
Deep learning for subsurface characterization and forecasting
PDF
A nonlinear pharmacokinetic model used in calibrating a transdermal alcohol transport concentration biosensor data analysis software
PDF
Theory of memory-enhanced neural systems and image-assisted neural machine translation
PDF
Dynamic network model for systemic risk
PDF
Large scale inference with structural information
Asset Metadata
Creator
Pan, Weidi
(author)
Core Title
Neural matrix factorization model combing auxiliary information for movie recommender system
School
College of Letters, Arts and Sciences
Degree
Master of Arts
Degree Program
Applied Mathematics
Degree Conferral Date
2022-05
Publication Date
04/13/2022
Defense Date
03/17/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
auxiliary information,feature extractors,matrix factorization model,neural networks,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey (
committee chair
), Mikulevicius, Remigijus (
committee member
), Tiruviluamala, Neelesh (
committee member
)
Creator Email
panweidi2018@gmail.com,weidipan@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC110963565
Unique identifier
UC110963565
Document Type
Thesis
Format
application/pdf (imt)
Rights
Pan, Weidi
Type
texts
Source
20220415-usctheses-batch-924
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
auxiliary information
feature extractors
matrix factorization model
neural networks