使用 PyTorch 创建 MLP 模型以预测用户对未看过的电影的评分
Creating MLP model to predict the ratings that a user will give to an unseen movie using PyTorch
对于我的项目,我试图根据用户对其他电影的评分来预测用户对未看过的电影的评分。我正在使用 movielens 数据集。主文件夹 ml-100k 包含有关 100,000 部电影.
的信息
在处理数据之前,主要数据(评分数据)包含用户ID、电影ID、用户评分从0到5和时间戳(不考虑用于该项目)。然后我使用 sklearn 库将数据分成 训练集(80%)和测试数据(20%)。
为了创建推荐系统,正在使用模型“Stacked-Autoencoder”。我正在使用 PyTorch 并且 代码是在 Google Colab 上实现的。该项目基于此https://towardsdatascience.com/stacked-auto-encoder-as-a-recommendation-system-for-movie-rating-prediction-33842386338
我是深度学习的新手,我想将此模型 (Stacked_Autoencoder) 与另一个深度学习模型进行比较。例如,我想使用 多层感知 (MLP)。这是为了研究目的。这是下面用于创建 Stacked-Autoencoder 模型和训练模型的代码。
### Part 1 : Archirecture of the AutoEncoder
#nn.Module is a parent class
# SAE is a child class of the parent class nn.Module
class SAE(nn.Module):
# self is the object of the SAE class
# Archirecture
def __init__(self, ):
# self can use alll the methods of the class nn.Module
super(SAE,self).__init__()
# Full connected layer n°1, input and 20 neurons-nodes of the first layer
# one neuron can be the genre of the movie
# Encode step
self.fc1 = nn.Linear(nb_movies,20)
# Full connected layer n°2
self.fc2 = nn.Linear(20,10)
# Decode step
# Full connected layer n°3
self.fc3 = nn.Linear(10,20)
# Full connected layer n°4
self.fc4 = nn.Linear(20,nb_movies)
# Sigmoid activation function
self.activation = nn.Sigmoid()
# Action : activation of the neurons
def forward(self, x) :
x = self.activation(self.fc1(x))
x = self.activation(self.fc2(x))
x = self.activation(self.fc3(x))
# dont's use the activation function
# use the linear function only
x = self.fc4(x)
# x is th evector of predicted ratings
return x
# Create the AutoEncoder object
sae=SAE()
#MSE Loss : imported from torch.nn
criterion=nn.MSELoss()
# RMSProp optimizer (update the weights) imported from torch.optim
#sea.parameters() are weights and bias adjusted during the training
optimizer=optim.RMSProp(sae.parameters(),lr=0.01, weight_decay=0.5)
### Part 2 : Training of the SAE
# number of epochs
nb_epochs = 200
# Epoch forloop
for epoch in range(1, nb_epoch+1):
# at the beginning the loss is at zero
s=0.
train_loss = 0
#Users forloop
for id_user in range(nb_users)
# add one dimension to make a two dimension vector.
# create a new dimension and put it the first position .unsqueeze[0]
input = Variable(training_set[id_user].unsqueeze[0])
# clone the input to obtain the target
target= input.clone()
# target.data are all the ratings
# ratings > 0
if torch.sum(target.data >0) > 0
output = sae(input)
# don't compute the gradients regarding the target
target.require_grad=False
# only deal with true ratings
output[target==0]=0
# Loss Criterion
loss =criterion(output,target)
# Average the error of the movies that don't have zero ratings
mean_corrector=nb_movies/float(torch.sum(target.data>0)+1e-10)
# Direction of the backpropagation
loss.backward()
train_loss+=np.sqrt(loss.data[0]*mean_corrector)
s+=1.
# Intensity of the backpropagation
optimizer.step()
print('epoch:' +str (epoch)+'loss:' +str(train_loss/s)
)
如果我想使用MLP模型进行训练。我怎样才能实现这个 class 模型?
另外,我可以使用哪些其他深度学习模型(除了 MLP)来与 Stacked-Autoencoder 进行比较?
谢谢。
MLP 不适合推荐。如果你想走这条路,你需要为你的 userid 创建一个嵌入,为你的 itemid 创建另一个嵌入,然后在嵌入之上添加线性层。您的目标是预测 userid-itemid 对的评级。
我建议您看一下变分自编码器 (VAE)。他们在推荐系统中给出了最先进的结果。他们还将与您的堆叠自动编码器进行公平比较。这是应用 VAE 进行协同过滤的研究论文:https://arxiv.org/pdf/1802.05814.pdf
对于我的项目,我试图根据用户对其他电影的评分来预测用户对未看过的电影的评分。我正在使用 movielens 数据集。主文件夹 ml-100k 包含有关 100,000 部电影.
的信息在处理数据之前,主要数据(评分数据)包含用户ID、电影ID、用户评分从0到5和时间戳(不考虑用于该项目)。然后我使用 sklearn 库将数据分成 训练集(80%)和测试数据(20%)。
为了创建推荐系统,正在使用模型“Stacked-Autoencoder”。我正在使用 PyTorch 并且 代码是在 Google Colab 上实现的。该项目基于此https://towardsdatascience.com/stacked-auto-encoder-as-a-recommendation-system-for-movie-rating-prediction-33842386338
我是深度学习的新手,我想将此模型 (Stacked_Autoencoder) 与另一个深度学习模型进行比较。例如,我想使用 多层感知 (MLP)。这是为了研究目的。这是下面用于创建 Stacked-Autoencoder 模型和训练模型的代码。
### Part 1 : Archirecture of the AutoEncoder
#nn.Module is a parent class
# SAE is a child class of the parent class nn.Module
class SAE(nn.Module):
# self is the object of the SAE class
# Archirecture
def __init__(self, ):
# self can use alll the methods of the class nn.Module
super(SAE,self).__init__()
# Full connected layer n°1, input and 20 neurons-nodes of the first layer
# one neuron can be the genre of the movie
# Encode step
self.fc1 = nn.Linear(nb_movies,20)
# Full connected layer n°2
self.fc2 = nn.Linear(20,10)
# Decode step
# Full connected layer n°3
self.fc3 = nn.Linear(10,20)
# Full connected layer n°4
self.fc4 = nn.Linear(20,nb_movies)
# Sigmoid activation function
self.activation = nn.Sigmoid()
# Action : activation of the neurons
def forward(self, x) :
x = self.activation(self.fc1(x))
x = self.activation(self.fc2(x))
x = self.activation(self.fc3(x))
# dont's use the activation function
# use the linear function only
x = self.fc4(x)
# x is th evector of predicted ratings
return x
# Create the AutoEncoder object
sae=SAE()
#MSE Loss : imported from torch.nn
criterion=nn.MSELoss()
# RMSProp optimizer (update the weights) imported from torch.optim
#sea.parameters() are weights and bias adjusted during the training
optimizer=optim.RMSProp(sae.parameters(),lr=0.01, weight_decay=0.5)
### Part 2 : Training of the SAE
# number of epochs
nb_epochs = 200
# Epoch forloop
for epoch in range(1, nb_epoch+1):
# at the beginning the loss is at zero
s=0.
train_loss = 0
#Users forloop
for id_user in range(nb_users)
# add one dimension to make a two dimension vector.
# create a new dimension and put it the first position .unsqueeze[0]
input = Variable(training_set[id_user].unsqueeze[0])
# clone the input to obtain the target
target= input.clone()
# target.data are all the ratings
# ratings > 0
if torch.sum(target.data >0) > 0
output = sae(input)
# don't compute the gradients regarding the target
target.require_grad=False
# only deal with true ratings
output[target==0]=0
# Loss Criterion
loss =criterion(output,target)
# Average the error of the movies that don't have zero ratings
mean_corrector=nb_movies/float(torch.sum(target.data>0)+1e-10)
# Direction of the backpropagation
loss.backward()
train_loss+=np.sqrt(loss.data[0]*mean_corrector)
s+=1.
# Intensity of the backpropagation
optimizer.step()
print('epoch:' +str (epoch)+'loss:' +str(train_loss/s)
)
如果我想使用MLP模型进行训练。我怎样才能实现这个 class 模型? 另外,我可以使用哪些其他深度学习模型(除了 MLP)来与 Stacked-Autoencoder 进行比较?
谢谢。
MLP 不适合推荐。如果你想走这条路,你需要为你的 userid 创建一个嵌入,为你的 itemid 创建另一个嵌入,然后在嵌入之上添加线性层。您的目标是预测 userid-itemid 对的评级。
我建议您看一下变分自编码器 (VAE)。他们在推荐系统中给出了最先进的结果。他们还将与您的堆叠自动编码器进行公平比较。这是应用 VAE 进行协同过滤的研究论文:https://arxiv.org/pdf/1802.05814.pdf